Inefficient tv archive recording

javiermatos commented 9 years ago

Hi,

I started using stalker portal with a second server for storage. I have developed systems in the past which faced the "multiple write operations" problem. This is exactly what you have in your storage server: many channels getting recorded at the same time, causing hundreds of writes to disks. HDD needle have to move from one location to the other, and moving that needle is the slowest and less efficient thing ever. The consequences are: HDD with shorter lifespan, slower servers, less customers per server and more money spent.

There is a tiny and stupid solution for that: using RAM for having buffers that will store big chunks of video recording and will move it to HDD in an efficient way.

I have seen that you have implemented your logic in two places: tvarchiverecorder.class.php and dumpstream. You can simply use "buffer" linux utility in the following wey:

Stream -> dumpstream ---(pipe)---> buffer -> HDD

You can redirect dumpstream stdout to buffer and save the result in HDD as shown in the followsing example: python dumpstream -a 239.255.0.1 -p 1234 -n 22 | buffer -s 256K -m 10M -p 80 -o output.mpg

Once you install buffer (apt-get install buffer, that is a 72 kB utility) you have these options:

Usage: buffer [-B] [-t] [-S size] [-m memsize] [-b blocks] [-p percent] [-s blocksize] [-u pause] [-i infile] [-o outfile] [-z size] [-Z] [-d] -B = blocked device - pad out last block -t = show total amount written at end -S size = show amount written every size bytes -m size = size of shared mem chunk to grab -b num = number of blocks in queue -p percent = don't start writing until percent blocks filled -s size = size of a block -u usecs = microseconds to sleep after each write -i infile = file to read from -o outfile = file to write to -z size = combined -S/-s flag -Z = seek to beginning of output after each 1GB (for some tape drives) -d = print debug information to stderr

Now the question is... is it safe for me to just edit tvarchiverecorder.class.php and change dumpstream call to use buffer? Is there any side effect that I can break by doing so?

Regards, Javier

javiermatos commented 9 years ago

I have created a pull request with the changed that are needed for having dumpstream python program to use buffering and reduce IO operations with disks. The default is to use a 8 MB buffering, so you improve your server performance by using 8MB of RAM for each channel that you are recording. I think is a good tradeoff.

I'm actually using it. I have done some previous tests (in an independent code) and it works as expected. Also, I'm testing in testing environment with a dedicated storage server and is ok.

azhurb commented 9 years ago

Thank you for such a detailed description. We need a little time to investigate this problem.

javiermatos commented 9 years ago

Hello again,

I did a small python program so you can check the behavior of the buffer when writing content to disk. This program just creates a file using a custom buffer size. If you have a small buffer size, then you can see that it makes many disk writes, but the larger the buffer, the less the write operations.

Use -s or --size for file size (the file size it creates, in MB), -b or --buffering for the buffering size (in MB). You don't need to specify the -c or --chunk-size. There you can see that with 100 MB file size and 10 MB buffering size, you only make 10 disk writes. If you don't use buffering, you will be making many more disk writes (more stress to disks, less efficiency). I hope this helps to your analysis.

import argparse
import io
from datetime import datetime

MEGABYTES = 1024*1024

def buffered_write(filename, size, buffering, chunk_size=1):

    c_size = 0
    with io.open(filename, 'ab', buffering) as f:
        while size > c_size:
            f.write(b'a' * chunk_size)
            c_size += chunk_size

def main():
    parser = argparse.ArgumentParser(description='Python buffering test')
    parser.add_argument('-f', '--filename', type=str)
    parser.add_argument('-s', '--size', type=int, default=64)
    parser.add_argument('-b', '--buffering', type=int, default=1)
    parser.add_argument('-c', '--chunk-size', type=int, default=1)
    args = parser.parse_args()

    if not args.filename:
        args.filename = 'buffering_%s' % datetime.utcnow().strftime('%Y%m%d%H%M%S')
    args.size *= MEGABYTES
    args.buffering *= MEGABYTES

    buffered_write(args.filename, args.size, args.buffering, args.chunk_size)

if __name__ == '__main__':
    main()

azhurb commented 9 years ago

Looks like by default python (on my test server) uses 8M buffering, therefore nothing has changed with this option for dumpstream.

azhurb commented 9 years ago

But we can add the option to the config.php to configure buffering.

azhurb commented 9 years ago

Sorry, my mistake. By default it is only 8K (or 4K), not 8M.

javiermatos commented 9 years ago

I can do the changes to expose the buffer size in config.php. What name do you want me to use? DUMPSTREAM_BUFFER for instance? I will create that and left 8 MB as default in case this value is not defined.

azhurb commented 9 years ago

We have already added https://github.com/azhurb/stalker_portal/commit/a835ae2a29b710dff4bf1b27ea2cafd3d53e9f8f

azhurb commented 9 years ago

Thanks again!

azhurb / stalker_portal

Inefficient tv archive recording #122