georgmartius / vid.stab

Video stabilization library
http://public.hronopik.de/vid.stab/
Other
842 stars 108 forks source link

Possibilities to reduce memory usage #54

Open ochilan opened 6 years ago

ochilan commented 6 years ago

Hello,

I'm using vid.stab with ffmpeg to stabilize VR recordings in order to reduce high frequency jitter which generally works quite well for me. I would like to keep using ffmpeg since it offers me codecs (GPU encoding) and other options (colorspace conversion, etc.) that I need. However, this requires me to use vid.stab in two-pass mode.

The main problem I have is that stabilizing an e.g. two or three hour recording in 1080p60 requires a lot of memory. The file written out by the first pass is a couple of GB on disk and the transform pass then requires more RAM than my recording machine has. The last time it tried to allocate 20 GB of RAM in the transform step while the machine "only" has 16 GB.

Now I would like to evaluate possibilities to reduce memory requirements, thus I inspected the code a little bit. I understand that the transform step requires a certain sliding window over the transforms in order to work. However, it seems to like the first step in the transform pass is to load all the local motions into RAM and then convert them to transforms which by itself only requires the local motion for each frame separately. Am I correct?

If this is the case then it would probably already help a lot if this part of the process would read the local motions and convert them to transforms directly, without requiring all local motions to be loaded into RAM first. Another possibility might be to move the conversion to transforms to the detection step and writing out the transforms which should be much smaller.

Are these observations correct? Do you think that this would be something worth looking into?

Best regards Ochi

georgmartius commented 6 years ago

Hi Ochi, your observations are correct. The loading of the transforms could be done in a stream fashion. I only implemented it this way because it was simpler and I did not think of so long videos. In earlier versions I computed to transforms in the detection steps, but this does not allow to change parameters in the second step, such as smoothness etc. So I prefer the way it is now. If you have time to improve the implementation, go ahead. Also, it would be great to have a binary version of the transform output, because it is getting huge, as you have seen. I am happy to assist, just that I have no time myself to work on it.

Georg