Jerking video when smoothing with pan

georgmartius / vid.stab

Video stabilization library

http://public.hronopik.de/vid.stab/

Other

838 stars 110 forks source link

Jerking video when smoothing with pan #88

Open mpanighel opened 4 years ago

mpanighel commented 4 years ago

Hi Georg, firstly thank you for this great library! I am trying to best stabilize cycling videos, similarly to what you reported in here, with the latest version of vidstab+ffmpeg (and also tried with the older transcode version). Without smoothing during transform, the overall stabilization is clearly a little poor. With smoothing, on the other hand, the overall stabilization is great, but I got a jerking effect on the fast moving parts of the frame near the edges (where there is more 'pan' than 'forward' movement), so that in these regions the less shaked parts of the original are actually better. I partly overcame this effect by using older vidstab in transcode (in which .trf are in the older, more comprehensible format) and then smoothing the trajectory in the .trf file (with an external piece of code) before applying the transform. The overall result is better concerning jerking, but not much concerning stabilization. I guess the new version of vidstab, which uses the new transform file format (with local motions lists), is better in detecting shakiness, but, on the other hand, I am not able not understand the new .trf file and thus perform some stuff on it. I wanted to know if I am doing something wrong and also which parameters did you use to best deshake the movie above (as in that video you do not have any jerking even in the panning part)? Thanks!

georgmartius commented 4 years ago

Can it be that these shaky bits near the boundaries come from the lens distortion? Fisheye lenses and other wide-angle lenses create a non-linear transformation that becomes strongly visible when translating the image. I think you need to have smoothing high to get good results. Maybe you can run a compensation for the barrel-transform of the lens first.

Can you tell me what you did on the manual/external manipulation of the transforms to get better results. Because from what I understand this should not really help. Can you upload share you original and stabilized video somewhere.

mpanighel commented 4 years ago

Thank you for the your quick reply!

Can it be that these shaky bits near the boundaries come from the lens distortion? I think you need to have smoothing high to get good results.

Indeed the camera has quite a wide angle. I did some test with higher smoothing, but it appears that smoothing is actually the cause. Here are the videos (and the original transforms) and the parameters for a short clip in which jerking is clearly visible due to a 'panning' movement, when the city landscape is framed. But a little it is also visible on the sides, when going forward (e.g. the bus shelter on the right in the first second), and also when camera is moved quickier. Actually in general, for regular forward movement, less shaky sections are worst in the stabilized that the original due to this effect (I could give an example for this also). Here vidstabdetect has been called with shakiness=10:accuracy=15:stepsize=6. Then vidstabtransform with optzoom=2:interpol="bicubic",unsharp=5:5:0.8:3:3:0.4 and the smoothing indicated in the file name (I already make sure that optzoom and interpol are not responsible for this effect and did also some trials on vidstabdetect parameters). I used ffmpeg-git-20200617-amd64-static from https://johnvansickle.com/ffmpeg/.

Maybe you can run a compensation for the barrel-transform of the lens first.

I will surely try that. Should I use lenscorrection/lensfun ffmpeg filters? I do not know if it would do much for the panning above, as it is in the center of the frame anyway?

Can you tell me what you did on the manual/external manipulation of the transforms to get better results. Because from what I understand this should not really help.

I just did a moving average on dx and dy of the .trf file. The improvement in jerking is anyway not so much (and I guess also the detection is somehow limited in that old transcode version so overall it is not so improved) and I would say it is actually a non-go. Anyway it seemed like at regular intervals the transform values for dx and dy regularly drop to zero every few frames. I do not now if that is intended and if it occurs in the (newer version) trf file of the clip above.

georgmartius commented 4 years ago

The results are indeed not satisfying. I think it is really the detection that does not work well. I would do the detect step with show=2 and saving the output to a dummy-video. This contains information about the detection. I am sure we will see what went wrong. My guess: try a smaller value of shakiness, e.g. 6. This big value causes big patches to be used for comparison and there might be too much sky etc in them. Anyway. Try the show=2 thing and upload the result.

You can also run the transform step with debug=1, then you get the original transform file, see https://ffmpeg.org/ffmpeg-filters.html#vidstabtransform-1. the dx and dy should not drop to 0. It would typically give a warning if there is too little contrast or other reasons for no detection.

mpanighel commented 4 years ago

I added a new folder shakiness6... at the same link with the dummy, the stabilized and the global motions with shakiness=6 and then smoothing=15. The effect is still there and you guessed right, the problem is in the detection.

There is also an image with the plot of dx and dy. Indeed every 6 frames (or 0.1 seconds) all the global transforms drop to zero. Actually having a closer look at the original "local motions" orginal.mov_shakiness6.trf file it can be seen that all and only these frames (5, 11, 17 and so on...) have all LM of the list like: LM 0 0 x x x x 0.000000 and indeed playing slowly the stabilized video, the jerking occurs every 6 frames.

As a test I opened and modified (with some external piece of code) the global tranforms so that each time they are zero they are replaced by the average values of the previous and the next frame. The resulting transform is global_motions_fixed.trf (see fixed_transform folder). Using this trf as input of vidstabtransform the jerking effect is indeed gone and the video is correctly stabilized - at least almost, since info of one frame every six is (possibly) lost.

georgmartius commented 4 years ago

Indeed a very good observation and a good fix. I need to add this postprocessing in the transform phase to check for those "missing" frame information. The video does not look so complicated to me, so I need to check why these frames are not detected.

mpanighel commented 4 years ago

I do not think either it is a problem of the particular movie 'appearance'. Indeed the 'missing' frames are regularly distributed as if it would be some glitch in the computation (in this case every 6 frames, or 100 ms as the movie is 60fps).

karbiv commented 3 years ago

The original video repeats each 5th frame(each 6th frame is identical to 5th). That could happen after transforming original video from 50fps to 60fps.

Or it could be an internal bug of the camera. Both 50 and 60 FPS standards exist historically due to two main TV standards in the world: PAL(50) and NTSC(60, USA and some others). Old TV boxes depended on power grid frequency of a country, that's why the difference. The only country in the world that has two separate electrical grids with 50 and 60 Hz is Japan. Different parts of the country bought industrial equipment from Germany and USA, so two standards co-exist. As a side-effect it contributed to higher reliability of japanese electronics in the past.

Another strange thing about the video - too dark shades. But that's completely different story. People around the world often have no idea that their original videos from cameras may contain additional details in dark and bright colors, but they are not shown even in videoplayers. It also was caused by some earlier legacy TV standard requirements, to narrow color signal levels. At least this observation is applicable to cameras that save videos in MP4 file format. Converting those videos with FFMPEG option "_-src_range 1_" makes more details visible in output videos. Not sure about MOV videos.

karbiv commented 3 years ago

Main fatal mistake is to use *.mov format. Best format for per-frame editing and transcoding without quality loss is h264. Videofiles from camera should always be converted to h264, and tried with src_range 1 option to check if there's some more visual details. To test stabilization, first that original.mov should be converted to h264 in mkv container:

ffmpeg -i original.mov **-c:v libx264** -crf 16 orig264.mkv

-crf 16 is an option for libx264 codec, "constant rate factor", at 16 it gives visually "lossless" quality of transcoding. In the video, each 6th frame is a duplicate of a previous 5th frame. More likely after a conversion from 50fps to 60fps. The video was cut out from some bigger video, so several starting frames may be not on a boundary of a 6th duplicate frame. Those starting frames should be dropped, so that FFMPEG filter could drop each 6th duplicate frame:

ffmpeg -ss 00:00:00.083 -i orig264.mkv -vf select='mod(n+1\,6)' -r 50 -crf 16 -y sel.mkv

It has dropped several starting frames to align the video for the select filter that drops each 6th duplicate frame. Video becomes 50fps again so the output's framerate must be adjusted by -r 50 option. mod in the filter is a modulus division(builtin operator of FFMPEG), so that each 6th frame return zero, to skip that frame in output.

ffmpeg -i sel.mkv -vf vidstabdetect=shakiness=10:accuracy=15:mincontrast=0.1:show=1 -crf 16 show.mkv
ffmpeg -i sel.mkv -vf vidstabtransform=smoothing=18 -crf 16 out.mkv

Libvidstab still can't handle that panning part perfectly. Because it's a perspective panning, when objects at different distances move with different speeds. Another defect remains, called "rolling shutter", slightly wobbly image. It's a characteristic of all modern cameras(after ~2008-2011), including even Sony a7. It can be easily fixed but would add an additional pass for libvidstab and 2-3 more options to set.

To deal with "perspective panning", some user interface must be developed where user would click on boxes to disable them or to edit some parameters. That would be a different software by definition.