daooshee / HD-VG-130M

The HD-VG-130M Dataset
106 stars 2 forks source link

Questions about the cut_videos_hdvg.py #2

Open zeyuanchen23 opened 6 months ago

zeyuanchen23 commented 6 months ago

Thanks for providing this valuable dataset. I have tried your cut_videos_hdvg.py, but got a few issues.

  1. The script runs very slowly. It took me around 1min to process one video
  2. The size of the processed clips are very large. For instance, for a video "-6Yn5t87K08" whose original size is 19M, its clips have 79M in total.

I am not sure if I have setup everything correctly. Any advise will be high appreciated!

zizixixi commented 6 months ago

Thanks for interest in our work. We have checked the problems in your feedback, and found that also existed in our experiments. They are mainly caused by ffmpeg during video processing according to our analysis.

  1. The original videos in HD-VG have relatively large resolution(1080p). It would be time consuming when processing large videos. Meanwhile, it also needs time for file retrieval in mega scale. Modification like multi-processing/threading may speed up.
  2. We also observed the increased memory. For example, "----meyKR48" with 49.29MiB turns to 194 clips of 201.75MiB in total. On the one hand, it may caused by the different coding schemes (ffmpeg would change the coding schemes before & after processing. Use "ffmpeg -i xxx.mp4" to check). Setting coding scheme instead of using default setting may help. On the other hand, the time stamps to split videos may not be the key frames when using ffmpeg to split videos, thus there may exist duplicated frames between adjacent video clips because of rounding.
daooshee commented 6 months ago

Thanks zixi @zizixixi for helping to answer this question.

zeyuanchen23 commented 6 months ago

Thanks for the clarification! Are you planning to update your video clipping script to fix these issues?