VTUL / vtechworks

DSpace at Virginia Tech
http://vtechworks.lib.vt.edu
Other
6 stars 8 forks source link

Duration of mp4 differs between Finder and VTechWorks player #736

Closed alawvt closed 3 years ago

alawvt commented 3 years ago

We recently received 3 videos whose duration differs 1-8 seconds between the files in Finder and the VTechWorks player. An example is Preserving DH Projects: Creating an Environment for Emulation. The duration in Finder is 14:45. In YouTube it is 14:44. However the same file in VTechWorks, the duration is 14:53. Consequently, the captions generated in YouTube do not match the file in the VTechWorks player. I created another test item on dev with just the mp4 file and it has the same behavior, https://vtechworks-dev.cloud.lib.vt.edu/handle/10919/98790.

keithgee commented 3 years ago

To add to the puzzle: I'm seeing this behavior in FIrefox and Chrome, but not in Safari.

On Mon, Oct 26, 2020 at 10:31 AM alawvt notifications@github.com wrote:

We recently received 3 videos whose duration differs 1-8 seconds between the files in Finder and the VTechWorks player. An example is Preserving DH Projects: Creating an Environment for Emulation http://hdl.handle.net/10919/100640. The duration in Finder as 14:45. In YouTube it is 14:44. However the same file in VTechWorks, the duration is 14:53. Consequently, the captions generated in YouTube do not match the file in the VTechWorks player. I created another test item on dev with just the mp4 file and it has the same behavior, https://vtechworks-dev.cloud.lib.vt.edu/handle/10919/98790.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/VTUL/vtechworks/issues/736, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTYV7SZOIJVQHRFIWNBED3SMWB3XANCNFSM4S7OU7TA .

keithgee commented 3 years ago

Okay, there are some things about the video formats that I don't understand yet - but I have a lead and potential solutions for these items.

The video in the example here - approximately the first 8 seconds are without audio. When the mp4 video is played, the video starts at the approximately 8 second spot, where the audio begins - and this will be time zero. When the webm video is played, it starts at the spot where there are 8 seconds without audio - Alex and Corinne are still setting up, and it's quiet for 8 seconds. So when the audio actually starts in the webm, we're already eight seconds into the video.

Firefox and Chrome are likely using the webm version of the video (They both support it now, and it's listed first in the "video" tag in the HTML - we should change this so that .h264 is listed first as it became the most widely supported standard, and they're often better quality in our old implementation). Safari uses the .mp4 version (they never got around to adding support for webm, and arguably they ultimately didn't need to do so.)

My recommendation for things that can be done in a reasonable time span would be to do both of these things:

  1. get a webm version that either doesn't have that first 8-ish seconds, or somehow skips over them, and replace the webm in VTechWorks
  2. Switch the html in the video implementation in VTechWorks to prefer mp4 version to webm version (just list it first, basically)

Another puzzle, though:

Does the mp4 also contain these 8 seconds of video without audio, but have a code that lets the player know to skip them and start from the eight second-ish mark? If that's the case, it seems we didn't account for that when we encoded the webm version. Or was the webm version created from a different file than the current mp4 file?

Is anyone able to implement any of the fixes? If not, I can try to trim the first 8-ish seconds off the webm video. Also, can anyone answer the puzzle of whether the current webm was created from the current mp4, or did they come from different sources?

Thanks!

alawvt commented 3 years ago

@keithgee, thank you very much for investigating this issue so quickly. It certainly is interesting. According to the provenance, the mp4 was submitted so probaby @pyc1 used the video script to create the webm. I will try to create a shortened webm in QuickTime Player.

keithgee commented 3 years ago

Thank you, @alawvt !

alawvt commented 3 years ago

Since the mp4 comes in with the shorter time, I wonder if there is something we can change in the script so that extra time is not added to the webm.

alawvt commented 3 years ago

In the YouTube player, The first word, "Hello" starts at second or 1 or 2, so there definitely isn't an 8 second lead-in with the mp4.

I downloaded the webm from the item in VTechWorks. QuickTime Player can't open it so it won't be able to trim it. I will try something else to trim. It does play in VLC and has that 8 extra seconds of silence at the beginning. I ran the mp4 to webm script again, perl ./4process_mp4_for_ir.pl, and the resulting webm also has that extra 8 seconds.

keithgee commented 3 years ago

Thank you, Anne. That was a good idea. Did you download the mp4 that is in VTechWorks now and then run the .mp4 to .webm script on that version, or did you find the .mp4 file somewhere else? I'm still confused about whether the .mp4 is the one that was originally uploaded, or if a different one got in there somehow. My original hypothesis about the mp4 having those 8 seconds but skipping the first 8 seconds might be wrong; it might be that the mp4 was trimmed and replaced in VTechWorks, and then the webm never updated to match it after that.

alawvt commented 3 years ago

I downloaded the mp4 from VTechWorks. The md5 of it matches the provenance data from the original submission. Then I ran 4process_mp4_for_ir.pl on it to get the webm, which is 8 seconds longer. I am learning to use Shotcut to trim the webm.

alawvt commented 3 years ago

I found this as well that chops off the beginning part of a video. The catch here is that I don't know what the value after ss is supposed to be - I thought it was in seconds, but it seems to not work exactly. On the bright side, it's very fast. - KG

ffmpeg -ss 12.0 -i Kinnaman_Guimont_DLF_2020.webm -c copy Kinnaman_Guimont_cut.webm
alawvt commented 3 years ago

Well, that's easy and fast, although not accurate. Using -ss from 8-11 yields a duration of 14:47. Bumping it to 12 yields a duration of 14:42.

"-ss position (input/output)

"When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved." - http://ffmpeg.org/ffmpeg.html

However, this, http://trac.ffmpeg.org/wiki/Seeking, which I don't completely understand seems to work: putting the -ss after the -i.

ffmpeg -i Kinnaman_Guimont_DLF_2020.webm -ss 8.0 -c copy Kinnaman_Guimont_DLF_2020_cut.webm

yields a time of 14:44 and the caption seems to be at the right time.

I'd still like to figure out why this occurred for these particular mp4s and how to avoid this in the future. But this is great to have this quick way to fix it. I am going to fix the other two tomorrow.

Thank you very much!

alawvt commented 3 years ago

I have trimmed these webms and updated our video processing documentation with instructions for this fix. It still remains to figure out why this occurred. I have added to the backlog:

@keithgee, thank you very much for all you analysis and research for this issue.