critrolesync / critrolesync.github.io

https://critrolesync.github.io
MIT License
9 stars 3 forks source link

Inconsistency in podcast episode duration for The Mighty Nein Reunited Part 1, C3E42, C3E51 #93

Closed jpgill86 closed 1 year ago

jpgill86 commented 1 year ago

Copied from commit message for d72ab83 (The Mighty Nein Reunited Part 1):

There is a new issue with the podcast episode duration being inconsistent. Because of this, the last two podcast timestamps needed to be obtained manually. This issue appears to be different than the ABR vs CBR issue.

This podcast episode's duration is reported as 4:19:38 by the podcast feed, by Spotify in both Chrome and Firefox, and by Google Podcasts in Chrome.

The duration is reported as 4:23:27 by librosa.get_duration, by Google Podcasts in Firefox, and by VLC and Windows Explorer file properties.

The podcast file format is MP3.

jpgill86 commented 1 year ago

For both The Mighty Nein Reunited Part 1 and C3E42, the duration reported by librosa.get_duration and some media players is 1.47% longer than that reported by Spotify and the podcast feed itself.

jpgill86 commented 1 year ago

For C3E51, the duration reported by librosa.get_duration is 4:30:30, whereas for Spotify in Chrome, VLC in Windows, and the podcast feed, it's 4:28:53. The former is 0.6% longer than the latter.

If, in get_absolute_slice_times, I simply override the incorrect duration returned by librosa.get_duration with the other duration, like this:

    # podcast_duration = get_duration(filename=podcast_file)
    podcast_duration = Time('4:28:53')  # OVERRIDE FOR C3E51

the autosync succeeds.

Perhaps this method would have worked on The Mighty Nein Reunited Part 1 and C3E42 too.

jpgill86 commented 1 year ago

librosa.get_duration uses another third-party package, soundfile, to get durations.

That duration seems to be calculated here: https://github.com/bastibe/python-soundfile/blob/0f606ed91a34c9e72c8b756fb92b2f9e389e5620/soundfile.py#L417

self.duration = float(self.frames)/f.samplerate

I presume that there is something wrong with frames or samplerate in these problematic podcast episodes.

jpgill86 commented 1 year ago

If, in get_absolute_slice_times, I simply override the incorrect duration returned by librosa.get_duration with the other duration, like this:

    # podcast_duration = get_duration(filename=podcast_file)
    podcast_duration = Time('4:28:53')  # OVERRIDE FOR C3E51

the autosync succeeds.

Perhaps this method would have worked on The Mighty Nein Reunited Part 1 and C3E42 too.

Indeed, it does work for these other episodes.

It seems that the podcast feed reports the correct duration every time, whereas librosa.get_duration is always longer. Usually it's longer by only a small amount such that problems do not occur (sometimes even with a duration ratio close to 128/127.7... could this be related to #5, or am I seeing patterns where there are none?), but other times it's longer by many minutes. There seems to be no pattern to it.

Since librosa.get_duration is used to decide when to slice the ending of the podcast (2 minutes before the ending), if the discrepancy is large enough, the podcast ending slice will cut into the outro (which is unique to the podcast, flummoxing the sync), or will even come after the file has ended. SP5 is an extreme example of this, with a nearly 12-minute duration discrepancy. (Note that librosa.get_duration is reporting a duration of 4:31:25 today for SP5, rather than 4:23:27 as stated in this thread's first post, so the file likely changed in recent months.)

If I forgo librosa.get_duration and instead just use the feed's reported duration, this should work around this issue for problematic episodes with large discrepancies, and should have little or no effect on others (perhaps autosync may change timestamps by 1 second in some cases due to rounding).

jpgill86 commented 1 year ago

Closing after workaround in 8870817.