Puyodead1 / udemy-downloader

A Udemy downloader that can download courses, with DRM support.
MIT License
1.26k stars 291 forks source link

[Bug]: Video Ends Early While Audio Continues Repeating in Decrypted Files #215

Open oijm17 opened 4 months ago

oijm17 commented 4 months ago

What happened?

Description The script downloads the audio and video files for each lesson and decrypts them correctly. However, when multiplexing, the issue arises because the script concatenates the audio four times within the same track in the video container, instead of just once. This behavior differs from the video track, which is multiplexed correctly with just one iteration.

The result is a single decrypted video file with two tracks: the first is the video track (which works correctly), and the second is the audio track, containing the original decrypted audio but concatenated four times. This means that when playing the file, the video stops after the expected duration, but the audio continues playing three more times, even though the video has stopped.

Example to Illustrate the Issue: If an encrypted lesson has a total duration of 5 minutes, the script, after decrypting and multiplexing, creates a video file with a duration of 20 minutes. After the first 5 minutes, the video stops because it's complete, but the audio starts again. This repeats at 10, 15, and 20 minutes, because the audio track is concatenated four times.

This problem only occurs with encrypted lessons and seems to affect all DRM-based courses. I have tested this with three different courses and encountered the same result.

Desktop: Python: v3.9.1

Expected Result

The script should download, decrypt, and multiplex the audio and video files for each lesson correctly. The resulting video file should contain one track for the video and one track for the audio, each with the expected duration and without repetition or errors.

When playing the final video file, both the video and audio should end at the same time, without any repeated or redundant audio tracks.

Branch

master/main

What operating systems are you seeing the problem on?

Alma Linux 8.9, Windows

Relevant log output

No response

Other information

No response

FrancoStino commented 3 months ago

Confirmed.

thebetauser commented 3 months ago

Confirmed as well. I have tested in on 2 encrypted files under 5 minutes and both have the same issue.

Edit: As a temporary hotfix, i have added the -shortest flag to the mux_process function so it cuts at the shortest stream (usually is the video). It works but i will wait for an official fix to be pushed.

thebetauser commented 2 months ago

@Puyodead1 Will there be an official fix for this or should I create a PR with my changes?

Puyodead1 commented 2 months ago

@Puyodead1 Will there be an official fix for this or should I create a PR with my changes?

Go ahead and make a PR.

auoie commented 2 months ago

It seems like yt-dlp isn't able to distinguish between the different audio segments in the .mpd files. In contrast, ffmeg can differentiate between them. I went into the ./temp folder and ran:

yt-dlp --allow-unplayable-formats --enable-file-urls -F file://$(pwd)/index_${ID}.mpd

It displays the following tracks:

ID EXT RESOLUTION │  TBR PROTO │ VCODEC       VBR ACODEC     ABR ASR MORE INFO
────────────────────────────────────────────────────────────────────────────────────────────────────
7  m4a audio only │  64k dash  │ audio only       mp4a.40.5  64k 44k [eng] DRM, DASH audio, m4a_dash
1  mp4 640x360    │  85k dash  │ avc1.4D401E  85k video only         DRM, DASH video, mp4_dash
2  mp4 640x360    │ 124k dash  │ avc1.4D401E 124k video only         DRM, DASH video, mp4_dash
3  mp4 768x432    │ 191k dash  │ avc1.4D401E 191k video only         DRM, DASH video, mp4_dash
4  mp4 1024x576   │ 283k dash  │ avc1.4D401F 283k video only         DRM, DASH video, mp4_dash
5  mp4 1280x720   │ 408k dash  │ avc1.4D401F 408k video only         DRM, DASH video, mp4_dash
6  mp4 1920x1080  │ 788k dash  │ avc1.4D4028 788k video only         DRM, DASH video, mp4_dash

I also ran:

ffmpeg -i index_${ID}.mpd

It displays the following tracks:

Input #0, dash, from 'index_9807122.mpd':
  Duration: 00:02:12.00, start: 0.000000, bitrate: 2 kb/s
  Program 0
  Stream #0:0: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 81 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 84954
      id              : 1
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:1: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 120 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 124412
      id              : 2
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:2: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 768x432 [SAR 1:1 DAR 16:9], 186 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 191129
      id              : 3
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:3: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1024x576 [SAR 1:1 DAR 16:9], 275 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 282741
      id              : 4
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:4: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 398 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 408096
      id              : 5
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:5: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 771 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 788192
      id              : 6
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:6(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 7
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:7(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 8
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:8(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 9
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:9(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 10
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:10(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 11
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:11(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 12
    Side data:
      unknown side data type 24 (1085 bytes)

yt-dlp identifies 6 video tracks and 1 audio track. ffmpeg identifies 6 video tracks and 6 audio tracks. These 6 audio tracks are identical. I used yt-dlp and ffmpeg to download the audio:

ffmpeg \
  -loglevel verbose \
  -i index_${ID}.mpd \
  -map 0:p:0:6 \
  -c copy \
  ffmpeg.m4a # 1.05MB
yt-dlp \
  -f 7 \
  --allow-unplayable-formats \
  --enable-file-urls \
  file://$(pwd)/index_${ID}.mpd \
  -o yt-dlp.m4a # 6.78MB

Basically, yt-dlp downloads all 6 audio tracks and combines them into one. It seems to be incapable of only downloading a single audio track. ffmpeg can do a single track, but it's slow. So a fix is to use some XML parser to parse the .mpd file, delete all of the audio <Representation/> elements except for one, and then use yt-dlp for downloading. This should also make downloading faster since you're not downloading the same thing multiple times.

Puyodead1 commented 2 months ago

It seems like yt-dlp isn't able to distinguish between the different audio segments in the .mpd files. In contrast, ffmeg can differentiate between them. I went into the ./temp folder and ran:

yt-dlp --allow-unplayable-formats --enable-file-urls -F file://$(pwd)/index_${ID}.mpd

It displays the following tracks:

ID EXT RESOLUTION │  TBR PROTO │ VCODEC       VBR ACODEC     ABR ASR MORE INFO
────────────────────────────────────────────────────────────────────────────────────────────────────
7  m4a audio only │  64k dash  │ audio only       mp4a.40.5  64k 44k [eng] DRM, DASH audio, m4a_dash
1  mp4 640x360    │  85k dash  │ avc1.4D401E  85k video only         DRM, DASH video, mp4_dash
2  mp4 640x360    │ 124k dash  │ avc1.4D401E 124k video only         DRM, DASH video, mp4_dash
3  mp4 768x432    │ 191k dash  │ avc1.4D401E 191k video only         DRM, DASH video, mp4_dash
4  mp4 1024x576   │ 283k dash  │ avc1.4D401F 283k video only         DRM, DASH video, mp4_dash
5  mp4 1280x720   │ 408k dash  │ avc1.4D401F 408k video only         DRM, DASH video, mp4_dash
6  mp4 1920x1080  │ 788k dash  │ avc1.4D4028 788k video only         DRM, DASH video, mp4_dash

I also ran:

ffmpeg -i index_${ID}.mpd

It displays the following tracks:

Input #0, dash, from 'index_9807122.mpd':
  Duration: 00:02:12.00, start: 0.000000, bitrate: 2 kb/s
  Program 0
  Stream #0:0: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 81 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 84954
      id              : 1
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:1: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 120 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 124412
      id              : 2
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:2: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 768x432 [SAR 1:1 DAR 16:9], 186 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 191129
      id              : 3
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:3: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1024x576 [SAR 1:1 DAR 16:9], 275 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 282741
      id              : 4
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:4: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 398 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 408096
      id              : 5
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:5: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 771 kb/s, 30 fps, 30 tbr, 30k tbn (default)
    Metadata:
      variant_bitrate : 788192
      id              : 6
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:6(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 7
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:7(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 8
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:8(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 9
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:9(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 10
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:10(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 11
    Side data:
      unknown side data type 24 (1085 bytes)
  Stream #0:11(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 47 channels, fltp, 62 kb/s (default)
    Metadata:
      variant_bitrate : 64139
      id              : 12
    Side data:
      unknown side data type 24 (1085 bytes)

yt-dlp identifies 6 video tracks and 1 audio track. ffmpeg identifies 6 video tracks and 6 audio tracks. These 6 audio tracks are identical. I used yt-dlp and ffmpeg to download the audio:

ffmpeg \
  -loglevel verbose \
  -i index_${ID}.mpd \
  -map 0:p:0:6 \
  -c copy \
  ffmpeg.m4a # 1.05MB
yt-dlp \
  -f 7 \
  --allow-unplayable-formats \
  --enable-file-urls \
  file://$(pwd)/index_${ID}.mpd \
  -o yt-dlp.m4a # 6.78MB

Basically, yt-dlp downloads all 6 audio tracks and combines them into one. It seems to be incapable of only downloading a single audio track. ffmpeg can do a single track, but it's slow. So a fix is to use some XML parser to parse the .mpd file, delete all of the audio <Representation/> elements except for one, and then use yt-dlp for downloading. This should also make downloading faster since you're not downloading the same thing multiple times.

what's strange is that I don't have this issue while some others do. also maybe yt-dlp is filtering out the other audio tracks because they are duplicates? idk, if that's not the case, this should be reported to yt-dlp devs