Open JohnVeness opened 4 years ago
Having examined the code more, the problem is in _build_subtitles_downloads which calls get_filename_from_prefix passing a prefix like "02". This is fine if the first file found in the directory starting "02" is the video file, but no good if the first file found in the directory starting "02" is, say, a PDF file. Fixing this is beyond my abilities, though!
🚨Please review the Troubleshooting section before reporting any issue. Don't forget also to check the current issues to avoid duplicates.
Subject of the issue
When I first used edx-dl, I didn't realise that it wouldn't download subtitles by default. Once I discovered why, I ran the command again including the "-s" option and was pleased to see that it was downloading just the subtitles and skipping the already downloaded videos.
However, some of the .srt files were given the incorrect prefix filename. In particular, they were given names such as 02-handouts_6002-L0-oei12-gaps-annotated.en.srt where 02-handouts_6002-L0-oei12-gaps-annotated.pdf is a PDF, not a video!
Your environment
Steps to reproduce
edx-dl -u <censored> https://courses.edx.org/courses/course-v1:MITx+6.002.1x+2T2019/course/ --filter-section 2
edx-dl -s -u <censored> https://courses.edx.org/courses/course-v1:MITx+6.002.1x+2T2019/course/ --filter-section 2
Expected behaviour
The .srt files should have the same base name as the corresponding video.
Actual behaviour
Some .srt files have the incorrect base name. Output:
Note that the mismatch problem only seems to occur if you download the videos and PDFs in one command, then attempt to download the subtitles in a separate command subsequently (so that it skips the already downloaded videos and PDFs). If you use the -s version in the first instance, while in an empty directory, the .srt files get the correct filenames. See: