coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 639 forks source link

Added filename sanitization for non-video resources #556

Open EugeneLoy opened 5 years ago

EugeneLoy commented 5 years ago

This adds sanitization to filenames given to non-video resources.

At the moment filename for non-video resources is derived from resource url, which may contain non-fs-friendly characters.

For example, running:

edx-dl --dry-run -u <username> https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/

... schedules download:

...
[skipping] https://courses.edx.org/asset-v1:MITx+18.6501x+3T2019+type@asset+block@lectureslides_chap1_annot.pdf => Downloaded\Fundamentals_of_Statistics\02-Unit_1_Introduction_to_statistics\02-asset-v1:MITx+18.6501x+3T2019+type@asset+block@lectureslides_chap1_annot.pdf
...

... (note : character that is used in destination filename and is not fs-friendly).

This results in silent failure to download affected resources.

coveralls commented 5 years ago

Coverage Status

Coverage remained the same at 47.7% when pulling 12e17022bb3c93278b6fc81a986c7637c417f83f on EugeneLoy:master into 265718cf35044a1ea90ac770fb7b810fe549fd30 on coursera-dl:master.