Open charle-y opened 8 years ago
Can somebody with access to windows help us check this one ?
The same problem here. Windows prohibits to name any file containing special characters like : \ / * ? " < > |
edx-dl should be able to omit such characters from file names.
If you need any help to check it out after any modification to the source code, I am happy to do so.
This course has such pdf files (asset): https://courses.edx.org/courses/course-v1:OsakaUx+CNR101x+1T2016/info
edx-dl tried this:
[download] http://courses.edx.org/asset-v1:OsakaUx+CNR101x+1T2016+type@asset+block@osakaux_cnr101x_wk1_handout.pdf => J:\Edx\Cognitive_Neuroscience_Robotics__Part_A\07-Weekly_Handout\01-asset-v1:OsakaUx+CNR101x+1T2016+type@asset+block@osakaux_cnr101x_wk1_handout.pdf
But only a file named "01-asset-v1" has been downloaded with 0 byte size.
I have a possible solution. What do you think?
usage:
./edx.py --sanitize-filename ":\/*?<>"
https://github.com/coursera-dl/edx-dl/blob/master/edx_dl/edx_dl.py#L683
def _build_filename_from_url(args, url, target_dir, filename_prefix):
"""
Builds the appropriate filename for the given args
"""
if is_youtube_url(url):
filename = filename_prefix + "-%(title)s-%(id)s.%(ext)s"
else:
original_filename = url.rsplit('/', 1)[1]
filename = filename_prefix + '-' + original_filename
#https://stackoverflow.com/a/38748649/5397116
def remove(str_, chars):
try:
# Python2.x
return str_.translate(None, chars)
except TypeError:
# Python 3.x
table = {ord(char): None for char in chars}
return str_.translate(table)
if args.sanitize_filename:
filename = remove(filename, args.sanitize_filename)
filename = os.path.join(target_dir, filename)
return filename
@xunilrj, please, send a pull request with your solution so that we can better evaluate this, if this is still a problem.
BTW, youtube-dl already has a sanitization facitily built in and you probably want to use that (or adapt what is there).
As an example, it looks like:
the file with name "01-asset-v1" is created, but it is 0KB. I am using Windows 10, I think it is a common issue in all Windows system. Thanks in advance.