Closed olegafx closed 11 years ago
Added a limit on the filename length set to roughly the OS limits. Are you actually getting an error or just finding such long filenames/paths annoying?
Not solved:
Failed to download url https://class.coursera.org/algo2-2012-001/lecture/subtitles?q=43_en&format=txt to C:\Videos\Coursera\algo2-2012-001\02 - II. SELECTED REVIEW FROM PART I (Week 1)\03 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min)\2 - 3 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min).txt: [Errno 2] No such file or directory: 'C:\Videos\Coursera\algo2-2012-001\02 - II. SELECTED REVIEW FROM PART I (Week 1)\03 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min)\2 - 3 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min).txt'
Failed to download url https://class.coursera.org/algo2-2012-001/lecture/subtitles?q=43_en&format=srt to C:\Videos\Coursera\algo2-2012-001\02 - II. SELECTED REVIEW FROM PART I (Week 1)\03 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min)\2 - 3 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min).srt: [Errno 2] No such file or directory: 'C:\Videos\Coursera\algo2-2012-001\02 - II. SELECTED REVIEW FROM PART I (Week 1)\03 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min)\2 - 3 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min).srt'
Failed to download url https://class.coursera.org/algo2-2012-001/lecture/download.mp4?lecture_id=43 to C:\Videos\Coursera\algo2-2012-001\02 - II. SELECTED REVIEW FROM PART I (Week 1)\03 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min)\2 - 3 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min).mp4: [Errno 2] No such file or directory: 'C:\Videos\Coursera\algo2-2012-001\02 - II. SELECTED REVIEW FROM PART I (Week 1)\03 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min)\2 - 3 - Guiding Principles for Analysis of Algorithms [Part I Review - Optional](15 min).mp4'
The problem is that in Windows's realization of NTFS the limit of 260 characters is not for the file name but for the full path (for more details google the StackOverflow).
So this check len(fileName) < 260 will not prevent the download error. Unfortunately this check len(os.path.abspath(fileName)) < 260 will not help either, because if abspath is > 260 Windows will return only the fileName itself. Not sure if it is a bug or a feature of os.path.abspath()
This piece of code in sanitiseFileName can be a quick and dirty fix:
# ensure it is within a sane maximum
max = 250
fullFileNameLength = len(os.getcwd()) + len(s)
if (fullFileNameLength) > max:
cutFileNameTail = fullFileNameLength - max
print " - The length of full file name is ", fullFileNameLength, " > max limit of ", max
print " - Original / shortened file name:"
print s
# split off extension, trim, and re-add the extension
fn,ext = os.path.splitext(s)
s = fn[:-(cutFileNameTail+len(ext))] + ext
print s
return s
It is dirty because it is not OS aware and user configurable.
This patch is verified on the first weeks of Drugs and the Brain course.
Had a closer look and actually quite tricky to fix properly as there are lots of corner cases. Windows paths are a mess :) Will have another look later.
isnt a better way to fix this is to download the lecture videos, srt in week's folder and the rest of lecture resources in a seperate folder for each lecture
not quite sure what you mean, feel free to clarify or propose a patch :)
Web / Blog : http://dirkgorissen.com Twitter : https://twitter.com/elazungu
On Sun, Apr 21, 2013 at 7:21 PM, Archit notifications@github.com wrote:
isnt a better way to fix this is to download the lecture videos, srt in week's folder and the rest of lecture resources in a seperate folder for each lecture
— Reply to this email directly or view it on GitHubhttps://github.com/dgorissen/coursera-dl/issues/8#issuecomment-16735519 .
The issue with long file name is proliferating even more. There is new course on writing2 and authors are taking poetic liberty with directory and filename. In first week itself just the directory name is running into 240+ characters, later week there may not be any space to create directory structure let alone files. Searching on internet choices are limited but if you do decide to adopt solution by libmor (mentioned above) then dgorissen please create a log file and put the information in there about old filename and new filename. The screen is already so verbose I don't even look at it anymore and in case we miss certain files because you have renamed original file and two files have same name, then at least we can look at logfile and figure out from that. Hopefully this can take care of 90% of such cases.
Annoying indeed. Hope to fix this, and redo the logging, but given my circumstances I have to say it may take a while. I will try my very best to fix bugs/crashes promptly though.
Web / Blog : http://dirkgorissen.com Twitter : https://twitter.com/elazungu
On Sun, Apr 28, 2013 at 8:25 AM, rodch-us notifications@github.com wrote:
The issue with long file name is proliferating even more. There is new course on writing2 and authors are taking poetic liberty with directory and filename. In first week itself just the directory name is running into 240+ characters, later week there may not be any space to create directory structure let alone files. Searching on internet choices are limited but if you do decide to adopt solution by libmor (mentioned above) then dgorissen please create a log file and put the information in there about old filename and new filename. The screen is already so verbose I don't even look at it anymore and in case we miss certain files because you have renamed original file and two files have same name, then at least we can look at logfile and figure out from that. Hopefully this can take care of 90% of such cases.
— Reply to this email directly or view it on GitHubhttps://github.com/dgorissen/coursera-dl/issues/8#issuecomment-17129479 .
Having the same problem with images-2012-001. Maybe too long file/directory names should be just trimmed?
Yeah, i have this problem, too. I think the easiest solution would be as Pastafarianist says: long file/directory names should be trimmed
Here's my version of a fix. It uses a parameter -t <max_path_length>
and trims filenames in long paths to fit the specified max length. It does not trim path names as that would require more complex changes. This fix works for my ~20 courses, half of which had some path length issues.
diff --git a/courseradownloader/courseradownloader.py b/courseradownloader/courseradownloader.py
index 601d8cf..27062a9 100644
--- a/courseradownloader/courseradownloader.py
+++ b/courseradownloader/courseradownloader.py
@@ -42,7 +42,7 @@ class CourseraDownloader(object):
# how long to try to open a URL before timing out
TIMEOUT=60.0
- def __init__(self,username,password,proxy=None,parser=DEFAULT_PARSER,ignorefiles=None):
+ def __init__(self,username,password,proxy=None,parser=DEFAULT_PARSER,ignorefiles=None, max_path_len=None):
self.username = username
self.password = password
self.parser = parser
@@ -54,6 +54,7 @@ class CourseraDownloader(object):
self.browser = None
self.proxy = proxy
+ self.max_path_len = max_path_len
def login(self,className):
"""
@@ -246,6 +247,32 @@ class CourseraDownloader(object):
r = self.browser.open(url,timeout=self.TIMEOUT)
return r.info()
+ def trimFileName(self, pathname):
+ """
+ Trim file name in given path name to fit max_path_len characters. Only file name is trimmed,
+ path names are not affected to avoid creating multiple folders for the same lecture.
+ """
+ MIN_LEN = 5 # Minimum length of file name to keep
+
+ if len(pathname) <= self.max_path_len:
+ return pathname
+
+ fpath, name = path.split(pathname)
+ name, ext = path.splitext(name)
+
+ to_cut = len(pathname) - self.max_path_len
+ to_keep = len(name) - to_cut
+
+ if to_keep < MIN_LEN:
+ print 'Cannot trim path name "%s" to fit required length (%d)' % (pathname, self.max_path_len)
+ return pathname
+
+ name = name[:to_keep]
+ new_pathname = path.join(fpath, name + ext)
+ print 'Trimmed path name "%s" to "%s" to fit required length (%d)' % (pathname, new_pathname, self.max_path_len)
+
+ return new_pathname
+
def download(self, url, target_dir=".", target_fname=None):
"""
Download the url to the given filename
@@ -270,6 +297,9 @@ class CourseraDownloader(object):
filepath = path.join(target_dir,fname)
+ if self.max_path_len:
+ filepath = self.trimFileName(filepath)
+
dl = True
if path.exists(filepath):
if clen > 0:
@@ -567,6 +597,7 @@ def main():
default=False, help="download and save the sections in reverse order")
parser.add_argument('course_names', nargs="+", metavar='<course name>',
type=str, help='one or more course names from the url (e.g., comnets-2012-001)')
+ parser.add_argument("-t", dest='max_path_len', type=int, help='attempt to trim path names to fit specified length, e.g. -t 259')
args = parser.parse_args()
# check the parser
@@ -593,8 +624,9 @@ def main():
password = getpass.getpass()
# instantiate the downloader class
- d = CourseraDownloader(username,password,proxy=args.proxy,parser=html_parser,ignorefiles=args.ignorefiles)
-
+ d = CourseraDownloader(username,password,proxy=args.proxy,parser=html_parser,ignorefiles=args.ignorefiles,
+ max_path_len=args.max_path_len)
+
# authenticate, only need to do this once but need a classaname to get hold
# of the csrf token, so simply pass the first one
print "Logging in as '%s'..." % username
finally committed a fix, thanks in part to @ilfats. I dont have a windows machine here to fully test with but assuming its all ok. Reopen if further issues.
Some courses contains a materials with a very long path names.
Example: inforiskman-2012-001\08 - Week 7\08 - Business Continuity and Disaster Recovery Michael Ness, Part 1 - Leadership Selling Your Ideas (1542)\8 - 8 - Business Continuity and Disaster Recovery Michael Ness, Part 1 - Leadership Selling Your Ideas (1542).srt