dgorissen / coursera-dl

A script for downloading course material (video's, pdfs, quizzes, etc) from coursera.org
http://dirkgorissen.com/2012/09/07/coursera-dl-a-coursera-download-script/
GNU General Public License v3.0
1.74k stars 300 forks source link

mppl problems, The system cannot find the path specified: #124

Closed TroyOfHelen closed 10 years ago

TroyOfHelen commented 10 years ago

I try to shorten the names using mppl, but i get an error, [Error 3] The system cannot find the path specified:

I think I can remember a time when mppl worked, which is kind of frustrating. I thought I could figure it out myself since I know a little python, but I haven't yet.

The rest of this post is the details.

""" K:\Documents\documents\class\downloader>python "K:\Documents\documents\class\downloader\coursera-dl\courseradownloader\courseradownloader.py" -mppl 30 -w 2 -u user@gmail.com -d K:\downloader smac-001 Warning: built-in 'html.parser' may cause problems on Python < 2.7.3 Coursera-dl v2.0.0 (html.parser) Password: Logging in as 'user@gmail.com'...

Course 1 of 1

I am using the latest version from github (it seems to say 2.0.0). I'm running on windows 7 with python 2.7. I had installed using pip as the instructions on the main page said. Then I cloned the repository to make sure I was using the latest version. I should note that in this course the week numbers change each week so that the current course appears at the top when I log into coursera.org. I mention it for the sake of replication.

shura-v commented 10 years ago

As far as I understood the mppl parameter basically governs the max length of the folders and it doesn't seem to have something to do with filenames. So here is the explanation and a fix:

    def trim_path_part(self, s):
        mppl = self.max_path_part_len
        if mppl and len(s) > mppl:
            return s[:mppl - 3] + "..."
        else:
            return s

This line (127) causes problems in Windows, the "..." in particular:

            return s[:mppl - 3] + "..."

fix:

    def trim_path_part(self, s):
        s = s.strip()
        mppl = self.max_path_part_len
        if mppl and len(s) > mppl:
            return s[:mppl].strip()
        else:
            return s

But removing the ellipsis doesn't solve the problem either, you have to shorten file name too. And if you look at line 306 and below you notice this isn't done:

_, ext = path.splitext(fname)

So I decided to rewrite this part and added a new method:

    def trim_file_name(self, fname, target_dir):
        _, ext = os.path.splitext(fname)
        _ = _.strip()
        if platform.system() == 'Windows':
            maxlen = 255 - len(target_dir) - len(ext) - 1
            if len(_) > maxlen:
                return _[:maxlen].strip(), ext
        return _, ext

Fix:

_, ext = self.trim_file_name(fname, target_dir)

Line 316:

    filepath = path.join(target_dir, _ + ext)

This completely fixes long file names issue in Windows (tested on various courses, including that one about statistical mechanics). You can edit the script in your %PYTHONPATH%\Lib\site-packages\courseradownloader\courseradownload.py

Here is my commit: https://github.com/shura-v/coursera-dl/commit/de3943a283c5f0f1ad72eee00c59f0b701d39eb1

TroyOfHelen commented 10 years ago

Dude that's awesome and makes sense. I'm trying it out right now.

TroyOfHelen commented 10 years ago

Works!