Fix for exercice download + bug report about exercice download

ankitsejwal / Lyndor

:rocket: Powerful command line tool to download lynda.com courses for personal offline use. :part_alternation_mark:

MIT License

131 stars 32 forks source link

Fix for exercice download + bug report about exercice download #45

Closed othmanelamnabhi closed 6 years ago

othmanelamnabhi commented 6 years ago

Please follow the guide below

You will be asked some question, please read them carefully
Put an x into all the boxes [ ] relevant to your issue (like this: [x])
Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run `git pull` to update your version from Lyndor directory

Before submitting an issue make sure you have:

What is the purpose of your issue?

[x] At least skimmed through the README
[x] Bug report (encountered problems with Lyndor) :beetle:
[ ] Question :question:
[ ] Feature request (request for a new functionality) :point_up:
[ ] Other

If the purpose of this issue is a bug report, or you are not completely sure then provide the full terminal output as follows:

Copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

Videos downloaded: 16

🔰  Moving files inside: 00. Introduction
🔰  Moving files inside: 01. Foundations of Salary Negotiations
🔰  Moving files inside: 02. The Negotiation Conversation
🔰  Moving files inside: 03. Special Considerations
🔰  Moving files inside: 04. After the Negotiation

🥂  videos/subtitles moved to appropriate chapters successfully.

Exercise file is available to download

library card no. and card pin. entered successfully....
launching desired course page ....
Downloading Ex_Files_Negotiating_Your_Salary.zip
Download in progress ./Users/othmanelamnabhi/Desktop/Lyndor/exercise_file.py:57: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if folder == ex_file_name:
Download in progress ................

Answer questions related to your Environment which will help in reproducing the issue:

The issue was encountered on: :computer:

[x] MacOS
[ ] Windows
[ ] Linux

Enter the python version you are using for download. Find your python version by typing in terminal `python -V`

python 2.7.10 (happens also with 3.6.4)

If the purpose of this issue is a bug report please provide all kinds of example URLs where you encountered issues (replace following example URLs by yours):

https://www.lynda.com/Business-tutorials/Negotiating-Your-Salary/702267-2.html

Description of your issue, suggested a solution and other information

I tried to download a course through my library login, the videos download just fine, but when it launches Chrome to download the exercice files, the login fails, when I compared the login url to the url in exercise_file.py I found out that I had to replace "sip" by "patron" in the organization login, then when I redownloaded the course, Chrome launched properly and logged in properly, but now the issue I face is that Lyndor downloads the file into the "Downloads" folder and then terminal gets stuck at "Download in progress" although the file is done downloading and I get that error UnicodeWarning.

ankitsejwal commented 6 years ago

Hi @Otech-Man thanks for reporting this issue, unfortunately, I was not able to reproduce it at my side. But as you are getting UnicodeWarning: Unicode equal comparison failed, I've made some changes to exercise_file.py that should fix it and also replaced sip to patron in URL and asked @rackyman to check if patron works for him as for him sip use to work perfectly #15 . Please go ahead and test the latest changes. Please report back, Thanks.

Cheers ANk

othmanelamnabhi commented 6 years ago

Hey @ankitsejwal thanks for getting back to me so fast. I installed the latest changed and launched the script again.

Here is what happened On an initial run, I got this error

Last login: Sun Aug 19 10:27:40 on ttys000
MacBook-Pro-dOthmane:~ othmanelamnabhi$ run /Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/run.py 
-bash: run: command not found
MacBook-Pro-dOthmane:~ othmanelamnabhi$ python /Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/run.py 
Traceback (most recent call last):
  File "/Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/run.py", line 7, in <module>
    import message, save, cookies, read, install, move, draw, rename, exercise_file
  File "/Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/exercise_file.py", line 58
    sys.stdout.write('\r{}'.format(f"Finding Ex_file in Downloads folder ---> {message.return_colored_message(Fore.LIGHTYELLOW_EX,folder)}"))
                                                                                                                                          ^
SyntaxError: invalid syntax

So I replaced everything after "format" with your old download message and added the variable download_message. The script run fine after that, videos were downloaded and when at the exercice step, the "patron" change worked fine. But then after the download launches, here is the message I got

Exercise file is available to download

library card no. and card pin. entered successfully....
launching desired course page ....
Downloading Ex_Files_Negotiating_Your_Salary.zip
Download in progress .Traceback (most recent call last):
  File "/Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/run.py", line 130, in <module>
    main()
  File "/Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/run.py", line 35, in main
    schedule_download(url)
  File "/Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/run.py", line 51, in schedule_download
    download_course(url)
  File "/Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/run.py", line 117, in download_course
    exercise_file.download(url, course_folder_path)
  File "/Users/othmanelamnabhi/Desktop/Lyndorr/Lyndor/exercise_file.py", line 61, in download
    if folder.encode('utf-8') == ex_file_name.encode('utf-8'):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 2: ordinal not in range(128)

I wonder what is different about our machines that makes it so that you can't reproduce this, I tried on a Windows machine and it happened. Is it the OS Locale maybe? Do you know what could be affecting such a thing?

othmanelamnabhi commented 6 years ago

So I have some good news. I can't let something rest so I tried to fiddle in anything that made sense to me (I'm no programmer).

Since the issue was with the == line and the encoding, I laid out some hypotheses to test, I set out to try the different combinations of encode/decode : ❌Encode / encode ❌Decode / encode ❌Encode / decode ❌Encode / nothing ✓ Decode / nothing Nothing / encode Nothing / decode Decode / decode

So the fifth one was the right one, and it looks like this : if folder.decode('utf-8') == ex_file_name Selenium launches the browser downloads the file, and then after a few seconds the file is moved to the Course folder and that's it.

I tried to make sense of these findings so I added a print type (folder) and print type(ex_file_name). The first one is a string and the second one is a unicode, and since you can't encode a string, I tried decode and it worked.

My question now if you don't mind is : when the exercice is downloaded, I have to wait for a while before it gets moved something like (30s or 40s). Is there someway to make that process faster? especially that the exercice file is really small (300ko).

    file_not_found = True
    while file_not_found:
        message.spinning_cursor()
        downloads_folder = install.get_path("Downloads")
        os.chdir(downloads_folder)
        download_message = "Download in progress ."
        for folder in os.listdir(downloads_folder):

            sys.stdout.write("\033[K")          # Clear to the end of line
            sys.stdout.write('\r{}'.format(download_message))
            sys.stdout.flush()                  # Force Python to write data into terminal.
            if folder.decode('utf-8') == ex_file_name:
                if os.path.getsize(folder) > 0: # if file downloaded completely.
                    print('\nDownload completed.')
                    file_not_found = False
                    break
            time.sleep(0.1)

EDIT : sometimes it stays stuck on "Download in progress" and the exercice file is not moved at all. I don't know if it has anything to do with refresh intervals or anything.

ankitsejwal commented 6 years ago

Hi @Otech-Man amazing work, I think you can say now that you are a programmer :) as you've nailed it this time. python2, and python3 are very different in their encoding and thus causes a lot of compatibility problem with a common code. I tried checking type(folder) and type(ex_file_name) in python3 and they both are \ while as you said in python2 type(ex_file_name) is \ Thanks to you I've made following changes to exercise_file.py

# exercise_file.py

            sys.stdout.write('\r{}'.format("Finding Ex_file in Downloads folder ---> " + message.return_colored_message(Fore.LIGHTYELLOW_EX,folder)))
            sys.stdout.flush()                  # Force Python to write data into terminal.

            try:
                folder = folder.decode('utf-8') # python 2.x
            except AttributeError:
                pass                            # python 3.x

            if folder == ex_file_name:
                if os.path.getsize(folder) > 0: # if file downloaded completely.
                    print('\nDownload completed.')
                    file_not_found = False
                    break
            time.sleep(0.02)                    # delay to print which file is being scanned

Please download just the exercise file to test if it works this time, no need to re-install Lyndor. You can comment all the main operations in run.py to test just exercise file download (will be faster)

# run.py
 try:
        # main operations ->
        # save.course(url, lynda_folder_path)                 # Create course folder
        # save.info_file(url, course_folder_path)             # Gather information
        # save.chapters(url, course_folder_path)              # Create chapter folders
        # save.contentmd(url)                                 # Create content.md
        # save.videos(url, cookie_path, course_folder_path)   # Download videos
        # rename.videos(course_folder_path)                   # rename videos
        # rename.subtitles(course_folder_path)                # rename subtitles
        # move.vid_srt_to_chapter(url, course_folder_path)    # Move videos and subtitles to chapter folders

        # Download exercise files
        if save.check_exercise_file(url):
            print('\nExercise file is available to download')

I guess the reason it's taking you 30-40 seconds to search file is because your Downloads folder may have a lot of files and the delay of 10 milliseconds is making it worse

time.sleep(0.1)

I've reduced the delay to 2 milliseconds so it should be 8 times faster now but, if its still slow for you, then you can reduce it to 0.01 (1 millisecond) or get rid of time.sleep statement if you want (cons: you won't get feedback about which file is being read currently)

Cheers ANk

othmanelamnabhi commented 6 years ago

OMG its WORKS @ankitsejwal 👍 I commented out the 4 last operations in run.py to allow the script to create the directory it was gonna move the file to. I initially got here because I was looking for a way to download the courses, and honestly I could have just left it at that and downloaded the exercices files manually at the end, but I like things perfect and I'm glad you were here for support. I'm thankful for the exchange, because as I told you I never programmed, but I can read strings of code until they kinda make sense to me. So I learned a lot from yours. Thank you for the explanation, I finally understand now why it took so long, and the verbose mode you added helped with that. I have a huge downloads folder so it was taking time but it's working amazingly well now.

One last thing, you told the "sip" parameter was working for some other user. If he ever ends up answering and the sip parameter works for him, either a conditional statement could be added where if login fails for one it reverts to the other parameter, or honestly it can just be done manually and that's the end of it.

But an in all, I had fun with this, and I have you to thank for a great weekend spent experimenting.

othmanelamnabhi commented 6 years ago

Alright @ankitsejwal, I'm bringing you something new if you're up for it. I'll probably do some research, but wanted to let you know about it (since I'll most probably get stuck) :D

Sometimes a course will have more than one exercice, so in the screenshot attached, you have two files, with the same class .exercise-name.

Is there anyway for the program to loop through the files with that class name? this will I suppose also impact the moving part at the end

screen shot 2018-08-20 at 00 05 14

othmanelamnabhi commented 6 years ago

UPDATE 1: after some searching I found this piece of code that I customized. My train of thought was finding some setting in selenium that made it possible to click multiple elements with the same class. After some trial and error, this did the trick. Do tell me if it's efficient or not please.


    exercices = driver.find_elements_by_css_selector('a > .exercise-name')
    for x in range(0,len(exercices)):
        if exercices[x].is_displayed():
            exercices[x].click()

Now when the download is done, the program moves only one file, so I'm off to look for something to solve that, let me know what you think :)

othmanelamnabhi commented 6 years ago

UPDATE 2: added this code for testing purposes and esthetics to see if I extracted the right data


    #ex_file_name = driver.find_element_by_css_selector('.exercise-name').text
    ex_file_name = driver.find_elements_by_css_selector('.exercise-name')
    for span in ex_file_name:
        print span.text #test to check if it extracts the right text
    # ex_file_size = driver.find_element_by_css_selector('.file-size').text
    for span in ex_file_name: #added this so that it shows all the files it's downloading
        print('Downloading ' + span.text) #ex_file_name)

othmanelamnabhi commented 6 years ago

UPDATE 3: I'm done haha I managed to achieve it, BUT I'm sure the code is messy and that there is a better way to formulate it, nonetheless here is what I did, if you could offer feedback, would be much appreciated.

I recycled the for span in ex_file_name: over and over and surprisingly it did the trick.

Here is the code change

            try:
                folder = folder.decode('utf-8') # python 2.x
            except AttributeError:
                pass                            # python 3.x
            for span in ex_file_name:
                if folder == span.text:
                    if os.path.getsize(folder) > 0: # if file downloaded completely.
                        print('\nDownload completed.')
                        file_not_found = False
                        break
                time.sleep(0.02)                    # delay to print which file is being scanned
    try:
        for span in ex_file_name:
            shutil.move(span.text, course_folder)
        print('Ex-File Moved to Course Folder successfully.')
    except:
        print('Moving error.')
    driver.close()

Here are my small issues with this code. 1/ The "download completed" shows while the script is listing files in the Downloads folder. 2/ The script doesn't move the files until it lists the whole Downloads directory (I believe) becomes even though he finds the files and declares the downloads completed. screen shot 2018-08-20 at 03 05 06

ankitsejwal commented 6 years ago

@Otech-Man you are brilliant, I've made changes to your code please test the attached file. As you spent a lot of time, I want your contribution to be counted, can you please:

fork this repository
```
# then
$ git pull
```
and then replace this new exercise_file.py
commit new changes and then create a pull request?

I'll merge your changes to Lyndor.

Thanks for your efforts. Great learning from you too.

Please watch these resources, if you're stuck im here to help. https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf (3 min read) https://www.youtube.com/watch?v=FQsBmnZvBdc (6 min watch)

Cheers ANK

exercise_file.py.zip

othmanelamnabhi commented 6 years ago

Works perfectly :D you provided me with some reading for tonight. And I launched some courses to download while I'm away :) Thank you for offering, it really made my day (and made me mess with git which is always a plus haha)

Keep rocking 🤘🏻

othmanelamnabhi commented 6 years ago

I have a question @ankitsejwal, in this snippet

    exercises = driver.find_elements_by_css_selector('a > .exercise-name')

    for exercise in exercises:
        if exercise.is_displayed():
            print('Downloading: ' +  exercise.text)
            exercise.click()

the variable exercices was defined, but nowhere do I see exercice defined. Does Python understand plural and singular or something like that?

ankitsejwal commented 6 years ago

Hi @Otech-Man python doesn't make sense of plural or singular its just a convention I use, the exercise can be written as x as you see in some examples:

# the x 
list = [0, 1, 2]
for x in list:
    print(x)
# output:
0
1
2
# Hence the x is just defined in 'for' statement and it will die after this for loop so to say. 
# x will act as a variable to hold the current item in the iteration, so it will be first 0 then 1 then 2

The above example can also be written as you attempted earlier

list = [0, 1, 2]
for x in range(0, len(list)- 1):
     print(x[0])
# output: same as before

# Hence you can see the previous example is favorable in some cases as the syntax is simpler, 
# sort of plain english -> for exercise in exercises (The magic of python: simplicity :)  )

Cheers ANk

othmanelamnabhi commented 6 years ago

@ankitsejwal makes lot of sense. I do agree, your version if way simpler and more understandable. Mine got me confused with all the 0s there.

ankitsejwal commented 6 years ago

@Otech-Man look what I've got https://www.lynda.com/Revit-tutorials/Revit-Tips-Tricks-Troubleshooting/386630-2.html try downloading the exercise files. Wooof!!!

othmanelamnabhi commented 6 years ago

@ankitsejwal hahahahaha does it ever end? I was just watching an intro video to Javascript on TeamTreeHouse 😂do you have any idea what the issue is?

Do you get this same message? I'm trying to make sense of the last part of the error message

Traceback (most recent call last):
  File "/Users/othmanelamnabhi/Desktop/Lyndor/run.py", line 130, in <module>
    main()
  File "/Users/othmanelamnabhi/Desktop/Lyndor/run.py", line 35, in main
    schedule_download(url)
  File "/Users/othmanelamnabhi/Desktop/Lyndor/run.py", line 51, in schedule_download
    download_course(url)
  File "/Users/othmanelamnabhi/Desktop/Lyndor/run.py", line 117, in download_course
    exercise_file.download(url, course_folder_path)
  File "/Users/othmanelamnabhi/Desktop/Lyndor/exercise_file.py", line 50, in download
    exercise.click()
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
    self._execute(Command.CLICK_ELEMENT)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webelement.py", line 628, in _execute
    return self._parent.execute(command, params)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
    self.error_handler.check_response(response)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Element <span class="exercise-name">...</span> is not clickable at point (483, 805). Other element would receive the click: <div class="show-all" style="display: block;"></div>
  (Session info: chrome=68.0.3440.106)
  (Driver info: chromedriver=2.41.578706 (5f725d1b4f0a4acbf5259df887244095596231db),platform=Mac OS X 10.14.0 x86_64)

ankitsejwal commented 6 years ago

Yah it seems like because there are lots of files our mouse click won't work at the bottom as we need to scroll down first to click on each zip file.

There is this max-height property when deleted should solve the problem or deletion of css file altogether should work.

screen shot 2018-08-22 at 1 17 16 am

othmanelamnabhi commented 6 years ago

Alright @ankitsejwal. So with your guess, I managed to add these two lines of code to the exercise_file.py (pushed a commit).

    element = driver.find_element_by_css_selector(".unlocked") 
    driver.execute_script("document.getElementsByClassName('unlocked')[0].style.maxHeight = 'none';")

So the list of files shows completely and the script is able to launch all downloads. BUT here is where it gets tricky. There were a hundred downloads in that course so chrome had to wait for available sockets and for some files to download before launching new ones. When the script went through the list of all exercices files and launched them, it began searching for then (knowing that many files have not been downloaded yet or even launched).

And so at the end, the script moved some files that it found to the course folder, but a large number of them stayed in my Downloads folder and it got through the list, it printed the "Download successful" message (knowing that a 1gig file was still downloading in the browser window).

So here are some observations and suppositions of what I think happened from my understanding:

I'm supposing the script takes each string from the list (exercice from exercices) and then looks for it in the downloads folder.
So the script looks for instance for 93.zip while 93.zip didn't even start downloading (it got clicked on, yes, but the file and tab was waiting for available for sockets or maybe the file was too large and still not done, but the script starts looking for it because it got clicked on and because we always assumed those files were small?).
When it does not find the file, the script moves to look for the next file in the list.
When the script moves through the whole list (finding some of the files and moving them while not finding others) it stops the process and closes the browser window while a huge file is still downloading.

here is the modified file, if you want to try reproducing what I told you about. Let me know if it fixes the height issue you spoke of and thus the file download and what you think causes the other issues I shared with you.

EDIT :I feel like the first line might be unnecessary.

exercise_file.py.zip

ankitsejwal commented 6 years ago

Hi @Otech-Man great analysis, you've broken the issue well. I pushed the update to development branch yesterday sorry I didn't notify you, I just created a pull request though #48 (but thankfully so that you are up with an analysis and your own code solution :) )

    # used JQuery to remove height property
     driver.execute_script("$('.exercise-tab .content').css('max-height', 'none');")

Your observations are on point with the file being downloading and program skipping them. One solution can be to run a loop to find the exercise until the exercises [] list gets empty. But it has its own disadvantage as this can be a blockade in case if some ex file is not downloaded some how then whole bulk download will fail.

Edit: Forgot to say that: Damn this course is crazy, a test downloads 120+ exercises my laptop had fan running on full speed (think about testing this script multiple times. lol). I've made changes to the while loop https://github.com/ankitsejwal/Lyndor/compare/development if you find time to test it (can't find courage to test it by myself 🤣 )

Cheers ANk

othmanelamnabhi commented 6 years ago

Alright so I think I made a breakthrough haha. Bear with me, this is going to be one long message. (I just read your edit while writing my message, and I know what you mean, the fan started being loud on my macbook pro and that's never a good sign haha, but with the findings below, I don't think that's needed anymore).

So initially after reading your message, my train of thought was to find a way for the browser through selenium to detect when the file was downloading and when it was done. After much reading, I thought about your loop idea the first loop starts after all the file links are clicked, and checks if the files are present in the downloads tab then the second one checks the state of the file. How? By checking if "Show in finder" is available in the HTML. But then again even with these methods, we run into the issue of one file maybe not launching and so a timeout might be necessary, but again sometimes files take time to appear in the downloads tab because they await sockets to free up. So not optimal.

My thinking then went along with the option below : why not download it with ARIA screen shot 2018-08-23 at 00 45 09

I mean it's a downloader, it can monitor the state of the file and then launch another one after that. So I set out to find where you call aria2c (I have to admit that this took much more time than it should, got confused with all the files there) until I started reading save.py then I stumbled upon this line

os.system('youtube-dl --no-check-certificate' + cookie + output + subtitles + url + ext_downloader)

So first I tried this (the most important thing for me was to keep the cookie variable since the download wasn't possible without it)

os.system('youtube-dl --no-check-certificate' + cookie + output + subtitles + 'https://www.lynda.com/ajax/course/83603/download/exercise/90503' + ext_downloader)

I received an error about not being able to download the video file but then youtube-dl reverted to some mode where it loads the link anyway and then started downloading a file with a weird name at the end. (I understood this happened because youtube-dl is supposed to download videos DUH haha)

I had a hunch that if I renamed the file, it would work, and so I gave it a .zip extension and it opened.

So with that in mind, I did another experiment, where I wanted to rely on aria2c as the downloader and not youtube-dl.

I found out that aria2c could also load cookies and that the same cookie file we use can be used here.

So this is how I proceeded

        output = ' -o ' +'"'+ course_folder + "exercise.zip" + '"' # don't quite understand the whole file path structure but from a test download I understood that the naming convention comes from here, so I changed it
        # Exter name downloader option
        ext_downloader = ' --external-downloader aria2c' if read.external_downloader else '' # got rid of this one
        cookie = ' --load-cookies=' + '"' + cookie_path + '"'     # changed this one to accommodate the aria2c command
        uName = read.username

        if "'" in uName:                                     # escaping single quote (') for users with quote in their username
            uName = uName.replace("'", "\\'")
        username = ' -u ' + uName                            # username
        password = ' -p ' + read.password                    # password

        # Checking download preferences
        if  download_preference in ['cookies', 'cookie']:
            cookies.edit_cookie(cookie_path, message.NETSCAPE) # Edit cookie file
            os.system('aria2c ' + 'https://www.lynda.com/ajax/course/83603/download/exercise/90503' + cookie + output) # the final command looks like this

And lo and behold, it worked. I found the file here screen shot 2018-08-23 at 01 08 46 with the naming convention I gave it.

Btw, I did my test with the Programming foundations course https://www.lynda.com/Programming-Foundations-tutorials/Foundations-Programming-Fundamentals/83603-2.html

So now, with that said, if this works as planned, here is how it's gonna go:

Through chrome webdriver we extract the list of files, the famous "exercises" then we feed them to aria2c through that command, the files get downloaded to the course folder in maybe some exercices folder.
We get rid of the click/download part on webdriver, the downloads scanning and the move exercice part.

So any flaw here? I believe we're timezones away, but that's a good thing I guess, although I would have loved sharing my findings with you real time.

ankitsejwal commented 6 years ago

@Otech-Man you are awesome. I'm mostly done with your solution. The username and password will be used to login to a web browser through selenium and then those exercise file URLs are visible which are then fed into aria2 with cookies.txt But as you can see 1) Now there is an added step of downloading cookie.txt in order to download ex_file (added step) 2) For the users who are using Lyndor through cookies.txt -> The same cookie file can be used to download the exercise_file through aria2 but in order to do so, one needs to be logged into the web page anyway. (so download won't work in this case)

Possible solution: a) If we are able to extract cookies with aria, selenium or some other way then case 1 with added step will be done. b) if we are able to inject cookies into web browser session for case 2 so that a login is possible to extract the URL's then people using cookies.txt will get a new feature (currently they can't download ex_files)

What do you think? I'll be pushing the updates in 24 hours as I'm leaving this in the middle as some urgent work popped in.

Cheers.

othmanelamnabhi commented 6 years ago

Hmmmm, I fail to see the issue here tbh. I mean we already had to download the cookie.txt file manually and place it either in desktop or downloads folder. So it's already present and ready to use. And we have to open the webdriver anyway to scrape all those exercise links (which Lyndor already does.)

So for the first situation : you're probably talking about people who set "downloading through library/normal login" in their settings. right? Can't they download the cookie manually through the cookies chrome extension?

As for the second situation: why inject a cookie when we can login with credentials and extract the files list like it was programmed initially?

Alright wait up, there are users who login to Lynda through some other portals (no credentials), and this part can't really be automated. So there needs to be a manual extraction of the cookie (which users of the cookie method already do). But for anyone who has account credentials, here is the solution I offer to save the cookie file then load it through selenium and pickle.

Code snippets are referenced in the links below https://stackoverflow.com/questions/45417335/python-use-cookie-to-login-with-selenium https://stackoverflow.com/questions/15058462/how-to-save-and-load-cookies-using-python-selenium-webdriver#15058521

By the way when you say

The same cookie file can be used to download the exercise_file through aria2 but in order to do so, one needs to be logged into the web page anyway. (so download won't work in this case)

You mean logged in through web driver to extract the links? Hope this is useful, have a good day :D

ankitsejwal commented 6 years ago

Hi, @Otech-Man thanks for the links, I'll have a look at them.

First of all thanks to you, I've created a pull request #49 with which now we are able to download the exercise files in headless way with superb performance and believe me it works like a charm.

Sorry for being concise last time, I'll elaborate on it to be clear: The element that holds the link for exercise file does not exist until we login to webpage through selenium, for example if you are logged out and you try to find this element with class "a.course-file" you won't find it:

<a target="_blank" href="/ajax/course/0000/download/exercise/0000" role="link" class="course-file data-ga-label="tab-exercises-item" data-ga-value="0000" tabindex="-1" aria-controls="ajax/course/0000/download/exercise/0000">
    <span class="exercise-name">10.zip</span>
    <span class="file-size">(13.9MB)</span>
</a>

Every time Lyndor runs it gathers data like course folder, chapters, video/srt file name this data is gather without logging into the website because all these elements are present in DOM (page). But exercise files are different, these elements with class name "course-file" only appears if we access them through selenium or requests module because login is required. So we get element with links from selenium (using username and password) in our new case and we download files with aria2 (using cookies.txt). Thus now if I wan't to use this new way I have to download cookies.txt too (though I would love to do that given the advantages). So, content now can be downloaded in following combinations. videos : username + password/ cookies.txt exercise-files : username + password + cookies.txt exercise-files : username + password Hence, if someone just have cookies.txt they can't download exercise files, like before.

screen shot 2018-08-23 at 11 20 53 pm

Hence I've given a choice between new and old method by updating web page on webserver to accomodate new controls through which we can choose between aria2 and selenium.

Please download the code on development branch to have a look, you need to re-install as this is major release with lots of breaking changes. :)

Cheers ANk

othmanelamnabhi commented 6 years ago

I perfectly understand now, thanks for taking the time to explain. 👍 And great job on the rewrite, it works awesome :)

I'll need to re-read your code again in light of everything you changed to piece everything together, I'd love to be able to write something like that for TeamTreehouse, although I'll probably need to get on the Python/Programming wagon sometime soon so that I can at least bring some meaningful contributions to the table.

Heard Automate boring stuff with Python was a great ressource, let me know if you know of another one. (and also let me know if it's ok to ask you programming questions unrelated to your work here, as I don't want to bother).

There is a some small esthetic feature that might be interesting to add as you did for the videos download and that you could add to the exercices, the position of the download especially for long ones like Revit course.

Something like Downloading 1 out of 100 so that the user has an idea about what awaits.

Btw, this is unrelated, so I don't know if you want me to open another issue for it or not, but I've never been able to edit the JSON file through the webpage, I always open settings.js to read the available values then write the ones I need in the JSON file.

Here is the traceback

Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Library/Python/2.7/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/Library/Python/2.7/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Library/Python/2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/Library/Python/2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Library/Python/2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Library/Python/2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Library/Python/2.7/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/othmanelamnabhi/Desktop/Lyndor-development/settings/settings.py", line 15, in update
    settings_file = open('./settings/static/js/settings.json', 'w')
IOError: [Errno 2] No such file or directory: './settings/static/js/settings.json' # btw the file exists
127.0.0.1 - - [23/Aug/2018 15:25:42] "GET /? HTTP/1.1" 200 -
127.0.0.1 - - [23/Aug/2018 15:25:42] "GET /static/js/settings.js?1535034342.6 HTTP/1.1" 200 -
127.0.0.1 - - [23/Aug/2018 15:25:42] "GET /static/js/settings.json?_=1535034342720 HTTP/1.1" 200 -

ankitsejwal commented 6 years ago

Thanks @Otech-Man I'm sure you will be a great contributor to TeamTreehouse project you are able to analyze and come up with some good solutions. I've heard good things about Automate the boring stuff but never had a chance to spend time on it (maybe in future). For absolute beginners, to me this is the best course, to begin with https://www.udacity.com/course/programming-foundations-with-python--ud036 it can be finished in 1 week and then the best way to learn is to start a project that will teach you a lot(and code every day https://www.youtube.com/watch?v=qZKvZzRynLE) and you are lucky that you already have something in mind. If you want more, than Lynda is always there and you can reach me out anytime for help (I will learn a thing or two from you too 😃 )

# that's a good idea
Downloading 1 out of 100

With the issue:

file = open('./settings/static/js/settings.json', 'w')
IOError: [Errno 2] No such file or directory: './settings/static/js/settings.json'

# can you try removing dot [.] from ./settings/....
# so it should look like
file = open('/settings/static/js/settings.json', 'w')

And yah If the issue still persists I'm happy to have it as a new issue, it's better to separate talks related to this issue. Cheers

othmanelamnabhi commented 6 years ago

Thank you for the kind words @ankitsejwal. Watched the video as soon as I woke up :) thanks for the resources, I love Udacity so I'm not gonna have any issue getting on board with that.

I would love to know more about how you got into programming and what was your path like, your actual stack, do you work in the field, etc... (this is definitely not an investigation haha but cheer curiosity).

I don't know what messaging app you use in Australia :) but here is my email () in case you don't want me plaguing your GitHub 😂

Regarding the settings issue, deleting the "." didn't work, but inputing the actual settings.json directory did work, so there is that if it's any help.

ankitsejwal commented 6 years ago

Haha, sure I'll email you there. Cheers

ankitsejwal / Lyndor

Fix for exercice download + bug report about exercice download #45

Please follow the guide below

Make sure you are using the latest version: run git pull to update your version from Lyndor directory

Before submitting an issue make sure you have:

What is the purpose of your issue?

If the purpose of this issue is a bug report, or you are not completely sure then provide the full terminal output as follows:

Answer questions related to your Environment which will help in reproducing the issue:

The issue was encountered on: :computer:

Enter the python version you are using for download. Find your python version by typing in terminal python -V

python 2.7.10 (happens also with 3.6.4)

If the purpose of this issue is a bug report please provide all kinds of example URLs where you encountered issues (replace following example URLs by yours):

Description of your issue, suggested a solution and other information

Make sure you are using the latest version: run `git pull` to update your version from Lyndor directory

Enter the python version you are using for download. Find your python version by typing in terminal `python -V`