FallingLights / Teachable-dl

Course downloader for teachable platform written in python3 using selenium and yt-dlp
GNU Lesser General Public License v3.0
106 stars 27 forks source link

[BUG] Doesn't login, doesn't bypass Cloudflare, throws an error while downloading #32

Closed CodeSpartan closed 7 months ago

CodeSpartan commented 8 months ago
  1. Using --email and --password, it doesn't perform any sort of authentication and remains unlogged.
  2. Using --man_login_url , it gets stuck at login due to Cloudflare, which asks to "verify that you're human" over and over.
  3. It throws an error while trying to download a publicly available file that doesn't require credentials. I don't know if it's because the browser starts playing the video from the middle. Attempts to manually rewind the video cause an immediate error.
    ERROR: Could not find login: Message:
    Stacktrace:
        GetHandleVerifier [0x00007FF70BB782B2+55298]
        (No symbol) [0x00007FF70BAE5E02]
        (No symbol) [0x00007FF70B9A05AB]
        (No symbol) [0x00007FF70B9E175C]
        (No symbol) [0x00007FF70B9E18DC]
        (No symbol) [0x00007FF70BA1CBC7]
        (No symbol) [0x00007FF70BA020EF]
        (No symbol) [0x00007FF70BA1AAA4]
        (No symbol) [0x00007FF70BA01E83]
        (No symbol) [0x00007FF70B9D670A]
        (No symbol) [0x00007FF70B9D7964]
        GetHandleVerifier [0x00007FF70BEF0AAB+3694587]
        GetHandleVerifier [0x00007FF70BF4728E+4048862]
        GetHandleVerifier [0x00007FF70BF3F173+4015811]
        GetHandleVerifier [0x00007FF70BC147D6+695590]
        (No symbol) [0x00007FF70BAF0CE8]
        (No symbol) [0x00007FF70BAECF34]
        (No symbol) [0x00007FF70BAED062]
        (No symbol) [0x00007FF70BADD3A3]
        BaseThreadInitThunk [0x00007FF8BD9E257D+29]
        RtlUserThreadStart [0x00007FF8BF2AAA78+40]

Platform: Windows.

Also, to activate an environment on Windows, the command is: env\Scripts\activate

FallingLights commented 8 months ago

Do you mind sharing the url for the course?

CodeSpartan commented 8 months ago

Sure. https://simondev.teachable.com/courses/1783153/lectures/40253627 Also, I tried using my existing browser, which already has the login cookies, but download threw an error as well. I tried this:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# in TeachableDownloader:
chrome_options = Options()
chrome_options.add_argument('user-data-dir=C:\\Users\\MyWindowsName\\AppData\\Local\\Google\\Chrome\\User Data\\MyProfileName')
self.driver = webdriver.Chrome(options=chrome_options)
FallingLights commented 8 months ago

Try the latest commit, or you can just use this url https://simondev.teachable.com/p/glsl-shaders-from-scratch

CodeSpartan commented 8 months ago

When trying with --man_login_url, I get into an infinite "verify you're human" loop. I check the checkbox and then it asks again, never quite satisfied.

When trying with --email, same thing happens, but the program quits after 3 failures with Cloudflare:

INFO: Starting login
INFO: Trying to find login
INFO: Logging in
---- three failures here
ERROR: Could not login: Message:
Stacktrace:
        GetHandleVerifier [0x00007FF70BB782B2+55298]
        (No symbol) [0x00007FF70BAE5E02]
        (No symbol) [0x00007FF70B9A05AB]
        (No symbol) [0x00007FF70B9E175C]
        (No symbol) [0x00007FF70B9E18DC]
        (No symbol) [0x00007FF70BA1CBC7]
        (No symbol) [0x00007FF70BA020EF]
        (No symbol) [0x00007FF70BA1AAA4]
        (No symbol) [0x00007FF70BA01E83]
        (No symbol) [0x00007FF70B9D670A]
        (No symbol) [0x00007FF70B9D7964]
        GetHandleVerifier [0x00007FF70BEF0AAB+3694587]
        GetHandleVerifier [0x00007FF70BF4728E+4048862]
        GetHandleVerifier [0x00007FF70BF3F173+4015811]
        GetHandleVerifier [0x00007FF70BC147D6+695590]
        (No symbol) [0x00007FF70BAF0CE8]
        (No symbol) [0x00007FF70BAECF34]
        (No symbol) [0x00007FF70BAED062]
        (No symbol) [0x00007FF70BADD3A3]
        BaseThreadInitThunk [0x00007FF8BD9E257D+29]
        RtlUserThreadStart [0x00007FF8BF2AAA78+40]

INFO: Cleaning up

I don't have a VPN, just in case.

I'm not a big expert on this subject, but I scraped a large website recently and bypassed Cloudflare using cloudscraper, maybe it can be of help.

CodeSpartan commented 8 months ago

Jabilee's method works to bypass Cloudflare. The course unfortunately fails to download.

When trying --url https://simondev.teachable.com/courses/1783153/lectures/40253627:

INFO: Starting login
INFO: Trying to find login
WARNING: Login button not found, navigating to fallback URL
INFO: Logging in
INFO: Logged in, switching to course page
INFO: Starting download of course: https://simondev.teachable.com/courses/1783153/lectures/40253627
INFO: Switching to course page
INFO: Picking course downloader
ERROR: Could not download course: https://simondev.teachable.com/courses/1783153/lectures/40253627 cause: DriverMethods.find_elements() got an unexpected keyword argument 'timeout'
INFO: Cleaning up

With --url https://simondev.teachable.com/p/glsl-shaders-from-scratch it goes like this:

INFO: Starting login
INFO: Trying to find login
INFO: Logging in
INFO: Logged in, switching to course page
INFO: Starting download of course: https://simondev.teachable.com/p/glsl-shaders-from-scratch
INFO: Switching to course page
INFO: Picking course downloader
ERROR: Could not download course: https://simondev.teachable.com/p/glsl-shaders-from-scratch cause: DriverMethods.find_elements() got an unexpected keyword argument 'timeout'
INFO: Cleaning up
iaamp commented 8 months ago

The bug with DriverMethods.find_elements() got an unexpected keyword argument 'timeout' can be easily bypassed by just changing the calls to find_elements(), e.g. from if self.driver.find_elements(By.ID, "__next", timeout=5): to if self.driver.find_elements(By.ID, "__next"):

As it seems like a dependency issue, not sure what would be best for this package, either changing the calls or changing the stated dependency on selenium driver.

ssm0801 commented 8 months ago

INFO: Starting download of course: https://ashok-it.teachable.com/courses/devops-with-aws/lectures/46799905 INFO: Switching to course page INFO: Picking course downloader ERROR: Could not download course: https://ashok-it.teachable.com/courses/devops-with-aws/lectures/46799905 cause: DriverMethods.find_elements() got an unexpected keyword argument 'timeout' INFO: Cleaning up

ssm0801 commented 8 months ago

after removing all timeouts I am getting Downloader does not support this course template. Please open an issue on github.

mr-wh1tehat commented 8 months ago

The bug with DriverMethods.find_elements() got an unexpected keyword argument 'timeout' can be easily bypassed by just changing the calls to find_elements(), e.g. from if self.driver.find_elements(By.ID, "__next", timeout=5): to if self.driver.find_elements(By.ID, "__next"):

As it seems like a dependency issue, not sure what would be best for this package, either changing the calls or changing the stated dependency on selenium driver.

Thank you for pointing this out, timeout error is handled now. but another issue came up right before downloading the course:

INFO: Downloading lecture: Special-Thanks-&-Credits INFO: Disabling autoplay ERROR: Could not download course: [courselinkhere] cause: Message: javascript error: Cannot read properties of null (reading 'checked') (Session info: chrome=119.0.6045.159) Stacktrace:

0 0x564078918723

1 0x5640785e7047

2 0x5640785ed205

3 0x5640785ef901

4 0x56407867741f

5 0x56407865aeb2

6 0x564078676a40

7 0x56407865ac83

8 0x564078626533

9 0x5640786274de

10 0x5640788e1eea

11 0x5640788e68a4

12 0x5640788d0d22

13 0x5640788e72c0

14 0x5640788b70be

15 0x564078907848

16 0x564078907a3a

17 0x564078917849

18 0x7fe83f1ddac3

INFO: Cleaning up

so i guess javascript error here? any idea how can i fix this please EDIT: SOLVED! I simply commented out the part of the code where it tries to disable autoplay and done)

mr-wh1tehat commented 8 months ago
  1. Using --email and --password, it doesn't perform any sort of authentication and remains unlogged.

    1. Using --man_login_url , it gets stuck at login due to Cloudflare, which asks to "verify that you're human" over and over.

    2. It throws an error while trying to download a publicly available file that doesn't require credentials. I don't know if it's because the browser starts playing the video from the middle. Attempts to manually rewind the video cause an immediate error.

ERROR: Could not find login: Message:
Stacktrace:
        GetHandleVerifier [0x00007FF70BB782B2+55298]
        (No symbol) [0x00007FF70BAE5E02]
        (No symbol) [0x00007FF70B9A05AB]
        (No symbol) [0x00007FF70B9E175C]
        (No symbol) [0x00007FF70B9E18DC]
        (No symbol) [0x00007FF70BA1CBC7]
        (No symbol) [0x00007FF70BA020EF]
        (No symbol) [0x00007FF70BA1AAA4]
        (No symbol) [0x00007FF70BA01E83]
        (No symbol) [0x00007FF70B9D670A]
        (No symbol) [0x00007FF70B9D7964]
        GetHandleVerifier [0x00007FF70BEF0AAB+3694587]
        GetHandleVerifier [0x00007FF70BF4728E+4048862]
        GetHandleVerifier [0x00007FF70BF3F173+4015811]
        GetHandleVerifier [0x00007FF70BC147D6+695590]
        (No symbol) [0x00007FF70BAF0CE8]
        (No symbol) [0x00007FF70BAECF34]
        (No symbol) [0x00007FF70BAED062]
        (No symbol) [0x00007FF70BADD3A3]
        BaseThreadInitThunk [0x00007FF8BD9E257D+29]
        RtlUserThreadStart [0x00007FF8BF2AAA78+40]

Platform: Windows.

Also, to activate an environment on Windows, the command is: env\Scripts\activate

FOR WHOEVER IS NOT BEING ABLE TO USE THS SOFTWARE:

  1. Don't use --man_login_url!, use --login_url
  2. If you're getting DriverMethods.find_elements() got an unexpected keyword argument 'timeout' error, do what @iaamp said.
  3. if you're getting ERROR: Could not download course: [courselinkhere] cause: Message: javascript error: Cannot read properties of null (reading 'checked'), comment out the part of the main.py script where tries to disable autoplay.
FallingLights commented 8 months ago

Should be fixed in the latest commit