Puyodead1 / udemy-downloader

A Udemy downloader that can download courses, with DRM support.
MIT License
1.36k stars 313 forks source link

[BUG] it does not download the attached resources or subtitles. #75

Closed oijm17 closed 2 years ago

oijm17 commented 2 years ago

Description When you have already downloaded only the video files of the classes, and you run the script again, this time specifying the arguments to download all resources and all subtitles: The script does not download any of the attached resources and downloads only some subtitles from classes (very, very few).

To Reproduce

  1. Run the script without downloading attachments or subtitles:

python main.py -c https://www.udemy.com/courses/myawesomecourse -b <Bearer Token>

  1. Wait for it to end.
  2. Run the script again, this time with the --download-assets --download-captions -l all arguments in order to complete the previous download with the attachments and subtitles, like this: python main.py -c https://www.udemy.com/courses/myawesomecourse -b <Bearer Token> --download-assets --download-captions -l all
  3. You will find that upon completion, it downloaded the subtitles for only some classes, and none of the attachments.

Desktop: OS: [Windows 10, Linux Centos 8.5 Stream] Python: [v3.9.1, v3.6.8]

Puyodead1 commented 2 years ago

Are you using the --download-assets and --download-captions options?

oijm17 commented 2 years ago

Are you using the --download-assets and --download-captions options?

Of course, yes, however I just found that the problem occurs in a specific situation, I have modified the description.

Puyodead1 commented 2 years ago

Are you using the --download-assets and --download-captions options?

Of course, yes, however I just found that the problem occurs in a specific situation, I have modified the description.

Would you mind posting some of the log?

Xen0byte commented 2 years ago

Hi, I am experiencing the same issue. For me, it's generating some zero-byte HTML files instead of downloading the lecture resources.

Puyodead1 commented 2 years ago

Hi, I am experiencing the same issue. For me, it's generating some zero-byte HTML files instead of downloading the lecture resources.

Is there anything in the console output like an error?

Xen0byte commented 2 years ago

Hi, I am experiencing the same issue. For me, it's generating some zero-byte HTML files instead of downloading the lecture resources.

Is there anything in the console output like an error?

Not as far as I could tell. I'll paste here a snippet from the log in a bit.

UPDATE 1: If I download with the --skip-lectures --download-assets flags, nothing is downloaded and there's nothing of particular interest in the console:

1_2021-12-16_13-46-32

Update 2: Using just he --download-assets flag downloads zero-byte HTML files, instead of the actual assets. There's nothing of particular interest in the console in this case, either.

2_2021-12-16_14-10-04

Xen0byte commented 2 years ago

@Puyodead1, please let me know what information I could extract that might be useful to you in debugging this issue.

Puyodead1 commented 2 years ago

Thank you, this is very helpful. Would you mind sending me your bearer token and the course URL so I can do some testing? I don't know any courses I have with HTML file resources which is why I never was able to actually test it. You can email me puyodead@protonmail.com or DM me on Discord Puyodead1#001

Xen0byte commented 2 years ago

Thank you, this is very helpful. Would you mind sending me your bearer token and the course URL so I can do some testing? I don't know any courses I have with HTML file resources which is why I never was able to actually test it. You can email me puyodead@protonmail.com or DM me on Discord Puyodead1#001

Yeah, sure thing, but I think your Discord handle is missing a digit from the identifier. Just to note that the resource is PDF, not HTML, but for whatever reason it's not being picked up as such.

Puyodead1 commented 2 years ago

Thank you, this is very helpful. Would you mind sending me your bearer token and the course URL so I can do some testing? I don't know any courses I have with HTML file resources which is why I never was able to actually test it. You can email me puyodead@protonmail.com or DM me on Discord Puyodead1#001

Yeah, sure thing, but I think your Discord handle is missing a digit from the identifier. Just to note that the resource is PDF, not HTML, but for whatever reason it's not being picked up as such.

ah damn github tried formatting it as a number and removed a digit lmao Puyodead#0001

and so its a PDF that is generating empty html files?

Xen0byte commented 2 years ago

Thank you, this is very helpful. Would you mind sending me your bearer token and the course URL so I can do some testing? I don't know any courses I have with HTML file resources which is why I never was able to actually test it. You can email me puyodead@protonmail.com or DM me on Discord Puyodead1#001

Yeah, sure thing, but I think your Discord handle is missing a digit from the identifier. Just to note that the resource is PDF, not HTML, but for whatever reason it's not being picked up as such.

ah damn github tried formatting it as a number and removed a digit lmao Puyodead#0001

and so its a PDF that is generating empty html files?

Essentially, this is the page:

2021-12-17_00-26-44

2021-12-17_00-30-57

... but the PDF is not downloaded, and instead there's this empty HTML file.

Xen0byte commented 2 years ago

@Puyodead1 I've emailed you, in the meantime. 😊

Puyodead1 commented 2 years ago

@Puyodead1 I've emailed you, in the meantime. 😊

I've identified the potential issue, could you please try the latest commit (8756bfc2668f688cb438db1570b7be7e57ab8cf7) Also I noticed this specific course is rather large, if it makes it easier for testing, you can use --save-to-file on the first run and then --load-from-file on any further runs as additional arguments to reduce wait times from processing the course data

Xen0byte commented 2 years ago

@Puyodead1 I've emailed you, in the meantime. 😊

I've identified the potential issue, could you please try the latest commit (8756bfc) Also I noticed this specific course is rather large, if it makes it easier for testing, you can use --save-to-file on the first run and then --load-from-file on any further runs as additional arguments to reduce wait times from processing the course data

Nice! I will most likely have a look sometime tomorrow. Also, thanks for the pointer on saving to and loading from file.

Xen0byte commented 2 years ago

Hi @Puyodead1, the PDFs are downloading fine now, but 0-byte HTML files are still being generated or downloaded too. It's not a massive problem, since I can just get rid of them all at the end, but it's indicative of an issue that may have additional implications.

UPDATE: So far, it seems to be just that one course, so it's possible that it may be structured in a different way from most other ones.

Puyodead1 commented 2 years ago

Hi @Puyodead1, the PDFs are downloading fine now, but 0-byte HTML files are still being generated or downloaded too. It's not a massive problem, since I can just get rid of them all at the end, but it's indicative of an issue that may have additional implications.

UPDATE: So far, it seems to be just that one course, so it's possible that it may be structured in a different way from most other ones.

That's odd, is it the same course you provided in your email? If so, could you tell me the exact command you're using (ofc censor any sensitive stuff). During my testing, I only downloaded assets and it didn't produce any html files

Xen0byte commented 2 years ago

Hi @Puyodead1, the PDFs are downloading fine now, but 0-byte HTML files are still being generated or downloaded too. It's not a massive problem, since I can just get rid of them all at the end, but it's indicative of an issue that may have additional implications. UPDATE: So far, it seems to be just that one course, so it's possible that it may be structured in a different way from most other ones.

That's odd, is it the same course you provided in your email? If so, could you tell me the exact command you're using (ofc censor any sensitive stuff). During my testing, I only downloaded assets and it didn't produce any html files

Yes, it's that course. Apparently, if you only download assets then it doesn't reproduce, but if you use the full python main.py --course-url <Course URL> --download-assets command, then I believe you should see the issue.

Puyodead1 commented 2 years ago

Hi @Puyodead1, the PDFs are downloading fine now, but 0-byte HTML files are still being generated or downloaded too. It's not a massive problem, since I can just get rid of them all at the end, but it's indicative of an issue that may have additional implications. UPDATE: So far, it seems to be just that one course, so it's possible that it may be structured in a different way from most other ones.

That's odd, is it the same course you provided in your email? If so, could you tell me the exact command you're using (ofc censor any sensitive stuff). During my testing, I only downloaded assets and it didn't produce any html files

Yes, it's that course. Apparently, if you only download assets then it doesn't reproduce, but if you use the full python main.py --course-url <Course URL> --download-assets command, then I believe you should see the issue.

Huh, okay. Could you send me a new bearer token for testing? If you prefer Discord, my correct tag is Puyodead1#0001

Xen0byte commented 2 years ago

@Puyodead1 I'm not sure if I'm doing something wrong, but the handle doesn't seem to work. I'll email you another token.

image

Puyodead1 commented 2 years ago

@Puyodead1 I'm not sure if I'm doing something wrong, but the handle doesn't seem to work. I'll email you another token.

image

OH, haha. Puy❄dead1#0001 give that a try. 🤦🏻

Puyodead1 commented 2 years ago

@Xen0byte Resolved in bc9f6ecb1a40aa0aa5eaed66e9935a7d582f9a76 @oijm17 If you continue to have this issue, please open a new issue.

Foxtrod89 commented 1 year ago

--download-assets ignores *.py assets, from this one

Puyodead1 commented 1 year ago

--download-assets ignores *.py assets, from this one

It doesn't ignore anything, if it's an attachment on a lecture it will be downloaded. If you have errors, make a new issue.