C0D3D3V / Moodle-DL

Moodle-DL downloads course content fast from Moodle (eg. lecture pdfs)
GNU General Public License v3.0
417 stars 62 forks source link

[Feature Request] Insider content crawling #37

Closed federicotorrielli closed 4 years ago

federicotorrielli commented 4 years ago

Most of the time content is not on the main page but on secondary pages: the only file saved will be a ".desktop" page, and not the content inside that page. More on screenshot. immagine (Gets the PDF but not the videos because they aren't directly in the main page) immagine (Clicking on the "video" will redirect on the page i really wanted to go for the download)

C0D3D3V commented 4 years ago

Can you post the URL of the second image (you can remove the Domain of the univerisity if you want)? I think this is a /mod/page website, i have this mod already on my todo list. I hope I can add it this week.

federicotorrielli commented 4 years ago

https://something.something.it/mod/url/view.php?id=131353

Thanks!

C0D3D3V commented 4 years ago

Ok this is very interesting, this is not /mod/page but /mod/url . So for moodle it is just a link, thats why the downloader created a link file. But it just seems to be an internal link. The .desktop files where do they lead to? If you open them, does the download of the video starts directly?

federicotorrielli commented 4 years ago

If i open one of these ".desktop" files with nano: immagine Outside of it, there is no program to open that file for me... PS: The download doesn't start, it just shows another page with a video item in.

federicotorrielli commented 4 years ago

Instead, I think the downloader should crawl the self.file.content_fileurl if it's a pic or video, if it's not, set it to shortcut.

Ref: https://github.com/C0D3D3V/Moodle-Downloader-2/blob/59d03e83fd5522df50b058fcf4c4a1e857006866/download_service/url_target.py#L191

C0D3D3V commented 4 years ago

Thank you very much.

Good Idea, I will make the changes in a new branche (maybe tomorrow) and let you test it if its good for you.

PS. on Linux .desktop files are like URL shortcuts on Windows (.URL files). Your filemanage should open them (via xdg-open) with you prefered webbrowser.

federicotorrielli commented 4 years ago

Thanks for the quick feedback! I'm now working on a VPS Ubuntu server that I set up only for this task, so opening shouldn't be my priority, I just recover them with ftp or open direct url to my pc (that also has Linux, so thanks for the tip!). If you want I can fork your project and propose something in the future to you, if it's not a problem! Thanks again.

C0D3D3V commented 4 years ago

I am ready to accept any pull request that is made. I am open to anyone who wants to contribute to the project.

But this little problem I can also implement for you :) This is helpful for everyone :)

C0D3D3V commented 4 years ago

So I have pushed some changes into a second branch (https://github.com/C0D3D3V/Moodle-Downloader-2/tree/download_linked_files) you are welcome to check it out. I will experiment a little bit and then do the whole thing optionally so that you can set in the configuration if linked files should be downloaded.

The function requires youtube_dl to be installed (pip install --user youtube_dl) and ffmpeg if you want to download from sources like youtube.

C0D3D3V commented 4 years ago

By the way, this dramatically increases the download volume when there are many videos on the Moodle site. And only files with links not yet found will be downloaded, so you'll need to copy your config into a new folder to test it for all videos.

federicotorrielli commented 4 years ago

Schermata 2020-06-02 alle 17 05 30 Now the problem is that youtube-dl tries do download every single file with a url on the Moodle, but this is easy, i think you already thought about that and that "It follows an error from youtube-dl, Don't worry, this usually just means that no video was found on this website." is yours! Apart from this little issue, the videos are now downloading (from 500Mb to 9Gb, impressive!) When the download ends, I'm gonna check if the videos are corrupted or something like that, and check (if you want) if the code needs to be optimized for youtube-dl.

C0D3D3V commented 4 years ago

Thanks :) The Error message that is displayed there is only for now for testing. This will be removed when I merge with master. This is just a message thrown by youtube_dl. It won't do much good if I ask youtube-dl first if it finds a video. Performance-wise I think it's faster this way. The message is simply suppressed later.

federicotorrielli commented 4 years ago

Can you add in later builds the feature to exclude some of the content (like external content)? I don't wanna download youtube videos, only videos that are inside my Moodle site.

But I also know that someone would like to download only YouTube videos, so...

C0D3D3V commented 4 years ago

Moodle has no real function to determine whether a link is external or internal. Although I store the `isexternal' attribute, this is completely unnecessary because its nearly always false xD.

Only a blacklist or whitelist would help. I can implement both, but not today :D Maybe you want to try that, or just wait :)

C0D3D3V commented 4 years ago

you can now create a whitelist and blacklist for external file links with download_domains_blacklist and download_domains_whitelist. I will add a wiki page for it and finish up the readme. It will be in the next release soon.

C0D3D3V commented 4 years ago

I finished the implementation and tested it for quite a while. If something is still missing, just please open a new issue.