Open Udinanon opened 1 year ago
The idea sounds fine to me, and was requested multiple times before. I think a lot of people will like it. And you are welcome to implement it :)
But please tell me more what kind of content you want to filter and how it is listed in your moodle? E.g. how does the URL pattern looks like you want to filter?
This filter is definitely good, but I'm not sure if it will do what you want :D for example: https://github.com/C0D3D3V/Moodle-DL/blob/0bfc7596c74aaf7923e82d7c9685c74a3aa81a5c/moodle_dl/moodle/result_builder.py#L87
There is no filter for LTI Modules, like kalvidres, helixmedia, opencast ... these will not be filtered by these patterns because they are "cookie mods" (maybe someday someone implements them as normal mod) and handled here: https://github.com/C0D3D3V/Moodle-DL/blob/0bfc7596c74aaf7923e82d7c9685c74a3aa81a5c/moodle_dl/downloader/task.py#L723
But anyway this regex filter is a good idea, and more filters for other parts of moodle-dl will come eventually.
Well, what i wanted to filer was exactly kalvidres
content , so this is a bit of a bummer
Perhaps we could also filter requests going inside external_download_url
directly, applying the same blacklist/whitelist?
We could also add a regex filter to the end of filter_courses in moodle_service.py I do not really like the idea but we can do it. You can do that instead of adding it to the external filter only. So all urls will be filtered.
We will also somday add filters for modules, so that kalvidres module can also be filtered
If you add a filter to filter_courses in moodle_service.py you can filter for r'https://your.moodle.com/mod/kalvidres/view.php\?.*' that should do the trick. Adding a filter option only for kalvidres module will take some more time, because I want to thing about a good solution, that we can apply to all mods
In principle related to #135
Description of the problem
I want to filter files being downloaded based on the URLs in detail, to avoid downloading unwanted content such as videos My university hosts a lot of content by itself, so I can't just do a domain filtering, i need to filter the URLs
Solution
Regex would be a very flexible tool to do it and would expand the capabilities of the tool while using a very common technology
I want to add a
download_domains_blacklist_regex
to the options, i think that's all inconfig.py
Then a small edit to the functionis_filtered_external_domain
intask.py
, checking the full URLs against this Regex list, and then filtering by hostname using the old version of the blacklist I might have to also add the option properties in thetypes.py
dataclassThis should be all the needed steps to add the option correctly, although some of this is based on notes from a previous version of the library I'll likely start working on a PR in the next few days, please tell me if anything here is incorrect or inaccurate
Alternatives
Perhaps the new filtering could cover the previous domain method, but that would break backwards compatibility unnecessarily Other text schemes are possible, but Regex is extremely popular and well implemented in Python