Can mark resource group as "do not download"

davidfstr commented 2 years ago

It has been observed while downloading http://animeworld.com/ that sometimes there are patterns of linked URLs that accept a connection yet never respond. For example:

http://www.assoc-amazon.com/**
http://counter.dreamhost.com/cgi-bin/Count.cgi?**

These URLs take a long time to attempt to download and then time out (after 10 seconds), slowing the download of the entire project.

It would be nice if there was a way to mark certain URL patterns (i.e. ResourceGroups) in a project as "do not download" so that no download is even attempted.

Related screenshots:

davidfstr commented 1 year ago

Another example: The following domain seems to always respond with HTTP 403 Forbidden:

https://i.pximg.net/

Here's some shell code to ignore this domain manually:

# Ignore links from a particular domain
if True:
    import crystal.model
    import crystal.util.urls

    is_unrewritable_url1 = crystal.util.urls.is_unrewritable_url

    def is_unrewritable_url2(url: str) -> bool:
        return (
            is_unrewritable_url1(url) or
            '.pximg.net/' in url or
            '.pixiv.net/' in url
        )

    crystal.util.urls.is_unrewritable_url1 = is_unrewritable_url1  # save old
    crystal.util.urls.is_unrewritable_url2 = is_unrewritable_url2  # save new
    crystal.util.urls.is_unrewritable_url = crystal.util.urls.is_unrewritable_url2  # set new
    #crystal.util.urls.is_unrewritable_url = crystal.util.urls.is_unrewritable_url1  # restore old

    # Update imported versions of this function (IMPORTANT!)
    crystal.model.is_unrewritable_url = crystal.util.urls.is_unrewritable_url

davidfstr commented 9 months ago

Other sitations that make this feature useful:

KC: Certain embedded resources are known to be ads. Do not want archived & displayed.
YRE, KC: Certain resources are incorrectly identified as embedded Script References and downloaded alongside a group I'd prefer to download by itself. Same as issue #107.

davidfstr / Crystal-Web-Archiver

Can mark resource group as "do not download" #72