Closed davidfstr closed 9 months ago
Another example: The following domain seems to always respond with HTTP 403 Forbidden:
Here's some shell code to ignore this domain manually:
# Ignore links from a particular domain
if True:
import crystal.model
import crystal.util.urls
is_unrewritable_url1 = crystal.util.urls.is_unrewritable_url
def is_unrewritable_url2(url: str) -> bool:
return (
is_unrewritable_url1(url) or
'.pximg.net/' in url or
'.pixiv.net/' in url
)
crystal.util.urls.is_unrewritable_url1 = is_unrewritable_url1 # save old
crystal.util.urls.is_unrewritable_url2 = is_unrewritable_url2 # save new
crystal.util.urls.is_unrewritable_url = crystal.util.urls.is_unrewritable_url2 # set new
#crystal.util.urls.is_unrewritable_url = crystal.util.urls.is_unrewritable_url1 # restore old
# Update imported versions of this function (IMPORTANT!)
crystal.model.is_unrewritable_url = crystal.util.urls.is_unrewritable_url
Other sitations that make this feature useful:
It has been observed while downloading http://animeworld.com/ that sometimes there are patterns of linked URLs that accept a connection yet never respond. For example:
http://www.assoc-amazon.com/**
http://counter.dreamhost.com/cgi-bin/Count.cgi?**
These URLs take a long time to attempt to download and then time out (after 10 seconds), slowing the download of the entire project.
It would be nice if there was a way to mark certain URL patterns (i.e. ResourceGroups) in a project as "do not download" so that no download is even attempted.
Related screenshots: