Open kkrugler opened 7 years ago
We could also use this to improve handling of shortened URLs. Basically flag the URL as a shortened URL, and in the FetchFunction we use a different fetcher (no redirects) to resolve. This would solve the issue of us currently (re)fetching the same shortened URL multiple times. So basically we treat it as a special case of redirection, where we're anticipating the redirection and optimizing for it.
Currently we have different fetchers and (effectively) different parsers for robots.txt, sitemap, and regular URLs. This isn't very clean, and duplicates code. So an alternative approach is to have a single FetchFunction and a single ParseFunction that knows how to handle the different types of URLs.
ValidUrl
class - e.g. regular, robots, sitemap.