Use one FetchFunction and one ParseFunction for all types of URLs

Currently we have different fetchers and (effectively) different parsers for robots.txt, sitemap, and regular URLs. This isn't very clean, and duplicates code. So an alternative approach is to have a single FetchFunction and a single ParseFunction that knows how to handle the different types of URLs.

Add a URL type as part of the ValidUrl class - e.g. regular, robots, sitemap.
In robots function, if there's no robots result yet for a domain, reject the URL with new "no robots" result, inject a domain/robots.txt URL into the stream, and create a "pending robots" result.
In robots function, if there's a pending robots result for a domain, reject the URL with the same "no robots" result, but don't inject the robots.txt URL into the stream.
In fetch function, auto change settings for robots or sitemap URLs (e.g. max size)
In parser function, process robots fetch results with crawler-commons parser, and send results to robots function via additional stream.
In parser function, process sitemap fetch results with crawler-commons parser.

ScaleUnlimited / flink-crawler

Use one FetchFunction and one ParseFunction for all types of URLs #34