no host name is extracted in the following situations
URL contains 4 slashes after the protocol: https:////example.org/ - while java.net.URL extracts an empty hostname, the Nutch's OkHTTP-based protocol seems to fetch the resource as if there are only two slashes.
similarly java.net.URL and OkHttp show a different behavior if there is an overlong (or even invalid?) userinfo before the hostname (scheme://userinfo@hostname/)
the extraction of registered domains (done by crawler-commons' EffectiveTldFinder does not extract anything if the hostname is equal to a public suffix (gov.uk, kharkov.ua for example)
gov.uk
,kharkov.ua
for example)