commoncrawl / ia-web-commons

Web archiving utility library
Apache License 2.0
9 stars 6 forks source link

URLParser fails if URL contains empty port #5

Closed sebastian-nagel closed 7 years ago

sebastian-nagel commented 7 years ago

If the port in a URL is empty (e.g., in http://example.com:/) the URLParser throws an exception:

java.net.URISyntaxException: bad port : http://fc-zenit.ru:/news/turnir-pamyati-sadyrina/2010/12/17/odin-iz-chetyrnadcati/
     at org.archive.url.URLParser.parse(URLParser.java:253)
     at org.archive.url.WaybackURLKeyMaker.makeKey(WaybackURLKeyMaker.java:60)

Cf. NUTCH-2337