gbif / parsers

Various GBIF parsers for dates, countries, language, taxon ranks, etc
Apache License 2.0
4 stars 8 forks source link

Extend the UrlParser to accept more URLs #19

Open marcos-lg opened 4 years ago

marcos-lg commented 4 years ago

Currently, our UrlParser doesn't accept URLs like:

For the last 2 examples it would be enough if we could translate them to their IDNs.

This is making us reject URLs that actually work if used in a browser.

MattBlissett commented 4 years ago

http://dep_bio.pnzgu.ru/Gerbariy_im_I_I_Sprygina isn't a valid hostname.

However, it mostly works with web browsers, search engines etc, although it's not possible to register a TLS certificate for dep_bio.pnzgu.ru (only *.pnzgu.ru).

mdoering commented 4 years ago

maybe this can help: https://github.com/smola/galimatias