gbif / parsers

Various GBIF parsers for dates, countries, language, taxon ranks, etc
Apache License 2.0
4 stars 8 forks source link

Parse IDNs. #20

Open MattBlissett opened 4 years ago

MattBlissett commented 4 years ago

Using the library suggested by @mdoering, this parses IDNs at the cost of a 10MB ICU4J library. There seems to be no alternative to this, see https://github.com/smola/galimatias/issues/57.

The library claims (https://github.com/smola/galimatias/issues/23) to parse URLs like http://dep_bio.pnzgu.ru/Gerbariy_im_I_I_Sprygina (which meets the WHATWG spec but doesn't meet the RFCs Java follows due to the underscore), but it doesn't. A pull request to the library is probably the best idea.

For issue #19.