Open jggatter opened 5 months ago
It is a good guideline that valid resource URLs contain only ASCII characters, even if web browsers can handle UTF-8 characters like the example above. I could see other ontologies not ensuring this though.
This issue is actually two issues I believe:
I am a bit torn. We usually recommend encoding non-ASCII characters to ensure interoperability across systems, but technically, they should be allowed. I would probably recommend to:
Thanks for the quick reply @matentzn! Sorry I am slow to respond. In the HANCESTRO issue I opened, https://github.com/EBISPOT/hancestro/issues/58, I informed them of your response.
Just curious, when could UTF-8 support be expected in pronto/fastobo? I'm not blocked by this issue, so it's no longer urgent to me. I'll continue using an older version of the HANCESTRO ontology.
when could UTF-8 support be expected in pronto/fastobo? I'm not blocked by this issue, so it's no longer urgent to me
This is an @althonos question!
I transfered this issue to the fastobo
repo, since this is a syntax issue. Either I fucked up the RFC3987 syntax implementation for IRIs, or there is a bug that causes the URL to be parsed as a prefixed identifier instead of an IRI ...
Hello,
The newest release of hancestro.owl adds an entry with resources that are problematic for parsing:
Pronto is unable to parse the above resource,
<obo:AfPO_0000235 rdf:resource="https://en.wikipedia.org/wiki/Efé_people"/>
:See https://github.com/althonos/pronto/blob/master/pronto/pv.py#L104
The testing below suggests that the
é
character is to blame:I don't really know much about OBO standards, so perhaps this is the intended behavior in fastobo. In any case I felt it was worth asking about here! I'll report this to Hancestro as well to see if they can use the url-safe version I show in the example above above.
Thanks, James