biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
119 stars 51 forks source link

https/http variants? example braininfo #148

Closed matentzn closed 3 years ago

matentzn commented 3 years ago

In PATO, for example, we can find:

http://braininfo.rprc.washington.edu/centraldirectory.aspx?ID=

bioregistry only has:

https://braininfo.rprc.washington.edu/centraldirectory.aspx?ID=

(https instead of http). Whats your take on this @cthoyt - we can easily fix pato but do we have a general strategy?

cthoyt commented 3 years ago

I'm a bit confused about what you mean by the bioregistry only has https://braininfo.rprc.washington.edu/centraldirectory.aspx?ID=. I did a string search and found that this URL format string for Neuronames (https://bioregistry.io/registry/neuronames) but not PATO.

Do you mean that within the content of PATO, it uses a URL prefix of http://braininfo.rprc.washington.edu/centraldirectory.aspx?ID= for some resource with http instead of https? I've never noticed stuff like this because I have very low confidence in the URL prefix strings to mean anything. Again I could revisit my disdain for IRIs 😛 and make the case that the prefixes are the meaningful bit and not the URL prefixes.

What's your use case? You have IRIs with this prefix and want to parse them, but http/https is an issue because it's doing exact string matching?

matentzn commented 3 years ago

I am basically trying to assign a curie to http://braininfo.rprc.washington.edu/centraldirectory.aspx?ID=123 when reading a turtle file. For example:

PATO:123 oboInOwl:hasDbXref <http://braininfo.rprc.washington.edu/centraldirectory.aspx?ID=123>

Should end up in SSSOM as:

subject_id predicate_id object_id
PATO:123 oboInOwl:hasDbXref neuronames:123

Is this what you were asking?

cthoyt commented 3 years ago

Yes this is what I guessed you might be trying to do. I guess we could just add an extra URL just for neuronames, but I need to spend some more time to improve the data model and infrastructure for adding extra arbitrary providers.

Alternative: generally double up all possible http/https combinations (since you can't trust anyone is consistent with those anyway)

matentzn commented 3 years ago

I guess that is also an option! Ok! Lets to that for now, but lets keep in mind that for the curie/prefix toolkit, we need that.