biopragmatics / curies

🐸 Idiomatic conversion between URIs and compact URIs (CURIEs) in Python
https://curies.readthedocs.io
MIT License
21 stars 6 forks source link

`Converter.prefixmap` should be a bimap #96

Closed matentzn closed 11 months ago

matentzn commented 11 months ago

Right now, the prefixmap in the converter object (converter.prefixmap) is a 1:n object, which means that any prefix can be linked to a number of prefixes, which makes it ambiguous (or rather, for those who like splitting hairs, order-dependent). In my opinion, this here should pass (but it does not):

def test_bimap(self):
    epm = [{
    "prefix": "Orphanet",
    "prefix_synonyms": [
        "orphanet.ordo"
    ],
    "uri_prefix": "http://www.orpha.net/ORDO/Orphanet_" }]
    converter = Converter.from_extended_prefix_map(epm)
    self.assertTrue('Orphanet' in converter.prefix_map)
    self.assertFalse('orphanet.ordo' in converter.prefix_map)

This is important, because otherwise I cannot control, as a user, which prefix (not, uri-prefix) should be used in SSSOM. Right now, both are included in the exported curie map, eg.

#   Orphanet: http://www.orpha.net/ORDO/Orphanet_
#   orphanet.ordo: http://www.orpha.net/ORDO/Orphanet_

but only the second, the one I do not want, decides over which prefix should be used during compression. So there are two issues here:

  1. I want a prefix map that is a bimap to ship with my data asset (i.e. the sssom file)
  2. I want to be certain that the "prefix", not the "prefix_synonyms" get to dictate the prefix during compression.

Is this an implementation issue with the prefixmap, or do we need a special extension, converter.bimap to cover this.

See https://github.com/mapping-commons/sssom-py/issues/469

cthoyt commented 11 months ago

You're looking for https://curies.readthedocs.io/en/latest/api/curies.Converter.html#curies.Converter.bimap

matentzn commented 11 months ago

Hmm weird my ide didn't see it. Thanks! Anyhow, this only solves problem number 1. what about compress?

cthoyt commented 11 months ago

I am not sure I understand what you are asking. When compressing, it only ever goes to the preferred prefix, which is dictated by the structure of the EPM.

If you use the prefix map data structure that has duplicates of the URI prefix, then there can't be any guarantees which one gets made the "primary" (I think the implementations picks the first)

matentzn commented 11 months ago

Not relevant anymore