SED-ML / sed-ml

Simulation Experiment Description Markup Language (SED-ML)
http://sed-ml.org
5 stars 2 forks source link

In general, how can MIRIAM URNs be resolved to specific model files? #79

Closed jonrkarr closed 3 years ago

jonrkarr commented 3 years ago

Identifiers.org (MIRIAM) can resolve ids. But, it typically resolves ids to web pages that describe entities. I'm not aware that Identifiers.org can directly resolve data for entities (e.g., XML files).

tellurium, our tools, and maybe other know how to resolve BioModels ids to XML files. But they don't know how to resolve ids in other namespaces.

Is there a mechanism to resolve data for Identifiers.org that I'm not aware? If not, I think this concept needs to be clarified in the specifications because the specifications misleadingly suggest any MIRIAM id can be resolved.

luciansmith commented 3 years ago

I might be wrong about this, but I thought I remembered that identifiers.org will point you at an actual file if you append a file extension to the URN. I.e. instead of asking for "BIOMD000000245" you ask for "BIOMD000000245.xml", or instead of asking for 'sbml-level-3', you ask for 'sbml-level-3.pdf'.

jonrkarr commented 3 years ago

This requires each namespace to support the protocol that you outlined. In general, identifiers.org namespaces are not required to do this. Its not part of submission to identifiers.org. Does identifiers.org have documentation that encourages maintainers of namespaces to do this?

luciansmith commented 3 years ago

Those are excellent questions, and I don't have any idea at all what the answers may be, or even if my vague recollection is correct in the first place.

The only thing I do know is that biomodels used to have the ability to resolve a miriam URN, but dropped it when they changed backends. What tellurium is supposed to do now is query the biomodels website about a model, get back a list of files, then look through that list for anything marked 'main SBML model' and then ask again for that model.

What I am actually doing is mangling the URN to a biomodels URL that ends with "BIOMDxxxxxx_url.xml" and hoping for the best. This is less than ideal.

It would be wonderful if I could instead just ask identifiers.org to resolve a URN for me and give me a file. If it doesn't currently do this, I say we ask.

jonrkarr commented 3 years ago

I'm doing the same as what tellurium is doing. Processing the URN and mapping that to a URL for BioModels.

The issue is that I don't think there's a way to extend this generally to ANY Identifiers.org namespace. Identifiers.org doesn't collect the information needed to do this for each namespace. As far as I know, this can't be solved by asking Identifiers.org to output more information. Identifiers.org would need to expand its data model and retroactively curate download URL patterns for each namespace.

Also, you cannot simply append .xml to the end of an identifiers.org URL pattern. This doesn't work because Identifiers.org validates whether the id pattern matches the curated id pattern. As an example, see http://identifiers.org/biomodels.db/BIOMD0000000297 vs http://identifiers.org/biomodels.db/BIOMD0000000297.xml.

As a concrete example, BiGG is also registered with Identifiers.org. XML files for its models can't simply be retrieved by appending .xml to the URL. Either to the Identifiers.org URL for resolving the entity or to the URL that Identifiers.org resolves.

I think MIRIAM URNs for models is fundamentally flawed. The infrastructure to make this work doesn't exist. Unless this can be fixed, I suggest this be removed from the SED-ML specifications.

matthiaskoenig commented 3 years ago

As far as I understand identifiers.org you can register urls for your registry. Currently most (probably all) registered urls link to HTML webpages. But one could easily register an additional url for a resource which returns xml or json, for instance a REST API. This would allow to directly get data or models from the given resource. See for instance the gene ontology https://registry.identifiers.org/registry/go which has multiple provider urls. In your tool you could then select to use the XML/JSON data URL instead of the canonical HTML page. Of course the identifiers.org link would still link to the primary URL, but one could have additional data urls which can be used in tools and directly give machine-readible definitions. This was never done, but I played with the thought when registering a database of my group: https://registry.identifiers.org/registry/pkdb We have REST endpoints for the data and for many tools it would be great to directly access the data instead of the webpage via an identifiers.org url.

jonrkarr commented 3 years ago

Yes, one could try to edit the existing records to accept a format argument or submit additional namespaces that resolve to model files such as biomodels.db.export --> http://identifiers.org/biomodels.db.export/BIOMD0000000297&format=sbml-xml --> http://www.ebi.ac.uk/biomodels/model/download/BIOMD0000000297?filename=BIOMD0000000297_url.xml.

I've registered a few namespaces. Its a fairly easy process.

However, such resolution isn't currently available. This might need to be discussed with Manuel Bernal Llinares because this would be a little outside their paradigm. To edit existing records owned by other people, those people would also have to be brought into the conversation.

Even if all of this were done, the specifications should also clarify that this resolution is only possible for a subset of namespaces which support such resolution to model files.

jonrkarr commented 3 years ago

I would suggest building on top of modelXchange rather than trying to engineer this into identifiers.org. For one, only a few of the many identifiers.org namespaces provide models. Second, identifiers.org doesn't presently facilitate access to records in multiple formats.

jonrkarr commented 3 years ago

I'm closing this proposal in favor of #86 to use modelXchange instead of Identifiers.org.