agroportal / project-management

Repository used to consolidate documentation about the AgroPortal project and track content related issues.
http://agroportal.lirmm.fr
7 stars 0 forks source link

Implement a script to load SSSOM mappings to AgroPortal #265

Open jonquet opened 2 years ago

jonquet commented 2 years ago

Before doing #255 we will have to create a script to load SSOM mappings into AgroPortal. This is related to D2KAB WP2. The correspondences between the two format have been discussed and captured (Clement's note) with @saubin78.

CCing @graybeal @matentzn and @cmungall for information and followup.

jonquet commented 2 years ago

Exemple of mappings in AgroPortal : http://data.agroportal.lirmm.fr/mappings?ontologies=ANAEETHES,AGROVOC&display_links=false&display_context=false&include=all

SSSOM specification: https://mapping-commons.github.io/sssom/Mapping/

AgroPortal SSSOM
classes subject_id
classes object_id
source (mapping) match_type
comment comment
source_name creator_label or author_label
source (process) mapping_provider
relation predicate_id
source_contact_info NA
creator mapping_tool or author_id or creator_id
name
date mapping_date
saubin78 commented 2 years ago

Hi. Thanks @matentzn for informing on the change from match_type to mapping_justification.

I have looked at @jonquet 's proposal for Agroportal to some and I'd like to suggest some modifications and additional elements. :

AgroPortal | SSSOM | AP example from [here ](http://data.agroportal.lirmm.fr/mappings?ontologies=ANAEETHES,AGROVOC&display_links=false&display_context=false&include=all)| comments -- | -- | -- | -- classes/id | subject_id | _http://opendata.inra.fr/anaeeThes/c2_2787_ | criterion needed for choosing which one is subject/object classes/id | object_id | _http://aims.fao.org/aos/agrovoc/c_36549_ | criterion needed for choosing which one is subject/object source (mapping) | match_type --> mapping_justification | _REST_ | possible values : REST; LOOM; SameURI; CUI ? process/comment | comment | _Generated with the Ontology Mapping Harvest Tool - v.1.3 - Agroportal Project - LIRMM - 12/10/2018 15:08 - FR_ | source_name | | _ANAEETHES_ | this would correspond to subject_source if it were a URI instead of a string process/source | mapping_provider | _http://data.agroportal.lirmm.fr/ontologies/ANAEETHES_ | Not bad, this is a URL... process/relation | predicate_id | _http://www.w3.org/2004/02/skos/core#exactmatch_ | Great, this is a URI ! source_contact_info | NA | _null_ | process/creator | creator_id | _http://data.agroportal.lirmm.fr/users/jonquet_ | Great, this is a URI ! process/name | | _REST Mapping_ | what is the difference with source (mapping) ? | process/date | mapping_date | _2018-10-15T12:12:53+02:00_ | clarification needed : is the date in AP when the mapping was created OR loaded in AP ? collection/id |mapping_set_id | _http://data.agroportal.lirmm.fr/rest_backup_mappings/3b4ea420-b292-0136-8446-525400026749_ | @jonquet you can have a look at the whole mapping in [ Sonia's spreadsheet ](https://docs.google.com/spreadsheets/d/1SfXPOBlQJ1MChXWyWmpk1jzkVM5Mo2ev/edit#gid=1348153467) used to test SSSOM on D2KAB's mapping use cases (3d tab).
matentzn commented 2 years ago

I remember now, you made that issue here: https://github.com/mapping-commons/sssom/issues/139

From a cursory look I think most elements look good. These are questionable:

source (mapping) --> this does not map to justification.. Are there more examples?
source_name

source_name should be:

mapping_provider if the source indicates "where the mapping was pulled from" subject_source if the source indicates which terminology the source id lives in. etc..

process/name --> mapping_set_title?
saubin78 commented 2 years ago

Could you please clarify the distinction between

matentzn commented 2 years ago

Moved your question here: https://github.com/mapping-commons/sssom/issues/202

syphax-bouazzouni commented 2 years ago

Todo

Resources

saubin78 commented 2 years ago

@syphax-bouazzouni please use this TEST file : https://docs.google.com/spreadsheets/d/1EpttUuJNWmp2up4SXDcJrc8mtlvZhyjs/edit?usp=sharing&ouid=101729640835482598083&rtpof=true&sd=true

Note that

I think it's a good start.

matentzn commented 2 years ago

I am not 100% sure about the order of the columns as the specs are not completely clear. One way to check may be to use the python toolkit to produce the rdf serialisation. I have not been able to test it on my laptop yet.

Columns in sssom toolkit are sorted by whatever the spec prescribes: https://github.com/mapping-commons/sssom/blob/master/src/sssom_schema/schema/sssom_schema.yaml#L473 - but only when using the sssom sort command (something happened to the CLI docs (they should be automatically deployed)).

You have to be able to parse metadata element either as global metadata or column (local to mapping)

If you see any case that is not permitted by the spec but you think is useful (where a mappings_set element goes into a column or a mapping element goes in the mappings_set), let us know.

There is a new feature in sssom py (not released yet on pypi but merged on master) which is sssom validate. If you want to try this, it will provide you with a much more rigorous validation process of your sssom tables than was previously possible.

jonquet commented 2 years ago

After discussion and based on the examples we are suggesting to implement the following correspondences:

AgroPortal SSSOM
classes Mapping : subject_id
classes Mapping : object_id
comment Mapping : comment
relation Mapping : predicate_id
source (process) Mapping : mapping_justification
creator NA (fixed value: http://data.agroportal.lirmm.fr/users/mappingadmin
source (mapping) NA (this is a fixed property in OntoPortal. Will always be "REST")
source_name MappingSet : mapping_set_id
source_contact_info MappingSet : creator_id
name MappingSet : mapping_set_description
date ( we need to double check that we can override this value and that it is not the date when the mappings has been uploaded to AgroPortal) Mapping : mapping_date OR MappingSet : mapping_date

To address the problem of the direction of the mapping that is lost; we shall work on adding in the Mapping model in OntoPortal new attributes to encode the subject_source and object_source respectively subject_source_id and _object_source_id

Then the loading script will have to resolve the ontology URIs (stored in the 2 new fields) to the OntoPortal IDs.

jonquet commented 2 years ago

@matentzn @jgraybeal We are thinking to implement the "SSSOM2OntoPortal converter" into the sssom-py tool here : https://mapping-commons.github.io/sssom-py/examples.html#convert-command

Note the converter will be generic to produce a JSON output compliant with any OntoPortal instance however, at loading time we will have another loading script that will resolve the ontology IDs to the local ontology IDs (acronyms) in the portal concerned.

What do you think?

matentzn commented 2 years ago

That would be amazing. Both ways would probably be even more amazing :)

graybeal commented 2 years ago

I'm having a little trouble fully grokking the directionality of the correspondences, I think I may need a walkthrough at some point. And of course your mapping model is not the same as BioPortal's any more, or the other OntoPortals. So maybe that's not an issue, but I think it deserves a bit of thought. As does the 'figure out which ontology to use' step.

syphax-bouazzouni commented 2 years ago

Hi @graybeal,

To give you more details about the directionality issue, The SSSOM mappings models are directional from the start node (subject_id) to the end node (object_id). Whereas in the ontoportal model we don't it is just a link between two classes and with it, we can't figure out which class is the origin. The solution that we propose is :

Then for our model that is different from the base one, I think it will work because we are still backward compatible with the base model and the import module that we will develop (it is yet developed but it needs some changes) will be generic to work in the base model

Hi @matentzn you can find a first (working) version of the "SSSOM2OntoPortal converter" here :

Finally, we will

jonquet commented 2 years ago

That would be amazing. Both ways would probably be even more amazing :)

Hi @matentzn We will be working on this too a bit later. Not in the sssom-py tool but directly in our API (using a proxy functionality that we have already sued to produce specific formats). We need to do the conversion from OntoPortal format to SSSOM on our side because some information (e.g., class names) are not in the mapping itself but can be populated by the portal.

And of course your mapping model is not the same as BioPortal's any more, or the other OntoPortals.

Hi @graybeal This is actually not the case. AgroPortal's mapping representation is the same that OntoPortal. We have just added a couple of feature sin the past to host "external" mappings in AgroPortal too (i.e., mappings which only one of the 2 classes mapped is in AgroPortal).

@syphax-bouazzouni Amazing reactivity to develop the converter, great job!