IHTSDO / snomed-owl-toolkit

The official SNOMED CT OWL Toolkit. OWL conversion, classification and authoring support.
Other
92 stars 19 forks source link

Inclusion of all available terms in OWL file conversion. #14

Closed kaicode closed 4 years ago

kaicode commented 5 years ago

Several members have requested that translated terms within their SNOMED release can be included when transforming from RF2 to an OWL file.

kaicode commented 5 years ago

There is a preview release available for this feature here https://github.com/IHTSDO/snomed-owl-toolkit/releases/tag/2.2.0-snapshot1

CC @fanavarro @danka74

ronaldcornet commented 5 years ago

@kaicode Any update on this preview-release? I think having labels in alternate / multiple languages, and also allowing for preferred / alternate terms, would be a great addition to this tool. I'm getting requests for this from my country / language :).

kaicode commented 5 years ago

Hi @ronaldcornet. I am still waiting for feedback on this one. If you can confirm that it works for you I would be happy to merge it into a release. Cheers.

ronaldcornet commented 5 years ago

Hi @kaicode . Reading the command line options, I see: -preferred-terms (Optional) Flag to use preferred term annotations rather than the default Fully Specified Names.

-language-refset (Optional) The identifier of the language reference set to use when selecting an FSN or PT to include. Defaults to 900000000000509007 which is the United States of America English language reference set.

Does this mean there is no possibility (yet) to also include the synonyms in the OWL-file? I.e., you're not using, e.g., rdfs:comment for including the "acceptable" synonyms? If not, I think that that would be a helpful addition.

kaicode commented 5 years ago

It's true, there is no option to include all synonyms at the moment. At the last MAG meeting Harold introduced a standard for annotation markup in OWL. I don't think we reached agreement that they should be used so I've not put any effort into implementing that yet.

ronaldcornet commented 5 years ago

I think there are some use cases for this to be of use (among others, searching for concepts in an OWL-ontology). I think rdfs:comment would make a sensible choice for that. You could also consider (if it's not in already) rdfs:isDefinedBy for the TextDefinition. Would we need to bring this to MAG in October?

kaicode commented 5 years ago

In April Harold sent me a turtle file with the suggested SKOS annotations. For example:

sct:10002003 a owl:Class ;
    rdfs:label "Resection of stomach fundus (procedure)"@en ;
    owl:equivalentClass [ a owl:Class ;
            owl:intersectionOf ( sct:116175006 [ a owl:Restriction ;
                        owl:onProperty sct:609096000 ;
                        owl:someValuesFrom [ a owl:Class ;
                                owl:intersectionOf ( [ a owl:Restriction ;
                                            owl:onProperty sct:405813007 ;
                                            owl:someValuesFrom sct:414003 ] [ a owl:Restriction ;
                                            owl:onProperty sct:260686004 ;
                                            owl:someValuesFrom sct:129304002 ] ) ] ] ) ] ;
    skos:altName "Gastric fundectomy"@en-GB,
        "Gastric fundusectomy"@en-GB,
        "Gastric fundectomy"@en-US,
        "Gastric fundusectomy"@en-US ;
    skos:prefName "Resection of stomach fundus"@en-GB,
        "Resection of stomach fundus"@en-US .

They include language and dialect acceptability. I will aim to implement this in functional syntax for review at the October MAG meeting.

ronaldcornet commented 5 years ago

Ah, that is great, @kaicode! Much appreciated!

kaicode commented 4 years ago

Last week the SNOMED Modeling Advisory Group met in Malaysia where they accepted recommendations from @hsolbrig to use SKOS annotations for this purpose.

I've pushed a new implementation of this to the develop branch.

We are using en-us or en-gb for terms which are only preferred/acceptable in one dialect, for example:

AnnotationAssertion(rdfs:label :10104004 "Flow cytometric crossmatch, two colors (procedure)"@en)
AnnotationAssertion(skos:prefLabel :10104004 "Flow cytometric crossmatch, two colors"@en-us)
AnnotationAssertion(skos:prefLabel :10104004 "Flow cytometric crossmatch, two colours"@en-gb)

Here the FSN is preferred in both US and GB so just en is used.

We are also including all descriptions and languages available in the extension. For example if we convert the Belgian extension we get a full set of annotations on concept 10087007 | Infection caused by Schistosoma (disorder)| like this:

AnnotationAssertion(rdfs:label :10087007 "Infection caused by Schistosoma (disorder)"@en)
AnnotationAssertion(skos:altLabel :10087007 "Bilharzia"@en)
AnnotationAssertion(skos:altLabel :10087007 "Bilharziasis"@en)
AnnotationAssertion(skos:altLabel :10087007 "Blood fluke infection"@en)
AnnotationAssertion(skos:altLabel :10087007 "Haemic distomiasis"@en-gb)
AnnotationAssertion(skos:altLabel :10087007 "Hemic distomiasis"@en-us)
AnnotationAssertion(skos:altLabel :10087007 "Infection caused by Schistosoma"@en)
AnnotationAssertion(skos:altLabel :10087007 "Schistosomiasis"@en)
AnnotationAssertion(skos:altLabel :10087007 "Schistosomiasis - bilharziasis"@en)
AnnotationAssertion(skos:altLabel :10087007 "Schistosomosis"@en)
AnnotationAssertion(skos:prefLabel :10087007 "Infection by Schistosoma"@en)
AnnotationAssertion(skos:prefLabel :10087007 "infectie door Schistosoma"@nl-be)
AnnotationAssertion(skos:prefLabel :10087007 "infection à Schistosoma"@fr-be)

I will leave this in develop for a few days in case there is any test feedback. Feel free to try it out if you have a moment @ronaldcornet, @mmorine, @fanavarro.

hsolbrig commented 4 years ago

I agree w/ the identifiers, although it would be useful to also include a text definition example (perhaps in different languages).

I would propose, however, that the en-us, en-gb and fr-be all be published as separate OWL "Ontologies", with the ontology URI being the language refset id:

@prefix sctm: <http://snomed.info/sct/> .
sctm:900000000000508004 a owl:Ontology ;
    rdfs:label "SNOMED Clinical Terms, International Release, GB English" ;
    rdfs:comment """Copyright 2019 The International Health Terminology Standards Development Organisation (IHTSDO).
All Rights Reserved. SNOMED CT was originally created by The College of American Pathologists. "SNOMED" and
 "SNOMED CT" are registered trademarks of the IHTSDO.  SNOMED CT has been created by combining SNOMED RT
and a computer based nomenclature and classification known as Clinical Terms Version 3, formerly known as
Read Codes Version 3, which was created on behalf of the UK Department of Health.

This document forms part of the International Release of SNOMED CT distributed by the International Health
Terminology Standards Development Organisation (IHTSDO), and is subject to the IHTSDO's SNOMED CT Affiliate
Licence. Details of the SNOMED CT Affiliate Licence may be found at www.ihtsdo.org/our-standards/licensing/""",
        "Generated as OWL RDF/XML from SNOMED CT release files" ;
    owl:versionIRI <http://snomed.info/sct/900000000000508004/version/20190731> ;
    owl:versionInfo "International Release, GB English, Release Date: 20190731" .
...
AnnotationAssertion(rdfs:label :10087007 "Infection caused by Schistosoma (disorder)"@en)
AnnotationAssertion(skos:altLabel :10087007 "Bilharzia"@en)
AnnotationAssertion(skos:altLabel :10087007 "Bilharziasis"@en)
AnnotationAssertion(skos:altLabel :10087007 "Blood fluke infection"@en)
AnnotationAssertion(skos:altLabel :10087007 "Haemic distomiasis"@en-gb)
AnnotationAssertion(skos:altLabel :10087007 "Infection caused by Schistosoma"@en)
AnnotationAssertion(skos:altLabel :10087007 "Schistosomiasis"@en)
AnnotationAssertion(skos:altLabel :10087007 "Schistosomiasis - bilharziasis"@en)
AnnotationAssertion(skos:altLabel :10087007 "Schistosomosis"@en)
AnnotationAssertion(skos:prefLabel :10087007 "Infection by Schistosoma"@en)
kaicode commented 4 years ago

Hi @hsolbrig,

Here are the annotations for the 10012005 | Expression (procedure) | concept which includes a text definition using the skos:definition annotation. I don't have any translated examples to hand but they will work with this implementation too.

AnnotationAssertion(rdfs:label :10012005 "Expression (procedure)"@en)
AnnotationAssertion(skos:definition :10012005 "An expulsion done by manipulation"@en)
AnnotationAssertion(skos:prefLabel :10012005 "Expression"@en)

I have opted to combine the class definitions and description annotations in the same ontology for the sake of simplicity. This implementation will include annotations for all the languages available in the edition / extension being converted to OWL.

Once all languages and dialects are in one file the language being displayed can be controlled within Protége. The user can actually give an ordered list of the languages/dialects they would like to see and Protége will display the best annotation to match their preference.

My preferred Protége setup is this: View > Render by annotation property > skos:prefLabel And then to configure the display language/dialect: View > Custom Rendering... > Configure... > Set Language = en-gb, en, !. This will display a GB English preferred term if one is available, otherwise the generic English term, otherwise if no English is found the concept id is displayed.

This will be very helpful for SNOMED extensions which contain a partial translation of concepts because if a translated annotation can not be found an English annotation can be used as the class label. If the user would just prefer to see the concept id in this case that could be configured too.

Cheers.

AJKellmann commented 4 years ago

Hi @kaicode,

I'm trying to use the Dutch version of the Snomed ontology to search for synonyms. Therefore I really appreciate the new way to annotate the synonyms using skos:altLabel.

I also would like to add the "Dutch Patient Friendly" Terms. Adding "15561000146104=nl" to the language-refset-dialect-map.properties did not help. The output that I got for those terms consisted of just 2 entities, all the synonyms where missing. Maybe you can help me to fix that.

Best Regards Alexander

AJKellmann commented 4 years ago

Hi @kaicode,

I have tried the new release 2.8.0 to get the descriptions / synonyms of the the Patient friendly terms. But it's still the same problem: The program realises 611 descriptions, but the output file just has 2 entries. Adding the relevant ids (15561000146104=nl or 15561000146102=nl) to the language-refset-dialect-map.properties seemed to have no effect.

Best Regards Alexander

kaicode commented 4 years ago

@AJKellmann I'm sorry to hear the generating the ontology file with Dutch patient friendly terms is not working for you. If you could provide an RF2 archive with this content, or a small archive with example rows, I will try to reproduce and debug the issue.

kaicode commented 4 years ago

Closing this issue because this feature was implemented in Release 2.8.0. @AJKellmann could you raise your problem as a separate issue ticket please?