JULIELab / trec-pm

Support code and resources for participation at the TREC Precision Medicine Track (TREC-PM)
http://trec-cds.appspot.com
MIT License
9 stars 2 forks source link

Umls format change #36

Closed khituras closed 5 years ago

khituras commented 5 years ago

While there are a lot of changes in the branch, the main thing to note - because it will break current installations - is the format change of the UMLS synset provider. The script has been adapted to create the new format. I mainly do this as a PR so that noone is missing the fact that a recreation of the UMLS synset is required.

khituras commented 5 years ago

Jain, as we Germans say, it is at least not a bug. There are no  components of the respective type in the pipeline, thus the empty array. I never implemented logic to leave out the file completely and I also think that it might actually be nice to specifically see that there are no components instead of wondering if the file just got lost. Am 17. Juli 2019, 18:14 +0200 schrieb Michel Oleynik notifications@github.com:

@michelole commented on this pull request. In uima/extra-to-xmi-db-pipeline/cmDescriptions.json:

@@ -0,0 +1 @@ +[] ditto — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

michelole commented 5 years ago

Short question: does it change the synonyms as well, i.e. should we re-run experiments?

khituras commented 5 years ago

It shouldn't change the synonyms. Changes made: Added the CUI, which is excluded when reading the synonyms, and doing case-sensitive de-duplication of the file, the previous file had duplicated synonyms in it. However, the de-duplication was made in Java anyway by use of sets.