INCATools / neoplasmer

Neoplasm Entity Recognition: matching disease names to ontology classes
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Plug into VICC pipelines #1

Open cmungall opened 5 years ago

cmungall commented 5 years ago

Instructions from @ahwagner below - we want to better automate this to have a workflow or services we can run to run this regularly over VICC unnormalized.

This was done by using the same harvester routines we use in production, at https://github.com/ohsu-comp-bio/g2p-aggregator/tree/v0.12/harvester.

Specifically, the `harvest` and `convert` phases were run using the utility scripts `harvest-file-all.sh` and `convert-file-all.sh` here: https://github.com/ohsu-comp-bio/g2p-aggregator/tree/v0.12/util.

Finally, I extracted the relevant terms from the pre-normalized `.convert.json` files using jq:

`cat *.convert.json | jq '.association.phenotypes | .[]?.description' | sort -u > unnormalized_disease_terms.txt`
cmungall commented 5 years ago

Initial results here: https://github.com/cmungall/neoplasmer/blob/master/scratch/vicc-results.tsv

I also updated the docker container

To re-run the analysis:

docker run -p 9055:9055 -e PORT=9055 -v $PWD:/work -w /work  --rm -ti cmungall/neoplasmer swipl -G0  -p library=/tools/prolog -l /tools/utf8.pl /tools/bin/neoplasmer -X .cache -i /data/mondo.owl -i /data/doid.owl -i /data/neoplasm-core.owl TERMFILE