facebookresearch / Clinical-Trial-Parser

Library for converting clinical trial eligibility criteria to a machine-readable format.
Apache License 2.0
161 stars 55 forks source link

the order of the labels #23

Closed bhomass closed 1 year ago

bhomass commented 1 year ago

I went thru pytext train < src/resources/config/ner.json.

The training data has multiple labels. When running pytext predict, the result shows the word classifications in numerical label. How do you match the numbers to the original text labels?

experimentally, I can tell 1=chronic_disease, 7=cancer, 8=age. but, where can I look this up? and btw, this order disagrees with that given in bin/README.md.

salkola commented 1 year ago

That logic is in src/ie/ner.py. Running ./script/ie_parse.sh should reproduce the results in this repo. The following lines in the ie_parse script do the named entity recognition (NER) portion of the IE parser:

export PYTHONPATH="$(pwd)/src
python src/ie/ner.py -m bin/ner.c2 -i data/output/ie_extracted_clinical_trials.tsv -o data/output/ie_ner_clinical_trials.tsv