OVERVIEW: This PR adds a bunch of logging for times when there are disambiguation errors or page errors when calling the terms via page in wikipedia. Also allows a get_corpus.py command to simply grab the UMLS terms without having to put int a filename.
INSTRUCTION: Run the get_corpus.py command form the command-line to test everything is working.
Run python fun_3000/get_corpus.py -d run_1
This should log disambiguation errors and errors when terms can not be found in medline.
Verify the expected files are generated as a result of get_corpus.py. It should match a good portion of the terms in the UMLS csv.
OVERVIEW: This PR adds a bunch of logging for times when there are disambiguation errors or page errors when calling the terms via page in wikipedia. Also allows a get_corpus.py command to simply grab the UMLS terms without having to put int a filename.
INSTRUCTION: Run the get_corpus.py command form the command-line to test everything is working.
Run
python fun_3000/get_corpus.py -d run_1
This should log disambiguation errors and errors when terms can not be found in medline.
Verify the expected files are generated as a result of get_corpus.py. It should match a good portion of the terms in the UMLS csv.