ayota / ddl_nlp

Repo for DDL research lab project.
2 stars 1 forks source link

39 retrieve term list (the real deal) #41

Closed dvetal closed 7 years ago

dvetal commented 8 years ago

OVERVIEW: This PR adds a bunch of logging for times when there are disambiguation errors or page errors when calling the terms via page in wikipedia. Also allows a get_corpus.py command to simply grab the UMLS terms without having to put int a filename.

INSTRUCTION: Run the get_corpus.py command form the command-line to test everything is working.

Run python fun_3000/get_corpus.py -d run_1

This should log disambiguation errors and errors when terms can not be found in medline.

Verify the expected files are generated as a result of get_corpus.py. It should match a good portion of the terms in the UMLS csv.

Waiting for #32 to land before this can be QA'd.