ayota / ddl_nlp

Repo for DDL research lab project.
2 stars 1 forks source link

closes #48. fix unbound local error when no medical abstracts are found #49

Closed lauralorenz closed 8 years ago

lauralorenz commented 8 years ago

Closes #48. Move the reference of the summary variable into the try block where it is assigned so we don't get reference errors.

Test against the word Myopathy to trigger a case where no summary is parseable from the results of the medical abstract search.

The original bug i.e. against develop:

# if the file bug_test_word.txt contains only the word Myopathy:
(ddl_nlp)ddl_nlp $ python fun_3000/get_corpus.py -s bug_test_word.txt -d bigoltest
2016-08-17 22:32:33,015: INFO : RDFLib Version: 4.2.1
2016-08-17 22:32:33,138: INFO : Fetching wikipedia articles and medical abstracts
2016-08-17 22:32:33,158: INFO : Starting new HTTP connection (1): en.wikipedia.org
2016-08-17 22:32:33,173: INFO : Starting new HTTPS connection (1): en.wikipedia.org
2016-08-17 22:32:33,297: INFO : Retrieving "Myopathy" page from Wikipedia.
2016-08-17 22:32:33,299: INFO : Starting new HTTP connection (1): en.wikipedia.org
2016-08-17 22:32:33,314: INFO : Starting new HTTPS connection (1): en.wikipedia.org
2016-08-17 22:32:33,420: INFO : Saving data to: /Users/llorenz/Development/ddl/ddl_nlp/data/bigoltest/Myopathy.txt
2016-08-17 22:32:33,422: INFO : Starting new HTTP connection (1): en.wikipedia.org
2016-08-17 22:32:33,437: INFO : Starting new HTTPS connection (1): en.wikipedia.org
2016-08-17 22:32:33,545: INFO : Fetched Myopathy term wiki artifacts.
2016-08-17 22:32:34,252: INFO : Document does not have abstract.
Traceback (most recent call last):
  File "fun_3000/get_corpus.py", line 84, in <module>
    fetch_corpus(search_terms, directory, results)
  File "fun_3000/get_corpus.py", line 41, in fetch_corpus
    med_search.get_medical_abstracts(term, data_dir, results)
  File "/Users/llorenz/Development/ddl/ddl_nlp/fun_3000/ingestion/med_abstract_ingest.py", line 172, in get_medical_abstracts
    abstracts_pubmed = fetch_pubmed(search_term, results)
  File "/Users/llorenz/Development/ddl/ddl_nlp/fun_3000/ingestion/med_abstract_ingest.py", line 64, in fetch_pubmed
    for item in summary:
UnboundLocalError: local variable 'summary' referenced before assignment

Fixed in this PR:

# if the file bug_test_word.txt contains only the word Myopathy:
(ddl_nlp)ddl_nlp $ python fun_3000/get_corpus.py -s bug_test_word.txt -d bigoltest
2016-08-17 22:32:20,604: INFO : RDFLib Version: 4.2.1
2016-08-17 22:32:20,752: INFO : Fetching wikipedia articles and medical abstracts
2016-08-17 22:32:20,792: INFO : Starting new HTTP connection (1): en.wikipedia.org
2016-08-17 22:32:21,230: INFO : Starting new HTTPS connection (1): en.wikipedia.org
2016-08-17 22:32:21,357: INFO : Retrieving "Myopathy" page from Wikipedia.
2016-08-17 22:32:21,359: INFO : Starting new HTTP connection (1): en.wikipedia.org
2016-08-17 22:32:21,373: INFO : Starting new HTTPS connection (1): en.wikipedia.org
2016-08-17 22:32:21,475: INFO : Saving data to: /Users/llorenz/Development/ddl/ddl_nlp/data/bigoltest/Myopathy.txt
2016-08-17 22:32:21,478: INFO : Starting new HTTP connection (1): en.wikipedia.org
2016-08-17 22:32:21,492: INFO : Starting new HTTPS connection (1): en.wikipedia.org
2016-08-17 22:32:21,595: INFO : Fetched Myopathy term wiki artifacts.
2016-08-17 22:32:22,503: INFO : Document does not have abstract.
2016-08-17 22:32:24,347: INFO : Fetched Myopathy term medical abstract artifacts.
2016-08-17 22:32:24,347: INFO : Fetching books
[ .. and so on .. ]