fhamborg / Giveme5W1H

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
Apache License 2.0
507 stars 88 forks source link

Issues with MasterExtractor() #54

Open johanneskruse opened 4 years ago

johanneskruse commented 4 years ago

Hi,

I am trying to get Giveme5W1H up and running but when calling MasterExtractor() in python I get the following:

ConfigurationError: Using Nominatim with default or sample user_agent "geopy/2.0.0" is strongly discouraged, as it violates Nominatim's ToS https://operations.osmfoundation.org/policies/nominatim/ and may possibly cause 403 and 429 HTTP errors. Please specify a custom user_agent with Nominatim(user_agent="my-application") or by overriding the default user_agent: geopy.geocoders.options.default_user_agent = "my-application".

Not sure what is wrong but I assume the issue might be caused by CoreNLP Server - I ran:

$ giveme5w1h-corenlp

and got:

[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called --- [main] INFO CoreNLP - setting default constituency parser [main] INFO CoreNLP - using SR parser: edu/stanford/nlp/models/srparser/englishSR.ser.gz [main] INFO CoreNLP - Threads: 12 [main] INFO CoreNLP - Starting server...

The readme said it could take a few minutes but I waited a long time and nothing ever happened - have anyone had the same issue?

Shawn617 commented 4 years ago

Same here

MarwaEssam commented 4 years ago

Same here

MarwaEssam commented 4 years ago

I solved it by adding the following line in environment_extractor.py: geopy.geocoders.options.default_user_agent = "XYZ-application"

Yet I got this error now : raise LookupError(resource_not_found) LookupError: Resource wordnet not found.

johanneskruse commented 4 years ago

I solved it by adding the following line in environment_extractor.py: geopy.geocoders.options.default_user_agent = "XYZ-application"

Yet I got this error now : raise LookupError(resource_not_found) LookupError: Resource wordnet not found.

This fixed my first issue, and after having downgraded my Java to version 8 ( #32 ) I get the following issue:

Exception in thread Thread-7: Traceback (most recent call last): File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/extractor.py", line 20, in run extractor.process(document) File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 40, in process self._extract_candidates(document) File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/extractors/cause_extractor.py", line 92, in _extract_candidates for candidate in self._evaluate_tree(tree): File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/extractors/cause_extractor.py", line 131, in _evaluate_tree if sibling.label() == 'VP' and "('NP'" in sibling.unicode_repr(): AttributeError: 'ParentedTree' object has no attribute 'unicode_repr'

Exception in thread Thread-5: Traceback (most recent call last): File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/extractor.py", line 20, in run extractor.process(document) File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 40, in process self._extract_candidates(document) File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/extractors/environment_extractor.py", line 153, in _extract_candidates self._cache_nominatim.cache(location_string, location) File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/tools/key_value_cache.py", line 58, in cache self.persist() File "/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/extractor/tools/key_value_cache.py", line 43, in persist with open(self._cache_path, 'wb') as f: FileNotFoundError: [Errno 2] No such file or directory: '/Users/johanneskruse/opt/anaconda3/envs/nlp_infosys/lib/python3.7/site-packages/Giveme5W1H/examples/caches/Nominatim.prickle'

It seems that I am missing caches and something with the ParentedTree?

TitasDas commented 4 years ago

For the issue related to parented tree, check #47 . You need to replace unicode_repr() with __repr__() .I thought this was fixed.

For the issue related to Nominatim.prickle not being detected a workaround would be to go into your caches folder as per the location mentioned in the error and renaming the Nominatim.prickle file without the "_" to "Nominatim.prickle" .

TitasDas commented 3 years ago

Hi @fhamborg ,

Came back to this beauty to fix some of the issues posted.

One of the first things I noticed was although sibling.label() == 'VP' and "('NP'" in sibling.unicode_repr() was changed to sibling.__repr__() in line 131 of cause_extractor.py through #48 it is not reflected in the pip installation. So when I try to run an example like

python3 parse_single_from_code.py

It still results in the attribute error

AttributeError: 'ParentedTree' object has no attribute 'unicode_repr'

That is the reason even though this part of the code was updated people are still experiencing this issue (#63 ).

fhamborg commented 3 years ago

Oh, my bad! I just uploaded the master branch as a new version to pypi. Could you check whether this fixes #63 ?

TitasDas commented 3 years ago

Could you check whether this fixes #63 ?

Yup, it fixes #63. Thank you for the upload.