alvations / pywsd

Python Implementations of Word Sense Disambiguation (WSD) Technologies.
MIT License
744 stars 132 forks source link

disambiguate bug #58

Open r-reilly opened 5 years ago

r-reilly commented 5 years ago

from pywsd.allwords_wsd import disambiguate disambiguate('I have five lights')

Traceback (most recent call last): File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'light.n.04'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/allwords_wsd.py", line 51, in disambiguate from_cache=from_cache) File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 251, in simple_lesk from_cache=from_cache) File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 226, in simple_signatures from_cache=from_cache) File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 123, in signatures from_cache=from_cache) File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 48, in synset_signatures return synset_signatures_from_cache(ss, hyperhypo, adapted, original_lesk) File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 35, in synset_signatures_from_cache return cached_signatures[ss.name()][signature_type] File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/frame.py", line 2980, in getitem indexer = self.columns.get_loc(key) File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'light.n.04'

alvations commented 5 years ago

@r-reilly seemed like the synset signatures were off when the wn packaged was updated. Could you try upgrading pip install -U pywsd then try again? It should work now =)

r-reilly commented 5 years ago

A similar problem with disambiguate('I am going to run for president')

KeyError: 'president.n.06'

alvations commented 5 years ago

Try a hard reinstall:

python3 -m pip install --upgrade --user --force-reinstall pywsd

You should see this :

>>> import pywsd
>>> pywsd.__version__
'1.2.3'

>>> from pywsd import disambiguate
Warming up PyWSD (takes ~10 secs)... took 8.754118204116821 secs.
>>> disambiguate('I am going to run for president')
[('I', None), ('am', None), ('going', Synset('travel.v.01')), ('to', None), ('run', Synset('run.v.34')), ('for', None), ('president', Synset('president_of_the_united_states.n.02'))]
r-reilly commented 5 years ago

Is there a way to upgrade the spaCy dependency? it changes results

alvations commented 5 years ago

Currently pywsd has no dependency on spacy.

Do you mean to use SpaCy as the POS tagger?

r-reilly commented 5 years ago

Interesting- never mind! Somehow my spaCy version changed with an install and I thought it was from this package.

hoperiver commented 3 years ago

Try a hard reinstall:

python3 -m pip install --upgrade --user --force-reinstall pywsd

You should see this :

>>> import pywsd
>>> pywsd.__version__
'1.2.3'

>>> from pywsd import disambiguate
Warming up PyWSD (takes ~10 secs)... took 8.754118204116821 secs.
>>> disambiguate('I am going to run for president')
[('I', None), ('am', None), ('going', Synset('travel.v.01')), ('to', None), ('run', Synset('run.v.34')), ('for', None), ('president', Synset('president_of_the_united_states.n.02'))]

I got this:

from pywsd.lesk import simple_lesk Traceback (most recent call last): File "", line 1, in File "E:\GitHub\alvations-pywsd\pywsd__init.py", line 14, in from wn import WordNet ImportError: cannot import name 'WordNet' from 'wn' (C:\Users\Xilan\AppData\Roaming\Python\Python39\site-packages\wn\init__.py)

After I reinstalled pywsd. Seems I have gotten the wrong wn module. Any advice? Thx! E:\GitHub\alvations-pywsd>python39 -m pip install --upgrade --user --force-reinstall pywsd Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Collecting pywsd Using cached pywsd-1.2.4-py3-none-any.whl Collecting wn Using cached https://mirrors.aliyun.com/pypi/packages/00/d0/517f29f0ead1635cf9a79e2c7d49302c33c24bc7196f1c27fe1368aa6d8d/wn-0.5.1-py3-none-any.whl (44 kB) Collecting nltk Using cached nltk-3.5-py3-none-any.whl Collecting numpy Downloading https://mirrors.aliyun.com/pypi/packages/ab/bb/695066483b2329d0cfa3658cad0b1c007539d5247c054033a171b03cefa0/numpy-1.20.1-cp39-cp39-win_amd64.whl (13.7 MB) |████████████████████████████████| 13.7 MB 3.3 MB/s Collecting six Using cached https://mirrors.aliyun.com/pypi/packages/ee/ff/48bde5c0f013094d729fe4b0316ba2a24774b3ff1c52d924a8a4cb04078a/six-1.15.0-py2.py3-none-any.whl (10 kB) Collecting pandas Using cached https://mirrors.aliyun.com/pypi/packages/a1/04/0446c4d78d6eafd68675cc7d77fb16591c954003c5b456e08dd167ce37eb/pandas-1.2.1-cp39-cp39-win_amd64.whl (9.3 MB) Collecting click Using cached https://mirrors.aliyun.com/pypi/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl (82 kB) Collecting joblib Using cached https://mirrors.aliyun.com/pypi/packages/34/5b/bd0f0fb5564183884d8e35b81d06d7ec06a20d1a0c8b4c407f1554691dce/joblib-1.0.0-py3-none-any.whl (302 kB) Collecting tqdm Using cached https://mirrors.aliyun.com/pypi/packages/80/02/8f8880a4fd6625461833abcf679d4c12a44c76f9925f92bf212bb6cefaad/tqdm-4.56.0-py2.py3-none-any.whl (72 kB) Collecting regex Using cached https://mirrors.aliyun.com/pypi/packages/01/05/bf78fd05dfa7e2c007397b2f3f449ff22b5db0fe690b33d90ba6e37cd6bb/regex-2020.11.13-cp39-cp39-win_amd64.whl (270 kB) Collecting python-dateutil>=2.7.3 Using cached https://mirrors.aliyun.com/pypi/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB) Collecting pytz>=2017.3 Using cached https://mirrors.aliyun.com/pypi/packages/70/94/784178ca5dd892a98f113cdd923372024dc04b8d40abe77ca76b5fb90ca6/pytz-2021.1-py2.py3-none-any.whl (510 kB) Collecting toml~=0.10 Using cached https://mirrors.aliyun.com/pypi/packages/44/6f/7120676b6d73228c96e17f1f794d8ab046fc910d781c8d151120c3f1569e/toml-0.10.2-py2.py3-none-any.whl (16 kB) Collecting requests~=2.25 Using cached https://mirrors.aliyun.com/pypi/packages/29/c1/24814557f1d22c56d50280771a17307e6bf87b70727d975fd6b2ce6b014a/requests-2.25.1-py2.py3-none-any.whl (61 kB) Collecting urllib3<1.27,>=1.21.1 Using cached https://mirrors.aliyun.com/pypi/packages/23/fc/8a49991f7905261f9ca9df5aa9b58363c3c821ce3e7f671895442b7100f2/urllib3-1.26.3-py2.py3-none-any.whl (137 kB) Collecting certifi>=2017.4.17 Using cached https://mirrors.aliyun.com/pypi/packages/5e/a0/5f06e1e1d463903cf0c0eebeb751791119ed7a4b3737fdc9a77f1cdfb51f/certifi-2020.12.5-py2.py3-none-any.whl (147 kB) Collecting idna<3,>=2.5 Using cached https://mirrors.aliyun.com/pypi/packages/a2/38/928ddce2273eaa564f6f50de919327bf3a00f091b5baba8dfa9460f3a8a8/idna-2.10-py2.py3-none-any.whl (58 kB) Collecting chardet<5,>=3.0.2 Using cached https://mirrors.aliyun.com/pypi/packages/19/c7/fa589626997dd07bd87d9269342ccb74b1720384a4d739a1872bd84fbe68/chardet-4.0.0-py2.py3-none-any.whl (178 kB) Installing collected packages: urllib3, six, idna, chardet, certifi, tqdm, toml, requests, regex, pytz, python-dateutil, numpy, joblib, click, wn, pandas, nltk, pywsd Attempting uninstall: toml Found existing installation: toml 0.10.2 Uninstalling toml-0.10.2: Successfully uninstalled toml-0.10.2 Successfully installed certifi-2020.12.5 chardet-4.0.0 click-7.1.2 idna-2.10 joblib-1.0.0 nltk-3.5 numpy-1.20.1 pandas-1.2.1 python-dateutil-2.8.1 pytz-2021.1 pywsd-1.2.4 regex-2020.11.13 requests-2.25.1 six-1.15.0 toml-0.10.2 tqdm-4.56.0 urllib3-1.26.3 wn-0.5.1