cfiltnlp / pyiwn

A Python based API to access Indian language WordNets.
http://www.cfilt.iitb.ac.in/
Creative Commons Attribution Share Alike 4.0 International
34 stars 19 forks source link

Problem with accessing kannada language synsets #3

Closed hlsrekha closed 4 years ago

hlsrekha commented 6 years ago

I'm facing problem with accessing kannada language synsets

from pyiwn import pyiwn iwn = pyiwn.IndoWordNet('kannada')

for some words I'm able to get the synsets

print(iwn.synsets('ಗಂಡಸು')) [Synset('ಮಾನವ.None.858')]

For the words ಮನೆ, ಮಾನವ, ಗುಡುಗುಡು print(iwn.synsets('ಮನೆ')) print(iwn.synsets('ಮಾನವ')) print(iwn.synsets('ಗುಡುಗುಡು')) I'm getting the following error:

File "<pyshell#11>", line 1, in print(iwn.synsets('ಮನೆ')) File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pyiwn\pyiwn.py", line 61, in synsets pos = sp[2] if pos == None else pos IndexError: list index out of range

same error for all the 3 words mentioned above. However, these words are present in all.kannada file. I request you to help me resolve this issue.

Thanks and regards Shashirekha

riteshpanjwani commented 6 years ago

@hlsrekha: We are looking into this. We will try to resolve this issue as soon as possible. Sorry for the inconvenience caused.

hlsrekha commented 6 years ago

@riteshpanjwani :

I tried analyzing pyiwn.py when I got the same error :

File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pyiwn\pyiwn.py", line 61, in synsets pos = sp[2] if pos == None else pos IndexError: list index out of range

for the following statements : print(iwn.synsets('ಮನೆ')) print(iwn.synsets('ಮಾನವ')) print(iwn.synsets('ಗುಡುಗುಡು'))

In the following definition of synsets (in file pyiwn.py)

def synsets(self, word, pos=None): synsets = []

# First part words_file_name = 'all.{}'.format(self._lang) if pos == None else '{}.{}'.format(pos, self._lang) with utils.read_file('{}/words/{}'.format(home, words_file_name)) as fo: for line in fo: sp = utils.clean_line(line) if word == sp[1]: synset_id = sp[0] pos = sp[2] if pos == None else pos break # Second part
synset_file_name = 'all.{}'.format(self._lang) if pos == None else '{}.{}'.format(pos, self._lang) with utils.read_file('{}/synsets/{}'.format(home, synset_file_name)) as fo: for line in fo: sp = utils.clean_line(line) synset_data = utils.synset_data(sp, pos) if word in synset_data[2]: synset_id, head_word, lemma_names, pos, gloss, examples = synset_data[0], synset_data[1], synset_data[2], synset_data[3], synset_data[4], synset_data[5] synsets.append(Synset(synset_id, head_word, lemma_names, pos, gloss, examples)) return synsets First part of the code is redundant. We may atmost get synset_id and pos from this part of the code but will be overwritten in the next (second part) part of the code. The synset information is obtained from the second part of the code only. Even though the first part of the code reads the words from \words\all.kannada (in my case) they are not used further. So I removed the first part of the code and it is working fine. Now, I'm not getting any errors which I had mentioned earlier for words present in the file.

regards -- Shashirekha

riteshpanjwani commented 4 years ago

Hi Shashirekha,

I have completely revamped the inner workings of the library and have fixed these issues. I would recommend you to do a full clean reinstall:

pip uninstall pyiwn pip install --upgrade pyiwn

And follow the steps in this examples notebook: https://github.com/riteshpanjwani/pyiwn/blob/master/examples/example.ipynb

Regards, Ritesh