cfiltnlp / pyiwn

A Python based API to access Indian language WordNets.
http://www.cfilt.iitb.ac.in/
Creative Commons Attribution Share Alike 4.0 International
34 stars 19 forks source link

Want to explore IndoWordNet #22

Closed Tawseef-Mir closed 3 years ago

Tawseef-Mir commented 3 years ago

2021-02-14:11:58:58,246 INFO [iwn.py:43] Loading kashmiri language synsets...

UnicodeDecodeError Traceback (most recent call last)

in ----> 1 iwn = pyiwn.IndoWordNet(lang=pyiwn.Language.KASHMIRI) ~\anaconda3\lib\site-packages\pyiwn\iwn.py in __init__(self, lang) 43 logger.info(f'Loading {lang.value} language synsets...') 44 self._synset_idx_map = {} ---> 45 self._synset_df = self._load_synset_file(lang.value) 46 self._synset_relations_dict = self._load_synset_relations() 47 ~\anaconda3\lib\site-packages\pyiwn\iwn.py in _load_synset_file(self, lang) 49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)]) 50 f = open(filename) ---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines())) 52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos']) 53 synset_df = synset_df.dropna() ~\anaconda3\lib\encodings\cp1252.py in decode(self, input, final) 21 class IncrementalDecoder(codecs.IncrementalDecoder): 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0] 24 25 class StreamWriter(Codec,codecs.StreamWriter): UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 5: character maps to