cfiltnlp / pyiwn

A Python based API to access Indian language WordNets.
http://www.cfilt.iitb.ac.in/
Creative Commons Attribution Share Alike 4.0 International
34 stars 19 forks source link

'charmap' codec can't decode byte 0x8d in position 13: character maps to <undefined> #27

Open gokul427 opened 1 year ago

gokul427 commented 1 year ago

iwn = pyiwn.IndoWordNet()

2022-11-06:13:21:14,789 INFO [iwn.py:43] Loading hindi language synsets...

UnicodeDecodeError Traceback (most recent call last) Cell In [5], line 2 1 # language defaults to Hindi ----> 2 iwn = pyiwn.IndoWordNet()

File ~\anaconda3\envs\py38torch\lib\site-packages\pyiwn\iwn.py:45, in IndoWordNet.init(self, lang) 43 logger.info(f'Loading {lang.value} language synsets...') 44 self._synset_idx_map = {} ---> 45 self._synset_df = self._load_synset_file(lang.value) 46 self._synset_relations_dict = self._load_synset_relations()

File ~\anaconda3\envs\py38torch\lib\site-packages\pyiwn\iwn.py:51, in IndoWordNet._load_synset_file(self, lang) 49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)]) 50 f = open(filename) ---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines())) 52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos']) 53 synset_df = synset_df.dropna()

File ~\anaconda3\envs\py38torch\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final) 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to

SrikantShubam commented 1 year ago

In line 50 : instead of this f = open(filename) change it to this f = open(filename, encoding='utf-8')

FarheenD commented 1 year ago

Hi, I am getting the same issue, is there a way to resolve it?

SrikantShubam commented 1 year ago

go to the pyiwn file and change line 50 from this f = open(filename) change it to this f = open(filename, encoding='utf-8')

bvgaurav4 commented 3 months ago

its not working who can i solve this

UnicodeDecodeError Traceback (most recent call last) Cell In[4], line 3 1 import pyiwn 2 pyiwn.download() ----> 3 iwn = pyiwn.IndoWordNet()

File c:\g\pproject\aimbot\bot\bot\lib\site-packages\pyiwn\iwn.py:45, in IndoWordNet.init(self, lang) 43 logger.info(f'Loading {lang.value} language synsets...') 44 self._synset_idx_map = {} ---> 45 self._synset_df = self._load_synset_file(lang.value) 46 self._synset_relations_dict = self._load_synset_relations()

File c:\g\pproject\aimbot\bot\bot\lib\site-packages\pyiwn\iwn.py:51, in IndoWordNet._load_synset_file(self, lang) 49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)]) 50 f = open(filename) ---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines())) 52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos']) 53 synset_df = synset_df.dropna()

File ~\anaconda3\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final) 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to this is my error message .

can u pls a help me resolve this

bvgaurav4 commented 3 months ago

i changed line 50 to the utf8 foormat also

bvgaurav4 commented 3 months ago

srry it worked i forgot to restart my env