cfiltnlp / pyiwn

A Python based API to access Indian language WordNets.
http://www.cfilt.iitb.ac.in/
Creative Commons Attribution Share Alike 4.0 International
34 stars 19 forks source link

'charmap' codec can't encode character '\u2588' in position 2: character maps to <undefined> #20

Closed CharviG closed 2 years ago

CharviG commented 3 years ago

While integrating pyiwn with my own product..getting error..

2020-08-27:12:31:07,306 INFO [pythServer.py:239] Evaluation of script Completed. 2020-8-27 12:31:07,3 ERROR [2020-08-27 12:31:07,306] INFO in pythServer: Evaluation of script Completed. 2020-8-27 13:03:13,5 ERROR 2020-08-27:13:03:13,578 INFO [helpers.py:20] Downloading IndoWordNet data of size ~31 MB... 2020-8-27 13:03:28,6 ERROR --- Logging error --- 2020-8-27 13:03:28,6 ERROR 2020-08-27:13:03:28,661 ERROR [pythServer.py:235] 'charmap' codec can't encode character '\u2588' in position 2: character maps to 2020-8-27 13:03:28,6 ERROR Traceback (most recent call last): 2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\PythService\pythServer.py", line 200, in evaluatePythonScript 2020-8-27 13:03:28,6 ERROR exec(str1,globals(),locals()) 2020-8-27 13:03:28,6 ERROR File "", line 13, in 2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\site-packages\pyiwn__init__.py", line 13, in 2020-8-27 13:03:28,6 ERROR if not download(): 2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\site-packages\pyiwn\helpers.py", line 39, in download 2020-8-27 13:03:28,6 ERROR sys.stdout.write('\r[{}{}]'.format('\u2588' done, '.' (50 - done))) 2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\encodings\cp1252.py", line 19, in encode 2020-8-27 13:03:28,6 ERROR return codecs.charmap_encode(input,self.errors,encoding_table)[0] 2020-8-27 13:03:28,6 ERROR UnicodeEncodeError: 'charmap' codec can't encode character '\u2588' in position 2: character maps to

SandipSPatil commented 3 years ago

For the command giving error ...>>> iwn = pyiwn.IndoWordNet() 2020-08-27:16:35:07,116 INFO [iwn.py:43] Loading hindi language synsets... Traceback (most recent call last): File "<pyshell#11>", line 1, in iwn = pyiwn.IndoWordNet() File "C:\Python\lib\site-packages\pyiwn\iwn.py", line 45, in init self._synset_df = self._load_synset_file(lang.value) File "C:\Python\lib\site-packages\pyiwn\iwn.py", line 51, in _load_synset_file synsets = list(map(lambda line: self._load_synset(line), f.readlines())) File "C:\Python\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to

SandipSPatil commented 3 years ago

I am using python 3.7.9

ArupDas15 commented 2 years ago

As a temporary fix, you may go to iwn.py file-> _load_synset_file(self, lang) -> change f = open(filename) to f = open(filename,encoding='utf-8'). To modify the iwn.py file locally, I have used the PyCharm IDE.

riteshpanjwani commented 2 years ago

@ArupDas15 I have changed it. Please uninstall the pip package, clone this repo and run python setup.py install