bitextor / bifixer

Tool to fix bitexts and tag near-duplicates for removal
GNU General Public License v3.0
29 stars 3 forks source link

'charmap' codec can't decode input_test_2.txt #15

Closed b3ade closed 2 years ago

b3ade commented 2 years ago

Hi I installed and I try to run bifixer.py on tests folder and file input_test_2.txt But I get charset error:

py bifixer.py ../tests/input_test_2.txt ../tests/input_test_2.ourput.txt es en
2022-05-01 16:39:06,499 - INFO - Arguments processed.
2022-05-01 16:39:06,499 - INFO - Executing main program...
2022-05-01 16:39:06,499 - INFO - Starting fixing text
2022-05-01 16:39:06,529 - ERROR - Traceback (most recent call last):
  File "bifixer.py", line 337, in <module>
    main(args)  # Running main program
  File "bifixer.py", line 329, in main
    perform_fixing(args)
  File "bifixer.py", line 313, in perform_fixing
    fix_sentences(args)
  File "bifixer.py", line 184, in fix_sentences
    for i in args.input:
  File "C:\Python38\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 260: character maps to <undefined>

Running on output_test_1.txt works fine. Also trying to run tests/test_bifixer.py return all output files empty.

mbanon commented 2 years ago

Hi @b3ade, I tried test_2 in a fresh installation of Bifixer and it works for me.

Which version of Unidecode do you have installed?

mbanon commented 2 years ago

Hi again, we noticed that you are using Windows. I changed the reading of the input file to force UTF-8, please reinstall Bifixer and let me know if that fixed your issue :)

b3ade commented 2 years ago

Great its working now, but running test_bifixer.py still return empty files. (make a change in monofixer.py also)

mbanon commented 2 years ago

Great! Regarding the tests, how are you running it? Just running pytest in /bifixer/tests should work.

b3ade commented 2 years ago

I just run comand: py test_bifixer.py My opinion is that here encoding is doing mess also but not sure because I don't get any error, The script is executed but with empty output files. Strange.

mbanon commented 2 years ago

Directly running python3 test_bifixer.py does also produces empty files for me. It's a script that is meant to be run by pytest (this is, just run pytestin the testssubdirectory and it will take care of running test_bifixer.py, comparing the produced output with the expected output, etc)

And yes, there is probably the encoding error as well.

b3ade commented 2 years ago

Running that on Win is mission impossible I think. py pytest C:\Python38\python.exe: can't open file 'pytest': [Errno 2] No such file or directory pytest 'pytest' is not recognized as an internal or external command, operable program or batch file.

I run it on Ubuntu and its working fine, but thx for help anyway.

mbanon commented 2 years ago

Ouch!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.