bitextor / bifixer

Tool to fix bitexts and tag near-duplicates for removal
GNU General Public License v3.0
29 stars 3 forks source link

Bifixer Indexerror: list index out of range #7

Closed jokinlasa closed 3 years ago

jokinlasa commented 3 years ago

After installing the bifixer software, I tried to run the test for testing if the software was working correctly and I get this Error:

`/bifixer-master/tests$ pytest ================================================ test session starts ================================================ platform linux -- Python 3.6.9, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /home/adminitzuli/ml-bifixer-jokin/bifixer-master/tests collected 0 items / 1 error

====================================================== ERRORS ======================================================= _ ERROR collecting test_bifixer.py __ test_bifixer.py:116: in class TestOrthoFix: test_bifixer.py:118: in TestOrthoFix replacements_es = restorative_cleaning.getReplacements("es") ../bifixer/restorative_cleaning.py:612: in getReplacements replacements[field[0].strip()] = field[1].strip() E IndexError: list index out of range ============================================== short test summary info ============================================== ERROR test_bifixer.py - IndexError: list index out of range !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ================================================= 1 error in 0.34s ================================================== `

I get the same error trying to clean my memory, I have not had problems with the installation, and the python version is on point. Have any of you experienced this error or any suggestions to solve it?

Thank you,

Jokin

ZJaume commented 3 years ago

The tests are working for me, have you changed any of the replacements files? The error is telling that a replacement file has a missing column.

jokinlasa commented 3 years ago

No I haven't changed any replacement file, but I deleted all the files and downloaded them again, still not working...

Here I send you the error I get when I try to run the bifixer.py with my Corpus it may help you: 2021-05-20 09:30:39,633 - INFO - Arguments processed. 2021-05-20 09:30:39,633 - INFO - Executing main program... 2021-05-20 09:30:39,633 - INFO - Starting fixing text 2021-05-20 09:30:39,638 - ERROR - Traceback (most recent call last): File "bifixer.py", line 242, in <module> main(args) # Running main program File "bifixer.py", line 234, in main perform_fixing(args) File "bifixer.py", line 218, in perform_fixing fix_sentences(args) File "bifixer.py", line 118, in fix_sentences replacements_tlang = restorative_cleaning.getReplacements(args.trglang) File "/home/adminitzuli/ml-bifixer-jokin/bifixer-master/bifixer/restorative_cleaning.py", line 612, in getReplacements replacements[field[0].strip()] = field[1].strip() IndexError: list index out of range Thank you,

Jokin

jokinlasa commented 3 years ago

Hello @ZJaume, I just solved the problem, I used the replacement files from a virtual machine where I did some tests and now is working, don't know yet why is not working with the replacement files you have to download.

I have another question if you can answer me please, is it a mistake when this happens when running the bifixer.py?:

`2021-05-20 09:40:53,390 - ERROR - Wrong column index on line 110576 2021-05-20 09:40:53,393 - ERROR - Traceback (most recent call last): File "bifixer.py", line 130, in fix_sentences target_sentence = parts[args.tcol - 1] IndexError: list index out of range

2021-05-20 09:40:53,393 - ERROR - Wrong column index on line 110578 2021-05-20 09:40:53,394 - ERROR - Traceback (most recent call last): File "bifixer.py", line 130, in fix_sentences target_sentence = parts[args.tcol - 1] IndexError: list index out of range

2021-05-20 09:40:53,394 - ERROR - Wrong column index on line 110580 2021-05-20 09:40:53,395 - ERROR - Traceback (most recent call last): File "bifixer.py", line 130, in fix_sentences target_sentence = parts[args.tcol - 1] IndexError: list index out of range

2021-05-20 09:40:53,395 - ERROR - Wrong column index on line 110582 2021-05-20 09:40:53,968 - INFO - Text fixing finished 2021-05-20 09:40:53,969 - INFO - Finished 2021-05-20 09:40:53,969 - INFO - Input lines: 111303 rows 2021-05-20 09:40:53,969 - INFO - Output lines: 106245 rows 2021-05-20 09:40:53,969 - INFO - Elapsed time 57.37 s 2021-05-20 09:40:53,969 - INFO - Troughput: 1940 rows/s `

Thank you Jaume,

Jokin

ZJaume commented 3 years ago

Hi Jokin,

Sorry for the misunderstanding, I hadn't the latest changes on my machine. After pulling them correctly I noticed that there were some replacements being separated by space instead of tab. Pushed a commit that fixes it.

Regarding your second question, the error is telling you that, probably the 110582 does not contain the target sentence or has less fields than it is supposed to have.