bitextor / bifixer

Tool to fix bitexts and tag near-duplicates for removal
GNU General Public License v3.0
29 stars 3 forks source link

Monofixer adds empty extra lines when there are multiple columns #18

Closed cgr71ii closed 1 year ago

cgr71ii commented 1 year ago

Hi!

I have installed bifixer 0.8.3 from PyPi.

I've noticed that monofixer adds empty extra lines when I provide multiple columns, but this doesn't happen with bifixer:

# Bifixer

echo -e "asd1\tasd2\tggg\thhh\nasd11\tasd22\tggg2\thhh2" | bifixer --scol 1 --tcol 2 --ignore_duplicates -q - - en es
# asd1    asd2    ggg     hhh
# asd11   asd22   ggg2    hhh2

# Monofixer
echo -e "asd1\tasd2\tggg\thhh\nasd11\tasd22\tggg2\thhh2" | monofixer --scol 1 --ignore_duplicates -q - - es # the same result with 'en'
# asd1    asd2    ggg     hhh
# 
# asd11   asd22   ggg2    hhh2
# 

When single columns are provided, it works as expected:

echo -e "asd1\nasd2" | monofixer --scol 1 --ignore_duplicates -q - - es
# asd1
# asd2

Output when flag -q is not provided (bifixer):

2022-12-07 11:31:14,377 - INFO - Text fixing finished
2022-12-07 11:31:14,377 - INFO - Finished
2022-12-07 11:31:14,378 - INFO - Input lines: 2 rows
2022-12-07 11:31:14,378 - INFO - Output lines: 2 rows
2022-12-07 11:31:14,378 - INFO - Elapsed time 0.07 s
2022-12-07 11:31:14,378 - INFO - Troughput: 28 rows/s
2022-12-07 11:31:14,378 - INFO - Output file: /home/cgarcia/Documentos/<stdout>
2022-12-07 11:31:14,378 - INFO - Program finished

Output when flag -q is not provided (monofixer):

2022-12-07 11:31:30,672 - INFO - Text fixing finished
2022-12-07 11:31:30,673 - INFO - Finished
2022-12-07 11:31:30,673 - INFO - Input lines: 2 rows
2022-12-07 11:31:30,673 - INFO - Output lines: 2 rows
2022-12-07 11:31:30,673 - INFO - Elapsed time 0.06 s
2022-12-07 11:31:30,673 - INFO - Troughput: 34 rows/s
2022-12-07 11:31:30,673 - INFO - Output file: /home/cgarcia/Documentos/<stdout>
2022-12-07 11:31:30,673 - INFO - Program finished

Is there something wrong?

Thank you!

ZJaume commented 1 year ago

Latest release should fix this, it was a bug. Thanks for reporting!