Closed cgr71ii closed 1 year ago
Hi!
Either monofixer or bifixer should remove long sentences when the number of words is greater than 5000: https://github.com/bitextor/bifixer/blob/1a91e3eb47b2c4e7de9d6812fe25ed1bd5f4e9d4/bifixer/monofixer.py#L195 https://github.com/bitextor/bifixer/blob/1a91e3eb47b2c4e7de9d6812fe25ed1bd5f4e9d4/bifixer/bifixer.py#L215
monofixer
bifixer
The problem is that, apparently, it seems that it is not working:
pip3 install bifixer==0.8.3
# monofixer python -c "print('asd'); print(' '.join(['a']*6000)); print('asd')" \ | monofixer --scol 1 --ignore_duplicates -q - - es \ | wc -w # 6002 python -c "print('asd'); print(' '.join(['a']*6000)); print('asd')" \ | monofixer --scol 1 --ignore_duplicates --ignore_long -q - - es \ | wc -w # 6002
# bifixer python -c "print('asd\tasd'); print('asd\t' + ' '.join(['a']*6000)); print('asd\tasd')" \ | bifixer --scol 1 --tcol 2 --ignore_duplicates -q - - en es \ | wc -w # 6005 python -c "print('asd\tasd'); print('asd\t' + ' '.join(['a']*6000)); print('asd\tasd')" \ | bifixer --scol 1 --tcol 2 --ignore_long --ignore_duplicates -q - - en es \ | wc -w # 6005
Am I doing something wrong?
Thank you!
Long sentences are not being removed, they are just ignored (not processed, but outputted).
It's not correct at the documentation, I'm fixing it.
Oh! Ok, thank you!
Hi!
Either
monofixer
orbifixer
should remove long sentences when the number of words is greater than 5000: https://github.com/bitextor/bifixer/blob/1a91e3eb47b2c4e7de9d6812fe25ed1bd5f4e9d4/bifixer/monofixer.py#L195 https://github.com/bitextor/bifixer/blob/1a91e3eb47b2c4e7de9d6812fe25ed1bd5f4e9d4/bifixer/bifixer.py#L215The problem is that, apparently, it seems that it is not working:
Am I doing something wrong?
Thank you!