Closed jgcb00 closed 2 years ago
Hi there,
we will take a look into this asap.
In the meanwhile, I think the flag you need to use is --ignore_normalization
Hi, indeed --ignore_normalization
prevent the issue !
Thanks !
Hi @jgcb00 ,
apparently, the spaces that are being removed from your text are not "regular" space characters, as seen in this hexdump
:
Anyway, they should not be removed, but replaced by regular space characters. We fixed it in edc0dc57f0d121d20a0bb7769554b243ee15fd8f
Cheers, -Marta
Hi, We noticed that when using bifixer, the detokenisation looks like it's broking some dates or ammount of money by removing some spaces that we want to keep !
Morover the
--ignore_detokenization
options is not working, now the command gives back empty filesExemple : raw sentence :
After bifixer with
--ignore_segmentation
becomes :the command :
Others examples :