I'm using tidytext for tokenization, but have some problems with texts in French. For instance "L'achat" or "j'ai" are not separated as they should be.
In an issue regarding tidytext you mentioned that you were working on a tokenizer that would work well for French and I got the impression that it was intended for the proustr package. Can you tell me more about it?
Hi Colin,
I'm using tidytext for tokenization, but have some problems with texts in French. For instance "L'achat" or "j'ai" are not separated as they should be. In an issue regarding tidytext you mentioned that you were working on a tokenizer that would work well for French and I got the impression that it was intended for the proustr package. Can you tell me more about it?
Cheers, Lise