revamp default space tokenization - review the ((newline)) thing

We inherited from the onmt-py space tokenization which split all text based on the whitespace " " (and only this one versus all python whitespace before onmt-py 3.4)

However it would be better to rely on the tokenizer to split the text in tokens

it would be easier to handle multispaces, multitabs, linebreaks (\r, \n, etc ...)

It would require to review all transforms because at the moment they receive list of tokens (they should now receive strings or streams).

eole-nlp / eole

revamp default space tokenization - review the ((newline)) thing #33