Open bittlingmayer opened 7 years ago
The offending bit of the script is tr 0-9 " ". So '1st' and '3D' are not in wiki.en.vec.
tr 0-9 " "
'1st'
'3D'
wiki.en.vec
'It won 1st place in the 3D film contest.' -> 'it won st place in the d film contest .'
'It won 1st place in the 3D film contest.'
'it won st place in the d film contest .'
Another bug here is the final substitution, which removes '«'. Probably that's because in English it is used for navigation like breadcrumbs.
'«'
But in most European languages, it is an ordinary quotation mark. (Opening in some, closing in others.)
'Г. Шмидт, можно сказать «Давай давай!»?' -> 'г . шмидт , можно сказать давай давай ! » ?'
'Г. Шмидт, можно сказать «Давай давай!»?'
'г . шмидт , можно сказать давай давай ! » ?'
'Dann stammerte er »Was... was fr a Witz soll des denn sein?«' -> 'dann stammerte er »was . . . was fr a witz soll des denn sein ?'
'Dann stammerte er »Was... was fr a Witz soll des denn sein?«'
'dann stammerte er »was . . . was fr a witz soll des denn sein ?'
This is quite odd.
See also: https://github.com/facebookresearch/fastText/issues/161
Any update on this? It would be ideal to avoid these issues before the next re-training.
The offending bit of the script is
tr 0-9 " "
. So'1st'
and'3D'
are not inwiki.en.vec
.'It won 1st place in the 3D film contest.'
->
'it won st place in the d film contest .'
Another bug here is the final substitution, which removes
'«'
. Probably that's because in English it is used for navigation like breadcrumbs.But in most European languages, it is an ordinary quotation mark. (Opening in some, closing in others.)
'Г. Шмидт, можно сказать «Давай давай!»?'
->
'г . шмидт , можно сказать давай давай ! » ?'
'Dann stammerte er »Was... was fr a Witz soll des denn sein?«'
->
'dann stammerte er »was . . . was fr a witz soll des denn sein ?'
This is quite odd.
See also: https://github.com/facebookresearch/fastText/issues/161