What encoding method is used for the data set you provided? I opened it in UTF-8 encoding format. English characters are normal, but Russian and other languages are not normal.
The contexts/paragraphs were taken from the original source datasets. However, I did apply ftfy at runtime. See commaqa/inference/dataset_readers.py for example. You might want to give it a try.
What encoding method is used for the data set you provided? I opened it in UTF-8 encoding format. English characters are normal, but Russian and other languages are not normal.