Almost every data point is damaged. Georgian part is nonsense. When I searched those data in OpenSubtitle site, I found out that those are just Russian characters mapped onto Georgian alphabet. Nowadays many multilingual model is poisoned because of that data. It would be great to investigate more into that topic.
https://opus.nlpl.eu/OpenSubtitles/en&ka/v2018/OpenSubtitles
Almost every data point is damaged. Georgian part is nonsense. When I searched those data in OpenSubtitle site, I found out that those are just Russian characters mapped onto Georgian alphabet. Nowadays many multilingual model is poisoned because of that data. It would be great to investigate more into that topic.