Open albertvillanova opened 2 years ago
It needs support for ZIP:
ERROR:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 23: invalid start byte
These datasets just contain pairs of words:
Word1 Word2 POS Sim1 Sim2 STD
biến ngập V 3.13 5.22 0.72
nhà_thi_đấu nhà N 3.07 5.12 1.18
động tĩnh V 0.6 1.0 0.95
I don't think these are appropriate to train a Language Model. CC: @yjernite