bojone / BERT-whitening

简单的向量白化改善句向量质量
480 stars 65 forks source link

求benchmark数据地址 #3

Closed shm007g closed 3 years ago

shm007g commented 3 years ago

http://ixa2.si.ehu.eus/stswiki/images/4/48/Stsbenchmark.tar.gz

我从这里下载的,发现数据处理略有问题。

shm007g commented 3 years ago
def load_train_data(filename):
    """加载训练数据(带标签)
    单条格式:(文本1, 文本2, 标签)
    """
    D = []
    with open(filename, encoding='utf-8') as f:
        for i, l in enumerate(f, start=1):
            l = l.strip().split('\t')
            # if len(l) != 7:
            #     print('len != 7', i, len(l))
            D.append((l[5], l[6], float(l[4])))
    return D

fix like this!