PaddlePaddle / RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.
Apache License 2.0
766 stars 128 forks source link

关于tokenizer是如何提取的 #112

Open JayLin1996 opened 1 year ago

JayLin1996 commented 1 year ago

找了很久没有找到如何操作token的,我使用paddlenlp官方的pretrain模型:rocketqa-zh-base-query-encoder和rocketqa-zh-base-para-encoder,想对齐embedding,但是发现对文本使用embedding的维度始终是(1,xx,768),但是这里的baseline都是(1,768)