Open wduo opened 2 years ago
At the moment we have the doc2query model only for English. Also the Cross-Encoder is only available for English.
But they could be trained on this new dataset: https://arxiv.org/abs/2203.10232
@kwang2049 What do you think, should we train doc2query & cross-encoder for Chinese?
I am very appreciate that if you could train doc2query & cross-encoder for Chinese. Thx alot!
Hi @liushenglei, thanks for your attention! Sorry for the late reply. I have just come back from my holiday:).
@nreimers yes! I am also very interested in that and would be very happy if there would be some models for my mother tongue:). I think the big question is about the training data. Do you have any suggestions? Personally, I only know Baidu's DuReader_retrieval. It has >80K query-passage pairs obtained from the Baidu search engine.
@kwang2049 @liushenglei @wduo Can you please keep me in the loop? I am also interested in the CN model. My email is max@workhere.org. Can we connect? Thank you.
@kwang2049 I think the DuReader dataset is good. But as I don't know Chinese, I'm not able to use that dataset as explanation etc. are mostly in Chinese.
@kwang2049 @liushenglei @wduo Can you please keep me in the loop? I am also interested in the CN model. My email is max@workhere.org. Can we connect? Thank you.
Yeah, sure. I think we can just write and post things here for now. I will also update it here if I got some new findings about this topic:)
Great job! Can I use gpl on Chinese models please? Which query generator model should I use? Which base models should be used? Which retrieval model should be used? Looking forward to your reply. thanks. @jcklie @reckart @dpetrak @nreimers @mbugert