castorini / bertserini

BERTserini
https://github.com/castorini/bertserini
Apache License 2.0
25 stars 10 forks source link

How to build index for Chinese corpus? #13

Closed a5038c closed 2 years ago

a5038c commented 3 years ago

Now i want to try to build index for Chines corpus other than your pre-built Chinese Wiki index. From your document, i should use Anserini's script bin/IndexConnection. But i think that script is only used to build index for English corpus. Is there any way to build index for Chinese corpus? Thanks a lot!

amyxie361 commented 3 years ago

You can use the -language zh argument to index Chinese corpus. Please refer to this PR of anserini: https://github.com/castorini/anserini/pull/804