Closed CheaSim closed 2 years ago
Hi, yep it should help according to the original paper. I also find if further randomize naive ids by not using acceding order for each docs but shuffle all the ids will also increase the scores quit a bit.
Thanks for your reply.
I don't have semantic id implemented, I may have a try in the next month. If you want to have a try, you are also welcome to open a PR to add this feature!
I implemented it with another nlp task, but it doesn't work, I may try to use the semantic id with your code.
Hi, Maybe Semantic String Docid will help improve the performance of DSI? In the data/create_NQ_train_vali.py, it uses random doc id.