I was trying to index the Wikipedia dump in paragraph-level as you did. And in the paper, you mentioned that you got 29.5M paragraphs, but instead I got 33.3M paragraphs. So, I would like to ask if you did some special filter setting when you split the article into paragraphs or just easily split them by article.split("\n")
A silly question about the retriever part:
I was trying to index the Wikipedia dump in paragraph-level as you did. And in the paper, you mentioned that you got 29.5M paragraphs, but instead I got 33.3M paragraphs. So, I would like to ask if you did some special filter setting when you split the article into paragraphs or just easily split them by
article.split("\n")
Thanks