Closed daniyal214 closed 1 month ago
Is it possible that this is due to the process in which the 3 passages were used for extraction? The passage file was subsequently modified, but the cache was not cleared. So it still reads these 3 passages? @bernaljg
This behavior is quite strange, I tried to reproduce this in my own environment with a synthetic corpus and it works correctly. I'm willing to jump on a call and try to debug this if you'd like though!
@bernaljg Sure, that would be great. Thanks for the response.
@yhshu Thanks for the response. I'm not sure, maybe. I'll try once again with a fresh environment with cleared cache, then will see if it still persists.
@daniyal214 I suggest you also clear the output directory before re-running. If the bug is still happening shoot me an email and we can look at it together.
@bernaljg @yhshu Thank you both for your quick and helpful responses!
It seems like the issue was indeed related to the process involving the passages and the cache. I set up a fresh environment, and re-indexed everything. This resolved the problem.
I really appreciate both of your assistance and willingness to help debug the issue. Thanks again!
I'll go ahead and close this issue now.
Hi, After implementing it with my custom dataset, the issue I am facing is that it always the returns first three corpus document as the ranked documents, when I do:
ranks, scores, logs = hipporag.rank_docs(query, top_k=10)
My data setup is
data/cindrella_corpus.json
like this:And my indexing setup is:
which returns the response as:
my HippoRAG test file is
src/hippo_test.py
:and run it:
!python3 src/hippo_test.py --dataset $DATA --query "What the two white pigeons did?"
which returned:So no matter what the question is it always resulted between 0, 1, 2, and only returns three documents not 10. Whereas in the corpus list, the text with idx 0, 1 or 2 do not have any context regarding white pigeon. You can also notice that
top_ranked_nodes
also does not mention 'white_pigeon', whereas I have this discussed in the corpus with idx 17, 22 etc.Could you please tell me what could be the reason of this not retrieving the correct documents, and how could this be modified to get the desired results.
Thanks!