Open sudhanshu-shukla-git opened 2 years ago
i'm exactly here :) still trying to figure it out some thoughts
I wonder what happens to that corpus in between being read from file and getting to that point?!
o well! our mistake is that the corpus.jsonl has the ids as int not strings. The code dataloader expects it to be string so it errors at that Key.
Change the corpus.jsonl to have string _ids.
@ahadda5 Thanks. Yes, even I have _ids as int. Let me change it to string and try again.
Thanks for both of your attention @ahadda5 @sudhanshu-shukla-git! I will add a type assertion assert type(did) == str
here.
This setting follows the one in the BeIR repo. I think string type is used instead of integers can make the IDs more universal.
Have added the type hints and assertion: https://github.com/UKPLab/gpl/pull/12
Hello! For those who have encountered this issue during dataset generation using pandas, the following data type conversion may be helpful for transforming a column:
df = df.astype({'_id': 'string'})
Hi ,
I am facing a key error while pseudo labeling. Looks like pos_pid selected is not found in the corpus.
The corpus, I have has the below structure. Does the order of the _id and numbers matter?
Code to train:
Could you help in what I am missing or doing wrong?