Closed MHDBST closed 3 years ago
You're running the model on your own dataset, is that correct?
First, can you confirm that you've set up your virtualenv as described in the README
? Just want to rule out software dependency issues.
If that's not it, it looks like something's going wrong in the tokenizer. I think the first thing to do is try to isolate the example that's causing the problem, and figure out why it's breaking the tokenizer. My guess is that maybe you're giving an input sequence that is longer than the 512-token max for RoBERTa, but that's just a guess.
If you can't get it working, feel free to post a minimal code example with the exact string that's causing the tokenization error, and I can try to help debug further.
Closing due to lack of activity.
Hi, I'm using scifact to do fact checking on a personal dataset. First I create a corpus and then claims in the format suggested by the code. I run the following code to generate prediction on my own claims:
In the middle of running, after reading a bunch of lines and retrieving abstracts I get this error:
Then it tries to skip this step but it can not continue because it hasn't created
merged_predictions.jsonl
file. Why this error occurs and how can I solve it?