Closed ckald closed 3 years ago
For prediction, the code already should support using JSONL format in streaming way.
See line 98 in predict command.
And allennlp's _get_json_data command.
Our code currently reads the data twice. First it counts the number of lines in the input at this line. If you want to make if fully streamable remove the line and remove total_size
from tqdm
at lines 92 and 97 accordingly.
Closing this for now. Feel free to open if you still have issues.
Hi! I'm trying to embed some 100M papers using SPECTER. However, there's some kind of a memory leak that makes the whole process extremely inefficient. I see that AllenNLP models support JSONL input format.
What is the simplest way to replace the
ids
andmetadata
args with a single JSONL file or stdin?