Closed jefflink closed 1 year ago
Hi @jefflink, thank you for your interest in this project! I suspect the program got killed due to a lack of CPU memory, as the evaluation process currently converts the predictions into a numpy array and saves the array onto the CPU for further computations (see L.141-161 at run.py). Since the distantly supervised data is large, this operation may require quite some CPU memory.
We adopt this part of code directly from the codebase without modification, but optimizing to save some memory seems possible. I will give it a try to improve the memory efficiency later.
Before the optimization, one solution could be to split train_distant.json
into several parts, run the inference script on each part and then concatenate the evaluated results.
I hope this is clear and helps you solve your problem.
Thanks @YoumiMa . I'll take a look at the script. Just surprised that even with 128GB of RAM it is not sufficient.
@jefflink Hi, I wonder if you solve this problem? I meet exactly the same situation. Thank u!
@Winson-Huang Hi! Unfortunately I did not managed to fix it.
Can I check if running the infer_distance script requires alot of RAM? I have a 48GB GPU card with 128GB RAM, however running both infer_distant_bert or infer_distant_roberta will result in Python killed during evaluating batches. For example: