Currently, when running a large number of test cases (e.g., 5k or more) using the LangTest library, there is a potential risk of losing model responses if the kernel restarts or if an API failure occurs. This situation can lead to the loss of valuable information for the test cases that were already processed.
Suggested Solution
During the execution of harness.run(), periodically save the model responses in a checkpoint file.
In case of a kernel restart or API failure, users can resume the execution from the last checkpoint, preventing the loss of already processed model responses.
Current State
Currently, when running a large number of test cases (e.g., 5k or more) using the LangTest library, there is a potential risk of losing model responses if the kernel restarts or if an API failure occurs. This situation can lead to the loss of valuable information for the test cases that were already processed.
Suggested Solution