Open zjwu0522 opened 2 weeks ago
Thank you for raising up this issue. I believe an offline workaround without gpt eval can be done as the workflow mentioned in #335. For fixing this bugs, @pufanyi may I ask do you have time to look into this recently? If not, I will look into it and try to fix it in the future. Thank you!
Description:
I'm experiencing an issue with calculating metrics after saving predictions using
--predict_only
and then attempting to compute metrics with--from_log
. It appears that thefrom_log
model is not functioning correctly, possibly due to recent changes in the log format.Steps to Reproduce:
Run prediction and save outputs:
I used the following script to generate predictions and save them:
Attempt to calculate metrics from saved outputs:
Then, I tried to calculate metrics using the saved logs:
Expected Behavior:
--predict_only
, predictions should be saved to the specified--output_path
.--model from_log
should load these saved predictions and compute the evaluation metrics.Actual Behavior:
from_log
model is not correctly processing the saved logs.Environment:
Request:
from_log
model so that it correctly processes logs and calculates metrics.from_log
to be compatible with the new format.I believe addressing this issue is important for workflows that separate prediction and evaluation phases. Moreover, enhancing this functionality will improve support for offline mode, as discussed in issue #335
Thank you for your assistance!