extracts hiddens without applying templates or making contrast tuples
can be used with eval by specifying a magic dataset “raw” and including --data_dir
doesn't support few-shot examples, yes balancing by default (though optional for everything now), no streaming (enforced in PromptConfig's __post_init__)
Add support for inference without contrast tuples in Reporter
renaming score to score_contrast_tuple
I'm not sure if I should just make them be the same function and do different things depending on the shape of the input
Columns of provided dataset in --data_dir must contain string “text” and binary “label”, and it shouldn't have any splits
In this mode the LM total logprob assigned to the text is also computed
That way you can perform ~whatever analyses you want by defining the input dataset and reading the output CSV
I prepend tokenizer.bos_token to the input so that I can compute this. Will this always work and be in distribution?
Adds base_fingerprint argument to the builder which reads the fingerprint of the raw dataset to improve caching as the raw datasets are modified
Adds support for saving the predictions to an output directory with --preds_out_dir
__post_init__
)Reporter
score
toscore_contrast_tuple
base_fingerprint
argument to the builder which reads the fingerprint of the raw dataset to improve caching as the raw datasets are modified--preds_out_dir