Closed danielz02 closed 9 months ago
I temporarily added an option for max_tokens
in the fairness configuration (c9e51f93a7029bc5568d7ac0da0414dfd75b9883). In the future, we should further refactor this argument into GenerationConfig
.
We also can't parse the results from the crime task properly from time to time. The examples below are from crime_br_0.0
with a similar error above.
To reproduce, use:
dt-run +model_config=hf ++model_config.model=hf/meta-llama/Llama-2-13b-chat-hf +fairness=crime_br_0.0
Describe the bug (Initially discovered and reported by UT Austin's VITA group. ) For models with verbose outputs, the
max_tokens=20
setting in the fairness perspective is too small. This leads to truncated prediction and incorrect result parsing. In addition, fairness scoring metrics are lacking certain keywords. This might also be a problem when the model is not instruction-following.To Reproduce
Example Outputs
Expected behavior
less
is not included as the keyword in fairness.Proposed Fix
max_tokens
settings forLlama-2-13b-chat-hf
src/dt/perspectives/fairness/score_calculation_script.py
.Environment: