DCGM / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
0 stars 2 forks source link

Harness - Truncation #2

Closed mdocekal closed 6 months ago

mdocekal commented 7 months ago
mdocekal commented 7 months ago

I'm about to write an issue on their GitHub.

MFajcik commented 6 months ago

We implemented alternate version of truncation (truncating instruction instead of few-shot samples). This doesn't seem to be helpful on propaganda datasets (7/13 tasks got worse result for CSMPT-100k). So we keep the original truncation for now.

However this might be still an issue for instruction tuned models. Only then we might consider pull requesting this.