bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
825 stars 219 forks source link

The evaluation results are inconsistent across different GPUs #252

Open DonteFlynn opened 4 months ago

DonteFlynn commented 4 months ago

command:

image

result:

image

Why do the same model and the same command yield inconsistent results?