bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
781 stars 208 forks source link

The evaluation results are inconsistent across different GPUs #252

Open DonteFlynn opened 2 months ago

DonteFlynn commented 2 months ago

command:

image

result:

image

Why do the same model and the same command yield inconsistent results?