The evaluation results are inconsistent across different GPUs

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

825 stars 219 forks source link

Open DonteFlynn opened 4 months ago

DonteFlynn commented 4 months ago

command:

result:

Why do the same model and the same command yield inconsistent results?