PhilipQuirke / quanta_maths

Tool used to verify accuracy of transformer model
Apache License 2.0
1 stars 1 forks source link

Change terminology: "1M Qs" to "six 9s" #36

Closed PhilipQuirke closed 6 months ago

PhilipQuirke commented 6 months ago

The phrase "1M Qs" (1 million questions answered correctly) is in the Colabs and paper 2. Paper 2 gives the impression that, although we have successfully run 1M Qs, we expect we could successfully run 10M Qs. One reviewer called "1M Qs" arbitrary and railed against the assumption of complete accuracy in a model.

Some of the newly added models (using different seeds) have failed on 1 or 2 questions in 1M. The 1M run stops after the first error. This 1 question could be found after say 200K questions, not giving a good impression of how accurate/inaccurate the model is.

IT industry measures say data-centre uptime reliability using terms like "five 9s" which is 99.999%. This is a known industry term and makes clear that there is "five 9s" has 0.001 of downtime. We should move to this terminology/approach for model accuracy- rather than continue to use our own invented term. Specifically:

Claiming a model has "five 9s" or "six 9s" accuracy is still a very impressive accuracy for a transformer model.

PhilipQuirke commented 6 months ago

Mostly done. Still need to update arxiv.

PhilipQuirke commented 6 months ago

Closing as Arxiv update is now dependent on other things (Figure updates).