Question on metrics of three runs in the dataset.

Dear researchers, I am really impressed by your work on it. I have a few questions that I hope you could help clarify for me. I find the data for an architecture is organized like this:

>>> dataset.res['cifar10']['6992']
{'val_acc': {'0': 0.7181000113487244,
  '1': 0.7139999866485596,
  '2': 0.7134000062942505,
  'threeseed': 0.7151666680971781},
 'averob': {'0': 0.5323750004172325,
  '1': 0.534000001847744,
  '2': 0.5349749997258186,
  'threeseed': 0.5337833339969317},
 'val_fgsm_3.0_acc': {'0': 0.6198999881744385,
  '1': 0.6247000098228455,
  '2': 0.6238999962806702,
  'threeseed': 0.6228333314259847},
 'val_pgd_3.0_acc': {'0': 0.6165000200271606,
  '1': 0.6207000017166138,
  '2': 0.6212999820709229,
  'threeseed': 0.6195000012715658},
 'val_fgsm_8.0_acc': {'0': 0.4632000029087066,
  '1': 0.4634000062942505,
  '2': 0.4656000137329101,
  'threeseed': 0.46406667431195575},
 'val_pgd_8.0_acc': {'0': 0.4298999905586242,
  '1': 0.4271999895572662,
  '2': 0.4291000068187713,
  'threeseed': 0.42873332897822053},
 'autoattack': 0.387499988079071}

Obviously, the accuracy for one seed (representing one trained weight?) is not precisely a four-decimal number since the test set is 10k images, so are there multiple evaluations for the same weight, and are the mean values reported?

TT2408 / nasrobbench201

Question on metrics of three runs in the dataset. #3