Test Set Size for Robustness Evaluation in Table 1

Hi, Are the robustness test accuracies against PGD-10 and AA attacks, presented in Table 1 of the paper, obtained using the full test set or just a subset of it? I noticed that in the code, the default value of 'n_ex' is set to 2000.

I am asking because I am trying to replicate your Table 1 results for the standard training of Mamba on the MNIST dataset. After 100 epochs, the model reaches a clean accuracy of 98.57% (which is similar to what you reported in the paper), but the robustness accuracy against PGD-10 is 7.05%, while you reported 19.35%. Could you help me understand the discrepancy?

Thanks!

Biqing-Qi / Exploring-Adversarial-Robustness-of-Deep-State-Space-Models

Test Set Size for Robustness Evaluation in Table 1 #4