Assuming (2) confirmed your assertion of non-normality, it seems like the paper could have much more impact if it went just a little further in its analysis, without having to run more costly experiments.
Surely the key question is whether this would have a significant effect on the underlying EA. Jin (2011, Evol.Intell.) presents a survey of methods used to assess the quality of surrogate models, that include two measures (Rho^{sel} and Rho^{tilde sel}. of how well rank-based selection based on surrogate fitness correlate with the same based on true fitness. Would it be possible to apply these measures to quantify the effect of incorrectly assuming Gaussian-distributed noise?