Implement true tests - Githubissues

borismarin commented 9 years ago

We need to have a systematic way of testing the builds for correctness. Rallpack? The travis script (#21) needs to be adapted to run such tests.

physicalist commented 9 years ago

Do you think we can get this fixed before SfN? Weren't you working on exactly this kind of thing with the OSM fork?

borismarin commented 9 years ago

Well, the OSB model validator is mainly concerned with testing models, not the simulators themselves. There are some test-related rules in the Makefile, I wonder if Dave knows what those are supposed to do, and if they can be reused somehow?

dbeeman commented 9 years ago

I was just looking through old notes from 2.3 beta testing, and trying to recover what we did. There was a genesis/tests/TestSuite directory that was not included in the final release. I'll come up with some tests to recommend later today.

dbeeman commented 9 years ago

I looked at the old TestSuite, and don't think it is worth reimplementing now. Here are some simple tests that should do for now:

Suggested tests for accuracy;

(1) Test the Scalable Portable Random Number Generator (SPRNG) with the following commands:

genesis #1 > setrand -sprng genesis #2 > randseed 0 genesis #3 > echo {rand 0 1} {rand 0 1} {rand 0 1} {rand 0 1} {rand 0 1}

Regardless of platform, you should get the results:

0.01426654216 0.7493918538 0.007316101808 0.1527428776 0.1134621128

(2) Test the rallpacks 'axon.g' simulation:

cd rallpack/reports/genesis-2.0/rallpack3

run axon.g, which will produce the files axon.out0 and axon.outx

The last entries should be: $ tail -n 5 axon.out0 0.24985 -0.0180403 0.2499 -0.0201349 0.24995 -0.0221736 0.25 -0.02416 0.25005 -0.0260978

$ tail -n 5 axon.outx 0.24985 -0.0749142 0.2499 -0.0748635 0.24995 -0.0748067 0.25 -0.0747422 0.25005 -0.0746679

I get the same results for both genesis executables, but the one produced by configure takes 0.351000 cpu seconds, and the one from the edited Makefile.dist takes 0.267000 cpu seconds (Note that the README says Upi set a record of 3.17 sec, back in 2006)

We should look at the optimization flags

borismarin commented 9 years ago

I've adapted (f76929a) the travis test to compare the output of sprng to the expected values mentioned above. The builds will fail if the comparison fails, so we now have a better indicator (well, at least better than just checking if make is returning 0) of the validity of the binaries. I'll eventually implement the rallpack check. BTW, @dbeeman, I'm getting different results for that in my machine (for both autoconf and Makefile.dist methods):

$ tail -5 axon.out0 
0.24985 -0.018699 
0.2499 -0.020776 
0.24995 -0.022798 
0.25 -0.0247688 
0.25005 -0.0266924

$ tail -5 axon.outx
0.24985 -0.0748959 
0.2499 -0.0748435 
0.24995 -0.0747845 
0.25 -0.0747171 
0.25005 -0.0746393

dbeeman commented 9 years ago

Do you get the same results with both methods? As you know, round off errors are tricky when looking at the last APs in a run. This may not be too surprising, but I'll have to look at runs on different machinesl. Also, the GENESIS SLI casts internal doubles to floats.

The configure script is not using the '-O2' option, and it links -lSm and -lICE, which are not needed for any modern Linux. These probably account for the file size and speed. We should definitely use the optimization. The overhead of the two other libs isn't so bad, but what happens if they are not installed?

physicalist commented 9 years ago

The configure script now uses '-O2' as of af6b98801e1fdbedba000015e721ef34de661860, libSM and libICE were also removed in the same build.

My output is shifted by 0.00005 (seconds?):

$ tail -n 5 axon.out0
0.24985 -0.0201349 
0.2499 -0.0221736 
0.24995 -0.02416 
0.25 -0.0260978 
0.25005 -0.0260978

and here it simply differs slightly:

$ tail -n 5 axon.outx 
0.24985 -0.0748635 
0.2499 -0.0748067 
0.24995 -0.0747422 
0.25 -0.0746679 
0.25005 -0.0746679

What does that mean? (I did not try all different combinations of compilers & optimisations)

physicalist commented 9 years ago

I tried this on a colleague's Linux machine (recent Ubuntu x86_64) and it produces the same numbers as in my case. Any ideas why?

borismarin / genesis2.4gamma

Implement true tests #22