Reduce run-time of unit-tests by using less extensive testdata samples (and fewer combinations)

martinschwinzerl commented 5 years ago

Just some FYI and some information regarding this issue:

I did some preliminary benchmarking over the weekend. The vast majority of the time is actually spend during the compilation of the programs/kernels rather than at actually performing any tracking or computations.

On my workstation, compiling the file with all four tracking kernels*) costs roughly 17 - 20 sec for AMDGPU-PRO based OpenCL nodes. NVIDIA seems to be a lot faster and does some caching. POCL does not seem to do any caching but does not contribute a lot of run-time overhead any way.

Since a track_job is recreated from scratch every unit-test (and for some configurations even several times within a test, i.e. test_track_job_opencl.py for example), being able to cache the binaries at least during the run-time of the test would cut down the run-time dramatically. This would be beneficial for use-cases where track-jobs have to be recreated in production code (?). Also, split-kernels and on-demand created kernels would also benefit from some caching.

Alternatively, TrackJob's already have a reset() API which is intended to change the buffers (particles, beam_elements, etc.) and the output buffer configuration but keeps all machine/hardware related state (like the compiled headers) alive. This is available, has been tested to work in principle and would allow to perform at least all tracking tests case in a single unit-test file without having to recompile.

The disadvantages of this approach is that currently different tracking related test cases are fully insulated by each other since we throw away the kernel and the track-job instance after each call. If we keep the track-job, we might risk to correlate tests with each other (i.e. the test works by some accident because we ran something before but the actual example does not work). Also, there is some work involved in restructuring and rebuilding the unit-tests to allow the use of reset()

Currently, the complete set of unit-test takes 10-12 min on my machine in debug mode and a bit less than that in release mode. It's not super-urgent but with additional tests and especially TrackJob related tests being added now, it will get longer.

) See the track_particles files in sixtracklib/opencl/kernels for reference. Currently, all four tracking kernels are included in a single file. According to my very unscientific tests, splitting them would incur roughly the same cost each for each item.

martinschwinzerl commented 4 years ago

Suggestion is to use a SIXTRACKLIB_DEVICE environment variable to select the device to use during unit-tests. The following cases would have to be handled:

Environment variable is not set at all -> use the first device of the first platform available (or the default device of the first discovered platform)
Environment variable is set to a specific device, i.e. export SIXTRACKLIB_DEVICES="0.0" uses the current architecture (opencl, cuda, etc.) and device 0.0 and only performs tests on this device
Environment variable is set to "all" -> runs the test on all applicable devices
Optional: Environment variable is set to a list of devices -> run the test on all applicable devices in the list, i.e. export SIXTRACKLIB_DEVICES="opencl:0.0, cuda:0.0", etc.

rdemaria commented 4 years ago

Could we also use this mechanism to choose the device in the trackjob if we don't specify a device?

martinschwinzerl commented 4 years ago

Yes, I was thinking about this as well -> I would add support to the Context / TrackJob implementation rather than tacking it on to every unit-test, that way it would always the same

martinschwinzerl commented 4 years ago

By merging #108 and #111 and excluding only those devices/platforms which are incur very long compilation times for the OpenCL kernels, the overall duration of running all unit-tests has shrunken from approx 980s to 148s on my developer machine. There is still a lot of room for improvement but overall, keeping the runtime below 3 minutes should hopefully be acceptable for the time being. Please re-open if you think the run-times are still or are getting too excessive.

SixTrack / sixtracklib

Reduce run-time of unit-tests by using less extensive testdata samples (and fewer combinations) #47