Goddard-Fortran-Ecosystem / pFUnit

Parallel Fortran Unit Testing Framework
Other
171 stars 45 forks source link

Improve robust configuration #8

Open tclune opened 6 years ago

tclune commented 6 years ago

A side discussion with @MatthewHambley suggests that it would be useful to have the RobustRunner somehow launch tearDown() methods for tests that crashed or hung. At the very least, this would be useful for the common case where a crashing test leaves temp files laying around that break the opens in the subsequent setUp() invocations.

MatthewHambley commented 5 years ago

It's been a while since I looked at this but from memory the purpose of "robust" mode is to run each test in a sub-thread of the suite. This way an error in the test does not halt the application thread which is running the suite. Thus testing may continue past the crash.

In order to ensure that the tear down function can be called even if a crash does occur I think the test must be run in a sub-sub-thread. Obviously this increases the number of thread creation/destruction events which will slow down execution. It is the simplest solution I can think of though. It may be possible to optimise in the future using a pool of worker threads.

tclune commented 5 years ago

The approach you are advocating cannot be done with traditional MPI nor OpenMP which don't really support the notion of recovery after a crash. In theory fault tolerant MPI (or Coarray Fortran) could provide a robust mechanism here.

And I think we decided in a separate thread that this is really only a meaningful issue for external resources like file creation and such. One clearly cannot deallocate memory nor tear down an object on a process that has failed. Please remind me if you have scenarios other than file creation/deletion in mind.

An approach to this issue could be to simply allow another annotation that decorates the proxy test that drives the real test. E.g., the proxy could perform something like setup() to create some files, ping the backend to run the actual test, and then run something like a teardown() to release the resources. At first glance this would not seem to be hard. Of course when using the regular driver (as opposed to robust), the both types of setup() and teardown() would need to be performed. A bit inelegant perhaps, but still not that big a change.