ORNL-Fusion / ips-wrappers

IPS wrapper / helper codes.
9 stars 4 forks source link

Have IPS copy and store binaries - and be able to re-run from them? #35

Open dlg0 opened 9 years ago

dlg0 commented 9 years ago

@bernhold @elwasif @batchelordb

So an interesting use case occurred today. @murakamim has two FASTRAN runs he is comparing, but they are using different values of a parameter that I think is hardwired into the binary itself. So Masanori requested a feature, @parkjm added that feature, and then Masanori re-ran. It turned out that some subsequent case that Masanori ran now failed, that didn't fail with the original binary.

What would be a good capability here is to re-run the case with the original binary ... which no longer exists. Of course this could be accomplished via appropriate binary versioning etc, but this is physics so that doesn't happen. One alternative that occured to me was to have the IPS framework copy all the binaries used in the workflow execution to a simulation_binaries directory or the like, the same way all the python files are copied into the simulation_setup directory. And then, have the capability to re-run an existing IPS simulation using the binaries and python files etc that were stored within the run directory itself, say by simply setting some config file variable like RERUN_FROM_COPIES=1.

I recall there being something like a replay component. Is it's functionality anything like what I've described?

If not, I think it could be quite useful to have the capability to re-run and existing run from only those files (including the binaries, python, and input files) now stored within the run directory, rather than pull them again from their original locations (which in the above use case have now changed).

Thoughts?

parkjm commented 9 years ago

@dlg0

According to post (https://github.com/ORNL-Fusion/ips-fastran/issues/9#issuecomment-146700930), the problem of @murakamim is not related to binary change (I have not changed the fastran binary in public location) Anyway, your suggestion is good thing to discuss, though I'm not sure if it's possible in practical sense.

bernhold commented 9 years ago

David,

This kind of thing comes up often in circles where people think about reproducibility of scientific results. Doing this in the general case is extremely challenging. Just a couple of examples of things that cause problems...

If your executable is dynamically linked to any libraries, in principle you need to capture those too.

It is common for OS, compiler, and other upgrades to force recompilation of executables. Saved executables are likely to have a limited shelf life.

Another point is that having the old binaries merely gives you the possibility to run some prior version of the code. But, I would argue this is fundamentally unhelpful. What you really need to know is the WHAT IS THE DIFFERENCE between the two versions. For that, you need careful versioning of the source code and the ability to associate those version numbers with the executables that are used in any given simulation. And if you have good versioning of the source code, then you DON'T NEED to capture the binaries because you can recreate any binary you want in a way that you can be confident it will run today (leaving aside the question of whether you've versioned the third-party libraries you're relying on too).

So, my conclusion is that you don't want to save the binaries, you really want to push harder to get the source code properly versioned.

It doesn't have to be that hard. If your code is in a version control repository, they already provide a perfectly good unique identifier you can use (though you are welcome to invent your own versioning scheme too). What you need to do is to get that version info into the executable, so that you can (for example) print it out at the beginning of a run. This could (should) also be part of the metadata that would be captured in the MPO system, and maybe the IPS could make a special point of gathering version info of everything used in a run, distinct from the MPO.

The other thing that's useful to know is whether what you built is actually the version from the repo it claims to be, or whether it has been modified. I would treat this as a binary (yes/no) and not try to capture the differences from the repository. Real science should only done with code that identical to a repository version. If the code has been modified, you're in development mode, not science mode. SVN and I think git provide tools to tell if your working directory differs from the repo version. This kind of check can be build into the build system and the version identifier that goes into the executable gets modified to give a clear indication that it is _derivedfrom a given repo version rather than being exactly some repo version.

It would be good to do this with the physics codes, the wrappers, and the IPS itself.

On 10/08/2015 11:24 AM, David L Green wrote:

@bernhold https://github.com/bernhold @elwasif https://github.com/elwasif @batchelordb https://github.com/batchelordb

So an interesting use case occurred today. @murakamim https://github.com/murakamim has two FASTRAN runs he is comparing, but they are using different values of a parameter that I think is hardwired into the binary itself. So Masanori requested a feature, @parkjm https://github.com/parkjm added that feature, and then Masanori re-ran. It turned out that some subsequent case that Masanori ran now failed, that didn't fail with the original binary.

What would be a good capability here is to re-run the case with the original binary ... which no longer exists. Of course this could be accomplished via appropriate binary versioning etc, but this is physics so that doesn't happen. One alternative that occured to me was to have the IPS framework copy all the binaries used in the workflow execution to a |simulation_binaries| directory or the like, the same way all the python files are copied into the |simulation_setup| directory. And then, have the capability to re-run an existing IPS simulation using the binaries and python files etc that were stored within the run directory itself, say by simply setting some config file variable like |RERUN_FROM_COPIES=1|.

I recall there being something like a |replay| component. Is it's functionality anything like what I've described?

If not, I think it could be quite useful to have the capability to re-run and existing run from only those files (including the binaries, python, and input files) now stored within the run directory, rather than pull them again from their original locations (which in the above use case have now changed).

Thoughts?

— Reply to this email directly or view it on GitHub https://github.com/ORNL-Fusion/ips-atom/issues/35.

David E. Bernholdt | Email: bernholdtde@ornl.gov Oak Ridge National Laboratory | Phone: +1 865-574-3147 http://www.csm.ornl.gov/~bernhold | Fax: +1 865-576-5491