DasLab / rna_benchmark

Python scripts & reference data for Rosetta RNA modeling benchmarks
7 stars 1 forks source link

Table creation: Efficiently determine if files need to be updated #29

Open everyday847 opened 7 years ago

everyday847 commented 7 years ago

In the status quo, we re-run build_full_model and clustering every time. This is wildly inefficient! Usually, not much has changed -- maybe only one swm_rebuild.out silent file has had added decoys, or maybe you've just changed some formatting that affects how you process the resulting data. But of course, you WANT to be sensitive to the possibility of changes, both in the input files and in the code.

Proposed strategy:

  1. Take the git commit hash for main and for this repo, plus some hash of the input silent file.
  2. If either git hash has changed, re-run all files.
  3. If the git hashes are unchanged, only rerun changed silent files.

Obvious pitfall: un-committed changes to code (either in main or here) will not trigger re-runs. Better solution, @rhiju / @calebgeniesse?