VIDA-NYU / reprozip

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
https://www.reprozip.org/
BSD 3-Clause "New" or "Revised" License
305 stars 34 forks source link

ReproZip automatic shell/consolidated database #193

Open remram44 opened 8 years ago

remram44 commented 8 years ago

(high-level idea, needs brainstorming)

Similarly to noWorkflow or Git, the .reprozip-trace folder in the project directory would contain past versions of the experiment (inputs, outputs, dependencies, binaries (sources?)), in a queryable format, allowing the user to find old results, use new inputs on old software versions, query how an output file was created, run new code in a previous environment, etc.

To do that:

remram44 commented 8 years ago

I started work on this in a new branch. Went with flat file storage + SQLite3 database, where all the metadata is stored (basically a table: run id, path, UNIX perms, content hash).

Looks very doable, but line between reprozip & reprounzip becomes fuzzier.