gkiar / reproreading

Reading list for reproducibility/generalizability/replicability/etc....
GNU General Public License v3.0
0 stars 0 forks source link

Paper: ReproZip #14

Open gkiar opened 6 years ago

gkiar commented 6 years ago

URL: https://dl.acm.org/citation.cfm?id=2899401

This paper does...

Provides a method for obtaining detailed provenance information from executions, including package dependencies. This seems like a great tool, and computational limitations aside (i.e. such that you cannot record executions on Mac or Windows), I don't know why one wouldn't use it.

This paper does not...

No mention of the issue that Reprozip cannot be run in Docker (as shown in NeuroDocker, Docker requires using the --security-opt=seccomp:unconfined options to run Reprozip, since it relies on ptrace), and it is unclear if Singularity containers can run ptrace. I'm beginning to test this to see if Clowdr can, on clusters, use reprozip natively.

Additional Notes?

Then mention a few other tools for recording provenance info, and it might be worth looking into them even if they cannot reproduce executions across platforms if they can be used inside containers - would remove the cross platform issue in this case, anyways.

Further Reading

[1] S. B. Davidson and J. Freire. Provenance and Scientific Workflows: Challenges and Opportunities. In SIGMOD, pages 1345–1350, 2008. [2] D. Devecsery, M. Chow, X. Dou, J. Flinn, and P. M. Chen. Eidetic systems. In OSDI, pages 525–540, 2014. [3] P. J. Guo and M. Seltzer. BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure. In TaPP, pages 7–7, 2012.

remram44 commented 6 years ago

I wouldn't say that "ReproZip cannot be run on Docker". A lot of tools (including Docker, Singularity, ...) cannot be run in a lot of configurations, but configurations can usually be relaxed. I wouldn't say that gdb doesn't work on Mac or Docker, for example.

Really cool tool, this repo :+1:

gkiar commented 6 years ago

Fair enough, @remram44 - thanks for the comment! My parenthetical attempts to clarify what I mean by that statement, but you're completely right, it can be run in Docker, with a relaxed configuration.

Unfortunately, services such as Amazon Batch, which is my go-to containerized pipeline deployment service, doesn't support applying these specific types of relaxations without considerably more overhead in the configuration of environments.

& Thanks!! 😄

gkiar commented 6 years ago

@remram44 I'm also currently playing with using Reprozip from within Singularity environments on SLURM clusters in Compute Canada. If I'm successful (preliminary signs suggest this may be the case), I will be very excited and start digging into the weeds of Reprozip more :) Thanks for making such a valuable provenance engine!