Open gkiar opened 6 years ago
I wouldn't say that "ReproZip cannot be run on Docker". A lot of tools (including Docker, Singularity, ...) cannot be run in a lot of configurations, but configurations can usually be relaxed. I wouldn't say that gdb doesn't work on Mac or Docker, for example.
Really cool tool, this repo :+1:
Fair enough, @remram44 - thanks for the comment! My parenthetical attempts to clarify what I mean by that statement, but you're completely right, it can be run in Docker, with a relaxed configuration.
Unfortunately, services such as Amazon Batch, which is my go-to containerized pipeline deployment service, doesn't support applying these specific types of relaxations without considerably more overhead in the configuration of environments.
& Thanks!! 😄
@remram44 I'm also currently playing with using Reprozip from within Singularity environments on SLURM clusters in Compute Canada. If I'm successful (preliminary signs suggest this may be the case), I will be very excited and start digging into the weeds of Reprozip more :) Thanks for making such a valuable provenance engine!
URL: https://dl.acm.org/citation.cfm?id=2899401
This paper does...
Provides a method for obtaining detailed provenance information from executions, including package dependencies. This seems like a great tool, and computational limitations aside (i.e. such that you cannot record executions on Mac or Windows), I don't know why one wouldn't use it.
This paper does not...
No mention of the issue that Reprozip cannot be run in Docker (as shown in NeuroDocker, Docker requires using the
--security-opt=seccomp:unconfined
options to run Reprozip, since it relies onptrace
), and it is unclear if Singularity containers can runptrace
. I'm beginning to test this to see if Clowdr can, on clusters, use reprozip natively.Additional Notes?
Then mention a few other tools for recording provenance info, and it might be worth looking into them even if they cannot reproduce executions across platforms if they can be used inside containers - would remove the cross platform issue in this case, anyways.
Further Reading
[1] S. B. Davidson and J. Freire. Provenance and Scientific Workflows: Challenges and Opportunities. In SIGMOD, pages 1345–1350, 2008. [2] D. Devecsery, M. Chow, X. Dou, J. Flinn, and P. M. Chen. Eidetic systems. In OSDI, pages 525–540, 2014. [3] P. J. Guo and M. Seltzer. BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure. In TaPP, pages 7–7, 2012.