Open jdkent opened 5 years ago
Thanks @jdkent for the feedback. I would say that reproducible-basics "solves the problem" (or addresses the question) of "what did/do I do to achieve X?" at the elementary level (not at the formalized provenance tracking). As usually that entails installing and running software tools, particular accent is given to the code/software/data distributions, and actual execution of the tools and what (e.g. environment variables) affect it. Indeed it is related to the data processing (and to FAIR-data to an extent of managing data), and I think it is good -- we do want modules to relate to each other.
Thank you for the clarification. If I understand correctly, I see this module as a sort of "glue" that doesn't directly solve a problem for a potential learner, but does provide a foundation for further learning. If we take the carpentries handbook as a guide, I want to remain cognizant of how many skills we assume learners have/how many we need to teach until the learner gets "enough" practical knowledge they can apply to their own problems. As in, if someone comes to a workshop on reproducible basics, how many takeaways could they take right away to their lab? I see a number of takeaways in the shell/git/right to share/other episodes, but I've survived (but perhaps not truly lived :stuck_out_tongue_winking_eye:) while not understanding the difference between package managers and distributions.
But I believe ascertaining the value of package managers and distributions would be a separate issue, and if I understood your explanation correctly, I've been swayed to think this module serves a good purpose overall.
but I've survived (but perhaps not truly lived) while not understanding the difference between package managers and distributions.
;-) indeed... something to think about and possibly refactor, e.g. placing apt/conda/... into a less prominent supplementary submodule. With containerization approaches it becomes a bit less important but I do feel that at least some cursory overview to familiarize with those "computing environment building blocks" could be of benefit. In your experiences -- how did you install software and were you "comfortable", e.g. knowing that you are actually running what you think you are running, and how often you encountered the WTF cases due to installation oddities (e.g. local python modules installed under ~/.local on one box but absent on another, conflicting with system-wide installations, picking up your ~/.local stuff within your docker/singularity container environments etc)?
Thinking about this on my own, it appears the target audience for this module substantially overlaps with dataprocessing, such that concepts covered here could be introduced as needed into that module to solve the person's data processing problems. I can see how FAIR-data, statistics, and dataprocessing help solve inter-related but separable problems, but I'm having a harder time placing what problem reproducible-basics solves.
What problems I think the modules solve
FAIR-data: how do I share/find my data? datapreprocessing: how do I preprocess/analyze my data reproducibly? statistics: how do I make appropriate models/interpret my data? reproducible-basics: understand reproducibility???
I'm trying to think from the perspective from someone that would like to attend a workshop, and I'm having trouble thinking about what concrete problem understanding reproducibility is solving or if there is another problem the module is solving that can easily translate to someone's goals.
I do think for the dataprocessing module, there are additional worthwhile concepts to cover that are not in
git-novice
orshell-novice
that are covered in this module and I'm curious what other people think about merging these two modules? (and redistributing/reformatting lessons that don't fit into dataprocessing into the introduction/FAIR-data modules)