Open nehamoopen opened 1 year ago
In my opinion (for python) conda is the most straightforward of the three (not necessarily the best). We can briefly mention all the possible options with their pros and cons. I don't know about these differences, but maybe there can be something relevant depending on the project and how you want to share it in the future (like making a package). Better check with the SE about this? Maybe they already know.
About python advanced, once initialised the virtual environment, the generation of requirements and environment files should be pretty straightforward and referring only to the local environment.
As mentioned, we can discuss the level of deepness we want to go to in terms reproducibility:
If we want to do both R and Python in a single shot, then we can stop at virtual environments (as it was done in the RepCo version I followed), otherwise, if doing only R for example, something can be said about how to create a proper R package (or at least about the basic settings, so to be open to the possibility of creating a package in the future).
About what to make them do at the beginning, if we go for virtual environment I would say to make them work on a dummy project from scratch (creation of VR -> creation of basic scripts -> generation of basic documentation and requirements -> publication on GitHub). If they try to put their already existing project in a virtual environment, they could face several problems. If it is only README and requirements compiled "by hand" (and other best practices), then they could totally work with their own projects.
I think the section on dependency management could improve with some (re)exploration of the topic and (potential) reorganization of the content.
It would be nice to present options along the reproducibility spectrum from easy (noting dependencies in a README file) to advanced solutions (containerization). During the workshop, we dive into the easy and middle-ground solutions. We should also be prepared to explain the differences between these options better, like what is the difference between
renv
environments and Docker containers.I'm going to organize the ideas per programming language for now:
R
annotater
package to annotate package load calls.sessionInfo()
to print version information about R, the OS and attached or loaded packages + automate writing the output ofsessionInfo()
into the README or another file.groundhog
package. This is an interesting solution but it has some caveats, see: https://www.brodrigues.co/blog/2023-01-12-repro_r/renv
works, including common issues. Some things to consider: should we ask participants to initializerenv
at the beginning of the workshop already when they reorganize their project + how do you ensure thatrenv
only records the project libraries in the lockfile and not the system/global libraries (this happens now and then during the workshop).Python
I'm not aware of easy and middle-ground solutions in Python that are comparable to the ones listed for R above. It might be nice to do some research into it but I don't think they're necessary to include in the workshop if they are not standard/best practices.
venv
is apparently the standard library for Python, I think there is alsopipenv
andconda
environments if you use those package managers. Similar torenv
- should we ask participants to initialize these environments at the beginning of the workshop already.requirements.txt
andenviroment.yml
again. Similar torenv
- how do you ensure that only get project libraries noted and not the system/global libraries. We also assume that everyone uses eitherpip
orconda
- is that correct?Other