UtrechtUniversity / workshop-computational-reproducibility

Material for the workshop 'Best Practices for Writing Reproducible Code'
https://utrechtuniversity.github.io/workshop-computational-reproducibility/
Other
9 stars 13 forks source link

content on dependency management #38

Open nehamoopen opened 1 year ago

nehamoopen commented 1 year ago

I think the section on dependency management could improve with some (re)exploration of the topic and (potential) reorganization of the content.

It would be nice to present options along the reproducibility spectrum from easy (noting dependencies in a README file) to advanced solutions (containerization). During the workshop, we dive into the easy and middle-ground solutions. We should also be prepared to explain the differences between these options better, like what is the difference between renv environments and Docker containers.

I'm going to organize the ideas per programming language for now:

R

Python

I'm not aware of easy and middle-ground solutions in Python that are comparable to the ones listed for R above. It might be nice to do some research into it but I don't think they're necessary to include in the workshop if they are not standard/best practices.

Other

StefanoRapisarda commented 1 year ago

In my opinion (for python) conda is the most straightforward of the three (not necessarily the best). We can briefly mention all the possible options with their pros and cons. I don't know about these differences, but maybe there can be something relevant depending on the project and how you want to share it in the future (like making a package). Better check with the SE about this? Maybe they already know.

About python advanced, once initialised the virtual environment, the generation of requirements and environment files should be pretty straightforward and referring only to the local environment.

As mentioned, we can discuss the level of deepness we want to go to in terms reproducibility:

If we want to do both R and Python in a single shot, then we can stop at virtual environments (as it was done in the RepCo version I followed), otherwise, if doing only R for example, something can be said about how to create a proper R package (or at least about the basic settings, so to be open to the possibility of creating a package in the future).

About what to make them do at the beginning, if we go for virtual environment I would say to make them work on a dummy project from scratch (creation of VR -> creation of basic scripts -> generation of basic documentation and requirements -> publication on GitHub). If they try to put their already existing project in a virtual environment, they could face several problems. If it is only README and requirements compiled "by hand" (and other best practices), then they could totally work with their own projects.