benmarwick / rrtools

rrtools: Tools for Writing Reproducible Research in R
Other
672 stars 85 forks source link

Allow for multiple compendia in a single repo #134

Closed kieran-mace closed 2 years ago

kieran-mace commented 3 years ago

In my organization, we hope to use a monorepo to store all of our analytic projects. We are not ok with creating a dedicated git repo per analysis project.

Despite this, we still want to create compendia for our work.

Can you create the ability to create multiple compendia with a shared git repo?

wolass commented 3 years ago

Hi Kieran, may I ask what is the idea behind keeping a "monorepo"?

Correct me if I am wrong but lets assume that:

  1. your org conducts multiple projects in paralel
  2. you would publish a paper using rrtools that only is a small part of other projects (10%)

Would that mean that if I would like to reproduce your analysis I would have to download 90% of irrelevant code? This seems counterproductive.

The idea behind rrtools is to make reproducibility for other researchers easier, and help you maintain good organization of your project as a byproduct.

benmarwick commented 3 years ago

@kieran-mace Yes, I don't see why not. I guess that a simple way to do what you say is have multiple analysis directories in your monorepo. Each directory, analysis-project1, analysis-project2, analysis-project3, etc., would have subdirectories of data, paper, figures, etc and whatever else you need.

So the general principles of rrtools could work for your situation. But I'm not sure if rrtools functions would be very useful in a monorepo situation. We use usethis in many places, which assumes one project = one package = one git repo. You might be better having your own use_analysis function that generates the templates according to your monorepo's requirements. I don't think we would add this function to rrtools because it is not consistent with our design philosophy of one project = one package = one git repo. But maybe I'm missing something, let's see what @nevrome thinks!

nevrome commented 3 years ago

rrtools is an opinionated wizard to help you set up a specific type of research compendium with some well-conceived design choices. @kieran-mace You can just adopt some of these choices and abolish or modify others. Just as Ben writes you could create a compendium with multiple analysis-project folders. I don't think we would rewrite the rrtools code to consider this specific use-case, though. So some rrtools functions will still work, but others will then produce insufficient configuration files. You have to adjust them yourself, but that can be a rewarding adventure.

If you do this, you should keep in mind, that our research compendia are essentially R packages with some additional directories and files stuck on. I suggest you to keep the underlying package working, to benefit from the R package infrastructure (e.g. installation of dependencies, documentation of data and functions etc.). A monolithic mega-repo certainly has advantages and disadvantages: You can keep data for all the analysis projects in one place and easily share functions, but also have to install all the dependencies for projects only your colleagues are working on.

So to sum this up: You can easily do this, rrtools can help with the initial steps but you then have to come up with an own structure and keep on top of your config files (e.g. .Rbuildignore, .gitignore). And finally I personally would not recommend to go this way, but to split things up into different repos. Modularization is usually superior and git repos are free real estate. Code and data that are shared across projects should go to an own R package which can then be loaded in the individual project repositories.