AU-BCE-EE / GitHub-guidance

Guidance for AU-BCE-EE members
0 stars 0 forks source link

Analysis file here and separate templates #10

Open sashahafner opened 9 months ago

sashahafner commented 9 months ago

We should have something like analysis-templates.md, which can describe our data analysis templates, which need to be created!

sashahafner commented 9 months ago

Use of "make" is intriguing. See here for example: http://zmjones.com/make/. If everything is done in R, it is unnecessary. But with a mix, it makes a lot of sense.

Two bits I particularly agree with:

Make has encouraged me to modularize my code (here is an example). Rather than having one file analysis.R that cleans your data, estimates your model(s), and produces your plots, you can have them split up by task. This makes it easier for others to understand your code and for you to debug it. When you are using R in batch-mode .Rout files are automatically produced, which is the output from the interpreter, making it easier to see what, if anything, went wrong. Modularizing your code in this manner usually necessitates writing intermediate results to file. For complex objects this can sometimes be a pain (though R makes it easy with .RData files, which are automatically produced when R is run in batch-mode).

I think literate programming (sweave, knitr, and org-babel are a few implementations) is great, but I am not convinced it actually makes it easier to reproduce results (most implementations are language specific, though org-babel is not). Using any of these tools makes building a document necessary to see the results of the data anlysis (or any of the intermediate products of the data analysis). The things you need to have installed to produce the document (some flavor of TeX and/or org-mode and Emacs) are not installed by default on most systems with a Unix command line interface. Make is. With Make you can seperate your data analysis from the document which presents your analysis while retaining the automated integration between the two.

I often see something like this online: "this or that notebook approach completely replaces all the earlier suggestions for project organization". No way!

sashahafner commented 9 months ago

Check out this directory structure for a Python project: https://drivendata.github.io/cookiecutter-data-science/#directory-structure. Some good ideas there. Overly complex though?

sashahafner commented 9 months ago

I still think this approach is good for R: https://github.com/sashahafner/R-template/tree/main