Open aaronpeikert opened 4 years ago
AFAIC: Goal: "Create reproducible and transparent research projects in 'R', with a minimal amount of code." Guiding principles:
Thoughts?
I agree with all what you state above but would like to add one crucial point (at least for me):
learning time is the metric to optimize for
For me, a reproducible workflow includes the following steps:
And the primary goal of worcs
is to provide a tool for researchers that requires minimal learning time to archive such a workflow. Hence it should almost be used "thoughtless". I would argue that the conceptual model that the user needs to know to use worcs
about should be as simple and clearly communicated as possible (but maybe that is the engineering psychologist in me speaking).
So the goal of worcs
is to get researches started with reproducibility as fast as possible. (While it is for repro
to streamline the use of different "reproducibility" solutions, which requires the user to understand the software behind it).
Ideally, I'd like to see a function in worcs
that corresponds to each of the light grey steps above (sidenote 1: the names are just what came to my mind at this moment, sidenote 2: or should these functions belong to repro?). Furthermore, one function for each black heading that bundles the more granular functions:
setup
check_software
(checks if everything [r, git] is installed, up to date and configured and if not advises how to resolve problems)creation
create_project
(setup folder/git/github)preregister
open_
/closed_data
add_manuscript
?reproduction
fork_project
? (creates a fork on GitHub & downloads the project?) verify
(is everything there we need?)rerun
? (chooses the entry point and runs the analysis)change
compile_changes
(gathers changes, checks that everything worcs)commit_changes
(adds changes to code/data/software)publish_changes
(git push, optionally github release or osf or something)It would help me a lot if we could extend/change this list of steps so it is evident what worcs
is supposed to do. Such a document then can serve as an authoritative reference for the implementation and place to discuss new features etc.
However, I am probably too conceptual in my thinking, and it could well be that I am the only one benefiting from such a thing. In that case, it is not worth the energy and time.
This is fantastic, and you've given me a whole new perspective on this project. Thank you!
I think we should do all of this - while at the same time maintaining a clearly defined scope:
check_software
is already implemented in repro
, right? We could just rely on repro
for thiscreate_project
should be through RStudio's New project wizardfork_project
, commit_changes
and publish_changes
are imho well implemented in Git and nicely intergrated in RStudio IDE, and out of scope for worcs
check_software
, verify
, and rerun
seem like really valuable and urgent additions. rerun
would require the project's entry point to be logged in .worcs
, which is probably good practice.
Cool. Yes to a clearly defined project scope!
I could draft a lengthier version of this list, which includes details about the concrete implementation, but only if you could read/check it (this would be the basis for my attempts to modularize worcs / integrate with repro
).
But for the concrete points you raise:
create_project
I think there are merits for having it as an own function (which is called by RStudio's New project wizard) namely documentation and testingfork_project
is implemented in usethis
(together with functions to create a PR) and is in my opinion nice that is automizes something that requires manual labour. Maybe we reexport it?commit_changes
I thought about a wrapper around renv snapshot / git add / git commit (for users uncomfortable with git/renv)publish_changes
automatically creates a release on github and uploads/updates preprint on osf (if they have an API). I repeatedly made mistakes in this process (see aaronpeikert/reproducible-research#56) and I think we should automate everything that is possible (but I have no idea about the technical challenges, if it is too complicated we drop the automation, but then I'd like to see step by step explanations what users have to do)Yeah I'm gonna read it :D My main priority now is getting the paper accepted, but I'm happy to play a role in the conceptual development
Also: OSF does have an API and it allows users to interact with their profile https://developer.osf.io/#tag/Authentication
Thank you, that would be great. The paper should definitely be the top priority. I just had the idea for the graphic today and thought it fitting to better explain what I mean.
Automation of preregistration and preprint is probably doable because of the API but it will be tricky. If we can hack something together that works we may submit it to the osfr
-package.
I don't know where to write down that thought, but I think a good rule of thumb for what features to include in worcs is:
The worcs paper provides a great idea about what users might do with worcs, however, I think it would help if the contributors to this project have an equivalent, some sort of "manifest". Hence a document that describes guiding principles/ design decisions/ scope/ distant future plans etc.
I think such a thing would streamline decision-making and make efforts more coherent and would help especially coordinating worcs with repro.
This has no time pressure would just be nice to have before we deeply integrate with
repro
.