Write down project vision

aaronpeikert commented 4 years ago

The worcs paper provides a great idea about what users might do with worcs, however, I think it would help if the contributors to this project have an equivalent, some sort of "manifest". Hence a document that describes guiding principles/ design decisions/ scope/ distant future plans etc.

I think such a thing would streamline decision-making and make efforts more coherent and would help especially coordinating worcs with repro.

This has no time pressure would just be nice to have before we deeply integrate with repro.

cjvanlissa commented 4 years ago

AFAIC: Goal: "Create reproducible and transparent research projects in 'R', with a minimal amount of code." Guiding principles:

Achieve "just-good-enough" reproducibility with minimal work
Make the process nearly fully automatic so users don't have to think about it
- In doing so, involve experts to evaluate whether the default process adheres to best practices
Minimal dependencies (many users wrestle with dependencies that break)
- I try to use dependencies that are, themselves, light on dependencies and robust
- If the cost in dependencies is too high, write my own function (e.g., both the packages 'codebook' and 'dataMaid' are a nightmare of dependencies, so I wrote my own)
Extensive documentation and tutorials
Embed in teaching (coauthor Barbara Vreede teaches my university's reproducibility course, I teach open science for our stats research master's, and am the open science faculty ambassador)

Thoughts?

aaronpeikert commented 4 years ago

I agree with all what you state above but would like to add one crucial point (at least for me):

learning time is the metric to optimize for

For me, a reproducible workflow includes the following steps: steps

And the primary goal of worcs is to provide a tool for researchers that requires minimal learning time to archive such a workflow. Hence it should almost be used "thoughtless". I would argue that the conceptual model that the user needs to know to use worcs about should be as simple and clearly communicated as possible (but maybe that is the engineering psychologist in me speaking).

So the goal of worcs is to get researches started with reproducibility as fast as possible. (While it is for repro to streamline the use of different "reproducibility" solutions, which requires the user to understand the software behind it).

Ideally, I'd like to see a function in worcs that corresponds to each of the light grey steps above (sidenote 1: the names are just what came to my mind at this moment, sidenote 2: or should these functions belong to repro?). Furthermore, one function for each black heading that bundles the more granular functions:

setup

check_software (checks if everything [r, git] is installed, up to date and configured and if not advises how to resolve problems)

creation

create_project (setup folder/git/github)
preregister
open_/closed_data
add_manuscript?

reproduction

fork_project? (creates a fork on GitHub & downloads the project?)
verify (is everything there we need?)
rerun? (chooses the entry point and runs the analysis)

change

compile_changes (gathers changes, checks that everything worcs)
commit_changes (adds changes to code/data/software)
publish_changes (git push, optionally github release or osf or something)

It would help me a lot if we could extend/change this list of steps so it is evident what worcs is supposed to do. Such a document then can serve as an authoritative reference for the implementation and place to discuss new features etc. However, I am probably too conceptual in my thinking, and it could well be that I am the only one benefiting from such a thing. In that case, it is not worth the energy and time.

cjvanlissa commented 4 years ago

This is fantastic, and you've given me a whole new perspective on this project. Thank you!

I think we should do all of this - while at the same time maintaining a clearly defined scope:

check_software is already implemented in repro, right? We could just rely on repro for this
create_project should be through RStudio's New project wizard
Preregister: If OSF has an API, we should develop further - otherwise, how do you imagine this?
fork_project, commit_changes and publish_changes are imho well implemented in Git and nicely intergrated in RStudio IDE, and out of scope for worcs

check_software, verify, and rerun seem like really valuable and urgent additions. rerun would require the project's entry point to be logged in .worcs, which is probably good practice.

aaronpeikert commented 4 years ago

Cool. Yes to a clearly defined project scope! I could draft a lengthier version of this list, which includes details about the concrete implementation, but only if you could read/check it (this would be the basis for my attempts to modularize worcs / integrate with repro). But for the concrete points you raise:

yes software checking is ready in repro (but see aaronpeikert/repro#34)
create_project I think there are merits for having it as an own function (which is called by RStudio's New project wizard) namely documentation and testing
change
- fork_project is implemented in usethis (together with functions to create a PR) and is in my opinion nice that is automizes something that requires manual labour. Maybe we reexport it?
- commit_changes I thought about a wrapper around renv snapshot / git add / git commit (for users uncomfortable with git/renv)
- publish_changes automatically creates a release on github and uploads/updates preprint on osf (if they have an API). I repeatedly made mistakes in this process (see aaronpeikert/reproducible-research#56) and I think we should automate everything that is possible (but I have no idea about the technical challenges, if it is too complicated we drop the automation, but then I'd like to see step by step explanations what users have to do)
- I honestly have not thought/have no idea about what exactly rerun and verify should entail...

cjvanlissa commented 4 years ago

Yeah I'm gonna read it :D My main priority now is getting the paper accepted, but I'm happy to play a role in the conceptual development

Also: OSF does have an API and it allows users to interact with their profile https://developer.osf.io/#tag/Authentication

aaronpeikert commented 4 years ago

Thank you, that would be great. The paper should definitely be the top priority. I just had the idea for the graphic today and thought it fitting to better explain what I mean.

Automation of preregistration and preprint is probably doable because of the API but it will be tricky. If we can hack something together that works we may submit it to the osfr-package.

aaronpeikert commented 4 years ago

I don't know where to write down that thought, but I think a good rule of thumb for what features to include in worcs is:

worcs should only contain features that are well documented, both in the help pages and vignettes and are likely to simplify the live of 50% of the potential users.

cjvanlissa / worcs

Write down project vision #60