ReproNim / module-reproducible-basics

Module 0: Reproducible Basics
http://www.repronim.org/module-reproducible-basics
Other
5 stars 13 forks source link

[HWK]: Repronim instructor fellow training feedback #28

Open gkiar opened 5 years ago

gkiar commented 5 years ago
yarikoptic commented 5 years ago

Hi @gkiar -- thanks a lot on the feedback, but I wonder what is the best way to proceed... may be we should at least split them into separate issues so to not breed one lengthy discussion on all of them at the same time? Also, if you feel that there is an easy way to improve any aspect (e.g. re "Link to choose a license"), please feel more than welcome to submit a PR to make it happen.

yarikoptic commented 5 years ago

With the above comment in mind here we can try to go through "Page 0" items:

  • "reproducibility requires knowledge of what, when, and how" --> I think repetition or re-running requires this, replication or reproduction is a bit of another concept that can use bits of each of these pieces of information, but is broader and relies on much more (at least by the definitions I attribute to each). Would be good to clarify terms often; the paper I link in ReproNim/module-intro#6 is helpful here.

FTR, the paper in question is Reproducibility vs. Replicability: A Brief History of a Confused Terminology . An interesting review, and as you can see from it -- there is still no consensus! IMHO the part of the problem is that in all of those cases we are trying to come up with a single word to cover some corner of the multi-dimensional space defined by at least following dimensions:

Note that in ACM's description for "repeatability" it is the "Same team, same experimental setup" and there is even no term for "Same team, different experimental setup" -- what is it then? probably "Reproducibility". So, if you can guarantee that your setup is 100% the same -- we could start using term "repeatable", but I am not sure if that we would help more than confuse.

Similar to "Goodman et al. (2016)" I feel that "reproducibility" is a good generic term to use at any level or combination of variants and at any level, and it needs additional description to signify the level of reproducibility. You are right that typically "reproducibility" and "replicability" refers to study level results and in this section we are talking about very elementary, down to earth aspects which help to achieve reproducibility "from ground up". The idea is that if someone cannot reproduce what they have done a day or a week ago which produced their results (given the "measurement error" if any allowed) - how they could reproduce the entire study? If we teach students to become more efficient in repetitive tasks, managing computing environments, etc we would assist in taking control over those "lower" dimensions above.

So, "what, when, and how" to me is merely a colloquium way to say that we need to know details of the environment, study, analysis; although "when" might often be irrelevant unless you are (like I did many times) doing "data archaeological" expedition trying to figure out "how" things were actually done? knowing "when" assists in pointing that point in history/lab notebook/bash history/etc.

We could indeed refer to that paper here, but I think it might be better in the intro, against which you filed the recommendation already. If you feel that we could adjust wording in this section somewhat to make it more specific, please suggest how (PR).

Sorry, somehow it came out too long ;-)

yarikoptic commented 5 years ago
  • clarify intention: who is this for? True beginners, or people that know a bit? Is the goal just to have them know basic infrastructure they should use in this space, or understand why using the shell and git are important?

Well, we have tried to answer "for whom" and "why" in the opening of http://www.repronim.org/module-reproducible-basics/00-Overview/

As for shell/git specifically -- those are argued for in corresponding sections. Again, if you see possibly how to improve, please suggest in a PR

yarikoptic commented 5 years ago
  • "very unlikely that you have managed to completely avoid using those tools" --> fix language; feels a bit alienating for those who truly are new to this

For me it sounds like a perfect Russian with Chinese influence English -- we are all friends and not trying to alienate anyone (at least recently). If you feel a better wording would make us all cuddle even more -- please suggest it. Or I would make it even longer Russian English with some Finish influence (doing lots of saunas recently) and then not sure where it would lead us ;-)

gkiar commented 5 years ago

@yarikoptic thanks so much for all the responses! I'll give a longer read later but wanted to acknowledge your responses :)

I did this review in prep for the instructor training ahead of HBM, so was planning to discuss changes then and make changes after that. Happy to do some before, of course!

yarikoptic commented 5 years ago

If we fix it all up before - more time for gelato and wine will be left for us at ohbm ;-) I am yet to also see what parts/changes we need to pull in from our recent workshops, eg https://github.com/ReproNim/sfn2018-training/tree/gh-pages/_episodes

gkiar commented 5 years ago

And regards to your long-response on reproducibility clarification - I agree with all of your points :)

I think it's important to squarely address that it is a confused terminology, though, rather than just accept that it's cloudy, and point out that words are overloaded and we mean _____ when we're talking about these things.

gkiar commented 5 years ago

I will definitely open PRs for these various points - I apologize for not clarifying these were more notes for me than "tasks" for anybody else :) I didn't know of a better place to keep these notes than on the repo!