Open tracykteal opened 9 years ago
I have previously tweeted about this.
Why so few tutorials on data cleaning or Windows scientific S/W installs? Once you're done, only a masochist would sit down and write it up.
— Jennifer Bryan (@JennyBryan) November 11, 2014
One of my first papers was about data cleaning, a topic close to my heart.
I completely agree that it's important to make this part reproducible. And for data cleaning it's particularly important to capture motivation (the why and not just the what). For example, the results may be completely reproducible, but why did you remove subject A and not subject B?
Great, and thanks for the link to the paper @kbroman!
A step in many researchers workflow is data cleaning - taking data from public repositories or their own lab output and cleaning it for use in an analysis. Being able to track how that data was cleaned is an important part of making the research reproducible, but there aren't currently many 'how to's' on this process or the importance of this step. It would be interesting to discuss including a module on data cleaning in a reproducible research workshop, or developing one that we can point to on line.
One example would be a module for using OpenRefine reproducibly.