Reproducible-Science-Curriculum / Reproducible-Science-Hackathon-Dec-08-2014

Workshop bringing together individuals interested in developing curriculum, workflows, and tools to strengthen reproducibility in research.
32 stars 3 forks source link

Reproducible research and data cleaning #17

Open tracykteal opened 9 years ago

tracykteal commented 9 years ago

A step in many researchers workflow is data cleaning - taking data from public repositories or their own lab output and cleaning it for use in an analysis. Being able to track how that data was cleaned is an important part of making the research reproducible, but there aren't currently many 'how to's' on this process or the importance of this step. It would be interesting to discuss including a module on data cleaning in a reproducible research workshop, or developing one that we can point to on line.

One example would be a module for using OpenRefine reproducibly.

jennybc commented 9 years ago

I have previously tweeted about this.

kbroman commented 9 years ago

One of my first papers was about data cleaning, a topic close to my heart.

I completely agree that it's important to make this part reproducible. And for data cleaning it's particularly important to capture motivation (the why and not just the what). For example, the results may be completely reproducible, but why did you remove subject A and not subject B?

tracykteal commented 9 years ago

Great, and thanks for the link to the paper @kbroman!