datacarpentry / spreadsheet-ecology-lesson

Data Organization in Spreadsheets for Ecologists
https://datacarpentry.org/spreadsheet-ecology-lesson
Other
37 stars 141 forks source link

file system for keeping track of your data / analysis #192

Closed DanielleQuinn closed 3 years ago

DanielleQuinn commented 7 years ago

In the "Formatting Data Tables in Spreadsheets" segment of the lesson, it suggests using multiple tabs in a spreadsheet file when you are cleaning up or modifying your data. I worry that having all of your original and modified data, along with your notes about how / why the data have been modified in a single file worrisome and perhaps a little clunky. If I may, I'd like to put forward an alternate method, just as a potential discussion point.

What I teach students (and practice myself) is to create a folder called "Data" in the appropriate location (i.e. Thesis > Data). Within that folder, I create three sub folders:

  1. Original Data, which contains a read-only copy of the original data file,
  2. Working Data, which contains the most up-to-date, currently-being-analyzed version of the data, including any changes or modifications that may have occurred, and
  3. Archived Data, which contains the iterations of the data that have occurred between the original and the working data, each with the date that it was archived and the working data updated.

I have a text file in the Data folder that documents what changes occurred at each date / iteration.

When using R, for example, the data is always imported from the Working Data folder and thus is always using the most up to date version of the data. This also ensures that if the Working Data folder is being shared or used by multiple people, there is no chance of accidentally making changes to or analyzing a previous version of the data, as it is now stored separately in the Archived Data folder and is identified as no longer being used by the date in the file name.

Thoughts or suggestions are more than welcome!

-DQ

hoytpr commented 6 years ago

DQ, your idea sounds like a good one, and I've told my grad student to do much the same (so she brings me the good stuff!). My understanding is that we're focusing on keeping tidy data in spreadsheets, not focusing so much on tidy folder organization. I'd be happy for others to chime in.