LibraryCarpentry / lc-spreadsheets

Tidy data for librarians
https://librarycarpentry.github.io/lc-spreadsheets/
Other
21 stars 38 forks source link

Conflicting recommendations about text documentation #36

Open carakey opened 5 years ago

carakey commented 5 years ago

The recommendations pertaining to text documentation about the data are inconsistent.

About "Keeping track of your analyses," module 2, Formatting data tables in Spreadsheets states:

[K]eep track of the steps you took in your clean up or analysis. You should track these steps as a scientist would each step in an experiment. You can do this in another text file, or a good option is to create a new tab in your spreadsheet with your notes. This way the notes and data stay together.

About "Inclusion of metadata," module 3, Formatting problems states:

Unlike a table in a paper or a supplemental file, metadata (in the form of legends) should not be included in a data file since this information is not data, and including it can disrupt how computer programs interpret your data file. Rather, metadata should be stored as a separate file in the same directory as your data file, preferably in plain text format with a name that clearly associates it with your data file.

While not referring to exactly the same documentation, both pieces of advice do refer to text documentation about the dataset rather than the data themselves. It seems like best practice would be to store documentation about the dataset in a single location.

shlake commented 5 years ago

Never liked the part about multiple tabs in this lesson. The lesson is about tidying data, and recording changes are important, but the lesson should also emphasize good practices for machine readable spreadsheets and having multiple tabs is not good. When saving as Tab or comma delimited ALL formatting including multi-tabs are not saved. Only good for Excel format

morskyjezek commented 2 years ago

Hi @carakey - thanks for this observation. I realize it has been a while since you posted this issue, but it appears to still be unresolved. As I understand it, there seems to be a contradiction between the better practices of formatting spreadsheets for data analysis (that is, to avoid using tabs) and the advice given in the "Keeping track of your analyses" section, which accurately advises to keep track of changes but also states that a new tab may be used to do this. And as @shlake points out, the tab option might only be possible in Excel formats. This does appear to be misleading, and I see that the issue has been marked as an "enhancement."

Since the issue has been lingering, I'd like to check in and see if this is indeed something we want to address and how to do that. Although I agree these two parts seem contradictory, I wonder if this may be resolved with a clarification in the "Formatting data tables" episode. I would propose to say that "another option" (not a good option, but another option) is to create a new tab for tracking changes, if the changes are being made in Excel and the data is managed in Excel. For any situations where data may be shared or reused or stored for a while, make sure that the changes are exported to a standalone file that accompanies the csv/tsv or desired the sharing format. In fact, we should be foreshadowing or leading up to the advice in the QC episode, which suggests to create a README and offers advice on how to do that.

This would address some of the contradictions and it would also make the lesson clearer. Any thoughts welcomed.

morskyjezek commented 1 year ago

I've checked on the DataCarpentry ecology lesson, and it is much more clear about this issue: create a text file (even better, recommend markdown) to keep track of any data processing actions, and keep this in the same folder as your data file. Their page on this is here: https://datacarpentry.org/spreadsheet-ecology-lesson/01-format-data/index.html .

Given that the lessons are quite similar, I will work on proposing a more unified approach for the lesson, which suggests to keep processing notes in a separate file. If anyone has comments or concerns, please let me know.

jt14den commented 3 months ago

@morskyjezek has this issue progressed any. If you can formulate what needs to be changed, we can flag this issue with help wanted.

morskyjezek commented 3 months ago

Thanks for bumping this up! I'll put together more concrete suggestions and add a help tag! 👍

jt14den commented 1 week ago

@morskyjezek, we're teaching this Monday. I'll submit a PR that follows what DC is doing ecology lesson on a readme. I think that's a good idea.