Open carakey opened 5 years ago
Never liked the part about multiple tabs in this lesson. The lesson is about tidying data, and recording changes are important, but the lesson should also emphasize good practices for machine readable spreadsheets and having multiple tabs is not good. When saving as Tab or comma delimited ALL formatting including multi-tabs are not saved. Only good for Excel format
Hi @carakey - thanks for this observation. I realize it has been a while since you posted this issue, but it appears to still be unresolved. As I understand it, there seems to be a contradiction between the better practices of formatting spreadsheets for data analysis (that is, to avoid using tabs) and the advice given in the "Keeping track of your analyses" section, which accurately advises to keep track of changes but also states that a new tab may be used to do this. And as @shlake points out, the tab option might only be possible in Excel formats. This does appear to be misleading, and I see that the issue has been marked as an "enhancement."
Since the issue has been lingering, I'd like to check in and see if this is indeed something we want to address and how to do that. Although I agree these two parts seem contradictory, I wonder if this may be resolved with a clarification in the "Formatting data tables" episode. I would propose to say that "another option" (not a good option, but another option) is to create a new tab for tracking changes, if the changes are being made in Excel and the data is managed in Excel. For any situations where data may be shared or reused or stored for a while, make sure that the changes are exported to a standalone file that accompanies the csv/tsv or desired the sharing format. In fact, we should be foreshadowing or leading up to the advice in the QC episode, which suggests to create a README and offers advice on how to do that.
This would address some of the contradictions and it would also make the lesson clearer. Any thoughts welcomed.
I've checked on the DataCarpentry ecology lesson, and it is much more clear about this issue: create a text file (even better, recommend markdown) to keep track of any data processing actions, and keep this in the same folder as your data file. Their page on this is here: https://datacarpentry.org/spreadsheet-ecology-lesson/01-format-data/index.html .
Given that the lessons are quite similar, I will work on proposing a more unified approach for the lesson, which suggests to keep processing notes in a separate file. If anyone has comments or concerns, please let me know.
@morskyjezek has this issue progressed any. If you can formulate what needs to be changed, we can flag this issue with help wanted
.
Thanks for bumping this up! I'll put together more concrete suggestions and add a help tag! 👍
@morskyjezek, we're teaching this Monday. I'll submit a PR that follows what DC is doing ecology lesson on a readme. I think that's a good idea.
The recommendations pertaining to text documentation about the data are inconsistent.
About "Keeping track of your analyses," module 2, Formatting data tables in Spreadsheets states:
About "Inclusion of metadata," module 3, Formatting problems states:
While not referring to exactly the same documentation, both pieces of advice do refer to text documentation about the dataset rather than the data themselves. It seems like best practice would be to store documentation about the dataset in a single location.