datacarpentry / spreadsheet-ecology-lesson

Data Organization in Spreadsheets for Ecologists
https://datacarpentry.org/spreadsheet-ecology-lesson
Other
37 stars 141 forks source link

Formatting examples contradict each other in 01 and 02 #227

Closed hoytpr closed 5 years ago

hoytpr commented 6 years ago

There is a conflict between 02-common-mistakes.md where it says: "However, metadata should not be contained in the data file itself. Unlike a table in a paper or a supplemental file, metadata (in the form of legends) should not be included in a data file since this information is not data, and including it can disrupt how computer programs interpret your data file. Rather, metadata should be stored as a separate file in the same directory as your data file, preferably in plain text format with a name that clearly associates it with your data file."

But in 01-format-data.md it says: "...create a new file or tab with your cleaned or analyzed data. Don't modify the original dataset, or you will never know where you started! keep track of the steps you took in your clean up or analysis. You should track these steps as you would any step in an experiment. You can do this in another text file, or a good option is to create a new tab in your spreadsheet with your notes. This way the notes and data stay together. This might be an example of a spreadsheet setup:"

First, the 02-common-mistakes.md clearly says not to use multiple tabs. Second, the example shown in 01-format-data.md has multiple tabs in the spreadsheet, and the information being stored is essentially metadata.

We should decide on one of these methods and be consistent (separate file for metadata, or new tab). I prefer to put metadata in separate files, but it might be easier to soften the language in 02-common-mistakes.md to include using multiple tabs. Also the image in 01-format-data.md shows six tabs. I'd be happy to try and fix these differences if we can agree on the best solution.

amandawhitmire commented 6 years ago

Here is what I propose to do:

1) Align the language in 02-common-mistakes.md regarding use of tabs. Not only is 01 in conflict with 02 (as @hoytpr mentions), but two of the Key Points in 02 are in conflict with each other: "Avoid spreading data across multiple tabs (but do use a new tab to record data cleaning or manipulations)," and "Record metadata in a separate plain text file." I would modify the first Key Point and the narrative section regarding tabs in the text above. I would keep the section on "Inclusion of metadata in data table" as-is.

2) Update Figure 1 in 01-format-data.md to show notes in a text file, not in a new tab of the working spreadsheet, and update narrative text in "Keeping track of your analyses" to align with that advice. Also update Key Point regarding tracking steps to mention using a text file.

If this is acceptable to @hoytpr, @ErinBecker, @fmichonneau, @cbahlai, I'd be happy to submit PRs for these updates.

fmichonneau commented 6 years ago

That seems reasonable to me!

hoytpr commented 6 years ago

Sounds great to me also.

JDCampbell301 commented 6 years ago

I am an instructor-in-training and am not fully familiar with the full content of this course. From a broader data management best practices perspective, putting metadata into an unstructured plain text file (or tab) is good, but may not be best.

At the risk of opening a can of worms, the metadata could be stored in a file conforming to a metadata standard such as Ecological Metadata Language,. EML Exploratory processing and analysis may be too early to justify formal documentation in an established format. However, knowing the desired ending format is helpful in knowing what metadata is best recorded contemporaneously.

hoytpr commented 6 years ago

Hi @JDCampbell301, Community input is always welcome. The EML link you posted leads to a project that was stopped several years ago, but still, your point is good. Learners in this lesson are usually beginners in data formatting, and sometimes the simplest solution isn't the most elegant. Probably introducing a new set of tools and standards would be better in a more advanced lesson. But thanks for your input, and good luck with your instructor training!

hoytpr commented 5 years ago

Closing #227 but relevant to other metadata issues.