datacarpentry / spreadsheet-ecology-lesson

Data Organization in Spreadsheets for Ecologists
https://datacarpentry.org/spreadsheet-ecology-lesson
Other
37 stars 141 forks source link

Add pointer to good versioning strategies #209

Closed agbeltran closed 5 years ago

agbeltran commented 7 years ago

The lesson 01 on formatting data tables in spreadsheets (http://www.datacarpentry.org/spreadsheet-ecology-lesson/01-format-data/) emphasizes "create a new file or tab with your cleaned or analyzed data. Don’t modify the original dataset, or you will never know where you started!". Adding a pointer to ways of versioning the data and relying on versioning systems would be useful (even if it is not the intent to cover those topics in this lesson).

hoytpr commented 6 years ago

You have a good point @agbeltran , but from a life-science perspective, this would not be the proper time to introduce GIT or GITHUB, it's just too complex and command-line oriented for biologists to grasp easily. Because we are basically working in a GUI interface, and the point is to enter data correctly, my opinion is just making sure everyone is aware of the concept of backing up your data (and how easy it is) in a spreadsheet is sufficient.

You've made several good suggestions, and thanks! There are several things that may need to be improved, but my understanding of one goal here is to "not add" more topics (due to time constraints).

hoytpr commented 6 years ago

My comment wasn't meant to be final, I really think @agbeltran has a good point, as tidy data is version-controlled... right? We could always add a comment like "For information on how to maintain version control over your data, look at our lesson on 'Git'". I was hoping to get more discussion about this from @agbeltran or @ErinBecker or @fmichonneau or @cbahlai

amandawhitmire commented 6 years ago

@hoytpr this discussion reminds me of the issue you started, "Formatting examples contradict each other in 01 and 02 #227", although this is versioning and your issue is related to where to put metadata. They are both related to whether or not to use tabs in spreadsheet programs (I say no! ;-) ). Your suggestion on Mar 19 is a good one for this particular issue, and overall a clarified message on the use of tabs (for any reason) would be very helpful to learners.

hoytpr commented 6 years ago

@amandawhitmire , @agbeltran thanks very much for your excellent input. Maintainers don't merge their own PRs So If one of you has a short descriptor, link, or even just a strong opinion, put in a PR and all the maintainers (myself + @ErinBecker or @fmichonneau or @cbahlai) can work on closing this. I'd like to get as many of these issues closed as possible. My personal opinion is that all spreadsheets are essentially early versions of a future relational database. As such, having tables is good, but having tabs could cause problems.

amandawhitmire commented 6 years ago

Thanks, @hoytpr - I'm learning the processes here, so this is very helpful feedback.

Seems like if we add your comment from Mar 19 and clarify the recommendation re. use of tabs, we can close both issues. I'd be happy to take a stab and submit PRs. After that, the maintainers close the issues?

hoytpr commented 6 years ago

That's correct. So stab that PR and be proud!

hoytpr commented 5 years ago

Closing #209