datacarpentry / spreadsheet-ecology-lesson

Data Organization in Spreadsheets for Ecologists
https://datacarpentry.org/spreadsheet-ecology-lesson
Other
37 stars 141 forks source link

Various lesson improvements in a separate repo #296

Closed anenadic closed 3 weeks ago

anenadic commented 3 years ago

We did a copy (rather than a fork) of this lesson at: https://github.com/Southampton-RSG/spreadsheets-data-organisation-and-management. The main modifications include:

I'd like to invite you to have a look and review this version of the lesson. If you feel that this is something worth merging back into the main lesson, I'd be happy to with you to make a fork that would be suitable for creating a PR.

anenadic commented 3 years ago

Tagging @hoytpr as not sure who the current official maintainers of the lesson are.

hoytpr commented 3 years ago

Hi @anenadic , I'll take a look.

hoytpr commented 3 years ago

Okay, WOW you did a great job @anenadic !!!!

I copied your repo to my home computer to render it, and went through all the lessons. Very impressive!

Setup page was very good. Good use of styles (didn't even know 'hearts' were a thing!) Removed repetitive aspects of setup

Ep. 01 Solution shortened and actually it improves the flow of the lesson removing the reference to an R book is probably best.

You correctly turned "Using Multiple Tables" into an exercise.

Change suggestion Ep. 01-A

The following might be shortened. I currently use this in a class and emphasize that it's computers that need this data, not humans. I emphasize to think of your file differently because it's not just a data-recording file, it's the entry file to your computational analysis.

There is no reason why observations from different sites or different years should not go into a single table. You just need to keep track of them by using new columns (in this examples, one for site and one for year). Keeping data from the same experiment in a single table will help you stick to a consistent data structure, and avoid errors when looking up data from different tables.

Using multiple tables on one page also makes it difficult for humans to keep track of the data - especially if one of the tables is hidden off the edge of the spreadsheet. Multiple tables also increases the risk of using the same column name in multiple places, which will make it significantly harder to clean your data.

Change suggestion Ep. 01-B

I would change: "Whatever the reason, it is a problem if unknown or missing data is recorded as -999, 999, or 0. Statistical programs do not know that these are intended to represent missing (null) values and, because they are valid numbers, they will be included in calculations which will lead to incorrect results. How these values are interpreted will depend on the software you use to analyse your data." TO: "Statistical software will differ on what they interpret as a null value. Make sure you know the correct representation for null in your downstream analyses. If uncertain, using blanks, NA (used in R), or NaN (used in Python) are good options. "

General comments on Ep. 02:

You did a great job explaining these points, but included a lot of extra text. This is probably okay for a lesson that is mostly discussion, but I liked the "Example" + "Solution" format of the topics.

General comments on Ep. 03

You did a great job reformatting this lesson. I couldn't find anything to change, and liked very much the link to changing gene names.

General comments on Ep. 04

Again, you must have spent a lot of time on this and did a great job. This section needed updating and it's good that you used examples from a Mac because the Windows examples were from mixed versions.

I also agree that "A Note on Cross-platform Operability" should be eliminated. This has bothered me since 2016, and we tried to re-explain it in 2019, but it is confusing and just isn't an issue anymore. As such, it's a bad way to end the workshop.


It would be fantastic for you to work on a PR for this and I would be happy to assist in any way.

If you want to work on your repo at https://southampton-rsg.github.io/spreadsheets-data-organisation-and-management/ before submitting a PR, that would be fine with me too.

Thanks very much for such hard and careful work!!!

Peter PS: If we do a major re-write, we will want to pass the final review through the Carpentries CAC

anenadic commented 3 years ago

Hi @hoytpr! Thank you so much for being so prompt and giving me great feedback. I have been annoyed in the past by some of the same things you mention ;-). I am going to have a look at your comments and try and incorporate them in my repo first. Wrt PR - because I modified the logo and a few other things in config - do you think merging from my repo could be relatively easily done (and easier for me but more work for you potentially) or it would require a separate clean fork with me then copying in all the lessons (more work for me)? I am fine either way, just the speed of things would be affected.

anenadic commented 3 years ago

Actually, a new clean fork with me copying in various episodes and bits is probably a better idea - I'd be up for that @hoytpr.

hoytpr commented 3 years ago

Of course! You're correct. Thanks again!

Peter

Peter R. Hoyt Oklahoma State University Genomics Facility Stillwater, OK


From: Aleksandra Nenadic notifications@github.com Sent: Wednesday, November 25, 2020 6:19:38 AM To: datacarpentry/spreadsheet-ecology-lesson spreadsheet-ecology-lesson@noreply.github.com Cc: Hoyt, Peter peter.r.hoyt@okstate.edu; Mention mention@noreply.github.com Subject: Re: [datacarpentry/spreadsheet-ecology-lesson] Various lesson improvements in a separate repo (#296)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe

Actually, a new clean fork with me copying in various episodes and bits is probably a better idea - I'd be up for that @hoytprhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhoytpr&data=04%7C01%7Cpeter.r.hoyt%40okstate.edu%7C0dcb70b5a4e7435c57dc08d8913c61f0%7C2a69c91de8494e34a230cdf8b27e1964%7C0%7C0%7C637419035806476921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TTPCY1u2%2F0%2FNbYODUi8fGIG9EFGCl60ZAovPz%2FGENsE%3D&reserved=0.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatacarpentry%2Fspreadsheet-ecology-lesson%2Fissues%2F296%23issuecomment-733673430&data=04%7C01%7Cpeter.r.hoyt%40okstate.edu%7C0dcb70b5a4e7435c57dc08d8913c61f0%7C2a69c91de8494e34a230cdf8b27e1964%7C0%7C0%7C637419035806476921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=z8j7iR1p7TC4ZHfQN%2FS%2Bqj0fcklIs63R4yqln0An%2BX0%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNDCZA4W37XLV7H5O77BCDSRTY5VANCNFSM4T76CCCQ&data=04%7C01%7Cpeter.r.hoyt%40okstate.edu%7C0dcb70b5a4e7435c57dc08d8913c61f0%7C2a69c91de8494e34a230cdf8b27e1964%7C0%7C0%7C637419035806486918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lmFSUAtU5GujxR53VRxRd5k5LTVcU8NVDtfAQs4cdRI%3D&reserved=0.

anenadic commented 3 years ago

I've just done a PR: https://github.com/datacarpentry/spreadsheet-ecology-lesson/pull/297 @hoytpr. Let me know if I missed anything.

Wrt your comments on Ep. 01-A and Ep. 01-B - I have incorporated them in the text at various places rather than shortening and replacing my original text. This made the new text longer, which is probably not the desirable effect but I felt that, e.g. explaining why putting -999 as a null value is important to novices who might not understand why this may cause problems. It may be obvious to instructors but if we want this lesson to be used in a self-learning manner than perhaps "more is more". Feel free to cut things out when merging - it is ultimately your call.

anenadic commented 3 years ago

Btw. @hoytpr I've done the same thing with the OpenRefine lesson (https://github.com/Southampton-RSG/openrefine-data-cleaning) which was devoid of any screenshots and very hard to pick up by new instructors (let alone novice learners trying to go through it). I have poked one of them (who did the last commit) to do the review but I am not sure who the official maintainers are. If you happen to know them - send them my way :-). Thank you once again for being so prompt!

fontikar commented 3 weeks ago

Hi @anenadic @hoytpr 👋 @metut and I are the new maintainers for this lesson! Thanks for both your contributions, it looks like everything here is all set so we are closing this issue :)

Feel free to reopen if we are have missed anything!