datacarpentry / spreadsheet-ecology-lesson

Data Organization in Spreadsheets for Ecologists
https://datacarpentry.org/spreadsheet-ecology-lesson
Other
37 stars 141 forks source link

Sorting could be an Exercise #119

Closed ErinBecker closed 7 years ago

ErinBecker commented 7 years ago

In 04-quality-control.md under "Sorting" there is an exercise that isn't marked as one. The lines starting with "When you do this sort . . ." and "Try sorting" could be called out as an exercise.

zjsteyn commented 7 years ago

We have used it in both workshops that I have been involved in, what worked will in the first workshop was a "clean" version of the species data that was supplied to the participants. This saved time and gave the same data to work with to participants, I've got a clean version of 2013 Field data that could possibly be added as a separate download to base the question on.

ErinBecker commented 7 years ago

Thanks @zjsteyn, could you share this data?

zjsteyn commented 7 years ago

Please find attached,

The data is tabular but still have errors that will work will in a sorting or a data validation exercise. The weight_grams column still has cells with no data and the letter "g" in some of the cells.

A sorting exercise will show how outliers short to the top or bottom, it can also work will with a data validation exercise where data validation can circle values that do not conform. I will draft two scenarios for consideration.

The xlsx file. survey_data_messy_quality_control.xlsx

zjsteyn commented 7 years ago
  1. The note to the "1980" data on line 102, also seem to be in isolation as there is no 1980 tab in the original data provided.
  2. The second exercise may be a bit much as it requires a lot of steps, and I'm not sure what the equivalent may be in LibreOffice Calc, the sort Exercise may have more in common.
  3. I will be able to adapt into a pull request if found useful.

Possible Validation Exercise using Sort

Exercise 1

Challenge: Sort the weight_grams column in your spreadsheet program from Smallets to Largest. What do you notice?

(Remember to expand your sort in order to prevent data corruption. Expanding your sort ensures that the the all the data in one row move together instead of only sorting a single column in isolation.)

Solution

screen shot 2017-04-02 at 2 31 38 pm

Note how the outliers in the case of the 2013 data sort to the bottom of the tabular data. The cells containing no data values as well as the cells where the letter "g" was included can be found towards the bottom of the tabular data.

screen shot 2017-04-02 at 2 32 51 pm 1

{: .solution}

{: .challenge}

Possible Validation Exercise using Data Validation

Exercise 2

Challenge: Utilising Data Validation (Excel) or Validity (Libre Office Calc) to validate whether the values entered in the Weigh_grams colum are valid integers (whole numbers) between 1 and 150.

Solution

In order to identify all the values that are not whole numbers, you have to select all the cells in the Weight_grams colum and click on Data Validation. You then need to alter the Data Validation input screen to resemble the following screenshot.

screen shot 2017-04-02 at 2 48 20 pm

After clicking on OK you need to select Circle Invalid Data from the Data Validation drop down.

screen shot 2017-04-02 at 2 53 13 pm

Note how all the cells where values are not whole numbers are indicated once you run the validation function on them.

screen shot 2017-04-02 at 2 52 34 pm

{: .solution}

{: .challenge}

zjsteyn commented 7 years ago

Will be refining during 7th Bug BBQ

ErinBecker commented 7 years ago

@zjsteyn I figured out how to update the data link and merged this. Thanks for your contribution, it's great!