cherrypi / Science-Fair_2019

Vernal Pond graphing and data, as well as data analysis.
1 stars 0 forks source link

Clean up your data table #3

Closed VCF closed 5 years ago

VCF commented 5 years ago

It turns out you do have some issues with your data. If you look at the str() report, you'll see something like this:

> str(Pond_Data)
'data.frame':   141 obs. of  11 variables:
 $ Date               : Factor w/ 137 levels " "," 4:15\\",..: 30 31 32 33 34 35 36 37 38 39 ...
 $ Rain.in.           : Factor w/ 49 levels ""," 4:30\\"," Algae",..: NA 21 21 21 21 NA 21 NA NA NA ...
 $ Depth.cm.          : Factor w/ 62 levels ""," Algae"," around pond",..: NA 40 39 48 42 NA 46 NA NA NA ...
 $ South              : Factor w/ 48 levels ""," Some measurements were difficult to read due to ice and snow",..: NA 34 31 33 34 NA 35 NA NA NA ...
 $ North              : Factor w/ 43 levels ""," Clear sky",..: NA 10 42 42 12 NA 4 NA NA NA ...
 $ West               : num  NA 55 54 25 25 NA 19 NA NA NA ...
 $ East               : num  NA 50 56 50 51 NA 65 NA NA NA ...
 $ Area               : Factor w/ 44 levels "","10160","14",..: NA 31 2 43 29 41 36 21 17 18 ...
 $ Temperature..F..Max: Factor w/ 43 levels ""," Warm\\","-1",..: 39 36 32 36 18 35 30 22 22 18 ...
 $ Temperature..F..Min: Factor w/ 51 levels ""," Snow/ice melting\\",..: 7 8 5 4 NA NA NA 15 15 15 ...
 $ Other.Observations : Factor w/ 30 levels ""," Algae"," Algae\\",..: 30 NA NA NA 1 1 1 1 1 1 ...

We'll get to the "Factor" thing in a bit - that's not so much an issue as a fundamental, but confusing, part of R. You now have a "data.frame" (which I'll call "DF") holding your hard-won data. DFs have a few features / rules:

Let's start by dealing with the Factor issue. We can talk about what factors are later, but for now just know that you don't want them here. You want to update your read.csv so that strings are not automatically converted to factors. Look at ?read.csv to see how to set stringsAsFactors to FALSE and then push the changes.

VCF commented 5 years ago

Ok, I see one problem. I told you to use \ for escapes, but that seems to have been bad advice. You should remove all slashes (eg Frozen\, Algae) and instead quote the whole value (eg "Frozen, Algae").

Per our discussion, you can also remove the $Area column - you will be using R to calculate that dynamically from your data.

VCF commented 5 years ago

Alright, things are improving with 706e7a4, but you still have some problems. Let's open a new issue to deal with them.