cherrypi / Science-Fair_2019

Vernal Pond graphing and data, as well as data analysis.
1 stars 0 forks source link

Generating new, derived columns from one or more columns in your data.frame #10

Open VCF opened 5 years ago

VCF commented 5 years ago

Ok, here's some sample code:

## I am buying a bunch of widgets. There are three types of widgets,
## each costs a different amount, and I am getting a different quantity
## for each. I am building the data frame "by hand", rather than loading
## from a file.
myData <- data.frame(widgetName=c("Alpha","Beta","Gamma"),
                     widgetCount=c(1,3,5),
                     widgetCost=c(7,11,13))
## Look at our order:
print(myData)

## Calculate the total cost for each widget:
myData$subtotal <- myData$widgetCount * myData$widgetCost
## Look at the updated order:
print(myData)

## Look at just the subtotal. It's just a numeric vector
str(myData$subtotal)

## We could calculate a grand total if we wanted:
message("Your grand total for all three widget types is ", sum(myData$subtotal))

## What if we had some boxes, where we measured the sides in inches?

myBoxes <- data.frame(x=c(2,3,4),
                      y=c(1,2,2),
                      z=c(5,3,8))
## Volume is just the product of all three sides...
myBoxes$cubicInches <- myBoxes$x * myBoxes$y * myBoxes$z

## What if we wanted this in cubic feet?
## A cubic foot is 12 * 12 * 12 cubic inches

myBoxes$cubicFeet <- myBoxes$cubicInches / (12 * 12 * 12)

## Let's see what we ended up with:

print(myBoxes)

I want you to look at what I've done above to make derived columns, and think about how you can use these methods to calculate some useful values for your pond.

cherrypi commented 5 years ago

What happens with the column that values are being calculated for, does it just get added to the rest of the data at the end, or do I have to make it beforehand, or does something else happen?

VCF commented 5 years ago

Did you try the example code?

cherrypi commented 5 years ago

Did it (i think)! For reference name of area and volume columns are: Area and Volume

VCF commented 5 years ago

Ok ... I'm looking at 1868df0 ... You need to do a few things:

VCF commented 5 years ago

Looking at commit 2093192 - You're putting in comments as if they were a discussion in a paper. That's not wrong, per se, but it's also not always the best approach. It's often better to break the comments into small, "digestable" bits. For that matter, it's often good to break your code into smaller bits - and in particular, anytime you find yourself re-typing the same code, you should probably "break out" the repeated bit into either a new variable, or in some cases its own function (but we won't work on functions here). Example:

## Bad
c1 <- x1 * 1.07 * 1.05
c2 <- x2 * 1.07 * 1.05
## Better
## We apply a 7% sales tax and a 5% commission for both purchases
c1 <- x1 * 1.07 * 1.05
c2 <- x2 * 1.07 * 1.05
## Best
markup <- 1.07 * 1.05 # 7% sale tax, 5% sales commission
## Calculate total costs of both purchases, with markup
c1 <- x1 * markup
c2 <- x2 * markup