Adding a column to a dataframe

Kirstensos commented 5 years ago

Hello Stirling coding club,

I am having difficulties with a data organisation problem and thought I would try out Github to see if anyone can help me out.

I have data in the following format:

plot | spp | value 
1    | 1   | 0.1    
1    | 2   | 0.2    
1    | 3   | 0.3   
2    | 2   | 0.1    
2    | 3   | 0.2    
3    | 1   | 0.3    
3    | 2   | 0.1    
3    | 3   | 0.2    
3    | 4   | 0.3

I want to calculate the sum of values for each plot, then use this to calculate the percentage contribution of each spp in each plot. I am trying to make a data.frame something similar to this:

plot | spp | value| total| % of total
1    | 1   | 0.1  | 0.6  | 16.7
1    | 2   | 0.2  | 0.6  | 33.3
1    | 3   | 0.3  | 0.6  | 50
2    | 2   | 0.1  | 0.3  | 33.3
2    | 3   | 0.2  | 0.3  | 66.6
3    | 1   | 0.3  | 0.9  | 33.3
3    | 2   | 0.1  | 0.9  | 11.1
3    | 3   | 0.2  | 0.9  | 22.2
3    | 4   | 0.3  | 0.9  | 33.3

I think I need to modify something like this to incorporate a way of sorting by plot:

data$total<- sum(data$value)

I have also tried aggregate() but can't seem to make it work:

data$total<- aggregate(data$value, by=list(data$plot), FUN=sum)

Any ideas would be really helpful. Thanks!

MattGuy57 commented 5 years ago

There might be simpler ways to do it but I would use tidyverse and do something like:

# install package 
require(tidyverse)

# recreate your data set
 x <- tibble(plot = c(1, 1, 1, 2, 2, 3, 3, 3, 3), 
spp = c(1, 2, 3, 2, 3, 1 ,2 , 3, 4), 
value = c(0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3))

# create a new summary data set with the total value for each plot
x2 <- x %>%        
group_by(plot) %>%      
summarise(total = sum(value))         

# join the two data sets together
x3 <- inner_join(x, x2)           

# add in column for % of total
 x4 <- x3 %>% mutate(per_cent = (value/total) * 100)

or if you don't want all those extra tables in your environment you could do it all in one:

x2 <- x %>% 
group_by(plot) %>% 
summarise(total = sum(value)) %>% 
inner_join(x) %>% 
mutate(per_cent = (value/total)*100)

Kirstensos commented 5 years ago

Great! Thanks Matt, that's worked.

jejoenje commented 5 years ago

Hi - sorry late to the party and I realise this has been sorted, but here is another way in "base" R (no extra packages needed).

I like using tapply() for doing this sort of stuff.

First, reconstruct your example:

dat <- data.frame(plot = c(1,1,1,2,2,3,3,3,3),
                  spp = c(1,2,3,2,3,1,2,3,4), 
                  value = c(0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3))

Now change column plot into a factor so we can work with it as a grouping variable:

dat$plot <- factor(dat$plot)

Now we can use tapply() to calculate the sum and sample size in each plot. The first argument in tapply() is the values you want to summarise, the second the grouping variable, and the third is the function you want to use on the values. In this case, we just want to use the sum() function.

totals <- tapply(dat$value, dat$plot, sum)
totals

Note that this gives you a little table with the sum of values for each plot. You can now want to repeat each of these totals by as many samples there are in each plot. We can work the latter out using tapply() too but this time using the length() function. Then we can use the rep() function to create as many totals as we have observations in each plot (the first argument is the values to repeat, the second is how many times we want to repeat each value). We add the result to your dataframe:

n <- tapply(dat$value, dat$plot, length)
n
totals <- rep(totals,n)
dat$totals <- totals
dat

Now the desired fraction is just

dat$value/dat$totals

Or, expressed as a proper percentage, rounded, and added to the dataframe:

dat$perc <- round((dat$value/dat$totals)*100,1)
dat

Kirstensos commented 5 years ago

Thanks Jeroen, I'll take a look at that.

StirlingCodingClub / studyGroup

Adding a column to a dataframe #28