Open Kirstensos opened 5 years ago
There might be simpler ways to do it but I would use tidyverse and do something like:
# install package
require(tidyverse)
# recreate your data set
x <- tibble(plot = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
spp = c(1, 2, 3, 2, 3, 1 ,2 , 3, 4),
value = c(0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3))
# create a new summary data set with the total value for each plot
x2 <- x %>%
group_by(plot) %>%
summarise(total = sum(value))
# join the two data sets together
x3 <- inner_join(x, x2)
# add in column for % of total
x4 <- x3 %>% mutate(per_cent = (value/total) * 100)
or if you don't want all those extra tables in your environment you could do it all in one:
x2 <- x %>%
group_by(plot) %>%
summarise(total = sum(value)) %>%
inner_join(x) %>%
mutate(per_cent = (value/total)*100)
Great! Thanks Matt, that's worked.
Hi - sorry late to the party and I realise this has been sorted, but here is another way in "base" R (no extra packages needed).
I like using tapply() for doing this sort of stuff.
First, reconstruct your example:
dat <- data.frame(plot = c(1,1,1,2,2,3,3,3,3),
spp = c(1,2,3,2,3,1,2,3,4),
value = c(0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3))
Now change column plot into a factor so we can work with it as a grouping variable:
dat$plot <- factor(dat$plot)
Now we can use tapply() to calculate the sum and sample size in each plot. The first argument in tapply() is the values you want to summarise, the second the grouping variable, and the third is the function you want to use on the values. In this case, we just want to use the sum() function.
totals <- tapply(dat$value, dat$plot, sum)
totals
Note that this gives you a little table with the sum of values for each plot. You can now want to repeat each of these totals by as many samples there are in each plot. We can work the latter out using tapply() too but this time using the length() function. Then we can use the rep() function to create as many totals as we have observations in each plot (the first argument is the values to repeat, the second is how many times we want to repeat each value). We add the result to your dataframe:
n <- tapply(dat$value, dat$plot, length)
n
totals <- rep(totals,n)
dat$totals <- totals
dat
Now the desired fraction is just
dat$value/dat$totals
Or, expressed as a proper percentage, rounded, and added to the dataframe:
dat$perc <- round((dat$value/dat$totals)*100,1)
dat
Thanks Jeroen, I'll take a look at that.
Hello Stirling coding club,
I am having difficulties with a data organisation problem and thought I would try out Github to see if anyone can help me out.
I have data in the following format:
I want to calculate the sum of values for each plot, then use this to calculate the percentage contribution of each spp in each plot. I am trying to make a data.frame something similar to this:
I think I need to modify something like this to incorporate a way of sorting by plot:
I have also tried aggregate() but can't seem to make it work:
Any ideas would be really helpful. Thanks!