Closed bc-anaisabel closed 3 years ago
Can we use the functions in dplyr and tidyverse to group and summarize mean and sd only instead of using function summarySE?
https://rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
We tried to create a for loop using:
###summary(data)
pH <- summarySE(soilvariables, measurevar= "pH", groupvars=c("Site"), na.rm = TRUE)
Pdis <-summarySE(soilvariables, measurevar= "Pdis", groupvars=c("Site"), na.rm = TRUE)
Ca <-summarySE(data, measurevar= "Ca", groupvars=c("Site"), na.rm = TRUE)
Mg <-summarySE(data, measurevar= "Mg", groupvars=c("Site"), na.rm = TRUE)
K <-summarySE(data, measurevar= "K", groupvars=c("Site"), na.rm = TRUE)
Na <-summarySE(data, measurevar= "Na", groupvars=c("Site"), na.rm = TRUE)
H <-summarySE(data, measurevar= "H", groupvars=c("Site"), na.rm = TRUE)
Al <-summarySE(data, measurevar= "Al", groupvars=c("Site"), na.rm = TRUE)
SoilM <-summarySE(data, measurevar= "SoilM", groupvars=c("Site"), na.rm = TRUE)
for (i in c("pH","Pdis","Ca","Mg","K","Na","H","Al","SoilM")){i = vector()
i<- summarySE(data, measurevar= i, groupvars=c("Site"), na.rm = T)}
We need to create a new column for the for loop that's named "Variable" and repeat the name of the variable in each row of the data frame, so it can use that column to use rowbind
later.
We also realized that the vector for the for loop needs to have the dimension of what we need to obtain and we don't know how to do that in R
So I think I figured out the FOR loop, after referencing an issue from a previous semester than Alicia mentioned...
data <-read.csv("soilvariables.csv", row.names = 1)
# extract variable names from data table column names ("soilvariables.csv")
x <- colnames(data[,1:9])
# loop that calculates summary statistics for each variable
# changes the third column name to "mean" instead of variable name
# and adds additional column titled 'Variable' with variable name
for (i in x){
a <- summarySE(data, measurevar= i, groupvars=c("Site"), na.rm = T)
names(a)[names(a) == i] <- "mean"
a$Variable <- i
assign(i,a)}
This results in 9 data frames (3 rows, 7 columns each) that can theoretically be combined and/or summarized using functions in dplyr and tidyverse. I tried to do this, but wasn't sure exactly how you want the finalized data table to look... if you post an example final table, maybe I can help more.
Thanks! This worked. So now what I did was use the command rbind
to get what I wanted, which was not a wide but a long format table:
Bind_soilvar<-rbind(pH, Pdis, Ca, Mg, K, Na, H, Al, SoilM, C, Nit)
The only thing I did so this could work was changing the name of my Nitrogen variable, because my abbreviation read "N" which was confusing for the summarySE function and for dplyr because that was also the name of one of the columns (the column for the number of samples, aka N). So instead of N I used Nit.
So the simplified script looks like this:
# Import data
data <-read.csv("soilvariables.csv", row.names = 1)
# extract variable names from data table column names ("soilvariables.csv")
x <- colnames(data[,1:11])
# loop that calculates summary statistics for each variable
# changes the third column name to "mean" instead of variable name
# and adds additional column titled 'Variable' with variable name
for (i in x){
a <- summarySE(data, measurevar= i, groupvars=c("Site"), na.rm = T)
names(a)[names(a) == i] <- "mean"
a$Variable <- i
assign(i,a)}
# Combine object by row names
Bind_soilvar<-rbind(pH, Pdis, Ca, Mg, K, Na, H, Al, SoilM, C, Nit)
The output looks like this:
I want to create a function to summarize the mean and SD of all the variables I have in a data frame and combine them into one table.
At the moment I am obtaining the values for each of the variables separately using the function
summarySE
from theRmisc
package in R version 3.6.2 and then putting them together by creating a data frame object with all of them.Input data looks like this:
What I am obtaining for each variable looks like this: