Open imarin79 opened 4 years ago
hi Isaac,
The by argument in the merge function specifies the column by which to match the two data.frames when merging.
> expressDF <- data.frame(Genes=c("PTBP1","PTBP2","PTBP3"),
+ expression=c(10,100,200))
>
> lenDF <- data.frame(Genes=c("PTBP1","PTBP2","PTBP3"),
+ length=c(10000,1020,200000))
>
>
> merge(expressDF,lenDF,by=1)
Genes expression length
1 PTBP1 10 10000
2 PTBP2 100 1020
3 PTBP3 200 200000
>
> merge(expressDF,lenDF,by="Genes")
Genes expression length
1 PTBP1 10 10000
2 PTBP2 100 1020
3 PTBP3 200 200000
>
Where we want to match the data.frames by different column positions or names we specify the by.x and by.y for columns we wish to use for matching in the first and second data.frame.
> expressDF <- data.frame(Genes=c("PTBP1","PTBP2","PTBP3"),
+ expression=c(10,100,200))
>
> lenDF <- data.frame(IDS=c("ID121","ID122","ID123"),
+ Symbols=c("PTBP1","PTBP2","PTBP3"),
+ length=c(10000,1020,200000))
>
>
> merge(expressDF,lenDF,by.x=1,by.y=2)
Genes expression IDS length
1 PTBP1 10 ID121 10000
2 PTBP2 100 ID122 1020
3 PTBP3 200 ID123 200000
>
> merge(expressDF,lenDF,by.x="Genes",by.y="Symbols")
Genes expression IDS length
1 PTBP1 10 ID121 10000
2 PTBP2 100 ID122 1020
3 PTBP3 200 ID123 200000
Hi Matt,
Thanks again for the great presentation last Friday. I am currently doing the exercises of factors and data frames. I have a question about the "by" function. Specifically in the question "Create a data frame containing only those gene names common to all data frames with all information from Annotation and the expression from Sample 1 and Sample 2", I do not quite understand the meaning of by.x=2 and by.y=1. Does it refer to the number of columns to merge between sample 1 and 2? Which columns are those? Many thanks