Question about the by argument in the merge function

hi Isaac,

The by argument in the merge function specifies the column by which to match the two data.frames when merging.

> expressDF <-  data.frame(Genes=c("PTBP1","PTBP2","PTBP3"),
+                          expression=c(10,100,200))
> 
> lenDF <-  data.frame(Genes=c("PTBP1","PTBP2","PTBP3"),
+                          length=c(10000,1020,200000))
> 
> 
> merge(expressDF,lenDF,by=1)
  Genes expression length
1 PTBP1         10  10000
2 PTBP2        100   1020
3 PTBP3        200 200000
> 
> merge(expressDF,lenDF,by="Genes")
  Genes expression length
1 PTBP1         10  10000
2 PTBP2        100   1020
3 PTBP3        200 200000
>

Where we want to match the data.frames by different column positions or names we specify the by.x and by.y for columns we wish to use for matching in the first and second data.frame.

> expressDF <-  data.frame(Genes=c("PTBP1","PTBP2","PTBP3"),
+                          expression=c(10,100,200))
> 
> lenDF <-  data.frame(IDS=c("ID121","ID122","ID123"),
+                      Symbols=c("PTBP1","PTBP2","PTBP3"),
+                          length=c(10000,1020,200000))
> 
> 
> merge(expressDF,lenDF,by.x=1,by.y=2)
  Genes expression   IDS length
1 PTBP1         10 ID121  10000
2 PTBP2        100 ID122   1020
3 PTBP3        200 ID123 200000
> 
> merge(expressDF,lenDF,by.x="Genes",by.y="Symbols")
  Genes expression   IDS length
1 PTBP1         10 ID121  10000
2 PTBP2        100 ID122   1020
3 PTBP3        200 ID123 200000

RockefellerUniversity / Intro_To_R_1Day

Question about the by argument in the merge function #15