SciLifeLab / NGI-RNAseq

Nextflow RNA-Seq Best Practice analysis pipeline, used at the SciLifeLab National Genomics Infrastructure.
https://ngisweden.scilifelab.se/
MIT License
51 stars 42 forks source link

edgeR_heatmap_MDS.r: duplicated column name(s) #235

Closed omerfaruk84 closed 6 years ago

omerfaruk84 commented 6 years ago

When I was trying to run the pipeline with 36 samples, it was giving an error while running the edgeR_heatmap_MDS.r script. The error was at the merging multiple data frames into a single one. It was stating that something like x has some duplicated column name(s): x. Please remove or rename the...

merge.all <- function(x, y) {
  merge(x, y, all=TRUE, by="Geneid")
}
data <- data.frame(Reduce(merge.all, temp))

I was able to overcome this issue by changing the strategy to merge the dataframes. I used library(plyr) so added following lines:

if (!require("plyr")) {
  install.packages("gplots", dependencies=TRUE, repos='http://cloud.r-project.org/')
  library("plyr")

and merged via

data <- data.frame(join_all(temp))

also modified following part

# Convert data frame to edgeR DGE object
dataDGE <- DGEList( counts=data.matrix(data[,6:length(data)]) )

I hope this will be helpful to someone.

ewels commented 6 years ago

Great stuff @omerfaruk84 - thanks for this! I take it you had some samples with identical file names / sample names somehow? This is definitely something we should look into fixing in the pipeline.

FYI, development of this pipeline has now moved to https://github.com/nf-core/rnaseq, this repository will soon be archived.

Phil

ewels commented 6 years ago

This was fixed in https://github.com/nf-core/rnaseq/pull/34 and is now available in the latest nf-core/rnaseq release 🎉