crj32 / M3C

Monte Carlo Reference-based Consensus Clustering
https://bioconductor.org/packages/release/bioc/html/M3C.html
36 stars 15 forks source link

Duplicate row.names error message #6

Open jackgisby opened 4 years ago

jackgisby commented 4 years ago

Originally posted as a stackoverflow question

Whilst attempting to run consensus clustering using M3C, I get an error - my console output (actual row names changed for example code):

running consensus cluster algorithm for real data...
done.
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘ABCDEF’, ‘ABCDGH’ 
> traceback()
6: stop("duplicate 'row.names' are not allowed")
5: `.rowNamesDF<-`(x, value = value)
4: `row.names<-.data.frame`(`*tmp*`, value = newerdes$ID)
3: `row.names<-`(`*tmp*`, value = newerdes$ID)
2: M3Creal(as.matrix(mydata), maxK = maxK, reps = repsreal, pItem = pItem, 
       pFeature = 1, clusterAlg = clusteralg, distance = distance, 
       title = "/home/christopher/Desktop/", des = des, lthick = lthick, 
       dotsize = dotsize, x1 = pacx1, x2 = pacx2, seed = seed, removeplots = removeplots, 
       silent = silent, fsize = fsize, method = method, objective = objective)
1: M3C::M3C(dissADJ, iters = 25, repsref = 1, repsreal = 100, clusteralg = "hc", 
       objective = "PAC", cores = 3)

I ran the equivalent of the following using M3C:

df_wide_matrix  # my expression matrix
any(duplicated(colnames(df_wide_matrix)))  # result = FALSE

M3C::M3C(df_wide_matrix, iters=2, repsref=2, repsreal=2, clusteralg="hc", objective="PAC")

I assumed the issue is caused by the fact the first four characters of each of these features are equal ("ABCD"). I therefore temporarily changed their respective names prior to running M3C:

dup_ids <- which(colnames(dissADJ) %in% c("ABCDEF", "ABCDGH"))
colnames(dissADJ)[dup_ids] <- c("A", "B")

M3C::M3C(df_wide_matrix, iters=2, repsref=2, repsreal=2, clusteralg="hc", objective="PAC")

M3C then runs correctly. This works as a solution, but was wondering if I had missed something or if this is a bug?

hamidghaedi commented 3 years ago

Working on TCGA data , I am getting the same error:

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed

Trying what @Jack mentioned- trimming sample ID initiate with a unique character string, error turned to :

Error in `[.data.frame`(df, neworder2) : undefined columns selected
> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(df, neworder2)
3: df[neworder2]
2: M3Creal(as.matrix(mydata), maxK = maxK, reps = repsreal, pItem = pItem, 
       pFeature = 1, clusterAlg = clusteralg, distance = distance, 
       title = "/home/christopher/Desktop/", des = des, lthick = lthick, 
       dotsize = dotsize, x1 = pacx1, x2 = pacx2, seed = seed, removeplots = removeplots, 
       silent = silent, fsize = fsize, method = method, objective = objective)
1: M3C(pro.vst, des = clin, removeplots = FALSE, iters = 25, objective = "PAC", 
       fsize = 8, lthick = 1, dotsize = 1.25)
CamGriffiths commented 3 years ago

I got the same error as @hamidghaedi while running M3C. I managed to track it down to the following line of code (line 476 on the M3C.R file):

df <- data.frame(m_matrix)

Many of my sample names (column names) started with a number and the data.frame() function added an "X" to the beginning of each name that started with a number ("1" becomes "X1"). This caused a mismatch with the names listed in neworder2.

To get around this problem, I changed all of my sample names to start with a letter and M3C is now running correctly.

Edit: This workaround can be easily applied by using the data.frame() function on your input dataset before running M3C.

hamidghaedi commented 3 years ago

Cool. I will try this solution. If you mind, please post your solution on StackOverflow entry also: https://stackoverflow.com/questions/65010759/clustering-by-m3c-package-error-in-data-framedf-neworder2-undefined-c