Vitek-Lab / MSstats

R package - MSstats
74 stars 46 forks source link

MaxQtoMSstatsFormat dumps Charge ==1 ; and fails when Intensity is type Integer64 #40

Closed bpolacco closed 3 years ago

bpolacco commented 4 years ago

Thanks for the great tools we continue to use routinely here. I came across a few issues today while trying to use the MaxQuant utility function. I'd be happy to help implement fixes, but as I say below, my way of doing it may not be how you'd like the code to go.

These could be two different issues here, but as they both relate to the use of dcast/melt to include explicit missing values, their fix could be related as well.

1.) Any row in the evidence file with Charge = 1 will have Charge set to NA (effectively removing the row) by line 391 in MaxQtoMSstatsFormat.R, the last line in the excerpt below

    data_w <- dcast( Proteins + Modified.sequence + Charge + IsotopeLabelType ~ Raw.file, data=d_long, 
                     value.var='Intensity', 
                     fun.aggregate=aggregateFun, na.rm=T, 
                     keep=TRUE) 
    ## keep=TRUE : will keep the data.frame value as 1 even though there is no values for certain feature and certain run.

    ## when there is completely missing in certain feature and certain run, '1' will be filled. Therefore put NA instead of 1.
    data_w[data_w == 1] <- NA

The intention is to replace only Intensity == 1 with NA, but it works over all columns, so Charge == 1 will also be replaced with NA

To test this, look for PrecursorCharge == 1 in any results from MaxQtoMSstatsFormat.R:

nrow(ev.df[ev.df$Charge ==1,]
mssDat <- MSstats::MaxQtoMSstatsFormat(ev.df, keys, protGroups)
nrow(mssDat[mssDat$Charge ==1,])
mssDat[is.na(mssDat$PrecursorCharge),]

2.) If the evidence table has intensity of type Integer64, possible when it is read with fread instead of read.table, the Intensity values will be replaced with garbage such as "4.940656e-324" during the call to reshape2::melt. To reproduce the problem, use fread to read the attached evidence file, or use read.table and convert the intensity to type Integer64.

ev.df <- read.table ("ev.short.txt", header=T, sep = "\t")
ev.df$Intensity <- bit64::as.integer64(Intensity)
# or 
ev.df <- setDF(data.table::fread ("ev.short.txt", check.names = FALSE))

mssDat <- MSstats::MaxQtoMSstatsFormat(ev.df, keys, protGroups)

I've implemented a fix for both of these that avoids dcast/melt that works for my own use. It does make liberal use of data.table functionality. I see that MSstats already requires data.table, but only for a single function rbindlist. If you're happy to have more data.table dependencies, let me know, and I'll prepare a pull request. Otherwise I'll let you code up fixes as you see fit. Thanks!!!

mstaniak commented 3 years ago

please let us know if the problem persists after an update, closing the issue for now