ebecht / infinityFlow

26 stars 8 forks source link

Error in quantile.default(data, q) #32

Open Sandyna opened 3 months ago

Sandyna commented 3 months ago

Hi, I've been stuck on this for a while. Some of my datasets give this error. With stricter downsampling, the error didn't show but ever since we changed to input_events_downsampling <- 1e6 and prediction_events_downsampling <- 1e6 some files are problematic. After I trim off margins, different datasets give this error. I found there is a problem if I trim off too much and if I trim off too little as well. We use a compensation matrix too, in case it's relevant.

This is where the program crashes:

Parsing and subsampling input data
    Downsampling to 1e+06 events per input file
uneven number of tokens: 665
The last keyword is dropped.
uneven number of tokens: 665
The last keyword is dropped.
    Concatenating expression matrices
    Writing to disk
Logicle-transforming the data
    Backbone data
Error in quantile.default(data, q) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE
Calls: <Anonymous> ... setNames -> lapply -> FUN -> quantile -> quantile.default
Execution halted

Thank you!

ebecht commented 3 months ago

Hello, It seems like there are NAs or NaNs in your data which I don't think should be happening. How did you handle trimming in your preprocessing ?

Sandyna commented 3 months ago

, How do I find where they are? Do you happen to know with which part of the data the function works?

Trimming was done via https://github.com/saeyslab/PeacoQC/blob/master/vignettes/PeacoQC_Vignette.Rmd by specifying which channels, what values they should have and running the RemoveMargins function.

Sandyna commented 3 months ago

I don't know what is the file that was linked, I'd probably be careful with it if I were you

Yeah, that was really suspicious, thank you.

ebecht commented 3 months ago

It seems that PeacoQC should not be adding NA / NaNs values so that probably isn't what is happening. I guess that if this isn't happening with low downsampling, it's probably caused by a few events with weird values.

I can't really tell you much more without having access to the files / script.

What you could do is navigate to the rds subfolder of the path_to_intermediary_results folder (one of the arguments of the main function), start an R session and load the file xp.Rds using xp = readRDS("xp.Rds"). Then run apply(xp, 2, quantile, 0.95) which should give the same error. Then try to identify which column is causing the issue and why. I think what is happening is that one column has NA / NaN values, so maybe try to identify them with which(is.na(xp), arr.ind = TRUE)

Sandyna commented 2 months ago

I think what is happening is that one column has NA / NaN values

Thank you so much for your help. I think it's a lot more than one column. Seems like every column has at least a handful. I'm pretty new to R, do you please happen to know where in my flow frames is this value located, how to access it?

Sandyna commented 2 months ago

Looks like one of my input files is really weird.

In case anyone else needs it, to look for NaNs in a flowset, I used this:

// Print the index, name of flowFrame containing NaNs. Print the NaNs' locations.

find_nan_in_fs <- function (fs){  
    for (i in 1:length(fs)) {
        if (any(is.na(fs[[i]]@exprs))) {
            print(i)
            print(fs[[i]]@description$GUID.original)
            print(which(is.na(fs[[i]]@exprs), arr.ind=TRUE))
        }
    }
}