giannimonaco / flowAI

3 stars 6 forks source link

Biorad S3 Cell Sorter - dimnames error #9

Closed rwbaer closed 2 years ago

rwbaer commented 2 years ago

I tried running your timestamp updated code you posted github (https://github.com/giannimonaco/flowAI/commit/5ed7dc6cd1cf6536d7b97c78434a12e146fbd232)

on a small flowset, and I got an error and a warning I don't understand. The first file in the set is the one I previously supplied. The second file caused an error.

Error in names(dimnames(sub_exprs)[[2]]) <- sprintf("$P%sN", 1:NN) : 'names' attribute [22] must be the same length as the vector [9] Is this related to my reading in only a subset of the columns using the column.pattern flag or is it something else? I've shared a few zipped biorad files so that you can reproduce the code sequence I used. If you need more examples, I'm glad to share .

In addition, I get a warning about the expression set not being time ordered that seems concerning to me, but I guess it is just a warning. Does it suggest a bigger problem?

In addition: Warning message: In ord_fcs_time(set[[i]], timeCh) : Expression data in the file M0_CD68-CD36-CCR7-CD16_S1.fcs were not originally ordered by time.

The console output follows:

> # Autogate example
> library(flowCore)
> library(flowWorkspace)
> library(openCyto)
> library(ggcyto)
> # Clean the data
> devtools::install_github("giannimonaco/flowAI")
Skipping install of 'flowAI' from a github remote, the SHA1 (5ed7dc6c) has not changed since last install.
  Use `force = TRUE` to force installation
> library(flowAI)
> 
> #Load data keeping only channels of interest (see column.pattern)
> myfiles <- list.files(path="H:/Flow Cytometry Analysis/JenTest/Samples/", pattern = ".fcs", ignore.case = TRUE)
> fs <- read.flowSet(myfiles, path="H:/Flow Cytometry Analysis/JenTest/Samples/", 
+                    column.pattern = "(TIME_LSW)|(FSC-HEIGHT)|(-AREA)")
> # More standard name for time channel
> colnames(fs)[1] = "TIME"
> 
> # use built in spillover correction
> spillmatrix = spillover(fs[[1]])$"$SPILLOVER"
> fs_comp <-compensate(fs, spillmatrix)
> 
> # Clean the data
> fs_comp_clean <- flow_auto_qc(fs_comp, 
+                    timeCh ="TIME", 
+                    timestep = 0.0000001,
+                    second_fractionFR=0.1 ,
+                    ChExcludeFS = c("FSC","SSC"))
Quality control for the file: M0_CD68-CD36-CCR7-CD16_S1
1.56% of anomalous cells detected in the flow rate check. 
0% of anomalous cells detected in signal acquisition check. 
0% of anomalous cells detected in the dynamic range check. 
Quality control for the file: M0_CD68-CD36-CCR7-CD16_S2
0.84% of anomalous cells detected in the flow rate check. 
0% of anomalous cells detected in signal acquisition check. 
0.01% of anomalous cells detected in the dynamic range check. 
Error in names(dimnames(sub_exprs)[[2]]) <- sprintf("$P%sN", 1:NN) : 
  'names' attribute [22] must be the same length as the vector [9]
In addition: Warning message:
In ord_fcs_time(set[[i]], timeCh) :
  Expression data in the file M0_CD68-CD36-CCR7-CD16_S1.fcs were not originally ordered by time.

BioradS3eFiles.zip

giannimonaco commented 2 years ago

I just fixed the issue. The error was due to the fact that you were subsetting the columns of the FCS files when loading them as flowSet. flowAI was not able to handle this properly when it was trying to add a new QC parameter. Now flowAI should be able to handle this too.

The warning message "Expression data in the file ... were not originally ordered by time" is generally not a problem. In general when you get this error, only some rows of the FCS files are not ordered by time. In case the time for each event was set correctly by the instrument, it does not matter anyway that the rows are not ordered by time. However, if you always get this error, it might be worth exploring why the instrument does not order the rows by time, to be sure that this is normal.

rwbaer commented 2 years ago

@giannimonaco Thumbs up and thanks! This fix does the trick for both flowsets and cytosets. As a side effect I'm now able to tell that only 3 of my 8 files have times that are out of temporal order. Way less worrisome.

A couple of quick follow-up questions if I might:

cs_comp_clean <- flow_auto_qc(cs_comp, 
                              timeCh ="TIME", 
                              timestep = 0.0000001,
                              second_fractionFR=0.1 ,
                              ChExcludeFS = c("FSC","SSC"))

I'm wondering about the use of the assignment operator here. This code chunk produces a cytoset that has a shorter exprs() than the parent set. Similarly for a flowset. This seems to verify that we have the "cleaned data". Is this a deep copy or a reference set with anomolies removed? I'm imagining a reference set with anomalies removed.

The second question I have is related to the best time in the workflow to run flowAI. I noticed that the anomalies were quite a bit higher on unstained samples.. This got me thinking that I was running flowAI before gating out debris. Is this the most sensible time to run it or should it be run on the nonDebris gate? I am assuming the percent of anomalies is higher on unstained samples because the range of staining intensities is lower. Perhaps I am misunderstanding this. Any comments/suggestions?

giannimonaco commented 2 years ago

Great to know that you have no errors now and that not many of your files are out of temporal order!

Regarding your other questions: 1 - The output is not a reference set. When the package was developed, the FCS files are loaded and saved with the functions with the flowCore package, which makes deep copies of the files. 2 and 3 - By default, the R object produced is not the same as the FCS files stored in the resultsQC folder. You can see this described in the "output" argument of the flow_auto_qc function. The default output ('output = 1') produces an R object file that contains only high quality files so that the object can be directly used with other data analysis functions (e.g. flowSOM); and it produces FCS file that still contain the bad quality event together with an extra parameter where you can gate out the bad quality events. The FCS files are saved in this way so that it is possibile to visually evaluate the results on a software like flowJo. In any case you can change the output by changing the arguments: output (as I said already), fcs_highQ and fcs_lowQ.

On Mon, 31 Jan 2022 at 23:09, Rob Baer @.***> wrote:

@giannimonaco https://github.com/giannimonaco Thumbs up and thanks! This fix does the trick for both flowsets and cytosets. As a side effect I'm now able to tell that only 3 of my 8 files have times that are out of temporal order. Way less worrisome.

A couple of quick follow-up questions if I might:

cs_comp_clean <- flow_auto_qc(cs_comp, timeCh ="TIME", timestep = 0.0000001, second_fractionFR=0.1 , ChExcludeFS = c("FSC","SSC"))

I'm wondering about the use of the assignment operator here. This code chunk produces a cytoset that has a shorter exprs() than the parent set. Similarly for a flowset. This seems to verify that we have the "cleaned data". Is this a deep copy or a reference set with anomolies removed? I'm imagining a reference set with anomalies moved.

  • Is it a reference set?
  • Is the assignment produced by this code the same flowframe/cytoframe that is saved/written to the resultsQC folder?
  • Is the ".fcs" file saved in resultsQC a cleaned version of the parent flowframe/cytoframe? These files seem to have an additional column named remove_from_all. What is this column and is it useful for anything?

— Reply to this email directly, view it on GitHub https://github.com/giannimonaco/flowAI/issues/9#issuecomment-1026261236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEBZNKAWNQXFBXLVZFDUY4CAFANCNFSM5NGZJHYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

rwbaer commented 2 years ago

Thank you for all your help in getting me going. I'll mark the issue closed.