RGLab / flowCore

Core flow cytometry infrastructure
43 stars 25 forks source link

How to know that my files are correctly compensated and transformed? #211

Closed algebio closed 2 years ago

algebio commented 3 years ago

Hi, I hope you all are well.

Describe the bug Not really a bug, just a call for guidance.

I have no much experience analysing flow data. A medical doctor who doesn't have much experience either has sent me a dataset from a Symphony analyser. I have compensated and transformed the files but I get weird plots like the one attached below and I don't know how to be sure that my files are correctly pre-processed and ready to be the input of an analysis pipeline. I have tried the flowCore tutorial but I haven't managed to produce a better plot where axes are adjusted. I would appreciate any help to learn how to ensure the quality of my files.

Regards Juan

CD4CD8

To Reproduce Sample dataset
Steps to reproduce the behavior: please use the reprex package to build a reproducible example.


list.files()
#>  [1] "20200120 T Compensation.csv"     "20200130 BIOCONT T_CD45+.fcs"   
#>  [3] "6 PKBDS 200130_Single Cells.fcs" "BF 202 BL T_CD45+.fcs"          
#>  [5] "BF202 ADHOC T FMO_CD45+.fcs"     "BF202 ADHOC T_CD45+.fcs"        
#>  [7] "BF202 BL T FMO_CD45+.fcs"        "BF202 w2 T FMO_CD45+.fcs"       
#>  [9] "BF202 w2 T_CD45+.fcs"            "BF202 w5 T FMO_CD45+.fcs"       
#> [11] "BF202 w5 T_CD45+.fcs"            "BF206 BL T FMO_CD45+.fcs"       
#> [13] "BF206 BL T_CD45+.fcs"            "BF206 w2 T FMO_CD45+.fcs"       
#> [15] "BF206 w2 T_CD45+.fcs"            "BF206 w5 T FMO_CD45+.fcs"       
#> [17] "BF206 w5 T_CD45+.fcs"            "BF206 w8 T FMO_CD45+.fcs"       
#> [19] "BF206 w8 T_CD45+.fcs"            "CD4CD8.png"                     
#> [21] "reprex.R"                        "Tcells_20200120_Comp_matrix.pdf"

# dir <- system.file("extdata", package="FlowSOMworkshop")
dir <- getwd()
files <- list.files(dir, pattern = ".fcs")
files
#>  [1] "20200130 BIOCONT T_CD45+.fcs"    "6 PKBDS 200130_Single Cells.fcs"
#>  [3] "BF 202 BL T_CD45+.fcs"           "BF202 ADHOC T FMO_CD45+.fcs"    
#>  [5] "BF202 ADHOC T_CD45+.fcs"         "BF202 BL T FMO_CD45+.fcs"       
#>  [7] "BF202 w2 T FMO_CD45+.fcs"        "BF202 w2 T_CD45+.fcs"           
#>  [9] "BF202 w5 T FMO_CD45+.fcs"        "BF202 w5 T_CD45+.fcs"           
#> [11] "BF206 BL T FMO_CD45+.fcs"        "BF206 BL T_CD45+.fcs"           
#> [13] "BF206 w2 T FMO_CD45+.fcs"        "BF206 w2 T_CD45+.fcs"           
#> [15] "BF206 w5 T FMO_CD45+.fcs"        "BF206 w5 T_CD45+.fcs"           
#> [17] "BF206 w8 T FMO_CD45+.fcs"        "BF206 w8 T_CD45+.fcs"

#Load an fcs file into a flowFrame

file.path(dir, files[6])
#> [1] "C:/Users/njo47/OneDrive_1_1-25-2021/T panel/20200130 T/BF202 BL T FMO_CD45+.fcs"
ff <- read.FCS(file.path(dir, files[6]),  truncate_max_range = F, transformation=FALSE, min.limit = NULL)
#> Error in read.FCS(file.path(dir, files[6]), truncate_max_range = F, transformation = FALSE, : could not find function "read.FCS"
ff
#> Error in eval(expr, envir, enclos): object 'ff' not found
#flowCore QC

summary(ff)
#> Error in summary(ff): object 'ff' not found

keyword(ff,c("$P1E", "$P2E", "$P3E", "$P4E"))
#> Error in keyword(ff, c("$P1E", "$P2E", "$P3E", "$P4E")): could not find function "keyword"

flowDensity::plotDens(ff, get_channels(ff, c("CD4", "CD8")))
#> Warning: replacing previous import 'flowCore::plot' by 'graphics::plot' when
#> loading 'flowDensity'
#> Error in flowDensity::plotDens(ff, get_channels(ff, c("CD4", "CD8"))): object 'ff' not found

summary(read.FCS(file.path(dir, files[6]),truncate_max_range = F,transformation="scale"))
#> Error in read.FCS(file.path(dir, files[6]), truncate_max_range = F, transformation = "scale"): could not find function "read.FCS"

ff <- read.FCS(file.path(dir, files[6]),truncate_max_range = F,transformation="scale")
#> Error in read.FCS(file.path(dir, files[6]), truncate_max_range = F, transformation = "scale"): could not find function "read.FCS"

flowDensity::plotDens(ff, get_channels(ff, c("CD4", "CD8")))
#> Error in flowDensity::plotDens(ff, get_channels(ff, c("CD4", "CD8"))): object 'ff' not found

# edit markers of interest
markers <- FlowSOM::get_markers(ff, colnames(ff))
#> Error in is.data.frame(x): object 'ff' not found
markers
#> Error in eval(expr, envir, enclos): object 'markers' not found
markers_of_interest <- markers[c(7:22)]
#> Error in eval(expr, envir, enclos): object 'markers' not found

flowDensity::plotDens(ff, get_channels(ff, c("CD4", "CD8")))
#> Error in flowDensity::plotDens(ff, get_channels(ff, c("CD4", "CD8"))): object 'ff' not found
flowDensity::plotDens(ff, get_channels(ff, c("Time", "FSC-H")))
#> Error in flowDensity::plotDens(ff, get_channels(ff, c("Time", "FSC-H"))): object 'ff' not found

# Read compensatino matrix
comp <- read.csv("20200120 T Compensation.csv",
                 check.names = FALSE,
                 row.names = 1)

# compensate the file
ff_comp <- compensate(ff,comp)
#> Error in compensate(ff, comp): could not find function "compensate"

flowTransform <- estimateLogicle(ff_comp, names(markers_of_interest))
#> Error in estimateLogicle(ff_comp, names(markers_of_interest)): could not find function "estimateLogicle"
ff_transformed <- transform(ff_comp, flowTransform)
#> Error in transform(ff_comp, flowTransform): object 'ff_comp' not found

flowDensity::plotDens(ff_transformed, get_channels(ff_transformed, c("CD4", "CD8")))
#> Error in flowDensity::plotDens(ff_transformed, get_channels(ff_transformed, : object 'ff_transformed' not found

# Heatmap of the compensation matrix
pdf("Tcells_20200120_Comp_matrix.pdf", width = 11.7, height = 8.3)
pheatmap::pheatmap(comp[order(colnames(comp)),order(colnames(comp))],
                   cluster_rows = FALSE,
                   cluster_cols = FALSE,
                   display_numbers = TRUE,
                   annotation_names_row = TRUE)
dev.off()
#> pdf 
#>   3

Created on 2021-03-01 by the reprex package (v1.0.0)

Your code here.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

sessionInfo():

Additional context Add any other context about the problem here.

SamGG commented 3 years ago

Hi, it seems the command library(flowCore) is missing. It should be called before any call to a function of the flowCore package. Hope I am not wrong.

algebio commented 3 years ago

Hi Sam

Thank you for your feedback, as always, much appreciated! I have tried running library(flowCore) just in case I forgot at the beginning but I got the same result. I think I am missing something either with the compensation or the transformation of these files.

Regards Juan

SamGG commented 3 years ago

Hi, IMHO the process is stopped at Error in read.FCS(file.path(dir, files[6]), truncate_max_range = F, transformation = FALSE, : could not find function "read.FCS", which is related to loading the library. But maybe the reprex code is incorrect because it didn't include the library call, which does not allow a correct understanding of the problem. I think you should select the code with the library call and redo the reprex. HTH

algebio commented 3 years ago

Oh, I see what you mean, Sam.

I added the libraries to the script, loaded them and runned reprex() again but I get many error messages like "ff <- read.FCS(file.path(dir, files[6]), truncate_max_range = F, transformation=FALSE, min.limit = NULL)

> Error in read.FCS(file.path(dir, files[6]), truncate_max_range = F, transformation = FALSE, : 'C:/Users/njo47/AppData/Local/Temp/RtmpCuQtEi/reprex4ae47f87748f/NA' is not a valid file"

Anyway, when I run the script line by line without reprex(), I get no error messages at all. Any idea why the cell populations in the plot are so far from the center? Could be my mistake when transforming the file or anything they did in the lab when exporting the files?

Regards Juan

SamGG commented 3 years ago

OK, let's forget about reprex in the current case. My first option is to suspect the negative cell population. This population might not be at zero, which is driving estimateLogicle to wrong parameters. I would try a simple asinh transformation with a cofactor such that the positive population is between 4 and 10. Then I would evaluate where is the negative population. As asinh(raw_intensity/cofactor) does not change the zero, the answer would not be changed by the transform.

algebio commented 3 years ago

Hi Samuel

Thank you for your help and sorry to be a pain... I have tried to follow your advice but I got these errors:

ff <- read.FCS(file.path(dir, files[16]), truncate_max_range = F, transformation=FALSE, min.limit = NULL) ff flowFrame object '330f73cc-8856-4390-899a-a009da87d199' with 89953 cells and 23 observables: name desc range minRange maxRange $P1 FSC-A NA 262144 0 262143 $P2 FSC-H NA 262144 0 262143 $P3 FSC-W NA 262144 0 262143 $P4 SSC-A NA 262144 0 262143 $P5 SSC-H NA 262144 0 262143 ... ... ... ... ... ... $P19 561 610_20-A CD56 262144 -81.3227 262143 $P20 561 780_60-A pERK 262144 -90.4746 262143 $P21 640 670_30-A pSTAT3 262144 -111.0000 262143 $P22 640 730_45-A CD69 262144 -88.0213 262143 $P23 Time NA 15100 0.0000 15099 221 keywords are stored in the 'description' slot markers <- FlowSOM::get_markers(ff, colnames(ff)) markers_of_interest <- markers[c(7:22)] asinhTrans <- arcsinhTransform(transformationId="ln-transformation", a=1, b=150, c=1) translist <- transformList(markers_of_interest, asinhTrans) dataTransform <- transform(ff, translist) Error in translist %on% _data : CD45 is not a variable in the flowFrame markers_of_interest 355 379_28-A 355 515_30-A 355 580_20-A 355 740_35-A 405 450_50-A 405 525_50-A 405 610_20-A 405 710_50-A "CD45" "CD4" "CD45RA" "CD8" "pSTAT5" "CD3" "PD-1" "CD25" 405 780_60-A 488 530_30-A 488 695_40-A 561 586_15-A 561 610_20-A 561 780_60-A 640 670_30-A 640 730_45-A "CTLA-4" "pSTAT1" "pSTAT6" "CD27" "CD56" "pERK" "pSTAT3" "CD69" markernames(ff) 355 379_28-A 355 515_30-A 355 580_20-A 355 740_35-A 405 450_50-A 405 525_50-A 405 610_20-A 405 710_50-A "CD45" "CD4" "CD45RA" "CD8" "pSTAT5" "CD3" "PD-1" "CD25" 405 780_60-A 488 530_30-A 488 695_40-A 561 586_15-A 561 610_20-A 561 780_60-A 640 670_30-A 640 730_45-A "CTLA-4" "pSTAT1" "pSTAT6" "CD27" "CD56" "pERK" "pSTAT3" "CD69" translist <- transformList("355 379_28-A 355", asinhTrans) dataTransform <- transform(ff, translist) Error in translist %on% _data : 355 379_28-A 355 is not a variable in the flowFrame

Do you know any online course or tutorial to learn more about compensation and transformation of flow cytometry datasets? I have read and run the flowCore tutorial but I'm still confused. I need to learn more if I want to stop being so dependant of others' help.

Regards Juan

SamGG commented 3 years ago

I got a poor internet connection currently. The vignette of flowCore is the right one. The arcsinhTransform follows the fomula: x<-asinh(a+b*x)+c). So, to put it simply, a=0, c=0, b=1/cofactor=1/150. There is no need to use FlowSOM::get_markers that I don't know exactly. You might use colnames(ff).