Closed denvercal1234GitHub closed 1 year ago
Hi,
I think R shows warnings at the end of the function call so I would still expect these warnings to be produced by data import. Which is weird because based on the name of the parameters, these appear to have already been produced by regression models. So I am a bit confused about what is going on here ! Any additional details you could share?
Thanks, Etienne
Hi @ebecht - These warnings were produced after the function infinity_flow()
. My input data to infinity_flow() was untransformed FCS files that have been undergone PeacoQC::RemoveMargins()
with minRange = -Inf, maxRange = 2000000
for all channels (except Time and FSC, and SSC).
I used fluorescence-minus-PE as "Blank" for Isotype.
This line all(names(TRIAL21_F37_annotation) %in% list.files(path_to_fcs))
returned FALSE
but all(names(isotypes) %in% list.files(path_to_fcs))
returned TRUE
In the backbone specificiation file, I "discard" FJComp-AF-A -- do you do this too?
I used XGBoost, SVM, LASSO2, and LM.
Below is the code:
TRIAL_fcs_data <- read.flowSet(path=".../FCS_trial", pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE)
channels <- c("FJComp-AF-A",
"FJComp-AF594-A",
"FJComp-APC-A",
"FJComp-APC-Fire 750-A", "FJComp-PE-A",...)
channelSpecs <- list(
"FJComp-AF-A" = c(minRange = -Inf, maxRange = 2000000),
"FJComp-AF594-A"= c(minRange = -Inf, maxRange = 2000000), ...),
"FJComp-PE-A"= c(minRange = -Inf, maxRange = 2000000),...)
TRIAL_fcs_data_RemovedMargin <- list()
for (i in 1:length(TRIAL_fcs_data)){
TRIAL_fcs_data_RemovedMargin[[i]] <- PeacoQC::RemoveMargins(ff=TRIAL_fcs_data[[i]], channels=channels, channel_specifications = channelSpecs, output="frame")
names(F37_CD4CD8DP_Untransformed_TRIAL_fcs_data_RemovedMargin)[i] <- F37_CD4CD8DP_Untransformed_TRIAL_fcs_data[[i]]@description$`$FIL`
}
## Optional converting a list to flowSet for using FlowViz
TRIAL_fcs_data_RemovedMargin_fs <- flowCore::as(TRIAL_fcs_data_RemovedMargin, "flowSet")
My annotation file
My backbone specification
regression_functions <- list(
XGBoost = fitter_xgboost, # XGBoost
## Passed to fitter_nn, e.g. neural networks through keras::fit. See https://keras.rstudio.com/articles/tutorial_basic_regression.html
#NN = fitter_nn,
SVM = fitter_svm, # SVM
LASSO2 = fitter_glmnet, # L1-penalized 2nd degree polynomial model
LM = fitter_linear # Linear model
)
backbone_size <- table(read.csv(TRIAL21_backbone_selection_file)[,"type"])["backbone"]
backbone_size
extra_args_regression_params <- list(
list(nrounds = 500, eta = 0.05),
list(type = "nu-regression", cost = 8, nu=0.5, kernel="radial"),
list(alpha = 1, nfolds=10, degree = 2),
list(degree = 1)
)
if(length(regression_functions) != length(extra_args_regression_params)){
stop("Number of models and number of lists of hyperparameters mismatch")
}
imputed_data <- infinity_flow(
regression_functions = regression_functions,
extra_args_regression_params = extra_args_regression_params,
path_to_fcs = "..../FCS_trial",
path_to_output = "..../TRIAL_21FCS/output",
path_to_intermediary_results = "..../TRIAL_21FCS/tmp",
backbone_selection_file = TRIAL21_backbone_selection_file,
annotation = targets,
isotype = "Blank",
input_events_downsampling = Inf,
prediction_events_downsampling = 1000
verbose = TRUE,
#Note: there is an issue with serialization of the neural networks and socketing since I updated to R-4.0.1. If you want to use neural networks, please make sure to set cores = 1L
cores = cores,
neural_networks_seed = NULL
)
The thing is I don't see extra arguments in flowCore::write.FCS about data truncation, so I am a bit puzzled by what you are showing. Are you sure the input files do not contains channels named CD244.LASSO2
for instance ?
In the backbone specificiation file, I "discard" FJComp-AF-A -- do you do this too?
I don't have much experience with spectral flow cytometry so I don't really know the answer to that question. My intuition is that it is a potentially informative feature of the cells so it may be worth keeping, unless it somehow biases prediction...
Thanks @ebecht for your response. Definitely, the input FCS do not have any channels with .LASS02
.
Hi @ebecht -- From flowCore
it looks like it was issues with PnR
and data truncation that happened in the read.FCS
instead of write.FCS
(https://github.com/RGLab/flowCore/issues/169).
From https://support.bioconductor.org/p/130629/, it was advised to specify range(fr, type = "data")
after read.FCS()
to view the min/max taken from the actual data.
fr <- read.FCS(fcs_file_path)
range(fr) # will default to instrument range
range(fr, type = "data") # Will use the actual values, as in summary
But this would only be read-only. Responses from https://github.com/RGLab/flowCore/issues/169 clarified that data is potentially already truncated in the read.FCS
and we would just need to switch truncation off when reading the file read.FCS(fcs_file_path, truncate_max_range=F)
.
Would you mind informing me how we could set truncate_max_range=F
within infinity_flow()
?
Interestingly, when I checked the codes of infinity_flow()
(https://github.com/ebecht/infinityFlow/blob/master/R/00_master.R), line 16 indicates that the truncate_max_range
was already set to FALSE
when reading FCS into inifnity_flow()
.
And in the default values for infinity_flow()
, looks like this has already been set to FALSE
?
extra_args_read_FCS = list(emptyValue = FALSE, truncate_max_range = FALSE,
ignore.text.offset = TRUE)
Perhaps, this issue here helps?
Hi @ebecht -- I realized also that only when I set input_events_downsampling <- Inf
that the "Warning about Some data values of '.....LASSO2' channel exceed its $PnR value 722220591 and will be truncated!" appeared.
When I set input_events_downsampling <- 2000
, there was no Warning. And, it appears that only markers predicted by LASSO2
that have this warning.
Q1. If the value of a channel really exceed 722,220,591, would not we want to remove those events anyway?
Q2. From the output plots for some markers like CD99, the predictions across the 4 models in file thata did not have CD99-PE Ab did not match well compared to the actual staining in file that has CD99-PE Ab. But, for some other markers, the prediction was okay. Is there a quantiative way to decide which model is better for which marker (instead of manually inspecting by plotting)? Or if we should pick 1 model for all files, is there a more quantitative way to decide which model is best for which FCS files?
Thanks for your help!
Hi @denvercal1234GitHub
As I was suggesting in my first reply, this warning sounds indeed like it is produced by flowCore::read.FCS
. The only places where this happens in the code is when input data is read (see here). These input files should not have a channel called something.LASSO2.
As for models, we showed in our paper that LASSO2 (2nd degree polynomial models) were quite bad in this context and shouldn't be used. In general we recommended XGBoost. SVMs, NNs and XGBoost give overall very similar results so it does not matter much which one you choose.
Thanks @ebecht for your response. That is very strange because indeed my input file does not have any channel called something.LASSO2, but infinity_flow
still threw this warning. Yet, the warning only happens when input_events_downsampling <- Inf
and did not happen when input_events_downsampling <- 2000
for example.
That is likely because when you use 2000 events you randomly get rid out of the problematic ones.
In any case, I'd encourage you to not use linear and polynomial models which we have shown to be less accurate, so if the warnings are not produced by the other models I think you can safely ignore them.
Great. Thank you, Etienne!
Hi there,
Thanks again for the package.
During the Exporting results, I encountered a Warning as below for almost all markers even though I used
PeacoQC::RemoveMargin
to filter out events greater than 2e+06 before runninginfinityFlow
. My data was acquired by spectral cytek Aurora.Do you usually set
truncate_max_range = FALSE
in this scenareo? If so, how might I do so as theinfinity_flow
is just one function?The argument
extra_args_read_FCS
is for reading in input but I did not find analogous argument for the output FCS.Thank you for your help.