HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 31 forks source link

sce2fcs: SCE to 'flowFrame/Set' - Error in readFCSdata(con, offsets, txt, transformation, which.lines, scale, #152

Closed PaulineMaby closed 4 years ago

PaulineMaby commented 4 years ago

I would like to export fcs files having the UMAP and tsne channels after running the Cytof workflow (Nowicka et al). 
 First I’ve tried to create a flowFrame or a flowset using the CATALYST fonction sce2fcs, I have an error that I do not know how to manage. Could please help me going through?

My_FlowFrameKeep <- sce2fcs(sce, split_by = NULL, keep_cd = TRUE, keep_dr = TRUE, assay = "counts") Error in readFCSdata(con, offsets, txt, transformation, which.lines, scale, : $PnRNAis larger than R's numeric limit:1.79769313486232e+308 In addition: Warning message: In readFCSdata(con, offsets, txt, transformation, which.lines, scale, : NAs introduced by coercion

HelenaLC commented 4 years ago

Hard to point down the issue from this... Could you please post sce, head(colData(sce)), head(reducedDim(sce)) (so I can get an idea about the data...), and (always!) the output of your sessionInfo()? Thanks!

PaulineMaby commented 4 years ago

Thanks a lot for your reply. Please find the info bellow. Thanks for your hlep Best

sce class: SingleCellExperiment dim: 36 811774 metadata(4): experiment_info cluster_codes SOM_codes delta_area assays(2): counts exprs rownames(36): CD16 CD19 ... Granulysin CD127 rowData names(4): channel_name marker_name marker_class used_for_clustering colnames: NULL colData names(4): sample_id condition patient_id cluster_id reducedDimNames(1): UMAP altExpNames(0):

head(colData(sce)) DataFrame with 6 rows and 4 columns sample_id condition patient_id cluster_id

1 05t2 t2 Patient05 100 2 05t2 t2 Patient05 47 3 05t2 t2 Patient05 100 4 05t2 t2 Patient05 98 5 05t2 t2 Patient05 49 6 05t2 t2 Patient05 56 head(reducedDim(sce)) [,1] [,2] [1,] NA NA [2,] NA NA [3,] NA NA [4,] NA NA [5,] NA NA [6,] NA NA sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS High Sierra 10.13.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] flowCore_2.0.1 CATALYST_1.12.2 SingleCellExperiment_1.10.1 [4] SummarizedExperiment_1.18.2 DelayedArray_0.14.1 matrixStats_0.57.0
[7] Biobase_2.48.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.2
[10] IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0

loaded via a namespace (and not attached): [1] ggbeeswarm_0.6.0 TH.data_1.0-10 Rtsne_0.15
[4] colorspace_1.4-1 rjson_0.2.20 ellipsis_0.3.1
[7] rio_0.5.16 ggridges_0.5.2 circlize_0.4.10
[10] cytolib_2.0.3 XVector_0.28.0 GlobalOptions_0.1.2
[13] base64enc_0.1-3 BiocNeighbors_1.6.0 clue_0.3-57
[16] rstudioapi_0.11 hexbin_1.28.1 CytoML_2.0.5
[19] ggrepel_0.8.2 fansi_0.4.1 mvtnorm_1.1-1
[22] xml2_1.3.2 codetools_0.2-16 splines_4.0.2
[25] scater_1.16.2 jsonlite_1.7.1 cluster_2.1.0
[28] png_0.1-7 graph_1.66.0 compiler_4.0.2
[31] drc_3.0-1 assertthat_0.2.1 Matrix_1.2-18
[34] cli_2.0.2 BiocSingular_1.4.0 tools_4.0.2
[37] ncdfFlow_2.34.0 rsvd_1.0.3 igraph_1.2.5
[40] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.3
[43] flowWorkspace_4.0.6 reshape2_1.4.4 dplyr_1.0.2
[46] ggcyto_1.16.0 Rcpp_1.0.5 carData_3.0-4
[49] cellranger_1.1.0 vctrs_0.3.4 DelayedMatrixStats_1.10.1
[52] stringr_1.4.0 openxlsx_4.2.2 irlba_2.3.3
[55] lifecycle_0.2.0 gtools_3.8.2 XML_3.99-0.5
[58] zlibbioc_1.34.0 MASS_7.3-53 zoo_1.8-8
[61] scales_1.1.1 RProtoBufLib_2.0.0 hms_0.5.3
[64] RBGL_1.64.0 sandwich_3.0-0 RColorBrewer_1.1-2
[67] ComplexHeatmap_2.4.3 yaml_2.2.1 curl_4.3
[70] gridExtra_2.3 ggplot2_3.3.2 latticeExtra_0.6-29
[73] stringi_1.5.3 plotrix_3.7-8 zip_2.1.1
[76] BiocParallel_1.22.0 shape_1.4.5 rlang_0.4.7
[79] pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-41
[82] purrr_0.3.4 cowplot_1.1.0 tidyselect_1.1.0
[85] plyr_1.8.6 magrittr_1.5 R6_2.4.1
[88] generics_0.0.2 nnls_1.4 multcomp_1.4-14
[91] pillar_1.4.6 haven_2.3.1 foreign_0.8-80
[94] survival_3.2-7 abind_1.4-5 RCurl_1.98-1.2
[97] FlowSOM_1.20.0 tibble_3.0.3 tsne_0.1-3
[100] crayon_1.3.4 car_3.0-10 viridis_0.5.1
[103] jpeg_0.1-8.1 GetoptLong_1.0.3 grid_4.0.2
[106] readxl_1.3.1 data.table_1.13.0 Rgraphviz_2.32.0
[109] ConsensusClusterPlus_1.52.0 forcats_0.5.0 digest_0.6.25
[112] RcppParallel_5.0.2 munsell_0.5.0 viridisLite_0.3.0
[115] beeswarm_0.2.3 vipor_0.4.5

HelenaLC commented 4 years ago

Okay, finally figured it out. The "issue" is that runDR supports running dimensionality reductions on a subset of cells; thus reducedDims contain NAs. I traced the issue back to flowCore::flowFrame(), which seems to not support encoding of NAs, hence the initially confusion but now-makes-sense error $PnRNAis larger than R's numeric limit:1.79769313486232e+308.

Depending on what you're interested in, there are two alternative workarounds (or you could do both):

  1. drop cells for which no UMAP coordinates are available via

    xy <- reducedDim(sce, "UMAP")
    sce <- sce[, !is.na(xy[, 1])]
    sce2fcs(...)
  2. set NAs to some value and thus keep all the data via... For this option, (-)Inf UMAP coordinates should of course be excluded for plotting, and depending on what you do outside of R I am not sure how simple it is to do this.

    xy <- reducedDim(sce, "UMAP")
    reducedDim(sce, "UMAP")[is.na(xy)] <- Inf 
    # ...or -Inf (I think these are best to distinguish from "real" coordinates)
    sce2fcs(...)
PaulineMaby commented 3 years ago

Sorry for my late answer. Thank you very much this works:

xy <- reducedDim(sce, "UMAP") sce <- sce[, !is.na(xy[, 1])]

Best, Pauline

Le ven. 23 oct. 2020 à 11:31, Helena L. Crowell notifications@github.com a écrit :

Okay, finally figured it out. The "issue" is that runDR supports running dimensionality reductions on a subset of cells; thus reducedDims contain NAs. I traced the issue back to flowCore::flowFrame(), which seems to not support encoding of NAs, hence the initially confusion but now-makes-sense error $PnRNAis larger than R's numeric limit:1.79769313486232e+308.

Depending on what you're interested in, there are two alternative workarounds (or you could do both):

  1. drop cells for which no UMAP coordinates are available via

xy <- reducedDim(sce, "UMAP") sce <- sce[, !is.na(xy[, 1])] sce2fcs(...)

  1. set NAs to some value and thus keep all the data via... For this option, (-)Inf UMAP coordinates should of course be excluded for plotting, and depending on what you do outside of R I am not sure how simple it is to do this.

xy <- reducedDim(sce, "UMAP") reducedDim(sce, "UMAP")[is.na(xy)] <- Inf

...or -Inf (I think these are best to distinguish from "real" coordinates)

sce2fcs(...)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HelenaLC/CATALYST/issues/152#issuecomment-715223341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKKWNMZ4IJ55XRCFQ2W4IH3SMFEN3ANCNFSM4SMTVBJA .

-- Pauline Maby, Ph.D. Postdoctoral Research Associate in Tumor Immunology INSERM UMRS1138, Integrative Cancer Immunology Cordeliers Research Center 15 Rue de l'Ecole de Medecine 75006 Paris, France

Office: +33 1 4427 9096 Fax: +33 1 4427 8117 Mobile France: +33 6 9803 2544 mabypau@gmail.com