SingleR-inc / SingleR

Clone of the Bioconductor repository for the SingleR package.
https://bioconductor.org/packages/devel/bioc/html/SingleR.html
GNU General Public License v3.0
172 stars 19 forks source link

Astrocytes (mis)classified as adipocytes in Blueprint / ENCODE data #96

Closed PeteHaitch closed 4 years ago

PeteHaitch commented 4 years ago

I'm no biologist, but this doesn't seem right.

suppressPackageStartupMessages(library(SingleR))
ref <- BlueprintEncodeData()
#> snapshotDate(): 2019-10-22
#> see ?SingleR and browseVignettes('SingleR') for documentation
#> loading from cache
#> see ?SingleR and browseVignettes('SingleR') for documentation
#> loading from cache
#> snapshotDate(): 2019-10-22
#> see ?SingleR and browseVignettes('SingleR') for documentation
#> loading from cache
#> see ?SingleR and browseVignettes('SingleR') for documentation
#> loading from cache
colData(ref)[ref$label.fine == "Astrocytes", ]
#> DataFrame with 2 rows and 2 columns
#>              label.main  label.fine
#>             <character> <character>
#> astrocyte    Adipocytes  Astrocytes
#> astrocyte.1  Adipocytes  Astrocytes

Created on 2020-02-13 by the reprex package (v0.3.0)

Session info ``` r sessionInfo() #> R version 3.6.2 (2019-12-12) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Ubuntu 18.04.4 LTS #> #> Matrix products: default #> BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so #> #> locale: #> [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C #> [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 #> [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 #> [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C #> [9] LC_ADDRESS=C LC_TELEPHONE=C #> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C #> #> attached base packages: #> [1] parallel stats4 stats graphics grDevices datasets utils #> [8] methods base #> #> other attached packages: #> [1] SingleR_1.0.5 SummarizedExperiment_1.16.1 #> [3] DelayedArray_0.12.2 BiocParallel_1.20.1 #> [5] matrixStats_0.55.0 Biobase_2.46.0 #> [7] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0 #> [9] IRanges_2.20.2 S4Vectors_0.24.3 #> [11] BiocGenerics_0.32.0 #> #> loaded via a namespace (and not attached): #> [1] Rcpp_1.0.3 lattice_0.20-38 #> [3] assertthat_0.2.1 digest_0.6.23 #> [5] mime_0.8 BiocFileCache_1.10.2 #> [7] R6_2.4.1 RSQLite_2.2.0 #> [9] evaluate_0.14 httr_1.4.1 #> [11] highr_0.8 pillar_1.4.3 #> [13] zlibbioc_1.32.0 rlang_0.4.4 #> [15] curl_4.3 blob_1.2.1 #> [17] Matrix_1.2-18 rmarkdown_2.1 #> [19] BiocNeighbors_1.4.1 AnnotationHub_2.18.0 #> [21] stringr_1.4.0 RCurl_1.98-1.1 #> [23] bit_1.1-15.1 shiny_1.4.0 #> [25] compiler_3.6.2 httpuv_1.5.2 #> [27] xfun_0.12 pkgconfig_2.0.3 #> [29] htmltools_0.4.0 tidyselect_1.0.0 #> [31] tibble_2.1.3 GenomeInfoDbData_1.2.2 #> [33] interactiveDisplayBase_1.24.0 later_1.0.0 #> [35] crayon_1.3.4 dplyr_0.8.4 #> [37] dbplyr_1.4.2 bitops_1.0-6 #> [39] rappdirs_0.3.1 grid_3.6.2 #> [41] xtable_1.8-4 DBI_1.1.0 #> [43] magrittr_1.5 stringi_1.4.5 #> [45] XVector_0.26.0 renv_0.9.2 #> [47] promises_1.1.0 DelayedMatrixStats_1.8.0 #> [49] vctrs_0.2.2 tools_3.6.2 #> [51] bit64_0.9-7 glue_1.3.1 #> [53] BiocVersion_3.10.1 purrr_0.3.3 #> [55] fastmap_1.0.1 yaml_2.2.1 #> [57] AnnotationDbi_1.48.0 BiocManager_1.30.10 #> [59] ExperimentHub_1.12.0 memoise_1.1.0 #> [61] knitr_1.28 ```
LTLA commented 4 years ago

I guess that does look a bit wrong.

We inherited that from the source, so I don't have much insight beyond that. (Searching for "Blueprint encode" just brings us back to this repository.) Perhaps @dviraran may have some thoughts.

If it is indeed an error, I suppose we should update the file on ExperimentHub.

PeteHaitch commented 4 years ago

Were you able to find the source of these annotations? Are they really astrocytes mislabelled as adipocytes or vice versa?

LTLA commented 4 years ago

:man_shrugging:

2 (label.fine and row names) vs 1 (label.main), so I'm going to call them astrocytes. You can verify this by looking for astrocyte markers.