SingleR-inc / SingleR

Clone of the Bioconductor repository for the SingleR package.
https://bioconductor.org/packages/devel/bioc/html/SingleR.html
GNU General Public License v3.0
177 stars 19 forks source link

Processing of the default reference sets #192

Closed alexandruioanvoda closed 1 year ago

alexandruioanvoda commented 3 years ago

SingleR is awesome and really user friendly! Thank you for this!

I was wondering whether the expression values in the HPCA and BlueprintENCODE references are normalised by library size? It seems so for HPCA (the colSums for which range very little, between 115k and 121k). However I wasn't sure for BPE (the colSums for which range between 24k and 44k, even though they seem to be fractional numbers so definitely not counts either. Is there a supplemental somewhere that describes how the datasets were processed (is it TPM, FPKM, TMM-normalised)?

@dviraran

j-andrews7 commented 1 year ago

They are indeed "normalized", though it's not entirely clear how.

The celldex scripts to create the objects for each mentions this:

https://github.com/LTLA/celldex/blob/master/inst/scripts/1.0.0/make-hpca-data.Rmd https://github.com/LTLA/celldex/blob/master/inst/scripts/1.0.0/make-blueprint_encode-data.Rmd

You'll have to go back to the legacy repo or original publication to get any further details.