The codes written from L286-317 compute the TPMs with a simple scaling by gene length prior parsing into SingleR. However based on their tutorial, it only mention for reference dataset, and not our test data set.
For the reference dataset, the assay matrix must contain log-transformed normalized expression values. This is because the default marker detection scheme computes log-fold changes by subtracting the medians, which makes little sense unless the input expression values are already log-transformed. For alternative schemes, this requirement may be relaxed (e.g., Wilcoxon rank sum tests do not require transformation); similarly, if pre-defined markers are supplied, no transformation or normalization is necessary.
I will still continue running the codes from L286-317 as discussed but we should keep this in view.
https://github.com/cameliaquek/singlecellmel/blob/8a9661f32f0cbe9573a5b2dd433df251473e6005/scripts/STEP0-CITEseq.Rmd#L286-L317
The codes written from L286-317 compute the TPMs with a simple scaling by gene length prior parsing into SingleR. However based on their tutorial, it only mention for reference dataset, and not our test data set.
http://bioconductor.org/books/release/SingleRBook/using-the-classic-mode.html#choices-of-assay-data
I will still continue running the codes from L286-317 as discussed but we should keep this in view.
Cheers Cam