Possible bug in CBCS_normalization_tutorial.pdf

Dear Mr. Bhattacharya,

Either I'm missing something or there is a little bug in the tutorial's R code. In the step "Below limit of detection", I think the aim is to count the number of transcripts with counts below the LOD, per sample. You do this e.g., with

num_hk_blod = colSums(raw[raw$Code.Class == 'Housekeeping',-c(1:2)] < lod)

However, I think doing it this way results in R cycling over the lod vector in a way that LODs from multiple samples are used for measurements within a single sample. This will result in counting errors if LODs vary substantially.

You can easily recreate what I mean in R with the following example:

test_expression <- data.frame(sample1 = c(1, 3, 1, 3, 1),
                              sample2 = c(5, 5, 5, 5, 5))

test_lod <- c(2, 6); names(test_lod) <- c("sample1", "sample2")

test_blod <- colSums(test_expression < test_lod)

If the misunderstanding is on my side, I'm sorry.

Best wishes, Andi

bhattacharya-a-bt / CBCS_normalization

Possible bug in CBCS_normalization_tutorial.pdf #7