RMolania / TCGA_PanCancer_UnwantedVariation

45 stars 15 forks source link

Merged dataset with TCGA-KIRC #3

Open kevinblighe opened 1 year ago

kevinblighe commented 1 year ago

Hi @RMolania, thank you for developing this elaborate approach to batch correction. I am currently testing RUV-III on a merged dataset comprising TCGA-KIRC bulk RNA-seq and another study. However, after running the RUV_III_PRPSfunction, I find that the 'batch corrected' dataset is actually the exact same as the raw counts dataset. I am running this function as follows:

ruviii.norm <- RUV_III_PRPS(
        Y = t(log2(ruv.data.input + 1)),
        M = ruv.rep.matrix,
        ctl = negative.control.genes,
        k = 0,
        average = TRUE,
        return.info = FALSE)
ruviii.prps.norm <- t(ruviii.norm[1:ncol(dds),])

Prior to this, ~950 negative control genes are selected across my biological condition of interest, and I cannot see any issues in the pseudo-replicates part.

I also found that the RUV_III_PRPSfunction returned errors when return.info = TRUE. I checked the function code and am not sure that it is capturing every possible eventuality that an end-user may try, which results in errors.

Any thoughts?

Kind regards, Kevin

RMolania commented 1 year ago

Hi @kevinblighe , thanks for your question. I would say that the problem comes from k=0 in the RUV-III-PRPS code. You need to specify a number for K. K is the number of unwanted factors to use. if k= 0, means no adjustment is made