Why additional normalisation of input sample is required when input data is already normalised before selecting high variance genes for ML.

dncR / MLSeq-archived

This repository is moved to archive. See "dncR/MLSeq" repository for current changes, issues and bug fixes.

5 stars 1 forks source link

Why additional normalisation of input sample is required when input data is already normalised before selecting high variance genes for ML. #8

Open nayanvs opened 1 year ago

nayanvs commented 1 year ago

Why normalization by the following is required when I already include normalized and batch effect-adjusted data for the training? deseq-vst deseq-rlog deseq-logcpm tmm-logcpm

dncR commented 1 year ago

Hello @NayanVS. The classify function of MLSeq offers several normalization and transformation alternatives, as well as another option to use them untouched which is your case. The argument "normalize" has an option "none"; however, it is not included in the "preProcess" argument, which should be included obviously. I will check the source codes for this issue and fix it.

Thank you, Best.

nayanvs commented 1 year ago

Hello Prof. @dncR, thank you for your feedback. I was able to run the chunk by replacing the preProcessing argument with the normalize="none" argument. I was wondering how appropriate is this.

Also, I have found that your algorithm is equally effective in other sequencing-based classification problems of biological samples and is not limited to expression counts. But I think you are aware of that since you have mentioned CHIP-seq etc in vignette.