BaselAbujamous / clust

Automatic and optimised consensus clustering of one or more heterogeneous datasets
Other
161 stars 36 forks source link

Log2 fold change nromalization -- help #27

Closed bioinfo17 closed 5 years ago

bioinfo17 commented 5 years ago

Hi Basel,

Congratulations for developing this great tool, very helpful. I just wanted to confirm if we need to give any normalization codes while using log2 fold change expression values as input. I believe no normalization is required. In the manual, normalization is recommended for log2 RNA-seq TPM and FPKM but not for log2 fold change expression values. The log2fold change values were calculated using T0 as controls.

Specifically, the data is in the format: gene_id T0 T1 T2 T3 T4 MSTRG.21649.1 0 -17.99461767 -17.99461767 -17.99461767 -17.99461767 MSTRG.18239.1 0 -20.38068299 -20.38068299 -20.38068299 20.38068299 MSTRG.6149.1 0 -18.56707533 -18.56707533 19.56707533 16.56707533 MSTRG.6144.1 0 -17.17941598 -17.17941598 -17.17941598 -17.17941598 MSTRG.21338.1 0 -16.79450764 -12.51669354 -16.79450764 -12.45173893 MSTRG.19827.1 0 -16.30894521 -16.30894521 -16.30894521 -16.30894521 MSTRG.13234.1 0 -16.3043002 -8.283472721 -16.3043002 -7.745002345

Please advise. Many thanks.

BaselAbujamous commented 5 years ago

Hi,

Thanks for using Clust and for your question.

I guess it makes sense to calculate z-scores (normalisation code 4). This is because if:

gene1 has fold changes: 0 1 2 3 4 and gene2 has fold changes: 0 2 4 6 8

I would say that these two genes are co-expressed because they increase with the same ratios. Z-score normalisation (code 4) sorts this out.

All the best Basel