WubingZhang / MAGeCKFlute

Integrative analysis pipeline for pooled CRISPR functional genetic screens
https://github.com/WubingZhang/MAGeCKFlute
26 stars 11 forks source link

Flute normalization increases beta scores #6

Open NicolasH2 opened 4 years ago

NicolasH2 commented 4 years ago

Hi, I used Flute as a follow up to the mageck pipeline (mageck count -> mageck mle). In comparison to the mle values, the Flute normalization by cell cycle increases the absolute value of the beta scores. So, where my count distribution was between -2 and +2 (output of mle), it is now between -5 and +5 (output of flute). From the vignette I understand that flute should center the distributions of the conditions; I didn't think it would increase the distribution range. Is this a known behaviour or maybe even wanted? I should mention that the conditions' distributions are already quite similar without Flute (not perfect though). Thanks in advance

My handling of Flute is as follows:

df = ReadBeta("path/to/gene_summary.txt")
df2 = NormalizeBeta(df, samples=c(ctrlname, treatname), method="cell_cycle")

The gene_summary.txt file (needed for Flute) was generated by: mageck mle --count-table "path/to/count.txt" --design-matrix "path/to/designmatrix.txt"

The count.txt file (needed for mageck mle) was generated by: mageck count --list-seq "path/to/library.txt" --fastq $fastqlist

WubingZhang commented 4 years ago

Hi, I used Flute as a follow up to the mageck pipeline (mageck count -> mageck mle). In comparison to the mle values, the Flute normalization by cell cycle increases the absolute value of the beta scores. So, where my count distribution was between -2 and +2 (output of mle), it is now between -5 and +5 (output of flute). From the vignette I understand that flute should center the distributions of the conditions; I didn't think it would increase the distribution range. Is this a known behaviour or maybe even wanted? I should mention that the conditions' distributions are already quite similar without Flute (not perfect though). Thanks in advance

My handling of Flute is as follows:

df = ReadBeta("path/to/gene_summary.txt")
df2 = NormalizeBeta(df, samples=c(ctrlname, treatname), method="cell_cycle")

The gene_summary.txt file (needed for Flute) was generated by: mageck mle --count-table "path/to/count.txt" --design-matrix "path/to/designmatrix.txt"

The count.txt file (needed for mageck mle) was generated by: mageck count --list-seq "path/to/library.txt" --fastq $fastqlist

Hi,

The Normalization is required when the data quality is not very good, or there are cell cycle confounding in conditions. If your beta score distributions are already similar, then you can ignore the Normalization step and do the downstream analysis directly. Indeed sometimes the normalization will increase the range of beta scores when the median beta score of cell cycle genes are below 1 (will be scaled to be 1). Therefore, in CRISPR screen data analysis, we identify hits based on the rank of genes or using one/two fold standard deviation as cutoff .

Best