davidliwei / mageck

Experimental source code for MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout)
Other
17 stars 9 forks source link

how to obtain normalized (median or total) data file from filtered raw counts? #3

Open BhaktiDwivedi opened 3 years ago

BhaktiDwivedi commented 3 years ago

Hi,

Thank you for developing mageck. I am using it to quantify expression from paired-end sgRNA fastq files:

mageck count -l Human_GeCKOv2_Library_combine.csv -n treatment_median --pdf-report --sample-label C,C,C,TR,TR,TR --fastq file1_R1.fastq file2_R1.fastq file3_R1.fastq --fastq-2 file1_R2.fastq file2_R2.fastq file3_R2.fastq

I tried all normalization methods (total, control) using --norm-method and looks like median normalization is working best with my data. However, I am getting no essential genes that are below FDR of 5% after running the following:

mageck mle -k treatment_median.count.txt --norm-method median -d designmat -n treatment_median --cnv-norm CCLE_copynumber_byGene_2013-12-03_NCIH460_LUNG.txt --sgrna-efficiency Human_GeCKOv2_Library_combine_eff.txt --sgrna-eff-name-column 1 --sgrna-eff-score-column 3 --control-sgrna negativeControl_sgRNA_list.txt Here, I input the 'raw counts' data file obtained from mageck count not the normalized counts.

I thought perhaps I should filter lowly expressed genes (genes with < 8 raw reads less than 50% samples) or genes with very few sgRNA (<4) from the raw counts, then normalize and run differential analysis. I have the filtered raw count file. Is there a way to just get normalized counts using filtered raw counts file as input? I can run mageck mle with filtered raw counts data file using --norm-method median, but it does not output the normalized data file. How can I get this data?

Appreciate any help. Thank you.