cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium
https://cumc.github.io/xqtl-protocol/
MIT License
38 stars 42 forks source link

leafcutter_norm output produce data that should not be there #391

Open hsun3163 opened 1 year ago

hsun3163 commented 1 year ago

after running leafcutter on the STAR_output of xqtl_protocol_data, the output is:

hs3163@node74:/mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing$ cat output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz_phenotype_file_list.txt
#chr    #dir
chr12   output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz.qqnorm_chr12
chr16   output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz.qqnorm_chr16
chr21   output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz.qqnorm_chr21
chr7    output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz.qqnorm_chr7
chr22   output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz.qqnorm_chr22
chr14   output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz.qqnorm_chr14
chr1    output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz.qqnorm_chr1
chr2    output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz.qqnorm_chr2

However, the bam file should only contains reads from chr21 and chr22. It is unknown whether this is a leafcutter issue or a alignment issue or a subsetting issue. Will need to see after post processing the rnaseqc output.

hsun3163 commented 1 year ago

Result for leafcutter are as followed:

hs3163@node74:/mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing$ head output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz_raw_data.qqnorm.txt | cut -f 1,2,3,4,5,6
#Chr    start   end     ID      Sample_1..md    Sample_2..md
chr7    44796793        44799247        chr7:44796793:44799247:clu_1602_+       -0.375137722329233      -0.676953120121036
chr7    44796793        44801287        chr7:44796793:44801287:clu_1602_+       -0.07519121744185064    -0.058230583849770104
chr7    44799874        44801287        chr7:44799874:44801287:clu_1602_+       0.47132103915795354     0.8293914740026799
chr7    44799874        44801354        chr7:44799874:44801354:clu_1602_+       0.04872372340298696     0.07684699337903445
chr1    92837630        92840551        chr1:92837630:92840551:clu_1_+  0.03798209293523225     0.0851291257010744
chr1    92837633        92840551        chr1:92837633:92840551:clu_1_+  0.526485443476123       0.6697043621526297
chr14   19062466        19064583        chr14:19062466:19064583:clu_3_+ 0.11832189039402831     0.023941763430948903
chr14   19062466        19065968        chr14:19062466:19065968:clu_3_+ -0.009493383220677543   -0.1233111713641115
chr14   19062466        19066001        chr14:19062466:19066001:clu_3_+ -0.6249147595342097     0.002476500085801292
hsun3163 commented 1 year ago

As it turn out, similar incident happend in the rnaseq analysis as well. As demonstrated below are the number of genes that are called and passed QC in each chromosome:

hs3163@node83:/mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing$ for i in `ls output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr*.mol_phe.bed.gz` ; do zcat $i | wc ; echo $i ; done
      7     371    6471
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr1.mol_phe.bed.gz
      5     265    4522
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr10.mol_phe.bed.gz
      3     159    2494
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr11.mol_phe.bed.gz
      5     265    4526
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr12.mol_phe.bed.gz
      5     265    4505
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr13.mol_phe.bed.gz
      6     318    5413
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr14.mol_phe.bed.gz
      6     318    5601
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr15.mol_phe.bed.gz
      2     106    1504
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr16.mol_phe.bed.gz
      3     159    2482
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr17.mol_phe.bed.gz
      5     265    4533
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr18.mol_phe.bed.gz
      7     371    6480
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr19.mol_phe.bed.gz
     12     636   11492
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr2.mol_phe.bed.gz
      3     159    2484
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr20.mol_phe.bed.gz
    415   21995  409519
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr21.mol_phe.bed.gz
    891   47223  878025
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr22.mol_phe.bed.gz
      4     212    3530
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr3.mol_phe.bed.gz
      2     106    1490
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr5.mol_phe.bed.gz
      6     318    5466
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr6.mol_phe.bed.gz
      5     265    4556
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr7.mol_phe.bed.gz
      3     159    2530
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr8.mol_phe.bed.gz
      4     212    3522
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chr9.mol_phe.bed.gz
      5     265    4455
output/rnaseq/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.chrX.mol_phe.bed.gz