KChen-lab / Monopogen

SNV calling from single cell sequencing
GNU General Public License v3.0
85 stars 18 forks source link

GermCalling Target Markers #48

Closed ArunaNannapaneni closed 6 months ago

ArunaNannapaneni commented 8 months ago

For germcalling, the example output had 10,000 markers but when I ran it I only got 635, and I ran it with the example data. Is there a reason for the difference?

Plus when I ran ld refinement on putative somatic SNVs, my graph also had a lot less data point.

jinzhuangdou commented 8 months ago

Which example data are you looking at for germline calling? The retina 19D013 or chr20:0-2M test data?

ArunaNannapaneni commented 8 months ago

I am using the [chr20.master_scRNA.bam]

jinzhuangdou commented 8 months ago

Do you mean you got only 635 SNVs from chr20.maester_scRNA.bam in the germline module? Could you show files in output of germline using ls -lrt? I would like to see whether each step was performed correctly.

ArunaNannapaneni commented 8 months ago
ls -alrt *
germline:
total 145
drwxrwsr-x 6 anannapaneni gquon     6 Feb  6 15:28 ..
-rw-rw-r-- 1 anannapaneni gquon 13396 Feb  7 10:29 chr20.gl.vcf.gz.tbi
drwxrwsr-x 2 anannapaneni gquon    10 Feb  7 10:39 .
-rw-rw-r-- 1 anannapaneni gquon   585 Feb  7 10:39 chr20.phased.vcf.gz.tbi
-rw-rw-r-- 1 anannapaneni gquon  3258 Mar  6 07:02 chr20.gp.log
-rw-rw-r-- 1 anannapaneni gquon  5790 Mar  6 07:02 chr20.gp.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 49224 Mar  6 07:02 chr20.germline.vcf
-rw-rw-r-- 1 anannapaneni gquon  4008 Mar  6 07:02 chr20.phased.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon  3264 Mar  6 07:02 chr20.phased.log
-rw-rw-r-- 1 anannapaneni gquon     0 Mar  6 07:04 chr20.gl.vcf.gz

Script:
total 25
drwxrwsr-x 6 anannapaneni gquon    6 Feb  6 15:28 ..
drwxrwsr-x 2 anannapaneni gquon    4 Feb  7 10:43 .
-rw-rw-r-- 1 anannapaneni gquon  456 Feb 20 22:50 bamExtract_chr20.sh
-rw-rw-r-- 1 anannapaneni gquon 1878 Mar  6 07:04 runGermline_chr20.sh

Bam:
total 402075
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr1.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr1.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr2.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr2.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr3.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr3.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr4.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr4.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr5.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr5.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr6.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr6.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr7.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr7.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr8.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr8.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr9.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr9.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr10.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr10.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr11.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr11.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr12.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr12.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr13.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr13.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr14.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr14.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr15.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr15.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr16.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr16.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr17.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr17.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr18.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr18.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 14:56 bm_chr19.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 14:56 bm_chr19.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 250233377 Feb  6 15:01 bm_chr20.filter.bam
-rw-rw-r-- 1 anannapaneni gquon     66192 Feb  6 15:01 bm_chr20.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 15:01 bm_chr21.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 15:01 bm_chr21.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon      2888 Feb  6 15:01 bm_chr22.filter.bam
-rw-rw-r-- 1 anannapaneni gquon      1568 Feb  6 15:01 bm_chr22.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr1.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr3.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr2.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr5.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr4.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr7.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr6.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr9.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        78 Feb  6 15:01 chr8.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr11.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr10.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr12.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr13.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr15.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr14.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr17.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr16.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr19.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr18.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr22.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr21.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon        79 Feb  6 15:01 chr20.filter.bam.lst
drwxrwsr-x 6 anannapaneni gquon         6 Feb  6 15:28 ..
drwxrwsr-x 3 anannapaneni gquon        73 Feb  7 10:44 .
drwxrwsr-x 2 anannapaneni gquon     16173 Feb  7 21:44 split_bam
-rw-rw-r-- 1 anannapaneni gquon  77551721 Feb 20 22:51 chr20.filter.targeted.bam
-rw-rw-r-- 1 anannapaneni gquon     36888 Feb 20 22:51 chr20.filter.targeted.bam.bai
-rw-rw-r-- 1 anannapaneni gquon  77551851 Feb 20 22:51 merge.filter.targeted.bam
-rw-rw-r-- 1 anannapaneni gquon     36888 Feb 20 22:51 merge.filter.targeted.bam.bai

somatic:
total 125610
drwxrwsr-x 6 anannapaneni gquon        6 Feb  6 15:28 ..
-rw-rw-r-- 1 anannapaneni gquon   259324 Feb 20 22:50 chr20.bed
-rw-rw-r-- 1 anannapaneni gquon    45709 Feb 21 01:49 chr20:40000002-50000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon    45709 Feb 21 01:49 chr20:50000002-60000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon    45708 Feb 21 01:49 chr20:60000002-64444167.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 10155449 Feb 21 02:03 chr20:20000002-30000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon  5714257 Feb 21 02:06 chr20:10000002-20000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 17187516 Feb 21 02:42 chr20:2-10000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 27793493 Feb 21 02:55 chr20:30000002-40000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 60715409 Feb 21 02:57 chr20.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon     9312 Feb 21 02:57 chr20.cell.gl.vcf.gz.tbi
-rw-rw-r-- 1 anannapaneni gquon   137445 Feb 21 02:57 chr20.cell.txt
-rw-rw-r-- 1 anannapaneni gquon  3040717 Feb 21 03:04 chr20.gl.filter.hc.cell.mat.gz
-rw-rw-r-- 1 anannapaneni gquon    47798 Feb 21 11:44 svm_feature.chr20.pdf
-rw-rw-r-- 1 anannapaneni gquon     5422 Feb 21 11:45 LDrefinement_germline.chr20.pdf
-rw-rw-r-- 1 anannapaneni gquon     1628 Feb 21 11:45 chr20.putativeSNVs.csv
-rw-rw-r-- 1 anannapaneni gquon      296 Feb 21 11:45 chr20.germlineTwoLoci_model.csv
-rw-rw-r-- 1 anannapaneni gquon      230 Feb 21 11:45 chr20.germlineTrioLoci_model.csv
-rw-rw-r-- 1 anannapaneni gquon    66620 Feb 21 11:45 chr20.SNV_mat.RDS
drwxrwsr-x 2 anannapaneni gquon       24 Feb 23 11:28 .
-rw-rw-r-- 1 anannapaneni gquon  8447265 Mar  5 15:44 chr20.gl.vcf.DP4
-rw-rw-r-- 1 anannapaneni gquon   913809 Mar  5 15:44 chr20.gl.vcf.filter.DP4
-rw-rw-r-- 1 anannapaneni gquon   259324 Mar  5 15:44 chr20.gl.vcf.filter.hc.bed
-rw-rw-r-- 1 anannapaneni gquon   163082 Mar  5 15:44 chr20.gl.vcf.filter.hc.pos
jinzhuangdou commented 7 months ago

The bam filtering step looks good. The file size in germline folder seems having some problems. Following is what I have. Are there log file available when you run Monopogen? Thanks

(base) [jdou1@ldragon1 germline]$ ls -lrt total 6657 -rw-r--r-- 1 jdou1 bcb 5725884 Feb 19 17:07 chr20.gl.vcf.gz -rw-r--r-- 1 jdou1 bcb 3194 Feb 19 17:16 chr20.gp.log -rw-r--r-- 1 jdou1 bcb 83434 Feb 19 17:16 chr20.gp.vcf.gz -rw-r--r-- 1 jdou1 bcb 699569 Feb 19 17:16 chr20.germline.vcf -rw-r--r-- 1 jdou1 bcb 3186 Feb 19 17:21 chr20.phased.log -rw-r--r-- 1 jdou1 bcb 50649 Feb 19 17:21 chr20.phased.vcf.gz -rw-r--r-- 1 jdou1 bcb 9392 Mar 4 09:55 chr20.phased.vcf.gz.tbi -rw-r--r-- 1 jdou1 bcb 13647 Mar 4 09:55 chr20.gl.vcf.gz.tbi

ArunaNannapaneni commented 7 months ago

Here is the output I got from running monopogen.

germCalling_out.txt

jinzhuangdou commented 7 months ago

The chr20.gl.vcf.gz in germline folder is empty now. Could you re-run it so that I can see the file size?

ls -alrt * germline: total 145 drwxrwsr-x 6 anannapaneni gquon 6 Feb 6 15:28 .. -rw-rw-r-- 1 anannapaneni gquon 13396 Feb 7 10:29 chr20.gl.vcf.gz.tbi drwxrwsr-x 2 anannapaneni gquon 10 Feb 7 10:39 . -rw-rw-r-- 1 anannapaneni gquon 585 Feb 7 10:39 chr20.phased.vcf.gz.tbi -rw-rw-r-- 1 anannapaneni gquon 3258 Mar 6 07:02 chr20.gp.log -rw-rw-r-- 1 anannapaneni gquon 5790 Mar 6 07:02 chr20.gp.vcf.gz -rw-rw-r-- 1 anannapaneni gquon 49224 Mar 6 07:02 chr20.germline.vcf -rw-rw-r-- 1 anannapaneni gquon 4008 Mar 6 07:02 chr20.phased.vcf.gz -rw-rw-r-- 1 anannapaneni gquon 3264 Mar 6 07:02 chr20.phased.log -rw-rw-r-- 1 anannapaneni gquon 0 Mar 6 07:04 chr20.gl.vcf.gz

ArunaNannapaneni commented 7 months ago
Screen Shot 2024-04-03 at 10 39 01 AM
jinzhuangdou commented 7 months ago

You need to download and use the imputation panel for whole chromosome. What you are using (/home/anannapaneni/anannapaneni/Monopogen/example/CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz) only includes variants from chr20:1-2MB. It is a simple example test dataset.