Closed ArunaNannapaneni closed 6 months ago
Which example data are you looking at for germline calling? The retina 19D013 or chr20:0-2M test data?
I am using the [chr20.master_scRNA.bam]
Do you mean you got only 635 SNVs from chr20.maester_scRNA.bam in the germline module? Could you show files in output of germline
using ls -lrt
? I would like to see whether each step was performed correctly.
ls -alrt *
germline:
total 145
drwxrwsr-x 6 anannapaneni gquon 6 Feb 6 15:28 ..
-rw-rw-r-- 1 anannapaneni gquon 13396 Feb 7 10:29 chr20.gl.vcf.gz.tbi
drwxrwsr-x 2 anannapaneni gquon 10 Feb 7 10:39 .
-rw-rw-r-- 1 anannapaneni gquon 585 Feb 7 10:39 chr20.phased.vcf.gz.tbi
-rw-rw-r-- 1 anannapaneni gquon 3258 Mar 6 07:02 chr20.gp.log
-rw-rw-r-- 1 anannapaneni gquon 5790 Mar 6 07:02 chr20.gp.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 49224 Mar 6 07:02 chr20.germline.vcf
-rw-rw-r-- 1 anannapaneni gquon 4008 Mar 6 07:02 chr20.phased.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 3264 Mar 6 07:02 chr20.phased.log
-rw-rw-r-- 1 anannapaneni gquon 0 Mar 6 07:04 chr20.gl.vcf.gz
Script:
total 25
drwxrwsr-x 6 anannapaneni gquon 6 Feb 6 15:28 ..
drwxrwsr-x 2 anannapaneni gquon 4 Feb 7 10:43 .
-rw-rw-r-- 1 anannapaneni gquon 456 Feb 20 22:50 bamExtract_chr20.sh
-rw-rw-r-- 1 anannapaneni gquon 1878 Mar 6 07:04 runGermline_chr20.sh
Bam:
total 402075
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr1.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr1.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr2.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr2.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr3.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr3.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr4.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr4.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr5.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr5.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr6.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr6.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr7.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr7.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr8.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr8.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr9.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr9.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr10.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr10.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr11.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr11.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr12.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr12.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr13.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr13.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr14.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr14.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr15.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr15.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr16.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr16.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr17.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr17.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr18.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr18.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 14:56 bm_chr19.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 14:56 bm_chr19.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 250233377 Feb 6 15:01 bm_chr20.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 66192 Feb 6 15:01 bm_chr20.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 15:01 bm_chr21.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 15:01 bm_chr21.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 2888 Feb 6 15:01 bm_chr22.filter.bam
-rw-rw-r-- 1 anannapaneni gquon 1568 Feb 6 15:01 bm_chr22.filter.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr1.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr3.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr2.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr5.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr4.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr7.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr6.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr9.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 78 Feb 6 15:01 chr8.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr11.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr10.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr12.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr13.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr15.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr14.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr17.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr16.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr19.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr18.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr22.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr21.filter.bam.lst
-rw-rw-r-- 1 anannapaneni gquon 79 Feb 6 15:01 chr20.filter.bam.lst
drwxrwsr-x 6 anannapaneni gquon 6 Feb 6 15:28 ..
drwxrwsr-x 3 anannapaneni gquon 73 Feb 7 10:44 .
drwxrwsr-x 2 anannapaneni gquon 16173 Feb 7 21:44 split_bam
-rw-rw-r-- 1 anannapaneni gquon 77551721 Feb 20 22:51 chr20.filter.targeted.bam
-rw-rw-r-- 1 anannapaneni gquon 36888 Feb 20 22:51 chr20.filter.targeted.bam.bai
-rw-rw-r-- 1 anannapaneni gquon 77551851 Feb 20 22:51 merge.filter.targeted.bam
-rw-rw-r-- 1 anannapaneni gquon 36888 Feb 20 22:51 merge.filter.targeted.bam.bai
somatic:
total 125610
drwxrwsr-x 6 anannapaneni gquon 6 Feb 6 15:28 ..
-rw-rw-r-- 1 anannapaneni gquon 259324 Feb 20 22:50 chr20.bed
-rw-rw-r-- 1 anannapaneni gquon 45709 Feb 21 01:49 chr20:40000002-50000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 45709 Feb 21 01:49 chr20:50000002-60000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 45708 Feb 21 01:49 chr20:60000002-64444167.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 10155449 Feb 21 02:03 chr20:20000002-30000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 5714257 Feb 21 02:06 chr20:10000002-20000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 17187516 Feb 21 02:42 chr20:2-10000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 27793493 Feb 21 02:55 chr20:30000002-40000001.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 60715409 Feb 21 02:57 chr20.cell.gl.vcf.gz
-rw-rw-r-- 1 anannapaneni gquon 9312 Feb 21 02:57 chr20.cell.gl.vcf.gz.tbi
-rw-rw-r-- 1 anannapaneni gquon 137445 Feb 21 02:57 chr20.cell.txt
-rw-rw-r-- 1 anannapaneni gquon 3040717 Feb 21 03:04 chr20.gl.filter.hc.cell.mat.gz
-rw-rw-r-- 1 anannapaneni gquon 47798 Feb 21 11:44 svm_feature.chr20.pdf
-rw-rw-r-- 1 anannapaneni gquon 5422 Feb 21 11:45 LDrefinement_germline.chr20.pdf
-rw-rw-r-- 1 anannapaneni gquon 1628 Feb 21 11:45 chr20.putativeSNVs.csv
-rw-rw-r-- 1 anannapaneni gquon 296 Feb 21 11:45 chr20.germlineTwoLoci_model.csv
-rw-rw-r-- 1 anannapaneni gquon 230 Feb 21 11:45 chr20.germlineTrioLoci_model.csv
-rw-rw-r-- 1 anannapaneni gquon 66620 Feb 21 11:45 chr20.SNV_mat.RDS
drwxrwsr-x 2 anannapaneni gquon 24 Feb 23 11:28 .
-rw-rw-r-- 1 anannapaneni gquon 8447265 Mar 5 15:44 chr20.gl.vcf.DP4
-rw-rw-r-- 1 anannapaneni gquon 913809 Mar 5 15:44 chr20.gl.vcf.filter.DP4
-rw-rw-r-- 1 anannapaneni gquon 259324 Mar 5 15:44 chr20.gl.vcf.filter.hc.bed
-rw-rw-r-- 1 anannapaneni gquon 163082 Mar 5 15:44 chr20.gl.vcf.filter.hc.pos
The bam filtering step looks good. The file size in germline
folder seems having some problems. Following is what I have. Are there log file available when you run Monopogen? Thanks
(base) [jdou1@ldragon1 germline]$ ls -lrt total 6657 -rw-r--r-- 1 jdou1 bcb 5725884 Feb 19 17:07 chr20.gl.vcf.gz -rw-r--r-- 1 jdou1 bcb 3194 Feb 19 17:16 chr20.gp.log -rw-r--r-- 1 jdou1 bcb 83434 Feb 19 17:16 chr20.gp.vcf.gz -rw-r--r-- 1 jdou1 bcb 699569 Feb 19 17:16 chr20.germline.vcf -rw-r--r-- 1 jdou1 bcb 3186 Feb 19 17:21 chr20.phased.log -rw-r--r-- 1 jdou1 bcb 50649 Feb 19 17:21 chr20.phased.vcf.gz -rw-r--r-- 1 jdou1 bcb 9392 Mar 4 09:55 chr20.phased.vcf.gz.tbi -rw-r--r-- 1 jdou1 bcb 13647 Mar 4 09:55 chr20.gl.vcf.gz.tbi
Here is the output I got from running monopogen.
The chr20.gl.vcf.gz in germline folder
is empty now. Could you re-run it so that I can see the file size?
ls -alrt * germline: total 145 drwxrwsr-x 6 anannapaneni gquon 6 Feb 6 15:28 .. -rw-rw-r-- 1 anannapaneni gquon 13396 Feb 7 10:29 chr20.gl.vcf.gz.tbi drwxrwsr-x 2 anannapaneni gquon 10 Feb 7 10:39 . -rw-rw-r-- 1 anannapaneni gquon 585 Feb 7 10:39 chr20.phased.vcf.gz.tbi -rw-rw-r-- 1 anannapaneni gquon 3258 Mar 6 07:02 chr20.gp.log -rw-rw-r-- 1 anannapaneni gquon 5790 Mar 6 07:02 chr20.gp.vcf.gz -rw-rw-r-- 1 anannapaneni gquon 49224 Mar 6 07:02 chr20.germline.vcf -rw-rw-r-- 1 anannapaneni gquon 4008 Mar 6 07:02 chr20.phased.vcf.gz -rw-rw-r-- 1 anannapaneni gquon 3264 Mar 6 07:02 chr20.phased.log -rw-rw-r-- 1 anannapaneni gquon 0 Mar 6 07:04 chr20.gl.vcf.gz
You need to download and use the imputation panel for whole chromosome. What you are using (/home/anannapaneni/anannapaneni/Monopogen/example/CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz
) only includes variants from chr20:1-2MB. It is a simple example test dataset.
For germcalling, the example output had 10,000 markers but when I ran it I only got 635, and I ran it with the example data. Is there a reason for the difference?
Plus when I ran ld refinement on putative somatic SNVs, my graph also had a lot less data point.