KChen-lab / Monopogen

SNV calling from single cell sequencing
GNU General Public License v3.0
80 stars 17 forks source link

The error in Germline calling step #60

Closed zhangdong360 closed 4 months ago

zhangdong360 commented 4 months ago

Hi, I have the following problem in the germline calling step. So far the odd thing is that the problem is only on chr1, chr20, chr2 and I've tested it and there's no problem.

prepare step02 germline calling [2024-05-12 18:08:17,374] INFO Monopogen.py Performing germline variant calling... [2024-05-12 18:08:17,374] INFO germline.py Parameters in effect: [2024-05-12 18:08:17,374] INFO germline.py --subcommand = [germline] [2024-05-12 18:08:17,374] INFO germline.py --region = [region.lst] [2024-05-12 18:08:17,374] INFO germline.py --step = [all] [2024-05-12 18:08:17,374] INFO germline.py --out = [out_CYK_R] [2024-05-12 18:08:17,374] INFO germline.py --reference = [/share/home/zhangd/project/single_cell/X101SC22101845-Z01-F028-B1-1_10X_release_20231111/scomatic/genome.fa] [2024-05-12 18:08:17,374] INFO germline.py --imputation_panel = [/share/home/zhangd/tools/scRNA/Monopogen/Monopogen_vcf/] [2024-05-12 18:08:17,374] INFO germline.py --max_softClipped = [3] [2024-05-12 18:08:17,374] INFO germline.py --app_path = [/share/home/zhangd/tools/scRNA/Monopogen/apps] [2024-05-12 18:08:17,374] INFO germline.py --nthreads = [8] [2024-05-12 18:08:17,374] INFO germline.py --norun = [FALSE] [2024-05-12 18:08:17,374] INFO Monopogen.py Checking existence of essenstial resource files... [2024-05-12 18:08:17,415] INFO Monopogen.py Checking dependencies... ['bash out_CYK_R/Script/runGermline_chr1:2-10000001.sh'] bash out_CYK_R/Script/runGermline_chr1:2-10000001.sh [mpileup] 1 samples in 1 input files (mpileup) Max depth is above 1M. Potential memory hog! Lines total/split/realigned/skipped: 7756607/39782/1368/0 beagle.27Jul16.86a.jar (version 4.1) Copyright (C) 2014-2015 Brian L. Browning Enter "java -jar beagle.27Jul16.86a.jar" for a summary of command line arguments. Start time: 06:19 PM CST on 12 May 2024

Command line: java -Xmx18204m -jar beagle.jar gl=out_CYK_R/germline/chr1:2-10000001.gl.vcf.gz ref=/share/home/zhangd/tools/scRNA/Monopogen/Monopogen_vcf/CCDG_14151_B01_GRM_WGS_2020-08-05_chr1.filtered.shapeit2-duohmm-phased.vcf.gz chrom=chr1 out=out_CYK_R/germline/chr1:2-10000001.gp impute=false modelscale=2 nthreads=24 gprobs=true niterations=0

No genetic map is specified: using 1 cM = 1 Mb

reference samples: 3202 target samples: 1 Exception in thread "Thread-2" net.sf.samtools.SAMFormatException: Did not inflate expected amount at net.sf.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:98) at net.sf.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:383) at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:365) at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:109) at net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:238) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at blbutil.InputIt.next(InputIt.java:120) at blbutil.InputIt.next(InputIt.java:48) at vcf.RefIt.readLine(RefIt.java:288) at vcf.RefIt.lambda$fileReadingThread$15(RefIt.java:168) at java.lang.Thread.run(Thread.java:750)

I did a preliminary check on this problem, the problem is in out_CYK_D/Script/runGermline_chr1:1-50000001.sh java command.

java -Xmx20g -jar /share/home/zhangd/tools/scRNA/Monopogen/apps/beagle.27Jul16.86a.jar gl=out_CYK_D/germline/chr1:1-50000001.gl.vcf.gz ref=/share/home/zhangd/tools/scRNA/Monopogen/Monopogen_vcf/CCDG_14151_B01_GRM_WGS_2020-08-05_chr1.filtered.shapeit2-duohmm-phased.vcf.gz chrom=chr1 out=out_CYK_D/germline/chr1:1-50000001.gp impute=false modelscale=2 nthreads=24 gprobs=true niterations=0

But I don't know much about java. I can't solve this problem by my own attempts.

zhangdong360 commented 4 months ago

Furthermore, I didn't encounter any issues when computing chr1 using sample data, which makes me even more frustrated. Currently, I am attempting to deploy and test on another platform.

zhangdong360 commented 4 months ago

I found the issue - the MD5 checksum of the VCF database did not pass, leading to the problem mentioned earlier. After re-downloading and passing the MD5 check, it seems that the program is now working fine.

jinzhuangdou commented 4 months ago

Thanks. The imputation reference panel sometimes is not downloaded correctly. I will close this channel.