KChen-lab / Monopogen

SNV calling from single cell sequencing
GNU General Public License v3.0
71 stars 17 forks source link

chrX #11

Closed lifan2022 closed 7 months ago

lifan2022 commented 1 year ago

Hello: I tried to use monopogen for germline SNP analysis of chrX and encountered two problems . One is that the "monopogen.py" script only specifies chr1-chr22. I managed to get "chrX.gl.vcf.gz" by modifying "monopogen.py" .But another problem is that I can only on "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased" found the "CCDG_14151_B01_GRM_WGS_2020-08-05_chrX.filtered.eagle2-phased.v2.vcf.gz file" .If you use this file for subsequent analysis, the error message "CCDG_14151_B01_GRM_WGS_2020-08-05_chrX.filtered.shapeit2-duohmm-phased.vcf.gz" is displayed. I found that "CCDG_14151_B01_GRM_WGS_2020-08-05_chrX.filtered.eagle2-phased.v2.vcf.gz" and "CCDG_14151_B01_GRM_WGS_2020-08-05_chr1.filtered.shapeit2-duohmm-phased.vcf.gz" have different content formats . Is there any way to get the appropriate chrX reference file for the next step?

jinzhuangdou commented 1 year ago

Monopogen was not tested on sex chromosome SNV calling. Is CCDG_14151_B01_GRM_WGS_2020-08-05_chrX.filtered.eagle2-phased.v2.vcf.gz has the GL (genotype likelihood) field? If not, you can use the GT field. We will update this in the next release. You can use following command for your task:

java -Xmx20g -jar /Monopogen/apps/beagle.27Jul16.86a.jar gt=chrX.gl.vcf.gz ref=CCDG_14151_B01_GRM_WGS_2020-08-05_chrX.filtered.eagle2-phased.v2.vcf.gz chrom=chrX out=chrX.gp impute=false
modelscale=2
nthreads=24
gprobs=true
niterations=0

slinnarsson commented 11 months ago

Also, preProcess does not process X, Y or MT chromosomes, due to this code snippet in the preProcess() method:

# generate postProcess bam files 
for chr in range(1, 23):
    bamlist = open(args.out + "/Bam/chr" +  str(chr) +  ".filter.bam.lst","w")
    for s in sample:
        bamlist.write(args.out+"/Bam/"+s+"_chr"+str(chr)+".filter.bam\n")
    bamlist.close()

As you can see, it will only preprocess chromosomes 1 - 22.

Later, the germline and somatic modules crash if you specify the X chromosome (or Y or MT), due to the missing preprocessed BAM files.

jinzhuangdou commented 11 months ago

Yes. Will specify this option in the further release