KChen-lab / Monopogen

SNV calling from single cell sequencing
GNU General Public License v3.0
71 stars 17 forks source link

About phased.vcf.gz generation during germline calling #15

Open scg-dgist opened 12 months ago

scg-dgist commented 12 months ago

When I conducted the germline calling process, I noticed that only a limited number of chromosomes were successfully processed into 'phased.vcf.gz,' whereas most of the chromosomes remained unprocessed. I included all standard chromosomes (chromosomes 1-22) as listed in 'region.lst' and utilized the GRCh38 human reference FASTA file for this analysis. Could this variation in success be linked to the inherent low read depth typically associated with 10x scRNA-seq data? Additionally, I'm interested in knowing if there are any potential solutions to address this issue.

Many thanks.

jinzhuangdou commented 12 months ago

Based on our testing, the phasing step should work for most of single cell sequencing platform (even only 100 cells included). Could you let me know how many SNVs in the chromosome that was successful in phasing step? Are they in the chromosome with large size (such as chr1, 2 etc).

scg-dgist commented 12 months ago

Based on our testing, the phasing step should work for most of single cell sequencing platform (even only 100 cells included). Could you let me know how many SNVs in the chromosome that was successful in phasing step? Are they in the chromosome with large size (such as chr1, 2 etc).

Thank you for the prompt response. The successfully called SNV file corresponds to chromosomes 12, 17, 18, and 22, out of the total 22 chromosomes.

jinzhuangdou commented 12 months ago

Could you share chr20.gl.vcf.gz file with me so that I can take a look at why phasing step failed?

scg-dgist commented 11 months ago

Could you share chr20.gl.vcf.gz file with me so that I can take a look at why phasing step failed?

Oh, I have solved the problem. The issue was with the panel VCF file. I find it confusing that when I used the "CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz" from your GitHub repository (located in the example directory), the process proceeded successfully. However, when I downloaded the same file from "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/" (as suggested in your paper), it did not generate the phased.vcf.gz file. Could you please explain if there are additional steps I need to take after downloading the phased.vcf.gz files from the public 1000Genomes project database?

I appreciate your invaluable assistance.

jinzhuangdou commented 11 months ago

Did it generate the .gp.vcf.gz file? If not, could you share the command located in folder ./Script/runGermline_.sh? There is the full command lines enabling us to debug the issue.