KChen-lab / Monopogen

SNV calling from single cell sequencing
GNU General Public License v3.0
85 stars 18 forks source link

Explanation .gl, .gp, .phased files #40

Open rafaella-buzatu opened 9 months ago

rafaella-buzatu commented 9 months ago

Hello! While running Monopogen, I noticed that it outputs quite a number of different files. I have read in your tutorial that the final output should be in the .phased.vcf.gz file, however that file only provides the genotype. I wanted to also obtain information about the read depth and allele frequency for those variants. I find that the .gl.vcf.gz file contains information about the depth, while the .gp.vcf.gz contains the genotype and allele frequency. I have also noticed that the .gl.vcf file contains unfiltered variants, while the .gp.vcf seems to contain only filtered variants that are the same as in .phased.vcf.

Could you help me understand what all these files are and how I could go about extracting as much information as possible for all variants (even unfiltered) from them?

Thank you!

jinzhuangdou commented 9 months ago

Hi, you can find more details in the beagle software manual https://faculty.washington.edu/browning/beagle/beagle_4.1_21Jan17.pdf. Briefly, .gl includes all candidate SNVs with alignment information available .gp.vcf.gz includes genotypes of SNVs overlapped with 1KG3 after LD refinement *.phased.vcf is similar with gp.vcf.gz but with phasing information available. Hope this is helpful for your question.