DecodeGenetics / graphtyper

Population-scale genotyping using pangenome graphs
http://dx.doi.org/10.1038/ng.3964
MIT License
170 stars 20 forks source link

Genotype outputs all sites #142

Open lihaicheng7003 opened 11 months ago

lihaicheng7003 commented 11 months ago

Can graphtyper output the genotyping of all sites in a region, such as mitochondrial 16569bp?

./graphtyper genotype ~/data1/References/human/Homo_sapiens_assembly38.fasta --sam=~/data1/project10/subdata/1.markdup.bam --region=chrM --output=~/data1/project19/results_chrM --vcf=output_chrM.vcf.modif.gz --force_no_filter_zero_qual

I tried this command but the resulting vcf file does not have any site information in it. output_chrM.vcf.modif.gz like:

##fileformat=VCFv4.2
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chrM    1       .       G       G       0       .       .
chrM    2       .       A       A       0       .       .
lihaicheng7003 commented 11 months ago

I found the problem, REF and ALT in --vcf cannot be the same, graphtyper will ignore sites where REF and ALT are the same. I manually made a vcf file for --vcf, so that all sites can be output

POS REF ALT #CHROM  ID  QUAL    FILTER  INFO
1   G   A   chrM    .   0   .   .
1   G   T   chrM    .   0   .   .
1   G   C   chrM    .   0   .   .
2   A   G   chrM    .   0   .   .
2   A   T   chrM    .   0   .   .
2   A   C   chrM    .   0   .   .

For others’ reference

hannespetur commented 11 months ago

Hey, this will create a graph where every read maps at every position perfectly. I guess you want to get read counts for each base? I'd suggest using samtools mpileup.

Best, Hannes

lihaicheng7003 commented 11 months ago

Thanks for the reply, I am testing this software

lihaicheng7003 commented 11 months ago

When using default parameters for both tools, GraphTyper calls out approximately 19,000 variants with only NA12878 (1000 Genomes) as input, while GATK calls out over 40,000. Is this normal?