I am using Freebayes (v1.3.6) for SNP calling analysis on eggplant population data. While scrutinizing the VCF file, it has come to my attention the heterozygous assignation in some cases. This was the command used to perform the SNP calling:
In this case, a homozygous genotype was assigned to sample5, supported by 11 reference reads and 1 alternate read, resulting in an alternate allele frequency of 1/12=0.083. On the other hand, sample8 exhibited a heterozygous genotype, supported by 9 reference reads and 1 alternate read, with an alternate allele frequency of 1/10=0.1. Despite the similarity in alternate allele frequencies between samples, Freebayes assigns different genotypes at the same loci.
Moreover, at other loci, sample6, supported by 11 reference reads and 1 alternate read, exhibits a heterozygous genotype instead of a homozygous one, as exhibited in the previous case by sample5.
/SMEL3Ch01 82135 . A G 885.116 . AB=0.0833333;ABP=21.1059;AC=5;AF=0.3125;AN=16;AO=30;CIGAR=1X;DP=107;DPB=107;DPRA=1.03535;EPP=10.2485;EPPR=11.1604;GTI=1;LEN=1;MEANALT=1;MQM=60;MQMR=59.6104;NS=8;NUMALT=1;ODDS=0.82119;PAIRED=1;PAIREDR=0.922078;PAO=0;PQA=0;PQR=0;PRO=0;QA=1174;QR=2974;RO=77;RPL=19;RPP=7.64277;RPPR=9.35551;RPR=11;RUN=1;SAF=11;SAP=7.64277;SAR=19;SRF=35;SRP=4.39215;SRR=42;TYPE=snp GT:GQ:DP:AD:RO:QR:AO:QA:GL 1/1:30.4148:14:0,14:0:0:14:551:-49.9224,-4.21442,0 0/0:52.3018:15:15,0:15:570:0:0:0,-4.51545,-51.6258 1/1:33.4251:15:0,15:0:0:15:582:-52.7037,-4.51545,0 0/0:40.2606:11:11,0:11:418:0:0:0,-3.31133,-37.9553 0/0:40.2606:11:11,0:11:434:0:0:0,-3.31133,-39.4105 0/1:1.58028:12:11,1:11:398:1:41:-0.482207,0,-32.3809 0/0:58.3224:17:17,0:17:679:0:0:0,-5.11751,-61.4334 0/0:43.2709:12:12,0:12:475:0:0:0,-3.61236,-43.0969
This performance was also observed at higher coverages. For example, in this case, sample1, sample2, sample4, and sample7 carry a heterozygous genotype when their alternate allele frequencies are 4/45=0.088, 4/27=0.148, 6/45=0.133, 3/24=0.125, respectively.
/SMEL3Ch01 96344 . T C 30.2059 . AB=0.120567;ABP=179.331;AC=4;AF=0.25;AN=16;AO=24;CIGAR=1X;DP=242;DPB=242;DPRA=0;EPP=4.45795;EPPR=3.82085;GTI=2;LEN=1;MEANALT=1.125;MQM=45.4583;MQMR=54.682;NS=8;NUMALT=1;ODDS=5.42383;PAIRED=1;PAIREDR=0.949309;PAO=0;PQA=0;PQR=0;PRO=0;QA=966;QR=8706;RO=217;RPL=21;RPP=32.3252;RPPR=3.10036;RPR=3;RUN=1;SAF=15;SAP=6.26751;SAR=9;SRF=114;SRP=4.22112;SRR=103;TYPE=snp GT:GQ:DP:AD:RO:QR:AO:QA:GL 0/1:0.000458462:45:41,4:41:1610:4:164:-1.33939,0,-127.693 0/1:22.7339:27:23,4:23:927:4:164:-6.37746,0,-71.7496 0/0:82.5237:17:16,1:16:643:1:32:0,-1.94407,-52.6745 0/1:29.6502:45:38,6:38:1525:6:246:-8.28092,0,-119.131 0/0:64.3246:27:25,2:25:1013:2:82:0,-0.614318,-82.214 0/0:66.6283:32:29,3:29:1168:3:114:0,-0.956644,-93.4502 0/1:0.0107044:24:21,3:21:853:3:123:-2.78843,0,-68.7308 0/0:96.5417:25:24,1:24:967:1:41:0,-3.63818,-80.9723
Summarizing, those results showed that Freebayes assigns genotypes randomly (two different sites supported by the same number of reads have different genotypes) and it assigns heterozygous genotypes even when the alternate allele frequency is too low, expecting homozygous genotypes at those sites. How can I deal with this issue? The VCF file's head is attached.
I am using Freebayes (v1.3.6) for SNP calling analysis on eggplant population data. While scrutinizing the VCF file, it has come to my attention the heterozygous assignation in some cases. This was the command used to perform the SNP calling:
/usr/bin/freebayes -b all_parents_mapped_BWA_v3_20X_dedup.bam -f /data/users/vbarfon/Genomes/Solgenomics_2019_08_30/Eggplant_V3_Chromosomes.fa -v multithreadfreebayes/subfiles/all_parents_mapped_BWA_v3_20X_dedup_SMEL3Ch01_0_136534347.vcf -r SMEL3Ch01:0-136534347 --limit-coverage 800 --min-coverage 10 -m 20 -q 20
In this case, a homozygous genotype was assigned to sample5, supported by 11 reference reads and 1 alternate read, resulting in an alternate allele frequency of 1/12=0.083. On the other hand, sample8 exhibited a heterozygous genotype, supported by 9 reference reads and 1 alternate read, with an alternate allele frequency of 1/10=0.1. Despite the similarity in alternate allele frequencies between samples, Freebayes assigns different genotypes at the same loci.
/SMEL3Ch01 96806 . TTTGCC CTGGCG 47.5168 . AB=0.185185;ABP=49.4959;AC=3;AF=0.1875;AN=16;AO=12;CIGAR=1X1M1X2M1X;DP=131;DPB=131;DPRA=1.18636;EPP=3.73412;EPPR=3.90444;GTI=1;LEN=6;MEANALT=1;MQM=35.5833;MQMR=60;NS=8;NUMALT=1;ODDS=4.88556;PAIRED=1;PAIREDR=0.915966;PAO=0;PQA=0;PQR=0;PRO=0;QA=465;QR=4692;RO=119;RPL=10;RPP=14.5915;RPPR=4.48836;RPR=2;RUN=1;SAF=9;SAP=9.52472;SAR=3;SRF=58;SRP=3.17453;SRR=61;TYPE=complex GT:GQ:DP:AD:RO:QR:AO:QA:GL 0/1:31.4607:17:13,4:13:520:4:159:-6.03904,0,-42.0287 0/0:101.192:22:22,0:22:872:0:0:0,-6.62266,-78.7923 0/0:71.0896:12:12,0:12:460:0:0:0,-3.61236,-41.7443 0/0:63.2764:21:20,1:20:792:1:38:0,-2.73405,-68.0036 0/0:51.1134:12:11,1:11:437:1:40:0,-1.51775,-37.5857 0/1:41.2006:27:22,5:22:851:5:187:-7.81267,0,-68.7726 0/0:65.069:10:10,0:10:401:0:0:0,-3.0103,-36.4467 0/1:0.0326615:10:9,1:9:359:1:41:-0.735817,0,-29.6608
Moreover, at other loci, sample6, supported by 11 reference reads and 1 alternate read, exhibits a heterozygous genotype instead of a homozygous one, as exhibited in the previous case by sample5.
/SMEL3Ch01 82135 . A G 885.116 . AB=0.0833333;ABP=21.1059;AC=5;AF=0.3125;AN=16;AO=30;CIGAR=1X;DP=107;DPB=107;DPRA=1.03535;EPP=10.2485;EPPR=11.1604;GTI=1;LEN=1;MEANALT=1;MQM=60;MQMR=59.6104;NS=8;NUMALT=1;ODDS=0.82119;PAIRED=1;PAIREDR=0.922078;PAO=0;PQA=0;PQR=0;PRO=0;QA=1174;QR=2974;RO=77;RPL=19;RPP=7.64277;RPPR=9.35551;RPR=11;RUN=1;SAF=11;SAP=7.64277;SAR=19;SRF=35;SRP=4.39215;SRR=42;TYPE=snp GT:GQ:DP:AD:RO:QR:AO:QA:GL 1/1:30.4148:14:0,14:0:0:14:551:-49.9224,-4.21442,0 0/0:52.3018:15:15,0:15:570:0:0:0,-4.51545,-51.6258 1/1:33.4251:15:0,15:0:0:15:582:-52.7037,-4.51545,0 0/0:40.2606:11:11,0:11:418:0:0:0,-3.31133,-37.9553 0/0:40.2606:11:11,0:11:434:0:0:0,-3.31133,-39.4105 0/1:1.58028:12:11,1:11:398:1:41:-0.482207,0,-32.3809 0/0:58.3224:17:17,0:17:679:0:0:0,-5.11751,-61.4334 0/0:43.2709:12:12,0:12:475:0:0:0,-3.61236,-43.0969
This performance was also observed at higher coverages. For example, in this case, sample1, sample2, sample4, and sample7 carry a heterozygous genotype when their alternate allele frequencies are 4/45=0.088, 4/27=0.148, 6/45=0.133, 3/24=0.125, respectively.
/SMEL3Ch01 96344 . T C 30.2059 . AB=0.120567;ABP=179.331;AC=4;AF=0.25;AN=16;AO=24;CIGAR=1X;DP=242;DPB=242;DPRA=0;EPP=4.45795;EPPR=3.82085;GTI=2;LEN=1;MEANALT=1.125;MQM=45.4583;MQMR=54.682;NS=8;NUMALT=1;ODDS=5.42383;PAIRED=1;PAIREDR=0.949309;PAO=0;PQA=0;PQR=0;PRO=0;QA=966;QR=8706;RO=217;RPL=21;RPP=32.3252;RPPR=3.10036;RPR=3;RUN=1;SAF=15;SAP=6.26751;SAR=9;SRF=114;SRP=4.22112;SRR=103;TYPE=snp GT:GQ:DP:AD:RO:QR:AO:QA:GL 0/1:0.000458462:45:41,4:41:1610:4:164:-1.33939,0,-127.693 0/1:22.7339:27:23,4:23:927:4:164:-6.37746,0,-71.7496 0/0:82.5237:17:16,1:16:643:1:32:0,-1.94407,-52.6745 0/1:29.6502:45:38,6:38:1525:6:246:-8.28092,0,-119.131 0/0:64.3246:27:25,2:25:1013:2:82:0,-0.614318,-82.214 0/0:66.6283:32:29,3:29:1168:3:114:0,-0.956644,-93.4502 0/1:0.0107044:24:21,3:21:853:3:123:-2.78843,0,-68.7308 0/0:96.5417:25:24,1:24:967:1:41:0,-3.63818,-80.9723
Summarizing, those results showed that Freebayes assigns genotypes randomly (two different sites supported by the same number of reads have different genotypes) and it assigns heterozygous genotypes even when the alternate allele frequency is too low, expecting homozygous genotypes at those sites. How can I deal with this issue? The VCF file's head is attached.
data_20X_head.txt