Closed dgordon562 closed 7 years ago
Sorry for the delay in responding. I'm finally catching up on reported pilon issues. I think you're saying that the sample has different inserted bases in the two alleles with respect to the reference, and Pilon isn't calling anything. To get a sense of what's going on, could you please send relevant lines of such a region from the VCF output?
HI, Bruce,
I have zillions of examples. Here is a random one:
gunzip -c ../filtered_freebayes1.vcf.gz | grep 18056357 000016F_1_26158149_quiver_pilon 18056357 . TTACAA TTACAAA,TTACAGA 47762.4 . AC=1,1;AF=0.5,0.5;AN=2;MQM=60,60;MQMR=60;NS=1;PQR=850;QA=36408,4911;QR=1181 GT:DP:GL 1/2:45:-133.775,-72.1093,-63.9815,-63.8398,0,-54.8089 {dgordon}e181:/net/eichler/vol25/projects/whole_genome_assembly/nobackups/chimp/Drafts/V3.0a-falcon/freebayes_polishing5/why_worse/igv
Take a look at this IGV screenshot of this particular error (attached): it shows you that the reference (result of pilon) is clearly wrong and that it should be replaced by either allele.
We are wrestling right now with various home-grown solutions but if you can fix it in pilon, we would be grateful.
Best wishes, David
On Wed, 15 Mar 2017, Bruce Walker wrote:
Sorry for the delay in responding. I'm finally catching up on reported pilon issues. I think you're saying that the sample has different inserted bases in the two alleles with respect to the reference, and Pilon isn't calling anything. To get a sense of what's going on, could you please send relevant lines of such a region from the VCF output?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.[AK_rixbh2jC3E-13L6E7dje--_ChYcerks5rl__fgaJpZM4L8m3c.gif]
Bruce: be aware that the 2 indels are often further away (in this case the insertions are just 1bp away). I've seen many such errors where the indels are 5 or 6 bp away.
Here is the command we used: java -Xmx110g -jar ~chrismh/src/pilon-1.17.jar --genome " + str( inp ut[2] ) + " --changes --frags " + str( input[0] ) + " --diploid --fix all --output " + szPrefix + " --threads 16
Is there some other parameters that would sove the problems shown above?
I should mention that I tried pilon without the diploid and I tried pilon version 1.21 but no joy.
Thanks for all the info, David. However, this is not going to be easy to fix in the short term. Pilon does not know how to do read phasing yet. While it should successfully call the heterozygous insertions at the respective locations in the Pilon VCF output, it doesn't know that they are mutually exclusive, and it probably won't incorporate either into the actual FASTA changes. There is certainly more work like this which could be done to make Pilon better at diploid genome improvement, but my Pilon development time is extremely limited these days, so no timeline.
Thanks for calling it to my attention, though!
''' AAGAA allele1 AAATA allele2 AAAA sequence NOT changed by pilon ''' Notice the G and T are in different positions in the two alleles.
Such an error in genes erroneously indicates the sequence has a gene-killing 1-base deletion. This happens perhaps a hundred times in the human genome in coding sequence--terrible. I've tried version 1.21 and get the same problem.
Any suggestions?