NicolaDM / MAPLE

MAPLE - a new approximate approach for maximum likelihood phylogenetics at short divergence.
GNU General Public License v3.0
43 stars 9 forks source link

Bacteria #9

Closed derekcg closed 10 months ago

derekcg commented 2 years ago

Can MAPLE be run on bacteria? We tried running MAPLE on a diff variant file we converted from the SNPs and deletions from our VCF files originally made by aligning whole genome sequencing reads of Mycobacterium tuberculosis isolates., and a fasta sequence of the reference genome those reads were originally aligned to to create those VCF files. MAPLE threw this error:

$ pypy ../../MAPLE/estimatePhylogenyIterativeFastLK.py --reference H37Rv-NC_000962.3.fasta --input all_snps_variants_MAPLE_Diff.txt --model UNREST --numTopologyImprovements 3 --overwrite --output ./all_snps_maple.tree
164 sequences in diff file.
Now doing sorting
Distances from the reference calculated
Starting topological impromevement attempt traversing number 1
Error: found likelihood cost is very heavy, this might mean that the reference used is not the same used to generate the input diff file
-inf
-inf
[['R', 1, 466, 2.2667862320844551e-07, True, 0.0], ['O', 467, 1, 0.0, [0.17194749358326247, 2.6362538740468722e-08, 0.0991628924522651, 8.09563565171249e-09]], 
etc.

We are wondering if this may be due to MAPLE not expecting the number of variants per isolate in bacterial genomes. However, admittedly the error could also be on our end. I converted the VCFs to a maple formatted variant file using a quick custom python script that I don't fully trust. If you think this error may be due to the number of variants in bacterial genomes, please let us know. Thank you for your time.

NicolaDM commented 2 years ago

I am not sure - could you attach or send me the diff and reference file you used?

derekcg commented 2 years ago

Here is a link to the diff and reference file I used. Thanks for looking into it. https://drive.google.com/drive/folders/1vvkmeSgrpw05jOxyD-lIF3Pgv6bctIg6?usp=sharing

On Fri, Apr 1, 2022 at 1:48 PM NicolaDM @.***> wrote:

I am not sure - could you attach or send me the diff and reference file you used?

— Reply to this email directly, view it on GitHub https://github.com/NicolaDM/MAPLE/issues/9#issuecomment-1086309798, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ3NDXNCVTALPU6YGVBBU3VC5OIJANCNFSM5SJWHK5Q . You are receiving this because you authored the thread.Message ID: @.***>

NicolaDM commented 2 years ago

It looks like a problem with the thresholds I am using, which were tailored for SARS-CoV-2. I will modify the code to make it work more generally, please bear some patience for the moment!

derekcg commented 2 years ago

Thank you for modifying the code to be more general. We are happy to wait.

NicolaDM commented 2 years ago

Sorry for the long wait! I worked on a major rewriting of the code to improve its performance. After coming back to this issue, I realized that the problem was indeed in the input file: there are multiple lines that refer to the same genome position for the same sample, for example

Since version 0.0.6, MAPLE will now throw an error in these circumstance. I did however complete a run where only the first entry among overlapping ones was considered - the result is attached. I used option --maxBLen 10000 outputNewMaple_tree.tree.zip .

derekcg commented 2 years ago

Hello Nicola,

Thank you for looking into this. It's good to know the problem is something on our end that we can fix internally. I'll investigate the cause of these overlapping variant positions, and in the meantime use the results you provided.

Regards, Derek

On Wed, Jun 8, 2022 at 3:00 PM NicolaDM @.***> wrote:

Sorry for the long wait! I worked on a major rewriting of the code to improve its performance. After coming back to this issue, I realized that the problem was indeed in the input file: there are multiple lines that refer to the same genome position for the same sample, for example

  • 4160534 9 A 4160536

Since version 0.0.6, MAPLE will now throw an error in these circumstance. I did however complete a run where only the first entry among overlapping ones was considered - the result is attached. I used option --maxBLen 10000 outputNewMaple_tree.tree.zip https://github.com/NicolaDM/MAPLE/files/8865276/outputNewMaple_tree.tree.zip .

— Reply to this email directly, view it on GitHub https://github.com/NicolaDM/MAPLE/issues/9#issuecomment-1150458310, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ3NDQ7KOAVTXSLJQJKSVLVOEJW7ANCNFSM5SJWHK5Q . You are receiving this because you authored the thread.Message ID: @.***>