amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
379 stars 64 forks source link

terminate when using non-consecutive sites of a partition #82

Closed Gongmian784 closed 4 years ago

Gongmian784 commented 4 years ago

I specified a partition file (all.aln.part5.partition) of five partitions in RAxML-style format: GTR+G, 1stpos = 7910-8590\3, 7749-7952\3, 5319-6863\3, 6997-7680\3, 8590-9373\3, 14120-15259\3, 2769-3725\3, 3933-4973\3, 9443-9788\3, 10148-11525\3, 9858-10154\3, 11725-13539\3, 13523-14050\3 GTR+G, 2ndpos = 7911-8590\3, 7750-7952\3, 5320-6863\3, 6998-7680\3, 8591-9373\3, 14121-15259\3, 2770-3725\3, 3934-4973\3, 9444-9788\3, 10149-11525\3, 9859-10154\3, 11726-13539\3, 13524-14050\3 GTR+G, 3rdpos = 7912-8590\3, 7751-7952\3, 5321-6863\3, 6999-7680\3, 8592-9373\3, 14122-15259\3, 2771-3725\3, 3935-4973\3, 9445-9788\3, 10150-11525\3, 9860-10154\3, 11727-13539\3, 13525-14050\3 GTR+G, rRNA = 71-1045, 1113-2693 GTR+G, tRNA = 1-70, 1046-1112, 2694-2768, 3725-3932, 4972-5318, 6861-6996, 7681-7748, 9374-9442, 9789-9857, 11526-11724, 14051-14119, 15260-15399

And then I run the following command: raxml-ng --all --msa all.aln.part5.fa --model all.aln.part5.partition --tree pars{10} --threads 24 --bs-trees 1000

However, the following error message appears: ERROR: Alignment site 7911 assigned to multiple partitions: "2ndpos" and "1stpos"!

Does raxml-ng support the non-consecutive sites of a partition (format like: XXX-XXX\3) or is there anything wrong in my partition file?

amkozlov commented 4 years ago

raxml-ng does support this format, but your partition contains overlapping regions, namely

GTR+G, 1stpos = 7910-8590\3, 7749-7952\3

and

GTR+G, 2ndpos = 7911-8590\3, 7750-7952\3
Gongmian784 commented 4 years ago

OK, I understand. Thanks a lot!

Gongmian784 commented 4 years ago

Hi, I run raxml-ng with the corrected partition file, but it aborted as follows:

RAxML-NG v. 0.9.0 released on 20.05.2019 by The Exelixis Lab. Developed by: Alexey M. Kozlov and Alexandros Stamatakis. Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth. Latest version: https://github.com/amkozlov/raxml-ng Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

RAxML-NG was called at 07-Jan-2020 11:08:54 as follows:

raxml-ng --all --msa chrAuto_cds_notran.fa --model chrAuto_cds_notran.partition --tree pars{10} --threads 24 --bs-trees 1000

Analysis options: run mode: ML tree search + bootstrapping (Felsenstein Bootstrap) start tree(s): parsimony (10) bootstrap replicates: 1000 random seed: 1578366534 tip-inner: OFF pattern compression: ON per-rate scalers: OFF site repeats: ON branch lengths: proportional (ML estimate, algorithm: NR-FAST) SIMD kernels: AVX2 parallelization: PTHREADS (24 threads), thread pinning: OFF

[00:00:05] Reading alignment from file: chrAuto_cds_notran.fa [00:00:07] Loaded alignment with 14 taxa and 28573486 sites

WARNING: Fully undetermined columns found: 451025

NOTE: Reduced alignment (with duplicates and gap-only sites/taxa removed) NOTE: was saved to: /PATH/chrAuto_cds_notran.fa.raxml.reduced.phy

NOTE: The corresponding reduced partition file was saved to: /PATH/chrAuto_cds_notran.fa.raxml.reduced.partition raxml-ng: /opt/conda/conda-bld/raxml-ng_1569616353378/work/src/MSA.cpp:107: void MSA::compress_patterns(const pll_state_t*): Assertion `_pll_msa->count && _pll_msa->length' failed. raxml-ng.sh: line 6: 12976 Aborted (core dumped) raxml-ng --all --msa chrAuto_cds_notran.fa --model chrAuto_cds_notran.partition --tree pars{10} --threads 24 --bs-trees 1000

Could you help me to fix it out? Thanks in advance.

amkozlov commented 4 years ago

I believe I fixed a similar issue some time ago, could you please try to reproduce this with the most recent raxml-ng version from github?

if the problem persists, please send me your input files via e-mail such that I can have a look.

Gongmian784 commented 4 years ago

Sure, the input files are in Attachment. I use raxml-ng v0.9.0 and always get this error. Is it because of too many partitions (57213: 1st, 2nd, 3rd codon positions per gene) I have made?

amkozlov commented 4 years ago

seems like you forgot the attachment...

Gongmian784 commented 4 years ago

Sorry, as the attachment is too big, I take the first 2,000,000 sites as input. Now you can download it. input.zip The bug was still thrown.

amkozlov commented 4 years ago

Thanks, now I see the problem: some of your partitions (see full list below) consist of only missing data (gaps). Normally, raxml-ng will automatically ignore/remove such columns and proceed with analysis, but in this particular situation this would yield empty partitions - and I do not handle this case properly.

I will fix this, but for now, you can just remove those empty partitions from your alignment - which is a reasonable thing to do anyways :)

Partition 906: RBP4_1st
Partition 907: RBP4_2nd
Partition 908: RBP4_3rd
Partition 915: LOC111773844_1st
Partition 916: LOC111773844_2nd
Partition 917: LOC111773844_3rd
Partition 918: LOC111773841_3rd
Partition 919: LOC111773841_2nd
Partition 920: LOC111773841_1st
Partition 1029: LOC100062284_1st
Partition 1030: LOC100062284_2nd
Partition 1031: LOC100062284_3rd
Partition 2391: LOC111767506_3rd
Partition 2392: LOC111767506_2nd
Partition 2393: LOC111767506_1st
Partition 2394: LOC111775266_3rd
Partition 2395: LOC111775266_2nd
Partition 2396: LOC111775266_1st
Partition 2397: LOC111775633_1st
Partition 2398: LOC111775633_2nd
Partition 2399: LOC111775633_3rd
Partition 2400: MORF4L1_3rd
Partition 2401: MORF4L1_2nd
Partition 2402: MORF4L1_1st
Partition 2403: LOC111775272_3rd
Partition 2404: LOC111775272_2nd
Partition 2405: LOC111775272_1st
Partition 2970: C1H15orf65_3rd
Partition 2971: C1H15orf65_2nd
Partition 2972: C1H15orf65_1st
Partition 3093: DUT_3rd
Partition 3094: DUT_2nd
Partition 3095: DUT_1st
Partition 3096: SLC12A1_3rd
Partition 3097: SLC12A1_2nd
Partition 3098: SLC12A1_1st
Partition 3099: LOC100629943_3rd
Partition 3100: LOC100629943_2nd
Partition 3101: LOC100629943_1st
Partition 3102: LOC111767887_1st
Partition 3103: LOC111767887_2nd
Partition 3104: LOC111767887_3rd
Partition 3105: SLC24A5_3rd
Partition 3106: SLC24A5_2nd
Partition 3107: SLC24A5_1st
Partition 3108: LOC106782180_3rd
Partition 3109: LOC106782180_2nd
Partition 3110: LOC106782180_1st
Partition 3111: LOC111767871_3rd
Partition 3112: LOC111767871_2nd
Partition 3113: LOC111767871_1st
Partition 3114: LOC111767893_3rd
Partition 3115: LOC111767893_2nd
Partition 3116: LOC111767893_1st
Partition 3117: MYEF2_1st
Partition 3118: MYEF2_2nd
Partition 3119: MYEF2_3rd
Partition 3120: LOC111767897_3rd
Partition 3121: LOC111767897_2nd
Partition 3122: LOC111767897_1st
Partition 3453: TMCO5A_3rd
Partition 3454: TMCO5A_2nd
Partition 3455: TMCO5A_1st
Partition 3456: LOC111768329_3rd
Partition 3457: LOC111768329_2nd
Partition 3458: LOC111768329_1st
Partition 3558: LOC111775620_3rd
Partition 3559: LOC111775620_2nd
Partition 3560: LOC111775620_1st
Partition 3561: LOC111775626_3rd
Partition 3562: LOC111775626_2nd
Partition 3563: LOC111775626_1st
Gongmian784 commented 4 years ago

OK, I will remove empty partitions first. Thanks for your kind help!