Cibiv / IQ-TREE

Efficient phylogenomic software by maximum likelihood
http://www.iqtree.org
GNU General Public License v2.0
187 stars 44 forks source link

Incompatible prefix.nex.best_scheme.nex produced when nexus file provides alignments with * syntax #178

Open davised opened 4 years ago

davised commented 4 years ago

Background

I generated a tree using the -p prefix.nex syntax with alignment files provided using the * syntax to include the entire sequence. e.g.:

#nexus
begin sets;
        charset PA0005 = aligned/PA0005.aln: *;
        charset PA0008 = aligned/PA0008.aln: *;
        charset PA0308 = aligned/PA0308.aln: *;
        charset PA0373 = aligned/PA0373.aln: *;
        charset PA0375 = aligned/PA0375.aln: *;
        charset PA0411 = aligned/PA0411.aln: *;
        charset PA0586 = aligned/PA0586.aln: *;
        charset PA0759 = aligned/PA0759.aln: *;
        charset PA0934 = aligned/PA0934.aln: *;
        charset PA0944 = aligned/PA0944.aln: *;
        charset PA1005 = aligned/PA1005.aln: *;
        charset PA1011 = aligned/PA1011.aln: *;
        charset PA1294 = aligned/PA1294.aln: *;
        charset PA1375 = aligned/PA1375.aln: *;
 ...

I want to use the prefix.nex.best_scheme.nex file in another analysis. Here is the file:

#nexus
begin sets;
  charset PA0005_PA0373_PA0375_PA1005_PA1011_PA1294_PA1805_PA1814_PA2630_PA2858_PA3020_PA3047_PA3800_PA3805_PA3831_PA4000_PA4001_PA4423_PA4636_PA5045_PA5133_PA5203_PA5209_PA5568 = aligned/PA0005.aln,aligned/PA0373.aln,aligned/PA0375.aln,aligned/PA1005.aln,aligned/PA1011.aln,aligned/PA1294.aln,aligned/PA1805.aln,aligned/PA1814.aln,aligned/PA2630.aln,aligned/PA2858.aln,aligned/PA3020.aln,aligned/PA3047.aln,aligned/PA3800.aln,aligned/PA3805.aln,aligned/PA3831.aln,aligned/PA4000.aln,aligned/PA4001.aln,aligned/PA4423.aln,aligned/PA4636.aln,aligned/PA5045.aln,aligned/PA5133.aln,aligned/PA5203.aln,aligned/PA5209.aln,aligned/PA5568.aln: *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *;
  charset PA0008_PA0411_PA0586_PA0934_PA1614_PA1803_PA2615_PA3002_PA3011_PA3068_PA3198_PA3297_PA3308_PA3344_PA3620_PA3658_PA4044_PA4233_PA4472_PA4542_PA4725_PA4727_PA4763_PA4937_PA5134_PA5224_PA5241_PA5242_PA5345_PA5361 = aligned/PA0008.aln,aligned/PA0411.aln,aligned/PA0586.aln,aligned/PA0934.aln,aligned/PA1614.aln,aligned/PA1803.aln,aligned/PA2615.aln,aligned/PA3002.aln,aligned/PA3011.aln,aligned/PA3068.aln,aligned/PA3198.aln,aligned/PA3297.aln,aligned/PA3308.aln,aligned/PA3344.aln,aligned/PA3620.aln,aligned/PA3658.aln,aligned/PA4044.aln,aligned/PA4233.aln,aligned/PA4472.aln,aligned/PA4542.aln,aligned/PA4725.aln,aligned/PA4727.aln,aligned/PA4763.aln,aligned/PA4937.aln,aligned/PA5134.aln,aligned/PA5224.aln,aligned/PA5241.aln,aligned/PA5242.aln,aligned/PA5345.aln,aligned/PA5361.aln: *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *;
  charset PA0308_PA1375_PA2964_PA3111_PA4397_PA4969_PA5064_PA5223_PA5257 = aligned/PA0308.aln,aligned/PA1375.aln,aligned/PA2964.aln,aligned/PA3111.aln,aligned/PA4397.aln,aligned/PA4969.aln,aligned/PA5064.aln,aligned/PA5223.aln,aligned/PA5257.aln: *  *  *  *  *  *  *  *  *;
  charset PA0759_PA1758_PA2974_PA2981_PA3456_PA3638_PA4051_PA5156 = aligned/PA0759.aln,aligned/PA1758.aln,aligned/PA2974.aln,aligned/PA2981.aln,aligned/PA3456.aln,aligned/PA3638.aln,aligned/PA4051.aln,aligned/PA5156.aln: *  *  *  *  *  *  *  *;
  charset PA0944_PA1528_PA2543_PA2961_PA2963_PA3087_PA3200_PA3243_PA3626_PA4617_PA4627_PA5206_PA5215_PA5221_PA5493_PA5567 = aligned/PA0944.aln,aligned/PA1528.aln,aligned/PA2543.aln,aligned/PA2961.aln,aligned/PA2963.aln,aligned/PA3087.aln,aligned/PA3200.aln,aligned/PA3243.aln,aligned/PA3626.aln,aligned/PA4617.aln,aligned/PA4627.aln,aligned/PA5206.aln,aligned/PA5215.aln,aligned/PA5221.aln,aligned/PA5493.aln,aligned/PA5567.aln: *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *;
  charset PA2633_PA3073_PA3074_PA3075_PA3949_PA5146_PA5258_PA5485 = aligned/PA2633.aln,aligned/PA3073.aln,aligned/PA3074.aln,aligned/PA3075.aln,aligned/PA3949.aln,aligned/PA5146.aln,aligned/PA5258.aln,aligned/PA5485.aln: *  *  *  *  *  *  *  *;
  charset PA3238_PA3257_PA4446_PA4749 = aligned/PA3238.aln,aligned/PA3257.aln,aligned/PA4446.aln,aligned/PA4749.aln: *  *  *  *;
  charpartition mymodels =
    JTTDCMut+F+R4: PA0005_PA0373_PA0375_PA1005_PA1011_PA1294_PA1805_PA1814_PA2630_PA2858_PA3020_PA3047_PA3800_PA3805_PA3831_PA4000_PA4001_PA4423_PA4636_PA5045_PA5133_PA5203_PA5209_PA5568,
    JTT+F+R4: PA0008_PA0411_PA0586_PA0934_PA1614_PA1803_PA2615_PA3002_PA3011_PA3068_PA3198_PA3297_PA3308_PA3344_PA3620_PA3658_PA4044_PA4233_PA4472_PA4542_PA4725_PA4727_PA4763_PA4937_PA5134_PA5224_PA5241_PA5242_PA5345_PA5361,
    JTT+F+R4: PA0308_PA1375_PA2964_PA3111_PA4397_PA4969_PA5064_PA5223_PA5257,
    JTT+F+R8: PA0759_PA1758_PA2974_PA2981_PA3456_PA3638_PA4051_PA5156,
    JTTDCMut+F+R4: PA0944_PA1528_PA2543_PA2961_PA2963_PA3087_PA3200_PA3243_PA3626_PA4617_PA4627_PA5206_PA5215_PA5221_PA5493_PA5567,
    JTT+F+R4: PA2633_PA3073_PA3074_PA3075_PA3949_PA5146_PA5258_PA5485,
    LG+R4: PA3238_PA3257_PA4446_PA4749;
end;

Error log file

When I specify this file to a new run, I get this message:

IQ-TREE multicore version 2.1.2 COVID-edition for Linux 64-bit built Oct 22 2020
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host:    fungi0.cgrb.oregonstate.local (SSE3, 125 GB RAM)
Command: iqtree2 -p hesse_phylo_2020.nex.best_scheme.nex -nt 16 -m MFP -B 1000 -alrt 1000 --msub nuclear --merge rclusterf -o Pseudomonas_aeruginosa_231_PPRO,Pseudomonas_aeruginosa_PAO1,Pseudomonas_aeruginosa_MPAO1_P1
Seed:    965846 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Wed Nov  4 15:31:41 2020
Kernel:  SSE2 - 16 threads (24 CPU cores detected)

Reading partition model file hesse_phylo_2020.nex.best_scheme.nex ...

Loading 7 partitions...
Reading 24 alignment files...
Reading alignment file aligned/PA0005.aln ... Fasta format detected
Alignment most likely contains protein sequences
Alignment has 135 sequences with 257 columns, 142 distinct patterns
114 parsimony-informative, 8 singleton sites, 135 constant sites

 ... BREAK ...

 128  Pseudomonas_synxantha_BG33R                5.74%    passed    100.00%
 129  Pseudomonas_chlororaphis_JD37              6.76%    passed    100.00%
 130  Pseudomonas_savastanoi_ICMP4352            6.25%    passed    100.00%
 131  Pseudomonas_sp_NFIX28                      6.76%    passed    100.00%
 132  Pseudomonas_mucidolens_NCTC8068            6.93%    passed    100.00%
 133  Pseudomonas_syringae_ICMP4303              6.42%    passed    100.00%
 134  Pseudomonas_fildesensis_KG01               5.74%    passed    100.00%
 135  Pseudomonas_avellanae_BPIC_631             6.42%    passed    100.00%
****  TOTAL                                      6.16%  0 sequences failed composition chi2 test (p-value<5%; df=19)
ERROR: Expecting integer, but found "*  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *" instead

This can be resolved by editing the prefix.nex.best_scheme.nex file to only include a single , instead of having a per file.

davised commented 4 years ago

I can provide the original files, but any -p prefix.nex file with the * syntax works to produce this output.

bqminh commented 3 years ago

Thanks Ed, for the report! Indeed this is a bug in writing .best_scheme.nex file. IQ-TREE just concatenate the string “” together in the merged partitions, which caused this unwanted behaviour. To fix this for now, you can ignore all , in case you want to use all alignment positions. Thaat means, e.g.:

nexus

begin sets; charset PA0005 = aligned/PA0005.aln; charset PA0008 = aligned/PA0008.aln; charset PA0308 = aligned/PA0308.aln; charset PA0373 = aligned/PA0373.aln; charset PA0375 = aligned/PA0375.aln; ...

And the problem should be gone.

Cheers Minh

On 5 Nov 2020, at 11:00 am, Ed Davis notifications@github.com wrote:

I can provide the original files, but any -p prefix.nex file with the * syntax works to produce this output.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Cibiv/IQ-TREE/issues/178#issuecomment-722039125, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRTPU5E74TASSMGLUM65RDSOHTIDANCNFSM4TKVIYAQ.

davised commented 3 years ago

Thanks! I'll remove the * from now on.