biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
121 stars 33 forks source link

RAxML can't parse alignments for which there are sequences made entirely of undetermined characters #113

Open RyanCook94 opened 1 year ago

RyanCook94 commented 1 year ago

Hi, thanks for the amazing tool by the way! I'm having an issue where RAxML can't parse alignments for which there are sequences made entirely of undetermined characters:

[e] error while refining gene tree {'program_name': '/opt/miniconda3/envs/phylophlan/bin/raxmlHPC', 'params': '-p 1989', 'database': '-t', 'input': '-s', 'output_path': '-w', 'output': '-n', 'version': '-v', 'model': '-m', 'command_line': '#program_name# #model# #params# #database# #output_path# #input# #output#'} PROTCATRTREV /home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/trim_not_variant/p0094.aln /home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/gene_tree1_polytomies/p0094.tre /home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/gene_tree2 p0094.tre

[e] Command '['/opt/miniconda3/envs/phylophlan/bin/raxmlHPC', '-m', 'PROTCATRTREV', '-p', '1989', '-t', '/home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/gene_tree1_polytomies/p0094.tre', '-w', '/home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/gene_tree2', '-s', '/home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/trim_not_variant/p0094.aln', '-n', 'p0094.tre']' returned non-zero exit status 255.

If I run the individual command, I get:

ubuntu@viral-metagenomics:~/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif$ /opt/miniconda3/envs/phylophlan/bin/raxmlHPC -m PROTCATRTREV -p 1989 -t /home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/gene_tree1_polytomies/p0094.tre -w /home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/gene_tree2 -s /home/ubuntu/Evelien-Adriaenssens/James_Doc/james_phylophlan_bif/take_two/james_phylophlan_bif_phylophlan/tmp/trim_not_variant/p0094.aln -n p0094.tre Warning, you specified a working directory via "-w" Keep in mind that RAxML only accepts absolute path names, not relative ones!

RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file

ERROR: Sequence PROKKA_Bifidobacterium_bifidum_GCA_022731955.1_ASM2273195v1_genomic consists entirely of undetermined values which will be treated as missing data ERROR: Sequence PROKKA_Bifidobacterium_bifidum_GCA_022728715.1_ASM2272871v1_genomic consists entirely of undetermined values which will be treated as missing data ERROR: Found 2 sequences that consist entirely of undetermined values, exiting...

This is similar to an issue Mick Watson reported previously (https://forum.biobakery.org/t/raxmlhpc-failed-e-refine-gene-tree-crashed/1606/5). I've followed the same steps suggested (cloned the git repository and call the script that's in the repo), but haven't resolved the issue. Any advice?

Many thanks and best wishes, Ryan

schmittel commented 5 months ago

I'm having the same issue. I've run phylophlan with various combinations of both --fragmentary_threshold and --remove_fragmentary_entries but I'm still getting the same error. This error has been reported previously but none of the solutions seem to help in my case.