biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

RAxML bad base #91

Closed nicholascdove closed 2 years ago

nicholascdove commented 2 years ago

During refining phylogeny, I came across this error:

Refining phylogeny "output_references/B_subtilis_input_bins_resolved.tre"

[e] Command '['/Users/ndove/opt/miniconda3/envs/phylophylan/bin/raxmlHPC-PTHREADS-SSE3', '-p', '1989', '-m', 'GTRCAT', '-T', '4', '-t', 'output_references/B_subtilis_input_bins_resolved.tre', '-w', '/Users/ndove/bioinfo_finite_2022/220518_phylophlan_test/output_references', '-s', 'output_references/B_subtilis_input_bins_concatenated.aln', '-n', 'B_subtilis_input_bins_refined.tre']' returned non-zero exit status 255.

[e] error while executing command_line: /Users/ndove/opt/miniconda3/envs/phylophylan/bin/raxmlHPC-PTHREADS-SSE3 -p 1989 -m GTRCAT -T 4 -t output_references/B_subtilis_input_bins_resolved.tre -w /Users/ndove/bioinfo_finite_2022/220518_phylophlan_test/output_references -s output_references/B_subtilis_input_bins_concatenated.aln -n B_subtilis_input_bins_refined.tre stdin: None stdout: None env: {'TERM_PROGRAM': 'Apple_Terminal', 'TERM': 'xterm-256color', 'SHELL': '/bin/zsh', 'TMPDIR': '/var/folders/h3/z23fmbw11ks59wkbttfxbb8c0000gp/T/', 'CONDA_SHLVL': '2', 'CONDA_PROMPT_MODIFIER': '(phylophylan) ', 'TERM_PROGRAM_VERSION': '444', 'TERM_SESSION_ID': '07208D07-5EAB-408E-9D0C-4D3E1D2FE5FF', 'USER': 'ndove', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.avif=01;35:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:~=00;90:#=00;90:.bak=00;90:.old=00;90:.orig=00;90:.part=00;90:.rej=00;90:.swp=00;90:.tmp=00;90:.dpkg-dist=00;90:.dpkg-old=00;90:.ucf-dist=00;90:.ucf-new=00;90:.ucf-old=00;90:.rpmnew=00;90:.rpmorig=00;90:.rpmsave=00;90:', 'CONDA_EXE': '/Users/ndove/opt/miniconda3/bin/conda', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.3sFXQ1WuQa/Listeners', '_CE_CONDA': '', 'CONDA_PREFIX_1': '/Users/ndove/opt/miniconda3', 'PATH': '/Users/ndove/opt/miniconda3/envs/phylophylan/bin:/Users/ndove/opt/miniconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin', 'CFBundleIdentifier': 'com.apple.Terminal', 'CONDA_PREFIX': '/Users/ndove/opt/miniconda3/envs/phylophylan', 'PWD': '/Users/ndove/bioinfo_finite_2022/220518_phylophlan_test', 'LANG': 'en_US.UTF-8', 'XPC_FLAGS': '0x0', 'XPC_SERVICE_NAME': '0', '_CE_M': '', 'HOME': '/Users/ndove', 'SHLVL': '4', 'LOGNAME': 'ndove', 'CONDA_PYTHON_EXE': '/Users/ndove/opt/miniconda3/bin/python', 'CONDA_DEFAULT_ENV': 'phylophylan', 'STAGING_BUCKET': 's3://aws-athena-query-results-728348960442-us-west-2/', 'CF_USER_TEXT_ENCODING': '0x1F6:0x0:0x0'}

I looked through my output folder and found no temporary RAxML files.

I then re-ran the following code:

raxmlHPC-PTHREADS-SSE3 -p 1989 -m GTRCAT -T 4 -t output_references/B_subtilis_input_bins_resolved.tre -w /Users/ndove/bioinfo_finite_2022/220518_phylophlan_test/output_references -s output_references/B_subtilis_input_bins_concatenated.aln -n B_subtilis_input_bins_refined.tre

and this is the error message I received:

Warning, you specified a working directory via "-w" Keep in mind that RAxML only accepts absolute path names, not relative ones!

RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file

ERROR: Bad base (E) at site 10 of sequence 1

Here is the sequence in question:

head output_references/B_subtilis_input_bins_concatenated.aln

">AIM086319_proj1_run165_20210408_plate905_C8_seq103551_asm106461 SRDAKDVRMEREKALFLILQEQTISPTNMLAARVAKDEVVITKTKIVIGSSFTIASKLNR RSNFVKATPVNVIANTPSNMDKAAQVFVIKTQKEVMADSFQQNRATVVHDMEQKNEAAMS GAMADTNEKCSMAKITNRIISLEVVDKVRVSPMSNTANTQKEKVDHVSTESENTLDFRED IINHTESVKKTAVEVYEQNGTNERDKGVAILCASKYHYERGLEEGDDNDVANANEINVGD QMLRHLQENVSRSTDDSFKRAGAIMSNEAVAETIEADIDIDAKTFKDNQPSHHRTQDQKT IDNCFVQNANNKQEKLDEHQNVHEQIVFKELENGTENSIIENDQRLIENQADIGRSSMEE MSEKIVIKSEETTREDVRSQEERASPTRSKDQWREVRPELPSIEQNKAATHAVSNKVAIK HNAVTHQPKDSGKYGPLVFKVKNHIHVSYESLRINETMSARTNIHTSEPAMQAVRVDVEE AENDNDQDTAANTQKSESKVLVQELVETVNSRNTSHHTPAALAIAIIFILVAVCVVQAVA"

It looks like there is an issue with the RAxML model, as GRTCAT is supposedly for nucleotides and the alignment file is in AA format. How do you suggest proceeding?

fasnicar commented 2 years ago

Hi, I guess you might have specified the --force_nucleotides parameter when creating the configuration file (that would explain the GTRCAT model), but then run PhyloPhlAn without providing the same --force_nucleotides param, so by default PhyloPhlAn will do the analysis on the proteins space instead of nucleotides. Depending on your goals you could either re-run PhyloPhlAn by adding the --force_nucleotides parameter, or create a new config file without the --force_nucleotides parameter and then re-run PhyloPhlAn.

Please, let me know if something is not clear.

nicholascdove commented 2 years ago

That's exactly right. Is it possible to proceed with my current tree (fastree) and .aln file with RAxML? For instance, do you have any guidance on choosing a different model? I'm thinking I can use -m LG instead of -m GTRCAT.

fasnicar commented 2 years ago

Yes, -m PROTCATLG is the model that PhyloPhlAn will use in RAxML for amino acid alignments.

If you had FastTree as well, you probably can't use the tree (although I am surprised you got one and not an error), because when specifying the --force_nucleotides the -gtr -nt parameters are specified, instead if you have amino acids the -lg param should be used.

So, in this case, I think you can directly run RAxML changing the GTRCAT with the PROTCATLG model for the -m params.

Many thanks, Francesco

nicholascdove commented 2 years ago

Thank you!

sentausa commented 1 year ago

Yes, -m PROTCATLG is the model that PhyloPhlAn will use in RAxML for amino acid alignments.

If you had FastTree as well, you probably can't use the tree (although I am surprised you got one and not an error), because when specifying the --force_nucleotides the -gtr -nt parameters are specified, instead if you have amino acids the -lg param should be used.

So, in this case, I think you can directly run RAxML changing the GTRCAT with the PROTCATLG model for the -m params.

Many thanks, Francesco

Hi. I was trying to follow the tutorial https://github.com/biobakery/biobakery/wiki/PhyloPhlAn-3.0:-Example-01:-S.-aureus and I had a similar error at step 5 during refining phylogeny as the phylophlan command in that step does not include --force_nucleotides. I got no error in the previous steps and I got the resolved.tre. So, are you saying that the resolved.tre should not be used and I can't just directly run RAxML on the resolved.tre using -m PROTCATLG to get the refined.tre?

fasnicar commented 1 year ago

Hi @sentausa, my previous comment was that without the refined.tre one could directly run RAxML on the concatenated alignment using the appropriate model parameters. I'm unsure about your specific case, but if you have the resolved.tre, then you can use it as a starting tree with RAxML for doing the refinement.

Thank you, Francesco