Closed TlaskalV closed 4 years ago
Hi, many thanks for reporting this. I'll update the code to check for all empty sequences even after subsampled alignments.
A quick fix can be the removal of that genome 3376_74
from the MSA and then run RAxML.
Another option would be to discard that marker/gene tree from your analysis.
Many thanks, Francesco
I think I would have to remove more markers and more genomes. RAxML gave error for tens of MAGs in MSA. I will try some command line approach to omit them. With MAGs (despite having high completeness) this might happen easily.
If you have many MSAs with the same problem you can import the phylophlan.py
and use the is_msa_empty(msa, path=None)
function, which returns True
if there is at least an empty sequence in the msa
file.
I am able to get list of markers present after subsampling in each genome (i.e. subsampled MSAs which do not have gap-only entries), I am using PhyloPhlAn db with 400 markers. May I ask if there is a way how to reduce RAxML refining to only this subset of markers? It always runs for the whole set. The solution might be to build custom reduced database. Is there an easier approach? Many thanks!
If you want to make a custom db with a reduced set of markers you can make a copy of the database phylophlan
folder and remove from the phylophlan.faa
fasta file the markers you don't want to use. Then you just need to delete the indexed db file (phylophlan.dmnd
if you used diamond) and re-launch PhyloPhlAn.
Hello Francesco, thank you for great documentation for PhyloPhlAn. I looked for solution of my issue but I was not able to find it so I started new one. When analyzing couple of MAGs, I can see gaps in subsampled alignments:
Later, RAxML outputs files (e.g.
RAxML_info.p0111.tre)
in thegene_tree2
folder with error for each MAG with gaps:My command is below:
I guess
remove_fragmentary_entries
works correctly, marker is not completely missing in non-subsampled alignment:Please, do you have any suggestion how to proceed with RAxML? Thank you in advance!