amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
374 stars 62 forks source link

very very large dataset recomendations #167

Open sapuizait opened 6 months ago

sapuizait commented 6 months ago

Hi all

As the title says, i have a very large dataset or 1500 genomes that share 1200 single copy genes. The plan is to build a concatenated alignment (lets see if its even possible :D ) and then use raxml to build a global phylogeny. Do you think it is even feasible or am I daydreaming and I should consider alternative approaches?

Cheers P

ps: I have access to a cluster which can run a maximum of 7 days, has 64 nodes and 500GB RAM -

amkozlov commented 6 months ago

In principle, it sounds feasible.

We successfully used raxml-ng for concatenated datasets with ~1400 taxa and ~1000 genes, as well as ~350 taxa and ~64000 loci (unfortunately, both papers are not published yet).

sapuizait commented 6 months ago

Thats excellent! Any advice/suggestions on how to do that? Do you use partitions and check models for each partition etc? Which algorithm? Thanks in advance for any tips! :)