davidemms / SHOOT

SHOOT.bio - the phylogenetic search engine
https://www.shoot.bio
GNU General Public License v3.0
24 stars 4 forks source link

Database creation without orthofinder #4

Open mrouard opened 1 year ago

mrouard commented 1 year ago

Hello,

I was wondering if this is possible to use shoot with existing database of multiple alignment and trees. Let's say that I reproduce the same directories as orthofinder and include diamond databases, the msa (fasta) and gene trees (newick), would it be enough to get Shoot working?

Thank you

guignonv commented 1 year ago

It is possible if you provide the following structure and some changes in the files:

ShootDB
    ├── Gene_Trees
    ├── MultipleSequenceAlignments
    ├── Orthogroup_Sequences
    └── WorkingDirectory
        └── Alignments_ids

All your clusters need to be renamed using the OrthoFinder nomenclature scheme: "OG" + 7 digits starting from 0 for the first cluster and following numbers for the rest. Numbers must match OrthoMCL output. It means you'll have to have a cluster name lookup table if you use a different name scheme, to rename and match OG names against your cluster names. You'll also have to adjust those names in several places (file contents).

Then, SHOOT can be used to initialize the "SHOOT database" with those command lines:

python shoot/create_shoot_db.py <your "ShootDB" path> full
python shoot/create_shoot_db.py <your "ShootDB" path> profiles
python shoot/bifurcating_trees.py <your "ShootDB" path>