Welcome to the official repository of ROADIES, a novel pipeline designed for phylogenetic tree inference of the species directly from their raw genomic assemblies. ROADIES offers a fully automated, easy-to-use, scalable solution, eliminating any error-prone manual steps and providing unique flexibility in adjusting the tradeoff between accuracy and runtime.
For more detailed information on all the features and settings of ROADIES, please refer to our Wiki.
To run ROADIES using DockerHub, follow these steps:
docker pull ang037/roadies:latest
docker run -it ang037/roadies:latest
First, clone the repository (requires git
to be installed in the system):
git clone https://github.com/TurakhiaLab/ROADIES.git
cd ROADIES
Then build and run the Docker container:
docker build -t roadies_image .
docker run -it roadies_image
First clone the repository:
git clone https://github.com/TurakhiaLab/ROADIES.git
cd ROADIES
Then, execute the installation script:
chmod +x roadies_env.sh
source roadies_env.sh
This will install and build all tools and dependencies. Once the setup is complete, it will print Setup complete
in the terminal and activate the roadies_env
environment with all Conda packages installed.
To run this script, ensure the following dependencies are installed:
wget
and unzip
commandsFor Ubuntu, you can install these dependencies with:
sudo apt-get install -y wget unzip make g++ python3 python3-pip python3-setuptools git default-jre libgomp1 libboost-all-dev cmake
Note: If you encounter issues with the Boost library, add its path to $CPLUS_LIBRARY_PATH
and save it in ~/.bashrc
.
Once setup is done, you can run the ROADIES pipeline using the provided test dataset. Follow these steps for a 16-core machine:
mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'
python run_roadies.py --cores 16
The first command will download the 11 Drosophila genomic datasets (links provided in test/input_genome_links.txt
) and save them in the test/test_data
directory. The second command will run ROADIES for those 11 Drosophila genomes and save the final newick tree as roadies.nwk
in a separate ROADIES/output_files
folder upon completion.
To run ROADIES with your own datasets, follow these steps:
config.yaml
file to include the path to your input datasets under the GENOMES
parameter. Ensure all input genomic assemblies are in .fa
or .fa.gz
format and named according to the species' name (e.g., Aardvark.fa
). Note: Each file should contain the genome assembly of one unique species. If a file contains multiple species, split it into individual genome files (fasplit
can be used: faSplit byname <input_dir> <output_dir>
).
Configure Other Parameters: Adjust other parameters in config.yaml
as needed. Detailed information on each parameter is available in the Usage
section.
Run the Pipeline: Execute the pipeline with the following command (example for 16 cores):
python run_roadies.py --cores 16
The output species tree in Newick format will be saved as roadies.nwk
in the output_files
folder.
fast
, balanced
, accurate
) by controlling the accuracy-runtime tradeoff. Use any one of the following commands to select a mode (accurate
mode is the default):python run_roadies.py --cores 16 --mode accurate
python run_roadies.py --cores 16 --mode balanced
python run_roadies.py --cores 16 --mode fast
We welcome contributions from the community. If you encounter any issues or have suggestions for improvement, please open an issue on GitHub. For general inquiries and support, reach out to our team.
If you use ROADIES in your research or publications, please cite the following paper:
Gupta A, Mirarab S, Turakhia Y, (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. bioRxiv. https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1
The output files with the gene trees and species trees generated by ROADIES are deposited to Dryad. To access it, please refer to this link (Note: the dataset submission is undergoing review and a permanent link will be posted once available).