Arborist builds trees for the IEDB. The trees are used for the user interface on https://iedb.org and the IEDB curation interface, and also for validating IEDB data. They combine data from the IEDB with community ontologies such as the NCBI Taxonomy and open scientific databases such as UniProt and Genbank.
WARN: This version of Arborist is still work-in-progress. It makes extensive use of Nanobot, which is also work-in-progress.
The Makefile
defines and documents all the specific steps for Arborist.
Run make help
to see the list of main tasks.
You can either run make
directly or inside a Docker container.
For Docker, run ./run_image.sh make
or sudo -E ./run_image.sh make
.
If you aren't using Docker,
first install the required software by running make deps
.
NOTE: Arborist currently supports only Linux on the x86_64 architecture.
The suggested workflow is:
src/iedb/update-cache
.
This requires MySQL/MariaDB connection parameters to be set
as IEDBMYSQL* environment variables:
IEDB_MYSQL_HOST,
IEDB_MYSQL_PORT,
IEDB_MYSQL_USER,
IEDB_MYSQL_PASSWORD,
IEDB_MYSQL_DATABASE.make all
to build all trees.make serve
to start the web interface on http://localhost:3000.These are the key Make tasks for building trees, in their dependency order:
make iedb
load IEDB data:
This runs the src/iedb/update-cache
scriptmake ncbitaxon
build the NCBI Taxonomymake organism
build the organism and subspecies trees:
This also creates the list of "active species" used by IEDB,
and the "active taxa" that fall under these species.make proteome
select a proteome for each active speciesmake protein
build the protein treemake all
build all treesTODO: build more trees: peptide, molecule, assay, disease, geolocation, ...
Here are some other important Make tasks:
make deps
install required softwaremake serve
run the web interface on http://localhost:3000make clean
remove all build filesmake clobber
remove all generated filesmake help
print this messagebin/
contains any required binaries that aren't already installedbuild/
all sorts of generated files
iedb/
selected tables from IEDB for use herearborist/
general build files<species_id>/
species-specific build filescache/
compressed data from various sources
iedb/
selected tables from IEDBncbitaxon/
NCBI Taxonomy's taxdmp.zip
filescurrent/
links to the cached data to use for builds
iedb
links to a subdirectory of cache/iedb/
taxdmp.zip
links to a file in cache/ncbitaxon/
result/
TODO date-stamped directories of results, and latest
linksrc/
iedb/
config and schemas for IEDB dataarborist/
config and schemas for Arborist tablesspecies/
config and schemas for species proteomes and protein treesorganism/
scripts for building the organism treeproteome/
scripts for selecting proteomesutil/
utility scripts for working with databasestemplates/
Nanobot HTML templates