A Federated Index of Virus Metadata and Hyperdata in Public Repositories
Status: Extensible DRAFT API
https://test.pypi.org/project/viral-index/
Requirements:
python3
Install the viral-index
module
python3 -m venv .env
source .env/bin/activate
pip install -q --extra-index-url https://test.pypi.org/simple/ viral-index
Configure BigQuery access credentials
Usage of this API requires access to GCP BigQuery. To set up authentication, please follow the instructions in the section "Setting up authentication" in this page. Note: when prompted to save the JSON file with your key downloads, we suggest we save it to a filename without spaces. In that way it's easier to set the GOOGLE_APPLICATION_CREDENTIALS
environment variable :)
N.B.: You may be charged for using this API. Please learn more about BigQuery pricing.
>>> from viral_index.client import ViralIndex
>>> viral_client = ViralIndex()
>>> cdd_id = 165276
>>> runs = viral_client.get_SRAs_where_CDD_is_found(cdd_id)
>>> print([r for r in runs])
['SRR2187433', 'SRR533343', 'ERR1915143']
>>>
>>> pig_taxid = 9823
>>> viruses = viral_client.get_viruses_for_host_taxonomy(pig_taxid)
>>> if viruses is not None:
for virus in viruses:
print(virus)
['Rotavirus C', 36427]
['Porcine rubulavirus', 53179]
['Porcine associated porprismacovirus 7', 2170123]
['Porcine enterovirus b/BEL/15V010', 2017720]
[...]
>>>
>>> spacer_seqs=viral_client.get_spacer_seqs(1915496)
>>> print([s for s in spacer_seqs])
[['112', 'CAGCCATCCGCGACGCCACGACAGCGGCCGAGAGTGT', 'GCF_002508705', 'GTDB'], ['1', 'AATCAGCCCGTCGGGGTAGCCAGGGACGCCCTCCA', 'GCF_002508705', 'GTDB'],
[...]
>>> spacer_seq='CACGAGTGCGAAGCATCCAATCCATATGACTACAT'
>>> spacer_tax_ids=viral_client.get_taxid_from_spacer_seq(str(spacer_seq))
>>> print([t for t in spacer_tax_ids])
[['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915496], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915507], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915502], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915504], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915506], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915510], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915499], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915512], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915500], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915495], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915498], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915505], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915508], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915503]]
Additional sample code can be found in python/sample-viral-index-access.py.
If you get an error like the one below, it's likely that you don't have Bigquery configured properly for your project. See step 2 in developer instructions above.
Access Denied: Project {YOUR_PROJECT_HERE}:
User does not have bigquery.jobs.create permission in project
{YOUR_PROJECT_HERE}
make
: Run sudo apt-get -y -m update && sudo apt-get install -y make
or
equivalent command for your system.python3
git clone https://github.com/NCBI-Codeathons/The_Virus_Index.git
make .env
source .env/bin/activate
export GOOGLE_APPLICATION_CREDENTIALS=${PATH_TO_CREDENTIALS_JSON_FILE}
.viral_index.client.ViralIndex
Automated testing is available in TravisCI.
The Makefile
has several targets that may be helpful:
.env
: initializes the python virtual environment.check_bq
: checks command line access to BigQuery (tool availability and authentication).check_python_syntax
: checks the syntax of python scripts in this repo.check_taxadb
: checks that taxadb was properly installed.check_api
: checks that the API can be retrieved from PyPI, runs demo script.init_taxadb
: Initializes and configures taxadb (needed for the taxonomy utilities).deploy
: Builds a tarball for distribution and uploads it to test.pypi.org (requires twine
, contact @christiam).setup_bigquery_authentication
: Sample command lines to set up authentication for BigQuery.The module's version is stored in setup.py.
(Assumes bash and linux)
make init_taxadb
(this will take about
2-3 minutes).source .env/bin/activate
export TAXADB_CONFIG=${PWD}/etc/taxadb.cfg
python/name2taxid.py
: takes scientific names on standard input or input files (spelling is significant) and
outputs NCBI taxonomy IDs.python/taxid2lineage.py
: takes NCBI taxonomy IDs on standard input (or
input files) and outputs the lineage for that given taxid.