NCBI-Codeathons / The_Virus_Index

A Federated Index of Virus Metadata and Hyperdata in Public Repositories
MIT License
3 stars 0 forks source link
ncbi-virus-discovery-2

The_Virus_Index

A Federated Index of Virus Metadata and Hyperdata in Public Repositories

API

Status: Extensible DRAFT API

Build Status

https://test.pypi.org/project/viral-index/

Requirements:

Developer instructions

  1. Install the viral-index module

    python3 -m venv .env
    source .env/bin/activate
    pip install -q --extra-index-url https://test.pypi.org/simple/ viral-index 
  2. Configure BigQuery access credentials

Usage of this API requires access to GCP BigQuery. To set up authentication, please follow the instructions in the section "Setting up authentication" in this page. Note: when prompted to save the JSON file with your key downloads, we suggest we save it to a filename without spaces. In that way it's easier to set the GOOGLE_APPLICATION_CREDENTIALS environment variable :)

N.B.: You may be charged for using this API. Please learn more about BigQuery pricing.

  1. Write your code to access the index!

Sample code

>>> from viral_index.client import ViralIndex
>>> viral_client = ViralIndex()
>>> cdd_id = 165276
>>> runs = viral_client.get_SRAs_where_CDD_is_found(cdd_id)
>>> print([r for r in runs])
['SRR2187433', 'SRR533343', 'ERR1915143']
>>> 

>>> pig_taxid = 9823
>>> viruses = viral_client.get_viruses_for_host_taxonomy(pig_taxid)
>>> if viruses is not None:
        for virus in viruses:
            print(virus)
['Rotavirus C', 36427]
['Porcine rubulavirus', 53179]
['Porcine associated porprismacovirus 7', 2170123]
['Porcine enterovirus b/BEL/15V010', 2017720]
[...]
>>>

>>> spacer_seqs=viral_client.get_spacer_seqs(1915496)
>>> print([s for s in spacer_seqs])
[['112', 'CAGCCATCCGCGACGCCACGACAGCGGCCGAGAGTGT', 'GCF_002508705', 'GTDB'], ['1', 'AATCAGCCCGTCGGGGTAGCCAGGGACGCCCTCCA', 'GCF_002508705', 'GTDB'],
[...]

>>> spacer_seq='CACGAGTGCGAAGCATCCAATCCATATGACTACAT'
>>> spacer_tax_ids=viral_client.get_taxid_from_spacer_seq(str(spacer_seq))
>>> print([t for t in spacer_tax_ids])
[['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915496], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915507], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915502], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915504], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915506], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915510], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915499], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915512], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915500], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915495], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915498], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915505], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915508], ['31', 'CACGAGTGCGAAGCATCCAATCCATATGACTACAT', 'GCF_002508705', 'GTDB', 1915503]]

Additional sample code can be found in python/sample-viral-index-access.py.

Troubleshooting

  1. If you get an error like the one below, it's likely that you don't have Bigquery configured properly for your project. See step 2 in developer instructions above.

    Access Denied: Project {YOUR_PROJECT_HERE}:
    User does not have bigquery.jobs.create permission in project
    {YOUR_PROJECT_HERE}

Maintainer instructions

Maintainer dependencies

  1. make: Run sudo apt-get -y -m update && sudo apt-get install -y make or equivalent command for your system.
  2. python3
  3. GCP SDK

Instructions

  1. Check out the source code: git clone https://github.com/NCBI-Codeathons/The_Virus_Index.git
  2. Set up the python virtual environment: make .env
  3. Enable python virtualenv: source .env/bin/activate
  4. Set up the GCP credentials: export GOOGLE_APPLICATION_CREDENTIALS=${PATH_TO_CREDENTIALS_JSON_FILE}.
  5. Write code that uses viral_index.client.ViralIndex

Automated testing is available in TravisCI.

The Makefile has several targets that may be helpful:

The module's version is stored in setup.py.

Bonus: Taxonomy utilities

Dependencies

Initialize taxadb and environment

(Assumes bash and linux)

  1. Download and set up taxadb: Run make init_taxadb (this will take about 2-3 minutes).
  2. Initialize python virtual environment: Run source .env/bin/activate
  3. Set environment variable: export TAXADB_CONFIG=${PWD}/etc/taxadb.cfg

Available tools

Future work