gks-anvil / vrs_anvil_toolkit

Extract clinical variant interpretations from VCF using GA4GH VRS IDs
MIT License
2 stars 1 forks source link

Add a documented script to download and setup seqrepo and vrs-python #1

Closed bwalsh closed 9 months ago

bwalsh commented 10 months ago

See biocommons Specifically local read-only archive

Contraints: In the terra environment, /usr/local/share is write only so the seqrepo needs to be installed in a user controlled directory e.g. ~/seqrepo this directory should not be hard coded.

Rough steps:

Setup

mkdir seqrepo
export SEQREPO_ROOT=~/seqrepo

# can you determine which of these is the best way to maintain 
seqrepo pull -i 2021-01-29 --root-directory ~/seqrepo
seqrepo  --root-directory ~/seqrepo pull -i 2021-01-29
seqrepo pull -i 2021-01-29

Testing

Manual testing

curl https://raw.githubusercontent.com/ga4gh/vrs-python/main/tests/extras/data/test_vcf_input.vcf --output test_vcf_input.vcf 
python3 -m ga4gh.vrs.extras.vcf_annotation --vcf_in test_vcf_input.vcf --vcf_out output.vcf.gz --vrs_pickle_out vrs_objects.pkl  --seqrepo_root_dir ~/seqrepo/2021-01-29

pytest

Implement a pytest module that will replicate manual tests above

e.g. tests/unit/your-test-here.py

bwalsh commented 10 months ago

Create a Pull Request, targeting the development branch.

quinnwai commented 9 months ago

Updated PR to address your comments along with simplifying some code, ready for your re-review again