gks-anvil / vrs_anvil_toolkit

Extract clinical variant interpretations from VCF using GA4GH VRS IDs
MIT License
2 stars 1 forks source link

feature/terra-testing-results-analysis #40

Closed bwalsh closed 7 months ago

bwalsh commented 8 months ago
# delete old log
rm state/vrs_anvil.log 

# launch in background, send progress bars, etc to /dev/null
export SEQREPO_FD_CACHE_MAXSIZE=25 # 50, 100
nohup vrs_anvil annotate >/dev/null 2>&1  &

# monitor
htop
# or simply
ps -ef | grep annotate

# examine logs and metrics
ls -l state/
bwalsh commented 8 months ago

current config: cat manifest.yaml

# Path to the cache directory, defaults to cache/ (relative to the root of the repository)
cache_directory: "cache/"

# Number of works to use for processing, defaults to 20
num_threads: 40

# Should we create new VCFs with annotations. FOR FUTURE USE
# annotate_vcfs: false

# where to store the state of the application, log files, etc.
state_directory: "state/"

# where to store temporary files, etc.
work_directory: "work/"

# max lines from a vcf file (optional)
#limit: 100000

# The local file paths or URLs to vcf files to be processed
vcf_files:
  - "/home/jupyter/vrs-python-testing/tests/fixtures/1kGP.chr1.1000.vcf"
  - "gs://fc-2ee2ca2a-a140-48a1-b793-e27badb7945d/high_coverage_3202_samples/1kGP_high_coverage_Illumina.chr1.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"

# installation path
seqrepo_directory: "~/seqrepo/latest"

# normalize the VRS ids
normalize: false

# Control if cache is used
cache_enabled: false

# compute reference allele
compute_for_ref: false
bwalsh commented 8 months ago

Maintain run instuctions, add to documentation, and presentation

bwalsh commented 8 months ago

The feature/metakb is ready for review see manifest.metakb_directory

# Path to the cache directory, defaults to cache/ (relative to the root of the repository)
cache_directory: "cache/"

# Number of works to use for processing, defaults to 20
num_threads: 10

# Should we create new VCFs with annotations. FOR FUTURE USE
# annotate_vcfs: false

# where to store the state of the application, log files, etc.
state_directory: "state/"

# where to store temporary files, etc.
work_directory: "work/"

# max lines from a vcf file (optional)
limit: 100000

# The local file paths or URLs to vcf files to be processed
vcf_files:
  - "/home/jupyter/vrs-python-testing/tests/fixtures/1kGP.chr1.1000.vcf"
  - "gs://fc-2ee2ca2a-a140-48a1-b793-e27badb7945d/high_coverage_3202_samples/1kGP_high_coverage_Illumina.chr1.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"

# installation path
seqrepo_directory: "~/seqrepo/latest"

# normalize the VRS ids
normalize: false

# Control if cache is used
cache_enabled: false

# compute reference allele
compute_for_ref: false

# required - where we can find CDM files
metakb_directory: "../tests/fixtures/metakb"
bwalsh commented 8 months ago

tests/fixtures/metakb$ gsutil cp $WORKSPACE_BUCKET/metakb/*.* .
Copying $WORKSPACE_BUCKET/metakb/civic_cdm_20240103.json...
Copying $WORKSPACE_BUCKET/metakb/civic_cdm_20240103.json.zip...
Copying $WORKSPACE_BUCKET/metakb/moa_cdm_20240103.json...
Copying $WORKSPACE_BUCKET/metakb/moa_cdm_20240103.json.zip...
- [4 files][ 19.7 MiB/ 19.7 MiB]
Operation completed over 4 objects/19.7 MiB.```
bwalsh commented 8 months ago
$ cat manifest.yaml
# Path to the cache directory, defaults to cache/ (relative to the root of the repository)
cache_directory: "cache/"

# Number of works to use for processing, defaults to 20
num_threads: 10

# Should we create new VCFs with annotations. FOR FUTURE USE
# annotate_vcfs: false

# where to store the state of the application, log files, etc.
state_directory: "state/"

# where to store temporary files, etc.
work_directory: "work/"

# max lines from a vcf file (optional)
#limit: 100000

# The local file paths or URLs to vcf files to be processed
vcf_files:
  - "/home/jupyter/vrs-python-testing/tests/fixtures/1kGP.chr1.1000.vcf"
  - "gs://fc-2ee2ca2a-a140-48a1-b793-e27badb7945d/high_coverage_3202_samples/1kGP_high_coverage_Illumina.chr1.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"

# installation path
seqrepo_directory: "~/seqrepo/latest"

# normalize the VRS ids
normalize: false

# Control if cache is used
cache_enabled: false

# compute reference allele
compute_for_ref: false

# required - where we can find CDM files
metakb_directory: "../tests/fixtures/metakb"