Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
150 stars 46 forks source link

Partially fix #4802 - update demo VCF header with missing keys #4803

Closed dnil closed 4 weeks ago

dnil commented 1 month ago

This PR adds a functionality or fixes a bug.

(scout312) ➜  demo git:(main) ✗ diff 643594.clinical.vcf.orig 643594.clinical.vcf
113a114,118
> ##INFO=<ID=AZ,Number=1,Type=Flag,Description="Autozygous positon call">
> ##INFO=<ID=AZLENGTH,Number=1,Type=String,Description="Autozygous region length">
> ##INFO=<ID=AZMARKERS,Number=1,Type=String,Description="Autozygous region length">
> ##INFO=<ID=AZQUAL,Number=1,Type=String,Description="Autozygous positon call quality">
> ##INFO=<ID=AZTYPE,Number=1,Type=String,Description="Autozygous region type">
202a208,211
> ##INFO=<ID=SWEGENAC_Hemi,Number=A,Type=Integer,Description="Allele counts in hemizygous genotypes">
> ##INFO=<ID=SWEGENAC_Het,Number=A,Type=Integer,Description="Allele counts in heterozygous genotypes">
> ##INFO=<ID=SWEGENAC_Hom,Number=A,Type=Integer,Description="Allele counts in homozygous genotypes">
> ##INFO=<ID=SWEGENAF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
223,224d231
< ##INFO=<ID=AZLENGTH,Number=1,Type=String,Description="Autozygous region length">
< ##INFO=<ID=AZQUAL,Number=1,Type=String,Description="Autozygous positon call quality">
225a233,235
> ##INFO=<ID=HmtVar,Number=A,Type=String,Description="HmtVar ID of the variant (can be used to view the related VariantCard on https://www.hmtvar.uniba.it/varCard/<HmtVarID>)">
> ##INFO=<ID=Annotation,Number=.,Type=String,Description="Annotates what feature(s) this variant belongs to.">
> ##INFO=<ID=Hom,Number=1,Type=Integer,Description="The number of observed homozygotes (from /rare-disease/references/references_12.0/grch37_loqusdb_snv_indel_variants_export-20240220-.vcf.gz)">
Testing on cg-vm1 server (Clinical Genomics Stockholm) **Prepare for testing** 1. Make sure the PR is pushed and available on [Docker Hub](https://hub.docker.com/repository/docker/clinicalgenomics/scout-server-stage) 1. Fist book your testing time using the Pax software available at [https://pax.scilifelab.se/](https://pax.scilifelab.se). The resource you are going to call dibs on is `scout-stage` and the server is `cg-vm1`. 1. `ssh @cg-vm1.scilifelab.se` 1. `sudo -iu hiseq.clinical` 1. `ssh localhost` 1. (optional) Find out which scout branch is currently deployed on cg-vm1: `podman ps` 1. Stop the service with current deployed branch: `systemctl --user stop scout.target` 1. Start the scout service with the branch to test: `systemctl --user start scout@` 1. Make sure the branch is deployed: `systemctl --user status scout.target` 1. After testing is done, repeat procedure at [https://pax.scilifelab.se/](https://pax.scilifelab.se), which will release the allocated resource (`scout-stage`) to be used for testing by other users.
Testing on hasta server (Clinical Genomics Stockholm) **Prepare for testing** 1. `ssh @hasta.scilifelab.se` 1. Book your testing time using the Pax software. `us; paxa -u -s hasta -r scout-stage`. You can also use the WSGI Pax app available at [https://pax.scilifelab.se/](https://pax.scilifelab.se). 1. (optional) Find out which scout branch is currently deployed on cg-vm1: `conda activate S_scout; pip freeze | grep scout-browser` 1. Deploy the branch to test: `bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b ` 1. Make sure the branch is deployed: `us; scout --version` 1. After testing is done, repeat the `paxa` procedure, which will release the allocated resource (`scout-stage`) to be used for testing by other users.

How to test:

  1. how to test it, possibly with real cases/data

Expected outcome: The functionality should be working Take a screenshot and attach or copy/paste the output.

Review:

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.21%. Comparing base (f13bc9e) to head (ec46dc5). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #4803 +/- ## ======================================= Coverage 84.21% 84.21% ======================================= Files 318 318 Lines 19179 19179 ======================================= Hits 16152 16152 Misses 3027 3027 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dnil commented 4 weeks ago

I think I will keep this PR simple with fixing the "simple" missing ones. Those that went obsolete with VCF 4.3 also will need some attentive updating in load functions to make sure all is well with new keys and the same ones on CSQ.

dnil commented 4 weeks ago

Fixed the missing keys likely stemming from some creative cut and pasting to make the demo file.

Before:

2024-08-30 12:20:15 MacBook-Pro-4.local scout.adapter.mongo.variant_loader[33662] INFO Start inserting clinical snv variants into database
[W::vcf_parse_info] INFO 'AZ' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'AZMARKERS' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'AZTYPE' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'Annotation' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'Hom' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'SWEGENAC_Hemi' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'SWEGENAC_Het' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'SWEGENAC_Hom' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'SWEGENAF' is not defined in the header, assuming Type=String
[W::vcf_parse_info] INFO 'HmtVar' is not defined in the header, assuming Type=String
2024-08-30 12:20:18 MacBook-Pro-4.local scout.adapter.mongo.variant_loader[33662] INFO All variants inserted, time to insert variants: 0:00:03.057766

After:

2024-09-02 07:49:28 MacBook-Pro-4.local scout.adapter.mongo.variant_loader[81891] INFO Start inserting clinical snv variants into database
2024-09-02 07:49:31 MacBook-Pro-4.local scout.adapter.mongo.variant_loader[81891] INFO All variants inserted, time to insert variants: 0:00:02.681580

We will still have the initial invalid tag check to deal with, but as stated that is a slightly different problem:

2024-08-30 12:20:14 MacBook-Pro-4.local scout.adapter.mongo.hgnc[33662] INFO Building interval trees...
[W::bcf_hrec_check] Invalid tag name: "1000GAF"
[W::bcf_hrec_check] Invalid tag name: "1000G_MAX_AF"
[W::bcf_hrec_check] Invalid tag name: "GERP++_RS_prediction_term"
[W::bcf_hrec_check] Invalid tag name: "1000GAF"
[W::bcf_hrec_check] Invalid tag name: "1000G_MAX_AF"
[W::bcf_hrec_check] Invalid tag name: "GERP++_RS_prediction_term"

I'll open a new issue for these.

northwestwitch commented 4 weeks ago

Is this a fix to #4802 ??

I'm answering myself: nope, or not yet?

image
dnil commented 4 weeks ago

Right, no, to the second thing mentioned in the same issue - VCF parsing warnings.

sonarcloud[bot] commented 4 weeks ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud