broadinstitute / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
176 stars 88 forks source link

Cannot go from the old clinvar.GRCh37.2020-06-15.ht Hail file to the new clinvar.GRCh37.2020-12-26.ht for on-prem loading #1621

Closed dmcgoldrick closed 3 years ago

dmcgoldrick commented 3 years ago

Describe the bug I would like to use updated clinvar for an on-prem upload of my WES vcf but the new one is now not compatible.

Scope of the bug In all projects

Screenshots hail.utils.java.FatalError: HailException: incompatible file format when reading: seqr-reference-data/GRCh37/clinvar.GRCh37.2020-12-26.ht supported version: 1.4.0, found 1.5.0

how do I get the pipeline-runner to load clinvar 1.5.0 hail?

I am getting this when updating the container.

[mcgold@rainier docker-shares]$ docker-compose up -d pipeline-runner dockershares_elasticsearch_1 is up-to-date dockershares_pipeline-runner_1 is up-to-date

hanars commented 3 years ago

@mike-w-wilson are we seeing this problem on our end is this a local-specific issue?

mike-w-wilson commented 3 years ago

We are not seeing this. We are on hail 0.2.61. This issue can happen when a hail table is being read by an older version of hail. The clinvar hail table causing the issue was likely created on hail >= 0.2.60. @dmcgoldrick can you check what version of hail your pipeline is running? I believe if you upgrade hail to a more recent release, it will fix the issue.

dmcgoldrick commented 3 years ago

Hi Mike --

Thanks I'll do that next

Daniel

On Mon, Jan 25, 2021 at 1:42 PM Mike Wilson notifications@github.com wrote:

We are not seeing this. We are on hail 0.2.61. This issue can happen when a hail table is being read by an older version of hail. The clinvar hail table causing the issue was likely created on hail >= 0.2.60. @dmcgoldrick https://github.com/dmcgoldrick can you check what version of hail your pipeline is running? I believe if you upgrade hail to a more recent release, it will fix the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/seqr/issues/1621#issuecomment-767130488, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELLUIYGIRSDFGTDFO4DACTS3XQNFANCNFSM4WERPCHQ .

-- Daniel J McGoldrick Ph.D

UW Genome Sciences Center (GRC),

Center for Mendelian Genomics (CMG)

Box 355065Seattle, WA 98195(206) 685-7342

dmcgoldrick commented 3 years ago

Thanks ! I ran $docker-compose up -d pipeline-runner dockershares_elasticsearch_1 is up-to-date

it seems that the docker image is using an older HAIL - how do I update my pipeline-runner image to include the hail >= 0.2.60?

Is hail not configured in the docker image and updated when I run docker-compose up -d pipeline-runner?

On Mon, Jan 25, 2021 at 2:41 PM Daniel Joseph McGoldrick mcgold@uw.edu wrote:

Hi Mike --

Thanks I'll do that next

Daniel

On Mon, Jan 25, 2021 at 1:42 PM Mike Wilson notifications@github.com wrote:

We are not seeing this. We are on hail 0.2.61. This issue can happen when a hail table is being read by an older version of hail. The clinvar hail table causing the issue was likely created on hail >= 0.2.60. @dmcgoldrick https://github.com/dmcgoldrick can you check what version of hail your pipeline is running? I believe if you upgrade hail to a more recent release, it will fix the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/seqr/issues/1621#issuecomment-767130488, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELLUIYGIRSDFGTDFO4DACTS3XQNFANCNFSM4WERPCHQ .

-- Daniel J McGoldrick Ph.D

UW Genome Sciences Center (GRC),

Center for Mendelian Genomics (CMG)

Box 355065Seattle, WA 98195(206) 685-7342

-- Daniel J McGoldrick Ph.D

UW Genome Sciences Center (GRC),

Center for Mendelian Genomics (CMG)

Box 355065Seattle, WA 98195(206) 685-7342

hanars commented 3 years ago

You are probably using an old docker image, so I would recommend forcing docker to run the latest image. Here is a stack overflow with suggestions for how to do that if you are having trouble: https://stackoverflow.com/questions/37685581/how-to-get-docker-compose-to-use-the-latest-image-from-repository

dmcgoldrick commented 3 years ago

I completed the steps in the github site that you recommended

773 docker-compose stop 774 docker-compose rm -f 775 docker-compose -f docker-compose.yml up -d

I then get:

Hail version: 0.2.39-ef87446bd1c7 Error summary: HailException: incompatible file format when reading: seqr-reference-data/GRCh37/clinvar.GRCh37.2020-12-26.ht supported version: 1.4.0, found 1.5.0

Which is Hail 0.2.39 - does the latest image really have version hail >= 0.2.60?

can you specify the exact steps to update that image?

This is what I have after the above process:

[mcgold@rainier docker-shares]$ docker image list REPOSITORY TAG IMAGE ID CREATED SIZE gcr.io/seqr-project/pipeline-runner latest f3942ad990ee 3 weeks ago 3.21 GB gcr.io/seqr-project/postgres gcloud-prod 36bbd6a134a7 6 months ago 321 MB gcr.io/seqr-project/redis gcloud-prod cf4d24bc8594 6 months ago 104 MB gcr.io/seqr-project/seqr gcloud-prod 4de37bd63419 6 months ago 3.49 GB gcr.io/seqr-project/pipeline-runner gcloud-prod e06b26d59bba 9 months ago 3.08 GB gcr.io/seqr-project/elasticsearch gcloud-prod ec5415d3fb0f 10 months ago 657 MB gcr.io/seqr-project/kibana gcloud-prod 0eff989c6b98 2 years ago 1.13 GB

hanars commented 3 years ago

I just confirmed that the latest pipeline runner docker image has hail==0.2.61. You can confirm the installed version you have by running docker-compose exec pipeline-runner pip freeze | grep hail

If you do have an out-of-date image, maybe try editing your docker-compose file to explicitly request the most recent tag, i.e. replace gcr.io/seqr-project/pipeline-runner:gcloud-prod with gcr.io/seqr-project/pipeline-runner:f3942ad990ee

dmcgoldrick commented 3 years ago

I reproduced results from $docker-compose exec pipeline-runner pip freeze | grep hail hail==0.2.61 Works - Great! :-)

Then was explicit in the docker-compose as suggested "i.e. replace gcr.io/seqr-project/pipeline-runner:gcloud-prod with gcr.io/seqr-project/pipeline-runner:f3942ad990ee"

When using image I see that the HAIL IS actually now the right version - Awesome! :-) :-) ... Welcome to <>_ / // / / / / _ / `/ / / // //_,//_/ version 0.2.61-3c86d3ba497a LOGGING: writing to /hail-20210202-2144-0.2.61-3c86d3ba497a.log

So in the future I'll have to remember to edit my docker-compose.yml to match the docker image list image after 773 docker-compose stop 774 docker-compose rm -f 775 docker-compose -f docker-compose.yml up -d

e.g. gcr.io/seqr-project/pipeline-runner latest f3942ad990ee 3 weeks ago 3.21 GB

once again Thank you hanars and mike-w-wilson :-)

D

hanars commented 3 years ago

Great, glad it worked!