broadinstitute / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
176 stars 88 forks source link

SeqrMTToESTask fails #1555

Closed davidhemp closed 3 years ago

davidhemp commented 3 years ago

Using a clean on-perm build as given in the doc is unable to load a test data test into ES.

elasticsearch.exceptions.RequestError: TransportError(400, 'mapper_parsing_exception', 'Root mapping definition has unsupported parameters: [variant : {_meta={gencodeVersion=unknown, hail_version=0.2.39, genomeVersion=38, sampleType=WES, sourceFilePath=/input_vcfs/PID_trio.vcf.gz},

This test vcf worked well on the previous installation and the .mt has been created successfully. Seems to be an ES issue, curl localhost:9200 gives the Version as 7.8.1

JakeHagen commented 3 years ago

I am having the same issue.

hanars commented 3 years ago

this looks like you are using the latest docker image for elasticsearch but an old docker image for the piepline runner. Try updating everything to the latest version

JakeHagen commented 3 years ago

I think I am up to date. I ran docker pull gcr.io/seqr-project/pipeline-runner:gcloud-prod It gave me,

gcloud-prod: Pulling from seqr-project/pipeline-runner
Digest: sha256:ac5edcbc36023a118d0a255a04c3ed43003b823fba469bb3460ec05c164a4752
Status: Image is up to date for gcr.io/seqr-project/pipeline-runner:gcloud-prod
gcr.io/seqr-project/pipeline-runner:gcloud-prod

Is this how I should be updating the docker image?

davidhemp commented 3 years ago

Similarly, I am using the newest version of the docker-compose.yml which uses gcr.io/seqr-project/pipeline-runner:gcloud-prod

I have tired deleting all images and re-downloading with

docker-compose down docker image prune -a docker-compose up -d seqr docker-compose up -d pipeline-runner

Inspect gives me

"Id": "sha256:168b34f4eb4b08ddc07d0399069641d2586e8b962c272f8a633c0cc8c1bf71f1", "RepoTags": [ "gcr.io/seqr-project/seqr:gcloud-prod" ], "RepoDigests": [ "gcr.io/seqr-project/seqr@sha256:d96b61f4516af49460d375db1a15e780859a545f588292b8dfbe04d3917f7ad7" ], "Parent": "", "Comment": "", "Created": "2020-11-05T02:21:05.87479731Z", "Container": "840c4e2f7821ee2958d216cad1c50c54b6677a51b9791967cfe5d439acd2d03c",

but I still get the error when running python3 -m seqr_loading SeqrMTToESTask --local-scheduler --dest-path /input_vcfs/NG149_duo.g.mt --genome-version 38 --es-host elasticsearch --es-index ng149duo

I'll try regenerating the .mt

JakeHagen commented 3 years ago

Ive tried using older docker images for both the pipeline runner and elasticsearch, and the newest images as of yesterday. I am getting the same error, do you have an idea where I should start in the codebase to troubleshoot this?

hanars commented 3 years ago

I've just added a new pipeline docker image, can you try that and see if it works?

JakeHagen commented 3 years ago

Where did you add it? On the gcr.io repo the newest one looks like its from September. Thank you for your help.

hanars commented 3 years ago

sorry, it should be there now

JakeHagen commented 3 years ago

That worked! Thank you very much

hanars commented 3 years ago

awesome! Sorry it took so long to get this up and fixed