Closed amnonkhen closed 2 weeks ago
@amnonkhen @Jeena-Rajan This has now been bumped up in urgency, ever since the last managed access meeting. We will need to be able to tag HCA projects with DUO codes, which are driven by an ontology. Because of this, please provide an estimate of time of completion, but our needs would probably be for a few weeks from today. Thanks!
Gabby has mentioned will be required for schema updates happening within the next 2 weeks.
For ingest-validator, the work would involve:
CurieExpansion.expandCurie()
/api/search?q=MONDO:0018177&exact=true&groupField=true&queryFields=obo_id
findChildTerm
in GraphRestriction
q
param) is a child of another term (allChildrenOf
param) in the given ontologies (ontology
param) /api/search?q=MONDO:0018177&exact=true&groupField=true&allChildrenOf=http://purl.obolibrary.org/obo/MONDO_0000001&ontology=mondo,efo&queryFields=obo_id
This showed a validation error for the biomaterial:
Could not retrieve IRI for HANCESTRO:0004.
I added to the error message the url and it shows as:
Could not retrieve IRI for HANCESTRO:0004. OLS URL: https://www.ebi.ac.uk/ols4/api/search?q=HANCESTRO%3A0004&exact=true&groupField=true&queryFields=obo_id at .human_specific.ethnicity[0].ontology
When I add the OLS response to the log I notice that:
HANCESTRO:0004
two documents are returneddata:0006
zero documents return
Could not retrieve IRI
message.see discussion of these errors in this comment below
Hey, @amnonkhen
Thanks for prioritising this. @ESapenaVentura @idazucchi - who can do the testing at the end?
In a meeting between @ESapenaVentura @arschat & @amnonkhen we found the following:
1) data:0006
- This value is missing from OLS4 because the EDAM ontology modified the prefix from data:
to EDAM:
. The resolution here would be:
data:
values to EDAM:
data:
the validation would fail until the wrangler/contributor fixes the document in a similar wayEDAM:
see HumanCellAtlas/metadata-schema#1572
2) HANCESTRO:0004
obo_id
(possibly with a different case), and if so, treat them as a single document.As discussed with Enrique I've uploaded a selection of projects to the dev environment to test the ontology. All of them have errors due to the EDAM change, a few of them also have errors due to HANCESTRO. I haven't seen any other type of error
If you need more projects let me know through slack please Project 1 - b5d05080-5417-4c22-89f8-4a5eff18d1f9 Project 2 - ec30720b-cfe2-424f-9655-a69a132e2883 Project 3 - cab32543-a44e-496e-b889-f1459686e59f Project 4 - 554e28e7-0b2d-4e69-a117-4e9f45e347b5 Project 5 - a1251330-f57a-4f96-8339-00e77f745e6e Project 6 - f7bfc34e-1ee2-43eb-bb7f-b3c65bbfaa67 Project 7 - f2600a33-4d20-469a-a5cc-3f9581856e10 Project 8 - c5123642-fb9a-44fb-8ab0-55709f688456 Project 9 - 957d3311-6f02-4167-9f5a-0adfe4441565 Project 10 - 72b36d11-4a2d-49b8-8b54-54e1c7d0e306
After deploying the change to the IRI resolution in ingest-validator that groups documents with identical obo_id
together, these projects pass validation. The only errors are due to missing files.
Before this ticket can be finished, the schema PR HumanCellAtlas/metadata-schema#1572 needs to be merged to dev so that the integration test can pass.
The changes are deployed to dev environment successfully, but during deployment to prod there are errors: in the gitlab job log:
$ helm upgrade --debug -f k8s/apps/$ENVIRONMENT_NAME.yaml $APP_NAME k8s/apps/$APP_NAME --set-string environment=$ENVIRONMENT_NAME,image=$RELEASE_TAG,replicas=$REPLICAS,gitlab_app=$CI_PROJECT_PATH_SLUG,gitlab_env=$CI_ENVIRONMENT_SLUG --wait --install
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /builds/hca/ingest-validator.tmp/KUBECONFIG
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /builds/hca/ingest-validator.tmp/KUBECONFIG
history.go:56: [debug] getting history for release ingest-validator
upgrade.go:123: [debug] preparing upgrade for ingest-validator
upgrade.go:1[31](https://gitlab.ebi.ac.uk/hca/ingest-validator/-/jobs/1841787#L31): [debug] performing update for ingest-validator
upgrade.go:303: [debug] creating upgraded release for ingest-validator
client.go:201: [debug] checking 1 resources for changes
wait.go:53: [debug] beginning wait for 1 resources with timeout of 5m0s
wait.go:244: [debug] Deployment is not ready: staging-environment/ingest-validator. 0 out of 1 expected pods are ready
Then the gitlab job times out.
Further inspection of the k8s prod env shows:
❯ kubectl get pods -l app=ingest-validator
NAME READY STATUS RESTARTS AGE
ingest-validator-54ff49d58f-f862g 0/1 ContainerCreating 0 48d
ingest-validator-6fd4d775c5-4p88k 1/1 Running 48 (157d ago) 401d
It appears that there is a problem mounting the secret and configmap:
❯ kubectl describe pods -l app=ingest-validator
Name: ingest-validator-54ff49d58f-f862g
Namespace: prod-environment
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 22m (x24140 over 48d) kubelet Unable to attach or mount volumes: unmounted volumes=[secret-volume], unattached volumes=[secret-volume kube-api-access-tp2f5]: timed out waiting for the condition
Warning FailedMount 3m6s (x34378 over 48d) kubelet MountVolume.SetUp failed for volume "secret-volume" : references non-existent secret key: ingest-service-account-auth-info
There is a missing secret ingest-service-account-auth-info.
The secret was missing from staging and prod.
source ~/dev/ingest-kube-deployment/config/environment_staging
kubectx ingest-eks-staging
make deploy-secrets
kubectl get secrets api-keys -o jsonpath="{.data.ingest-service-account-auth-info}"
source ~/dev/ingest-kube-deployment/config/environment_prod
kubectx ingest-eks-prod
make deploy-secrets
kubectl get secrets api-keys -o jsonpath="{.data.ingest-service-account-auth-info}"
2 HCA components use OLS:
options:
Option 1 is preferable in terms of maintenance. Amnon and Enrique need to check whether public OLS can be used or what the gap is. See discussion from 10/7 by James, Enrique and Amnon
James' github handle - @udp
Deadline for decommission of public OLS 3 - Oct 2023