ebi-ait / dcp-ingest-central

Central point of access for the Ingestion Service of the HCA DCP
Apache License 2.0
0 stars 0 forks source link

hca-util sync'ed files still missing in ingest #988

Closed idazucchi closed 8 months ago

idazucchi commented 10 months ago

Describe the bug After using hca-util to sync the files to the ingest s3 bucket the files' status remains missing in ingest, despite the files being present in the ingest s3 bucket. This is problem persists after deleting the submission, reuploading the spreadsheet and syncing the files again. Affected projects: potentially all new project examples:

To Reproduce Steps to reproduce the behaviour:

  1. Go to https://contribute.data.humancellatlas.org/projects/all
  2. Select one project and upload a spreadsheet - example
  3. Upload files with hca-util sync - example uuid fa6518af-997d-4c4a-9936-e3e83f97d926
    hca-util select fa6518af-997d-4c4a-9936-e3e83f97d926
    hca-util sync s3://org-hca-data-archive-upload-prod/9fd083a0-d72d-4425-a391-f3b7a9795a3b/
  4. Files are transferred succesfully
  5. See error - the files are still missing in the submission

Expected behaviour I expect the files to be recognised by ingest and validation to start

Environment

Browser

amnonkhen commented 10 months ago

Trying to follow dev troubleshooting sop.

Checking a file from DendriticCellActivationHCM

amnonkhen commented 9 months ago

Step 2. validation job logs in aws nothing here

amnonkhen commented 9 months ago

Step 3. file in s3 file is there, nothing evident

amnonkhen commented 9 months ago

Step 4. lambda function logs - filtering for the file dcp-upload-csum-prod. I found errors sending notifications to ingest about fie validation status:

2024-01-18T17:44:10+0000 INFO 
upload.common.ingest_notifier failed to send notification d47ab7e8-69b7-4a59-8741-b50653ebd87b with payload

{
'upload_area_id': '8e330a2f-25c1-4bb7-95a0-0fc4e602b0bd', 
'name': 'Human_5171_counts_0_1_HHJTNDSXY_S1_L003_I1.fastq.gz',
 'size': 894113556, 
'content_type': 'binary/octet-stream; dcp-type=data',
 'url': 's3://org-hca-data-archive-upload-prod/8e330a2f-25c1-4bb7-95a0-0fc4e602b0bd/Human_5171_counts_0_1_HHJTNDSXY_S1_L003_I1.fastq.gz', 
'checksums': {
   'sha1': 'e6353f4c2aa830d3d58cc3278d4d10465ea20f08', 
   'crc32c': '824be187', 
   'sha256': '2b46f29fdd0e1894d9c333755061c1d6e2dc34290d16d044ff6ee5df00144fa3', 
   's3_etag': 'cfdda042513ef5106a9cf273a4e57380-14'
 }, 
'last_modified': '2024-01-18T17:44:05+00:00'
} 
and error Expecting value: line 1 column 1 (char 0)

This is probably not the issue, as a bit further down the log, at 024-01-18T17:44:41.087+00:00, it shows that the notification succeeded.

[INFO]  2024-01-18T17:44:41.87Z 7b41a1e8-92e3-5879-947f-ffcbf775ef3a    Notified Ingest: file_info={
'upload_area_id': '8e330a2f-25c1-4bb7-95a0-0fc4e602b0bd', 
'name': 'Human_5171_counts_0_1_HHJTNDSXY_S1_L003_I1.fastq.gz', 
'size': 894113556,
 'content_type': 'binary/octet-stream; dcp-type=data', 
'url': 's3://org-hca-data-archive-upload-prod/8e330a2f-25c1-4bb7-95a0-0fc4e602b0bd/Human_5171_counts_0_1_HHJTNDSXY_S1_L003_I1.fastq.gz', 
'checksums': {
   'sha1': 'e6353f4c2aa830d3d58cc3278d4d10465ea20f08', 
   'crc32c': '824be187', 
   'sha256': '2b46f29fdd0e1894d9c333755061c1d6e2dc34290d16d044ff6ee5df00144fa3', 
   's3_etag': 'cfdda042513ef5106a9cf273a4e57380-14'},
 'last_modified': '2024-01-18T17:44:05+00:00'}, 
status=d47ab7e8-69b7-4a59-8741-b50653ebd87b
amnonkhen commented 9 months ago

Step 5: suspicious error messages in the logs

Looking in the logs of ingest-core in Grafana I noticed errors related to the service account ingest-service-gcp-svc-acc@ingest-service-301110.iam.gserviceaccount.com.

java.lang.RuntimeException: com.auth0.jwk.SigningKeyNotFoundException: Cannot obtain jwks from url 
https://www.googleapis.com/service_accounts/v1/jwk/ingest-service-gcp-svc-acc@ingest-service-301110.iam.gserviceaccount.com

Hitting the url mentioned in the error message resulted in an HTTP 404 error message:

{
  "error": {
    "code": 404,
    "message": "Requested entity was not found.",
    "status": "NOT_FOUND"
  }
}

Investigating in the GCP console, I noticed the project ingest-service-301110 was shut down & marked for deletion. I was able to restore it, and that fixed the problematic url, which now is back to return the public keys.

The next steps are:

  1. re-sync the upload areas & verify whether this solved the file validation problem.
    • [x] DendriticCellActivationHCM
    • [x] EndStageHeartFailureSimonson
  2. Check with cloud consultants why I did not receive a notification about the deletion of the GPC project. For this end I created incident INC0015024 in Service Now.
amnonkhen commented 9 months ago

It was indeed the deleted GCP project ingest-service-301110 that caused this mess. The workaround for any affected projects is to re-sync the upload area so that validation is re-triggered. The 2 projects mentioned in this ticket are Metadata Valid currently.