ebi-ait / dcp-ingest-central

Central point of access for the Ingestion Service of the HCA DCP
Apache License 2.0
0 stars 0 forks source link

Managed Access Deployment #1029

Closed KociOrges closed 3 months ago

KociOrges commented 4 months ago

This ticket tracks the progress of updates for deploying Managed Access changes across several components.

Related ticket #967

See also ticket #1028 for manual submissions

Components to be updated:

Plan for each component:

1. integration-tests

2. ingest-core

3. ingest-graph-validator (potentially - see Additional Notes)

4. ingest-exporter

5. ingest-ui

6. state-tracking

7. staging-manager

Additional Notes: broker: It is just a test change, so not necessary. It can be pushed in the future. graph validator: The change is to use the authenticated ingest API calls. Not sure if it is actually necessary. Suggestion: rollback graph validator on dev and rerun a manual test on dev. validator: There is a managed access branch, but it has not been merged to dev because it is not needed; not necessary for the feature.

dipayan1985 commented 4 months ago
  1. monitoring dashboard setup in ingest-staging grafana after getting access
  2. Post ingest exporter deployment, first staging submission monitoring URL is 669fa5e2ad1b3e0b95227f5a/9a519a55-333c-4409-add1-f3c479fdd969
dipayan1985 commented 4 months ago
  1. Attempting to initiate graph validation for above submission.
dipayan1985 commented 4 months ago
  1. Arsenios triggered the graph validation and it went through, submission is GRAPH valid
dipayan1985 commented 4 months ago
  1. deploying ingest-ui to staging
dipayan1985 commented 4 months ago
  1. Post ingest-ui deployment, staging submission monitoring URL is 66a0d95dad1b3e0b95228683/e04067ae-40ce-4fd1-b55b-cbac5bec1a2b
dipayan1985 commented 4 months ago
  1. Submission is GRAPH_VALID and then we did the PUT to /submissionEvent and submission is in status SUBMITTED. I am expecting EXPORT(ED). Checking with Amnon on if this is expected behaviour.
dipayan1985 commented 4 months ago
  1. Trying a new submission to use archive action in /submissionEvent call -> 66a21ac2ad1b3e0b952289a7/458ac070-76a5-4402-a772-0fc5b674e826
dipayan1985 commented 4 months ago
  1. The above submission was done with ["Archive", "Export", "Cleanup"] and it got stuck in PROCESSING.
dipayan1985 commented 4 months ago
  1. results are the same in manual and integration tests (one at a time) - submissions are stuck in the SUBMITTED state. Notable log message is Submission event was not accepted
dipayan1985 commented 4 months ago

ingest_staging_logging.txt

dipayan1985 commented 4 months ago

We are hitting OptimisticLockingFailureException in mongo

dipayan1985 commented 4 months ago

Restarted ingest-core and initiated the submission -> 66a36163e8c73060882dcc20/1ce6cd2c-efc7-4963-9ab7-e157ee5039f9

dipayan1985 commented 4 months ago

ingest_core_logs.txt We are hitting: `2024-07-26 09:44:58
{"log":"2024-07-26 08:44:58.309 WARN 7 --- [nio-8080-exec-2] o.h.i.c.web.GlobalStateExceptionHandler : Handling ResourceNotFoundException and returning NOT_FOUND response\n","stream":"stdout","time":"2024-07-26T08:44:58.309583696Z"}

2024-07-26 09:44:58 {"log":"2024-07-26 08:44:58.308 WARN 7 --- [nio-8080-exec-2] o.h.i.c.web.GlobalStateExceptionHandler : Caught a resource not found exception argument at 'http://172.20.186.196/files/search/findByValidationJobValidationId'; this will generate a NOT_FOUND RESPONSE. Error message: Resource not found!\n","stream":"stdout","time":"2024-07-26T08:44:58.309246611Z"}

2024-07-26 09:44:52 {"log":"2024-07-26 08:44:52.231 WARN 7 --- [io-8080-exec-35] o.h.i.c.web.GlobalStateExceptionHandler : Handling ResourceNotFoundException and returning NOT_FOUND response\n","stream":"stdout","time":"2024-07-26T08:44:52.231909724Z"}

2024-07-26 09:44:52 {"log":"2024-07-26 08:44:52.231 WARN 7 --- [io-8080-exec-35] o.h.i.c.web.GlobalStateExceptionHandler : Caught a resource not found exception argument at 'http://172.20.186.196/files/search/findByValidationJobValidationId'; this will generate a NOT_FOUND RESPONSE. Error message: Resource not found!\n","stream":"stdout","time":"2024-07-26T08:44:52.231780723Z"}

2024-07-26 09:44:52 {"log":"2024-07-26 08:44:52.198 WARN 7 --- [nio-8080-exec-4] o.h.i.c.web.GlobalStateExceptionHandler : Handling ResourceNotFoundException and returning NOT_FOUND response\n","stream":"stdout","time":"2024-07-26T08:44:52.19895465Z"}

2024-07-26 09:44:52 {"log":"2024-07-26 08:44:52.198 WARN 7 --- [nio-8080-exec-4] o.h.i.c.web.GlobalStateExceptionHandler : Caught a resource not found exception argument at 'http://172.20.186.196/files/search/findByValidationJobValidationId'; this will generate a NOT_FOUND RESPONSE. Error message: Resource not found!\n","stream":"stdout","time":"2024-07-26T08:44:52.198481634Z"}

2024-07-26 09:42:27 {"log":"2024-07-26 08:42:27.210 WARN 7 --- [io-8080-exec-15] o.h.i.c.web.GlobalStateExceptionHandler : Attempt a failed save, likely due to multiple requests, at 'http://172.20.186.196/biomaterials/66a36168e8c73060882dcc24'; this will generate a CONFLICT RESPONSE\n","stream":"stdout","time":"2024-07-26T08:42:27.210816008Z"}

2024-07-26 09:42:27 {"log":"2024-07-26 08:42:27.114 WARN 7 --- [io-8080-exec-21] o.h.i.c.web.GlobalStateExceptionHandler : Attempt a failed save, likely due to multiple requests, at 'http://172.20.186.196/biomaterials/66a36168e8c73060882dcc24'; this will generate a CONFLICT RESPONSE\n","stream":"stdout","time":"2024-07-26T08:42:27.114793581Z"}

2024-07-26 09:42:26 {"log":"2024-07-26 08:42:26.715 INFO 7 --- [ntContainer#0-1] o.h.ingest.file.FileService : File validation state is DRAFT for file with cloudUrl s3://org-hca-data-archive-upload-staging/1ce6cd2c-efc7-4963-9ab7-e157ee5039f9/SRR3562314_1.fastq.gz and submission UUID 1ce6cd2c-efc7-4963-9ab7-e157ee5039f9 \n","stream":"stdout","time":"2024-07-26T08:42:26.715992344Z"}

2024-07-26 09:42:26 {"log":"2024-07-26 08:42:26.692 WARN 7 --- [nio-8080-exec-4] o.h.i.c.web.GlobalStateExceptionHandler : Attempt a failed save, likely due to multiple requests, at 'http://172.20.186.196/biomaterials/66a36168e8c73060882dcc24/validEvent'; this will generate a CONFLICT RESPONSE\n","stream":"stdout","time":"2024-07-26T08:42:26.692820328Z"}

2024-07-26 09:42:26 {"log":"2024-07-26 08:42:26.670 INFO 7 --- [ntContainer#0-1] o.h.ingest.file.FileService : Updating file with cloudUrl s3://org-hca-data-archive-upload-staging/1ce6cd2c-efc7-4963-9ab7-e157ee5039f9/SRR3562314_1.fastq.gz and submission UUID 1ce6cd2c-efc7-4963-9ab7-e157ee5039f9\n","stream":"stdout","time":"2024-07-26T08:42:26.671029382Z"}

2024-07-26 09:42:26 {"log":"2024-07-26 08:42:26.624 WARN 7 --- [io-8080-exec-44] o.h.i.c.web.GlobalStateExceptionHandler : Attempt a failed save, likely due to multiple requests, at 'http://172.20.186.196/biomaterials/66a36168e8c73060882dcc23/validEvent'; this will generate a CONFLICT RESPONSE\n","stream":"stdout","time":"2024-07-26T08:42:26.624878504Z"}`

dipayan1985 commented 4 months ago

We are still investigating the issue with the submission state change not being accepted from SUBMITTED to EXPORT. This is an issue only with staging. /cc @tburdett /cc @amnonkhen @amnonkhen, thanks for providing me the help to go into pods and check logs.

dipayan1985 commented 4 months ago

Ruled out both the above errors, CONFLICT and NOT_FOUND, they are non-issues

dipayan1985 commented 4 months ago

Added more logging in state tracker and did a new PR, revied by Amnon and merged to dev.

dipayan1985 commented 3 months ago

logs_comparison_dev_vs_staging.txt Lots of differences in guard while the same submission goes through in dev and staging.

dipayan1985 commented 3 months ago

Revised deployment list:

dipayan1985 commented 3 months ago

@amnonkhen, @tburdett, @gabsie - ingest-core and ingest-ui are the impacted projects for this change and they are deployed now to production.

Testing is successful, from submission -> validation -> graph validation -> submission event -> export -> complete

Test submissions:

  1. First

  2. Second

Test jobs:

  1. https://gitlab.ebi.ac.uk/hca/ingest-integration-tests/-/jobs/1757211
  2. https://gitlab.ebi.ac.uk/hca/ingest-integration-tests/-/jobs/1757166