ebi-ait / dcp-ingest-central

Central point of access for the Ingestion Service of the HCA DCP
Apache License 2.0
0 stars 0 forks source link

When 2 hca integration tests running in parallel - one fails and one passes #1028

Closed amnonkhen closed 3 months ago

amnonkhen commented 4 months ago

Related to the managed access tickets: ebi-ait/dcp-ingest-central#967 & ebi-ait/dcp-ingest-central#1012

During testing the managed access changes I noticed a problem when running several integration tests in parallel where one or more might fail, but when later run sequentially the would all pass. In order to make sure that this is not due to a problem with the integration tests I ran a similar scenario using only API calls in postman and command line calls of hca-util.

Scenario:

  1. run twice the /api_upload end point to upload a spreadsheet.
    curl --location 'https://ingest.dev.archive.data.humancellatlas.org/api_upload' \
    --header 'Authorization: ••••••' \
    --form 'params="{\"updateProject\":false,\"isUpdate\":false}"' \
    --form 'file=@"/Users/amnon/dev/ingest-integration-tests/tests/fixtures/datasets/SS2/SS2.xlsx"'
  2. 1st submission becomes METADATA_INVALID as it should.
  3. 2nd submission remains PENDING, which is a bug.
  4. When the data files are uploaded using the hca-util tool, the 1st submission becomes METADATA_VALID, and the failed one remains failed

Example submissions:

See diff of logs: Logs from 2 Submissions in Parallel - 20240715 - one failure.pdf

The logs show that the failing submission does not start validation of its metadata documents. Furthermore, the failing submission does not have its project short name updated. See screenshot:

image.png
amnonkhen commented 4 months ago

In order to prevent two simultaneous spreadsheet upload during the integration tests, I am going to add a random pause of between 5 and 15 seconds before starting each gitlab job:

sleep $(( RANDOM % 11 + 5 ))