ebi-ait / dcp-ingest-central

Central point of access for the Ingestion Service of the HCA DCP
Apache License 2.0
0 stars 0 forks source link

Files stuck in validation more frequently #1049

Open idazucchi opened 2 weeks ago

idazucchi commented 2 weeks ago

Describe the bug Once the files are sync'ed to ingest they begin validation, but a number of them never concludes the opration and remains validating indefinetly. In the last week me and @arschat have noticed it is happening more frequently and that files will not get to valid even if the validation is re-triggered following this script

To Reproduce Steps to reproduce the behaviour:

  1. Go to a project page
  2. Upload submission
  3. Sync files
  4. Validation starts
  5. After 1 day files are still validating

Expected behaviour I expect files to pass or fail validation, if not on the first try at least after re-triggering the validation

Environment

Browser

Example submission these two examples are the same project, same files - in the second attempt at validation more files got stuck 4 files stuck 19 files stuck

arschat commented 2 weeks ago

A second example submission is in this ticket ebi-ait/hca-ebi-wrangler-central#1306

KociOrges commented 4 days ago

Background: Spot Instances provide a cost-effective solution for validation requests (up to 90% savings compared to On-Demand Instances), however they rely on spare AWS capacity to fulfil incoming requests.

Spot Fleet requests for validation jobs are failing due to capacity constraints and configuration issues, leading to job interruptions. Investigations identified insufficient capacity and unsupported instance types in specific availability zones, alongside some bid prices being below the Spot market price.

Proposed Actions: