broadinstitute / single_cell_portal_core

Rails/Docker application for the Broad Institute's single cell RNA-seq data portal
https://singlecell.broadinstitute.org
BSD 3-Clause "New" or "Revised" License
62 stars 26 forks source link

Adding machine type & failure properties to ingestSummary event (SCP-5720) #2120

Closed bistline closed 1 month ago

bistline commented 1 month ago

BACKGROUND & CHANGES

In an effort to make the ingestSummary Mixpanel event for AnnData ingest more useful, we are adding three new properties:

  1. machineType: GCE instance type, e.g. n2d-highmem-8
  2. action: ingest process that failed (if present), e.g. ingest_cell_metadata, ingest_anndata, etc.
  3. exitCode: exit code from process that terminated the instance - 0 for success, non-zero for failures

Along with existing properties, these new fields will afford us better insight into how and why AnnData ingests either succeed or fail. For instance, the error and exitCode fields will let us determine which type of extracted data fails most frequently, and if those failures are memory-related (137 or 139), or in the case of metadata files if they are JSON schema errors (65). The machineType field will be useful in conjunction with memory-related exit codes as it will allow us to plot file size against machine types to better ascertain where we should be scaling up to the next available machine type.

There could be future work done in scp-ingest-pipeline to differentiate other failure states with unique exit codes, if desired. Since many of our handled exceptions exit with 1, we could chose to pick different codes for more common errors to allow us better visibility into how ofter these happen, which could then drive future development into CSFV to prevent these upload from proceeding to ingest.

MANUAL TESTING

  1. Boot all services and sign in
  2. Go to the Feature Flags control panel and deactivate clientside_validation for your user account (otherwise you will not be able to upload the invalid AnnData file later)
  3. Create a new study and select the AnnData upload UX
  4. Upload the AnnData file test/test_data/anndata_test_bad_header_no_species.h5ad and specify that you have clustering in the X_umap slot (this doesn't exist and will throw an error to be caught later)
  5. After a few minutes, you should see the ingestSummary event in the logs with the following properties:
    jobStatus: "failed", 
    numFilesExtracted: 0, 
    machineType: "n2d-highmem-4",
    action: "ingest_anndata",
    exitCode: 1
codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 69.57%. Comparing base (166e7e6) to head (ca74dc7). Report is 4 commits behind head on development.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2120/graphs/tree.svg?width=650&height=150&src=pr&token=HMWE5BO2a4&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) ```diff @@ Coverage Diff @@ ## development #2120 +/- ## =============================================== + Coverage 69.55% 69.57% +0.02% =============================================== Files 327 327 Lines 27545 27556 +11 Branches 2289 2289 =============================================== + Hits 19159 19173 +14 + Misses 8252 8249 -3 Partials 134 134 ``` | [Files with missing lines](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2120?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) | Coverage Δ | | |---|---|---| | [app/models/ingest\_job.rb](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2120?src=pr&el=tree&filepath=app%2Fmodels%2Fingest_job.rb&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL21vZGVscy9pbmdlc3Rfam9iLnJi) | `41.57% <100.00%> (+1.14%)` | :arrow_up: | ... and [2 files with indirect coverage changes](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2120/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)