In an effort to make the ingestSummary Mixpanel event for AnnData ingest more useful, we are adding three new properties:
machineType: GCE instance type, e.g. n2d-highmem-8
action: ingest process that failed (if present), e.g. ingest_cell_metadata, ingest_anndata, etc.
exitCode: exit code from process that terminated the instance - 0 for success, non-zero for failures
Along with existing properties, these new fields will afford us better insight into how and why AnnData ingests either succeed or fail. For instance, the error and exitCode fields will let us determine which type of extracted data fails most frequently, and if those failures are memory-related (137 or 139), or in the case of metadata files if they are JSON schema errors (65). The machineType field will be useful in conjunction with memory-related exit codes as it will allow us to plot file size against machine types to better ascertain where we should be scaling up to the next available machine type.
There could be future work done in scp-ingest-pipeline to differentiate other failure states with unique exit codes, if desired. Since many of our handled exceptions exit with 1, we could chose to pick different codes for more common errors to allow us better visibility into how ofter these happen, which could then drive future development into CSFV to prevent these upload from proceeding to ingest.
MANUAL TESTING
Boot all services and sign in
Go to the Feature Flags control panel and deactivate clientside_validation for your user account (otherwise you will not be able to upload the invalid AnnData file later)
Create a new study and select the AnnData upload UX
Upload the AnnData file test/test_data/anndata_test_bad_header_no_species.h5ad and specify that you have clustering in the X_umap slot (this doesn't exist and will throw an error to be caught later)
After a few minutes, you should see the ingestSummary event in the logs with the following properties:
BACKGROUND & CHANGES
In an effort to make the
ingestSummary
Mixpanel event for AnnData ingest more useful, we are adding three new properties:machineType
: GCE instance type, e.g.n2d-highmem-8
action
: ingest process that failed (if present), e.g.ingest_cell_metadata
,ingest_anndata
, etc.exitCode
: exit code from process that terminated the instance -0
for success, non-zero for failuresAlong with existing properties, these new fields will afford us better insight into how and why AnnData ingests either succeed or fail. For instance, the
error
andexitCode
fields will let us determine which type of extracted data fails most frequently, and if those failures are memory-related (137
or139
), or in the case of metadata files if they are JSON schema errors (65
). ThemachineType
field will be useful in conjunction with memory-related exit codes as it will allow us to plot file size against machine types to better ascertain where we should be scaling up to the next available machine type.There could be future work done in
scp-ingest-pipeline
to differentiate other failure states with unique exit codes, if desired. Since many of our handled exceptions exit with1
, we could chose to pick different codes for more common errors to allow us better visibility into how ofter these happen, which could then drive future development into CSFV to prevent these upload from proceeding to ingest.MANUAL TESTING
clientside_validation
for your user account (otherwise you will not be able to upload the invalid AnnData file later)test/test_data/anndata_test_bad_header_no_species.h5ad
and specify that you have clustering in theX_umap
slot (this doesn't exist and will throw an error to be caught later)ingestSummary
event in the logs with the following properties: