Closed bistline closed 2 weeks ago
Attention: Patch coverage is 92.30769%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 69.74%. Comparing base (
0eaa7e3
) to head (631f415
). Report is 15 commits behind head on development.
Files with missing lines | Patch % | Lines |
---|---|---|
app/models/ingest_job.rb | 92.00% | 2 Missing :warning: |
Functional review: Confirmed one IngestSummary event (note:
hidden event
in MixpanelSending AnnData summary for compliant_pbmc3K.h5ad (66fd3d54eb0c8dcba8f32c7a) after ingest_expression
I extracted processed, metadata, X_tsne, X_umap and raw. The Mixpanel event indicates 4 files extracted - does
raw
not count?I realize that
raw
doesn't generate a fragment file but it might count as an "extraction"? Would we want to reflect "raw" in the numFilesExtracted or leave numFilesExtracted as-is and have an array of detailed extraction types. Possible the detailed info isn't useful, don't have enough caffeine in me to have a opinion.
BACKGROUND & CHANGES
This update addresses some issues surrounding reporting the
ingestSummary
event for AnnData files. These events are a rollup of the 3 main ingest processes for AnnData files - clustering, metadata, and processed expression. Recent updates to how related jobs were identified proved to be prone to race conditions that meant jobs checking simultaneously would identify each other as "still processing" and both exit, leading to no summary being sent. Now, the query specifically looks for jobs of the 3 main types that are not "done" according to their status in the LifeSciences API. Furthermore, this check can happen after any job that is submitted for this file, be it main ingest, differential expression, subsampling, etc. Once it is determined that there are no more jobs running that are extracting data, a summary is sent, and then that file is marked as having submitted aningestSummary
. No further summaries will be sent for that file.Additional clusterings or raw counts data can theoretically still be extracted, but neither will result in submitting an
ingestSummary
- only the standardingest
event will be reported. Since the point of theingestSummary
is to gauge the amount of time it takes a user to go from initial upload to an "initialized" study and whether or not this was successful, this is acceptable.This also addresses an unrelated bug with deleting AnnData files where DE output files could not be clean up due to missing annotation data (was being used only for logging).
MANUAL TESTING
This is complicated to test, especially since in normal development the race condition can never happen as it requires multiple concurrent job workers processing jobs for the same AnnData file. However, you can start DelayedJob locally with multiple workers using the following command (once your environment is initialized):
Note that once you are done testing, you will need to manually stop the workers as they are launched headlessly. This can be done with
bin/delayed_job stop development
.Once your workers are running, follow these steps:
development.log
file to look for messages regarding the AnnData summary (your filename/id will be different but the wording will be exact). It will take several minutes to appear, but while you may see multiple "Checking" messages you should only see one message that says "Sending AnnData summary" and no further messages after that:ingestSummary
<your accession>
<your file>
numFilesExtracted
: 1 each for expression/metadata, 1 if you extracted raw counts, and 1 per clustering.