broadinstitute / single_cell_portal_core

Rails/Docker application for the Broad Institute's single cell RNA-seq data portal
https://singlecell.broadinstitute.org
BSD 3-Clause "New" or "Revised" License
62 stars 26 forks source link

Fast validation for remote AnnData headers (SCP-5770) #2118

Closed eweitz closed 1 month ago

eweitz commented 1 month ago

This extends up-front checks to cover adding an AnnData file from a bucket. It makes authoring smoother, and thus faster.

Impact

Previously, new AnnData validation caught errors upon selecting a local file. Validating before upload is a great idea. It eliminates long wait times, potentially reducing a disjointed 15-45 minute process into a cohesive 10 second process. But local AnnData validation doesn't help when the file is remote, i.e. in a Terra workspace GCP bucket.

Now, client-side file validation (CSFV) for AnnData headers also applies to remote files. When adding AnnData via "Use bucket path", header validation finishes in < 10 seconds rather than 5-10 minutes like before.

Demo

Here's how it looks!

https://github.com/user-attachments/assets/360b2bfe-f744-4476-b439-d37fdabb8bf9

Technical notes

This required fixing a bug in the external HDF5 library that prevented using it with files needing an OAuth token. Details on that are in https://github.com/jrobinso/hdf5-indexed-reader/pull/2. That HDF5 library is now also available as a canonical NPM package, which replaces our use of an org-scope package.

Next steps

This satisfies SCP-5770.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 56.25000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 69.54%. Comparing base (20fdea2) to head (4c47c6b). Report is 13 commits behind head on development.

Files with missing lines Patch % Lines
app/javascript/lib/validation/validate-anndata.js 62.50% 3 Missing :warning:
...javascript/lib/validation/validate-file-content.js 50.00% 2 Missing :warning:
app/javascript/lib/validation/validate-file.js 0.00% 2 Missing :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2118/graphs/tree.svg?width=650&height=150&src=pr&token=HMWE5BO2a4&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2118?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) ```diff @@ Coverage Diff @@ ## development #2118 +/- ## =============================================== - Coverage 69.54% 69.54% -0.01% =============================================== Files 327 327 Lines 27536 27544 +8 Branches 2287 2289 +2 =============================================== + Hits 19151 19156 +5 - Misses 8251 8254 +3 Partials 134 134 ``` | [Files with missing lines](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2118?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) | Coverage Δ | | |---|---|---| | [app/javascript/lib/scp-api.jsx](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2118?src=pr&el=tree&filepath=app%2Fjavascript%2Flib%2Fscp-api.jsx&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL2phdmFzY3JpcHQvbGliL3NjcC1hcGkuanN4) | `42.15% <100.00%> (+0.28%)` | :arrow_up: | | [...javascript/lib/validation/validate-file-content.js](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2118?src=pr&el=tree&filepath=app%2Fjavascript%2Flib%2Fvalidation%2Fvalidate-file-content.js&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL2phdmFzY3JpcHQvbGliL3ZhbGlkYXRpb24vdmFsaWRhdGUtZmlsZS1jb250ZW50Lmpz) | `87.64% <50.00%> (-0.43%)` | :arrow_down: | | [app/javascript/lib/validation/validate-file.js](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2118?src=pr&el=tree&filepath=app%2Fjavascript%2Flib%2Fvalidation%2Fvalidate-file.js&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL2phdmFzY3JpcHQvbGliL3ZhbGlkYXRpb24vdmFsaWRhdGUtZmlsZS5qcw==) | `66.26% <0.00%> (-0.81%)` | :arrow_down: | | [app/javascript/lib/validation/validate-anndata.js](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2118?src=pr&el=tree&filepath=app%2Fjavascript%2Flib%2Fvalidation%2Fvalidate-anndata.js&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL2phdmFzY3JpcHQvbGliL3ZhbGlkYXRpb24vdmFsaWRhdGUtYW5uZGF0YS5qcw==) | `87.50% <62.50%> (-5.61%)` | :arrow_down: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2118/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)
eweitz commented 1 month ago

Thanks for the enhanced testing, Jean! I see expected validation failure with unconventional.h5ad per screenshot, so my hunch is your finding is due to a discrepancy in local setup. The other file is a known separate case. So this seems good to merge. Let's pair when you've got a few minutes!

Screenshot 2024-09-03 at 12 58 21 PM