Closed eweitz closed 2 months ago
Attention: Patch coverage is 90.76923%
with 6 lines
in your changes missing coverage. Please review.
Project coverage is 69.83%. Comparing base (
809a8ae
) to head (c256082
). Report is 29 commits behind head on development.
Thanks for the catch. Adaptations I made for the unit test broke the UX -- a downside of discrepant web and Node APIs.
I've fixed the UX and ensured the unit test still works, and added another unit test to help ensure soundness, and refined the manual test steps. The 2 unit tests pass, and the 3 cases outlined in the manual test steps also work for me now.
Thanks for the reviews!
When I run
yarn ui-test
All the CI test runs pass, and my local tests via yarn test
pass (modulo two pre-existing local-only unrelated failures). So I think your hunch is right this is an issue on your end. Ping me if you'd like to pair on it!
When CSFV errors are encountered, the CSFV error reports the most recently chosen file but the form itself retains the previous filename
Useful find! It turns out this is reproducible already on staging, and affects that UX flow for all file types. So this is a pre-existing issue. I agree it's confusing when it occurs. Its incidence is likely very low, as selecting a valid file then an invalid file via "Replace" seems rare. Fixing this while we're refining the upload UI might be opportune, though. I've opened SCP-5771 so we can assess priority next sprint.
This helps users avoid waiting 15-45 minutes to upload a big H5AD file, only to learn of a user error at the end.
Impact
Previously, authors would learn of any validation errors in their AnnData file only after uploading it and then checking their email. That's quite slow because .h5ad files in single cell science tend to be large: 2 GB, 6 GB, sometimes 60 GB. Needing to upload such big files to detect validation problems is frustrating.
Now, some problems are found before upload. Basic AnnData validation is done right in the web upload UI in less than a second. It validates AnnData metadata headers with a few existing rules, and reports any problems instantly and in context. The user journey is smoother, and thus faster.
We made AnnData upload generally available in July 2023, and have noticed a distinct uptick in its usage in the last few months. So improving this UX seems worthwhile.
Demo
Here's how it looks!
https://github.com/user-attachments/assets/7f21c8d4-33f5-4edf-94f6-c43193feb3bb
Technical notes
This uses hdf5-indexed-reader*, an efficient JavaScript library for HDF5 files. JS in the web browser parses the start of the AnnData file, reads
obs
keys, and applies some basic pre-existing client-side file validation (CSFV) rules. It reuses existing rules via minor rearrangement of parsed AnnData. So we can use the same validation rule code for both classic and AnnData files.Parsing is streaming and non-blocking. That's important because AnnData files can be larger than client memory. We recognized this as an underlying blocker for AnnData back in November 2022, and started a community discussion about how to solve it. Since then, building on @bmaranville's foundational work, @jrobinso made
hdf5-indexed-reader
and we now have a library that neatly fits this need.This changeset also includes a standalone HTML page and spinning DNA graphic. That enables nimbly experimenting with HDF5 and JS using something like Simple Web Server.
* Because hdf5-indexed-reader isn't on NPM, it can't be used as a conventional package. So I went ahead and published an org-scoped version of it, @single-cell-portal/hdf5-indexed-reader. The package.json source trivially adds a scope to the original name. This ensures the canonical version can use a simpler top-level name (presumably
hdf5-indexed-reader
) if / when it's published to NPM.Next steps
Test
A new automated test confirms some behavior. To manually test:
yarn install
single_cell_portal_core/test/test_data/
anndata_test_bad_header_no_species.h5ad
anndata_test.h5ad
anndata_test_bad_header_no_species.h5ad
andanndata_test.h5ad
to your study's GCP bucketFurther details
For more context, see 2024-08-13 SCP demo video. This satisfies SCP-5718.