Closed mlin closed 1 month ago
Attention: Patch coverage is 96.82540%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 91.41%. Comparing base (
eb8f449
) to head (5c0668e
). Report is 2 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
...cellxgene_census/experimental/_embedding_search.py | 96.72% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@ebezzi Putting this up for initial review since it's working well, but we still need to plan action on #1181 -- this still copies the approach of hard-coding the base S3 URI.
@ebezzi @pablo-gar @ivirshup Updated this to resolve indexes through mirrors/contributions json and remove the need for caller to use get_embedding_metadata_by_name()
on their own. Please take another pass including the prior discussion. Unfortunately we have known CI issues currently but I've run the new test cases locally. 🙏
@ivirshup I split out the perf optimization to #1257 since I was still getting an error, will write more there -- hope you don't mind, it's only because I need to triage desperately right now!
Adds two new functions to
cellxgene_census.experimental
:find_nearest_obs
uses TileDB-Vector-Search indexes of Census embeddings to find nearest neighbors of given embedding vectors (in an AnnData obsm layer). #1114predict_obs_metadata
uses the nearest neighbors to predict metadata attributes like cell_type and tissue_general for the query cells. Naive initial implementation is just a starting point to start experimenting with. #1115The TileDB-Vector-Search query speed seems to be very S3-latency-sensitive, even moreso than typical Census queries. It's many times faster to run from within AWS us-west-2 than externally.