chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
72 stars 18 forks source link

ExperimentDataPipe should record the iterated joinids #1182

Open ebezzi opened 1 month ago

ebezzi commented 1 month ago

When ExperimentDataPipe iterates through cells, the obs_joinids are not recorded anywhere. This isn't important for training but it's necessary if the same datapipe is used for a forward pass (e.g. when generating embeddings). The current _obs_joinids field can be used but:

  1. It requires shuffling to be off.
  2. Doesn't work with multiple workers, since they don't process in order.