cellarium-ai / cellarium-ml

Distributed single-cell data analysis.
BSD 3-Clause "New" or "Revised" License
11 stars 2 forks source link

Cache throws KeyError for missing entry #181

Open bricewang opened 4 months ago

bricewang commented 4 months ago

To reproduce:

  1. Checkout branch cache-bug, which includes a config for running the onepass_mean_var_std model.
  2. Run cellarium-ml onepass_mean_var_std fit --config=config/onepass.yaml from the base directory.

Throws KeyError when attempting to index the 2nd data shard from cache. Also triggers a downstream error with obs attribute dtypes not matching the reference, which does not appear to be an actual problem in the data.

ordabayevy commented 4 months ago

Hi Brice, it seems like you forgot to add config/onepass.yaml file.

bricewang commented 4 months ago

My mistake, my config file is now pushed.

ordabayevy commented 4 months ago

Hey Brice, it works fine on my VM. I only got errors related to DDPStrategy (probably because of the older version of PyTorch Lightning that you used to generate the config file?) arguments which I changed to (removed defaults):

  strategy:
    class_path: lightning.pytorch.strategies.DDPStrategy
    dict_kwargs:
      broadcast_buffers: false

and had to set obs_columns_to_validate: [] so it doesn't give validation errors which are unrelated to your bug.

Can you upgrade packages in your environment (or reinstall cellarium-ml in a new environment) and check again?