Open forrwill opened 1 year ago
Please provide more details, exact error message, package versions.
On Fri, 24 Feb 2023 at 03:21, Forward @.***> wrote:
with stereoseq and single cell raw counts data, I got the error: NotImplementedError: The Cell2location model currently does not support minified data., But the data I use is the raw counts Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/. Problem
...
- I follow the instructions from the cell2location tutorial (using on scvi-tools) https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html .
- I have adjusted required hyperparameters to my dataset and tissue N_cells_per_location and detection_alpha.
- I have provided 10X reaction/inlet as batch_key for reference NB regression.
- I have checked scverse Discourse https://discourse.scverse.org/c/ecosytem/cell2location/ and old Cell2location Community Forum https://github.com/BayraktarLab/cell2location/discussions, and did not find a solution.
Description of the data input and hyperparameters
...
... Single cell reference data: number of cells, number of cell types, number of genes
... Single cell reference data: technology type (e.g. mix of 10X 3' and 5')
... Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)
...
— Reply to this email directly, view it on GitHub https://github.com/BayraktarLab/cell2location/issues/253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMFTV5FF3D5XHDHKVEU3J3WZASMBANCNFSM6AAAAAAVGMV5AM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Package Version
absl-py 1.4.0 aiohttp 3.8.4 aiosignal 1.3.1 anndata 0.8.0 async-timeout 4.0.2 attrs 22.2.0 brotlipy 0.7.0 cached-property 1.5.2 cell2location 0.1.3 certifi 2022.12.7 cffi 1.15.0 charset-normalizer 2.0.4 chex 0.1.6 colorama 0.4.4 conda 22.11.1 conda-content-trust 0+unknown conda-package-handling 1.8.1 contextlib2 21.6.0 contourpy 1.0.7 cryptography 36.0.0 cycler 0.11.0 dm-tree 0.1.8 docrep 0.3.2 et-xmlfile 1.1.0 etils 1.0.0 flax 0.6.4 fonttools 4.38.0 frozenlist 1.3.3 fsspec 2023.1.0 h5py 3.8.0 idna 3.3 igraph 0.10.4 importlib-resources 5.12.0 jax 0.4.4 jaxlib 0.4.4 joblib 1.2.0 kiwisolver 1.4.4 leidenalg 0.9.1 lightning-utilities 0.7.0 llvmlite 0.39.1 markdown-it-py 2.1.0 matplotlib 3.7.0 mdurl 0.1.2 ml-collections 0.1.1 msgpack 1.0.4 mudata 0.2.1 multidict 6.0.4 multipledispatch 0.6.0 natsort 8.2.0 networkx 3.0 numba 0.56.4 numpy 1.23.5 numpyro 0.11.0 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 opencv-python 4.7.0.68 openpyxl 3.1.1 opt-einsum 3.3.0 optax 0.1.4 orbax 0.1.2 packaging 23.0 pandas 1.5.3 patsy 0.5.3 Pillow 9.4.0 pip 21.2.4 pluggy 1.0.0 pycosat 0.6.3 pycparser 2.21 Pygments 2.14.0 pynndescent 0.5.8 pyOpenSSL 22.0.0 pyparsing 3.0.9 pyro-api 0.1.2 pyro-ppl 1.8.4 PySocks 1.7.1 python-dateutil 2.8.2 python-igraph 0.10.4 pytorch-lightning 1.9.2 pytz 2022.7.1 PyYAML 6.0 requests 2.27.1 rich 13.3.1 ruamel.yaml 0.17.21 ruamel.yaml.clib 0.2.7 ruamel-yaml-conda 0.15.100 scanpy 1.9.2 scikit-learn 1.2.1 scipy 1.10.1 scvi-tools 0.20.1 seaborn 0.12.2 session-info 1.0.0 setuptools 61.2.0 six 1.16.0 statsmodels 0.13.5 stdlib-list 0.8.0 tensorstore 0.1.32 texttable 1.6.7 threadpoolctl 3.1.0 toolz 0.12.0 torch 1.13.1 torchmetrics 0.11.1 tqdm 4.63.0 typing_extensions 4.5.0 umap-learn 0.5.3 urllib3 1.26.8 wheel 0.37.1 yarl 1.8.2 zipp 3.14.0
Can you provide more info on where you downloaded this data?
my adata was created by gem file. I transformed it to a adata file. and the adata.X is the raw matrix. I don't know what is wrong? and another question is, what is the meaning of minified data?
Traceback (most recent call last):
File "/cell2loc/cell2loc_mapping.py", line 130, in
Can you please provide a reproducible example of your code? and the full traceback?
my code is Refer to the tutorial to run the pipeline, the tutorial is in https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Cell2location:-spatial-mapping. The difference is tutorial data is 10x space, and My data is stereoseq. The code was error with
What is minified data?
@forrwill I can recommend trying to make sure that the adata.X
or adata.layers["whatever slot you are using"]
is scipy.sparse.csr_matrix
and data type is "float32"
:
adata.X = scipy.sparse.csr_matrix(adata.X, dtype="float32")
Thank you, I will check it.
@forrwill can you provide a full traceback of the error?
Do you mean the log file? or the input data I use. I set it in "adata.X = scipy.sparse.csr_matrix(adata.X, dtype="float32")", But the error still exists
extra_categorical_covs State Registry
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Source Location ┃ Categories ┃ scvi-tools Encoding ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ adata.obs['batch'] │ W │ 0 │
│ │ XC │ 1 │
│ │ │ │
└────────────────────┴────────────┴─────────────────────┘
/soft/Miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning_fabric/plugins/e
rank_zero_warn(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/soft/Miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/lightning_fabric/plugins/e
rank_zero_warn(
/soft/Miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/pytorch_lightning/trainer/
rank_zero_warn("You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.")
You are using a CUDA device ('NVIDIA A800 80GB PCIe') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
/soft/Miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/pytorch_lightning/trainer/
rank_zero_warn(
^MTraining: 0%| | 0/250 [00:00<?, ?it/s
^MEpoch 250/250: 100%|███████████████████████████████████████████████████████████| 250/250 [03:39<00:00, 1.14it/s, v_num=1, elbo_train=1.62e+8
^MSampling local variables, batch: 0%| | 0/75 [00:00<?, ?it/s
^MSampling global variables, sample: 0%| | 0/199 [00:00<?, ?it/s
Traceback (most recent call last):
File "./cell2loc_mapping.py", line 135, in <module>
mod = cell2location.models.Cell2location(
File "/soft/Miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/mode
super().__init__(adata)
File "/soft/Miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/scvi/model/base/_b
raise NotImplementedError(
NotImplementedError: The Cell2location model currently does not support minified data.
Which model are you attempting to train? The error message suggests cell2location.models.Cell2location
but the number of epochs is for the regression model.
We strongly don't recommend minibatch training (batch_size=number
) for cell2location.models.Cell2location
because it gives lower accuracy and requires extremely long training to achieve decent results. You have a fairly large GPU so please try using full data training (batch_size=None
).
If you really need and would like to try limiting batch_size
, you need to use our experimental amortised inference approach which uses a neural network to approximate cell abundance. This approach is generally less sensitive (especially low count data such as Stereoseq) - but on good quality data (such as human lymph node and mouse brain used in cell2location paper) it can give very similar results to our preferred approach. You can try aggregating Stereoseq proximal locations to get higher data quality. See here for the required settings https://github.com/BayraktarLab/cell2location/discussions/264#discussioncomment-5341068 and please post the exact code you will use here to make sure this approach is used correctly.
It could be possible that cell2location.models.Cell2location
doesn't support minified data - but I don't know what minified data is.
Just a tip: wrap your code into backticks to display it nicely: "```python"
"```"
with stereoseq and single cell raw counts data, I got the error: NotImplementedError: The Cell2location model currently does not support minified data., But the data I use is the raw counts
Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/.
Problem
...
N_cells_per_location
anddetection_alpha
.batch_key
for reference NB regression.Description of the data input and hyperparameters
...
...
Single cell reference data: number of cells, number of cell types, number of genes
...
Single cell reference data: technology type (e.g. mix of 10X 3' and 5')
...
Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)
...