cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
52 stars 7 forks source link

Input data must be raw transcript counts, represented as integers. Provided data contains non-integer values. #24

Closed samhkim closed 10 months ago

samhkim commented 1 year ago

I was working through the tutorial data (the synthetic cells and the 10x multiome brain datasets) that were available with the library but ran into this issue on both of the datasets.

It occurs with .get_learning_rate_bounds. Below is the error output for the 10x multiome brain dataset but the error is identical with the synthetic cells as well.

AssertionError                            Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 example_rna_model.get_learning_rate_bounds(rna_data)
      2 example_rna_model.trim_learning_rate_bounds(2.5, 1) # The trim function moves the lower and upper bounds in by a factor of 10 from the spline-estimated learning rate values
      3 example_rna_model.plot_learning_rate_bounds(figsize=(5,3))

File /oak/stanford/groups/wjg/skim/software/miniconda3/envs/mira-env/lib/python3.8/site-packages/mira/adata_interface/core.py:177, in wraps_modelfunc.<locals>.run.<locals>._run(self, adata, *args, **kwargs)
    172     if not any(
    173         [kwarg in subfunction_kwargs.keys() for subfunction_kwargs in [getter_kwargs, adder_kwargs, function_kwargs]]
    174     ):
    175         raise TypeError('{} is not a valid keyword arg for this function.'.format(kwarg))
--> 177 output = func(self, **fetch(self, adata, **getter_kwargs), **function_kwargs)
    179 return add(adata, output, **adder_kwargs)

File /oak/stanford/groups/wjg/skim/software/miniconda3/envs/mira-env/lib/python3.8/site-packages/mira/topic_model/base.py:650, in BaseModel.get_learning_rate_bounds(self, num_epochs, eval_every, lower_bound_lr, upper_bound_lr, features, highly_variable, dataset)
    647 for epoch in range(num_epochs + 1):
    649     self.train()
--> 650     for batch in self.transform_batch(data_loader, bar = False):
    652         step_loss += self._step(batch, 1.)['loss']
    653         batches_complete+=1

File /oak/stanford/groups/wjg/skim/software/miniconda3/envs/mira-env/lib/python3.8/site-packages/mira/topic_model/base.py:488, in BaseModel.transform_batch(self, data_loader, bar, desc)
    486 def transform_batch(self, data_loader, bar = True, desc = ''):
--> 488     for batch in tqdm(data_loader, desc = desc) if bar else data_loader:
    489         yield {k : torch.tensor(v, requires_grad = False).to(self.device)
    490             for k, v in batch.items()}

File /oak/stanford/groups/wjg/skim/software/miniconda3/envs/mira-env/lib/python3.8/site-packages/torch/utils/data/dataloader.py:628, in _BaseDataLoaderIter.__next__(self)
    625 if self._sampler_iter is None:
    626     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    627     self._reset()  # type: ignore[call-arg]
--> 628 data = self._next_data()
    629 self._num_yielded += 1
    630 if self._dataset_kind == _DatasetKind.Iterable and \
    631         self._IterableDataset_len_called is not None and \
    632         self._num_yielded > self._IterableDataset_len_called:

File /oak/stanford/groups/wjg/skim/software/miniconda3/envs/mira-env/lib/python3.8/site-packages/torch/utils/data/dataloader.py:671, in _SingleProcessDataLoaderIter._next_data(self)
    669 def _next_data(self):
    670     index = self._next_index()  # may raise StopIteration
--> 671     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    672     if self._pin_memory:
    673         data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)

File /oak/stanford/groups/wjg/skim/software/miniconda3/envs/mira-env/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py:61, in _MapDatasetFetcher.fetch(self, possibly_batched_index)
     59 else:
     60     data = self.dataset[possibly_batched_index]
---> 61 return self.collate_fn(data)

File /oak/stanford/groups/wjg/skim/software/miniconda3/envs/mira-env/lib/python3.8/site-packages/mira/adata_interface/topic_model.py:18, in collate_batch(batch, preprocess_endog, preprocess_exog, preprocess_read_depth)
     14 endog, exog = list(zip(*batch))
     15 endog, exog = sparse.vstack(endog), sparse.vstack(exog)
     17 return {
---> 18     'endog_features' : preprocess_endog(endog),
     19     'exog_features' : preprocess_exog(exog),
     20     'read_depth' : preprocess_read_depth(exog)
     21 }

File /oak/stanford/groups/wjg/skim/software/miniconda3/envs/mira-env/lib/python3.8/site-packages/mira/topic_model/expression_model.py:172, in ExpressionTopicModel.get_endog_fn.<locals>.preprocess_endog(X)
    169 assert(len(X.shape) == 2)
    170 assert(X.shape[1] == self.num_endog_features)
--> 172 assert(np.isclose(X.astype(np.int64), X, 1e-2).all()), 'Input data must be raw transcript counts, represented as integers. Provided data contains non-integer values.'
    174 X = self._residual_transform(X, self.residual_pi).astype(np.float32)
    176 return X

AssertionError: Input data must be raw transcript counts, represented as integers. Provided data contains non-integer values.
qinqian commented 1 year ago

This might due to the difference of scanpy when handling data.raw. The solution is to change data.raw = data to data.raw = data.copy()

samhkim commented 1 year ago

The same error seems to persist even after setting data.raw = data.copy() unfortunately.

AllenWLynch commented 1 year ago

Hi Samhkin,

What version of Scanpy are you using? I noticed a new behavior recently as well.

AL


From: samhkim @.> Sent: Tuesday, June 13, 2023 2:41 PM To: cistrome/MIRA @.> Cc: Subscribed @.***> Subject: Re: [cistrome/MIRA] Input data must be raw transcript counts, represented as integers. Provided data contains non-integer values. (Issue #24)

The same error seems to persist even after setting data.raw = data.copy() unfortunately.

— Reply to this email directly, view it on GitHubhttps://github.com/cistrome/MIRA/issues/24#issuecomment-1589915123, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE43JPCLXLHDRPEZUC4LZTDXLC67XANCNFSM6AAAAAAX35XC2Y. You are receiving this because you are subscribed to this thread.Message ID: @.***>

samhkim commented 1 year ago

Hello,

Below is all of the package versions in the conda environment for mira I have set up currently.

I have scanpy 1.9.3 in this environment.

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge absl-py 1.4.0 pyhd8ed1ab_0 conda-forge alembic 1.11.1 pyhd8ed1ab_0 conda-forge anndata 0.9.1 pyhd8ed1ab_0 conda-forge arpack 3.7.0 hc6cf775_2 conda-forge asttokens 2.2.1 pypi_0 pypi attrs 23.1.0 pyh71513ae_1 conda-forge autopage 0.5.1 pyhd8ed1ab_0 conda-forge backcall 0.2.0 pypi_0 pypi backports 1.0 pyhd8ed1ab_3 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge biopython 1.81 py38h1de0b5d_0 conda-forge brotli 1.0.9 h166bdaf_8 conda-forge brotli-bin 1.0.9 h166bdaf_8 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.19.1 hd590300_0 conda-forge ca-certificates 2023.5.7 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge certifi 2023.5.7 pyhd8ed1ab_0 conda-forge cffi 1.15.1 py38h4a40e3a_3 conda-forge charset-normalizer 3.1.0 pyhd8ed1ab_0 conda-forge cliff 4.2.0 pyhd8ed1ab_0 conda-forge cmaes 0.9.1 pyhd8ed1ab_0 conda-forge cmd2 2.4.3 py38h578d9bd_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge colorlog 6.7.0 py38h578d9bd_1 conda-forge comm 0.1.3 pypi_0 pypi contourpy 1.0.7 py38hfbd4bf9_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge debugpy 1.6.7 pypi_0 pypi decorator 5.1.1 pypi_0 pypi executing 1.2.0 pypi_0 pypi fonttools 4.39.4 py38h01eb140_0 conda-forge freetype 2.12.1 hca18f0e_1 conda-forge glpk 5.0 h445213a_0 conda-forge gmp 6.2.1 h58526e2_0 conda-forge greenlet 2.0.2 py38h17151c0_1 conda-forge grpcio 1.42.0 py38hce63b2e_0 h5py 3.1.0 nompi_py38hafa665b_100 conda-forge hdf5 1.10.6 nompi_h3c11f04_101 conda-forge icu 72.1 hcb278e6_0 conda-forge idna 3.4 pyhd8ed1ab_0 conda-forge igraph 0.10.4 hb9ddf80_2 conda-forge importlib-metadata 6.6.0 pyha770c72_0 conda-forge importlib-resources 5.12.0 pyhd8ed1ab_0 conda-forge importlib_metadata 6.6.0 hd8ed1ab_0 conda-forge importlib_resources 5.12.0 pyhd8ed1ab_0 conda-forge ipykernel 6.23.1 pypi_0 pypi ipython 8.12.2 pypi_0 pypi jedi 0.18.2 pypi_0 pypi joblib 1.2.0 pyhd8ed1ab_0 conda-forge jupyter-client 8.2.0 pypi_0 pypi jupyter-core 5.3.0 pypi_0 pypi kiwisolver 1.4.4 py38h43d8883_1 conda-forge lcms2 2.15 haa2dc70_1 conda-forge ld_impl_linux-64 2.40 h41732ed_0 conda-forge leidenalg 0.9.1 py38h8dc9893_0 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libblas 3.9.0 8_openblas conda-forge libbrotlicommon 1.0.9 h166bdaf_8 conda-forge libbrotlidec 1.0.9 h166bdaf_8 conda-forge libbrotlienc 1.0.9 h166bdaf_8 conda-forge libcblas 3.9.0 8_openblas conda-forge libdeflate 1.18 h0b41bf4_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 13.1.0 he5830b7_0 conda-forge libgfortran-ng 7.5.0 h14aa051_20 conda-forge libgfortran4 7.5.0 h14aa051_20 conda-forge libhwloc 2.9.1 hd6dc26d_0 conda-forge libiconv 1.17 h166bdaf_0 conda-forge libjpeg-turbo 2.1.5.1 h0b41bf4_0 conda-forge liblapack 3.9.0 8_openblas conda-forge libllvm14 14.0.6 hcd5def8_3 conda-forge libopenblas 0.3.12 pthreads_hb3c22a3_1 conda-forge libpng 1.6.39 h753d276_0 conda-forge libprotobuf 3.20.3 h3eb15da_0 conda-forge libsqlite 3.42.0 h2797004_0 conda-forge libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge libtiff 4.5.0 ha587672_6 conda-forge libwebp-base 1.3.0 h0b41bf4_0 conda-forge libxcb 1.15 h0b41bf4_0 conda-forge libxml2 2.10.4 hfdac1af_0 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge lisa2 2.3.0 pyhdfd78af_0 bioconda llvm-openmp 16.0.5 h4dfa4b3_0 conda-forge llvmlite 0.40.0 py38h94a1851_0 conda-forge mako 1.2.4 pyhd8ed1ab_0 conda-forge markdown 3.4.3 pyhd8ed1ab_0 conda-forge markupsafe 2.1.3 py38h01eb140_0 conda-forge matplotlib-base 3.7.1 py38hd6c3c57_0 conda-forge matplotlib-inline 0.1.6 pypi_0 pypi metis 5.1.0 h58526e2_1006 conda-forge mira-multiome 1.0.4 py_0 liulab-dfci mkl 2022.2.1 h84fe81f_16997 conda-forge moods 1.9.4.1 py38hcbe9525_4 bioconda mpfr 4.2.0 hb012696_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge natsort 8.3.1 pyhd8ed1ab_0 conda-forge ncurses 6.4 hcb278e6_0 conda-forge nest-asyncio 1.5.6 pypi_0 pypi networkx 2.8.8 pyhd8ed1ab_0 conda-forge ninja 1.11.1 h924138e_0 conda-forge numba 0.57.0 py38hd559b08_1 conda-forge numpy 1.23.5 py38h7042d01_0 conda-forge openjpeg 2.5.0 hfec8fc6_2 conda-forge openssl 1.1.1u hd590300_0 conda-forge opt_einsum 3.3.0 pyhd8ed1ab_1 conda-forge optuna 2.10.1 pyhd8ed1ab_0 conda-forge packaging 23.1 pyhd8ed1ab_0 conda-forge pandas 2.0.2 py38h01efb38_0 conda-forge parso 0.8.3 pypi_0 pypi patsy 0.5.3 pyhd8ed1ab_0 conda-forge pbr 5.11.1 pyhd8ed1ab_0 conda-forge pcre2 10.40 hc3806b6_0 conda-forge pexpect 4.8.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pillow 9.5.0 py38h885162f_1 conda-forge pip 23.1.2 pyhd8ed1ab_0 conda-forge platformdirs 3.5.1 pypi_0 pypi prettytable 3.7.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.38 pypi_0 pypi protobuf 3.20.3 py38h8dc9893_1 conda-forge psutil 5.9.5 pypi_0 pypi pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.2 pypi_0 pypi pycparser 2.21 pyhd8ed1ab_0 conda-forge pyfaidx 0.7.2.1 pyh7cba7a3_1 bioconda pygments 2.15.1 pypi_0 pypi pynndescent 0.5.10 pyh1a96a4e_0 conda-forge pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge pyperclip 1.8.2 pyhd8ed1ab_2 conda-forge pyro-api 0.1.2 pyhd8ed1ab_0 conda-forge pyro-ppl 1.8.4 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.8.16 h7a1cb2a_3 python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-igraph 0.10.4 py38hd98a34f_0 conda-forge python-tzdata 2023.3 pyhd8ed1ab_0 conda-forge python_abi 3.8 2_cp38 conda-forge pytorch 1.12.1 cpu_py38h39c826d_0 conda-forge pytz 2023.3 pyhd8ed1ab_0 conda-forge pyvcf3 1.0.3 pyhdfd78af_0 bioconda pyyaml 6.0 py38h0a891b7_5 conda-forge pyzmq 25.1.0 pypi_0 pypi readline 8.2 h8228510_1 conda-forge requests 2.31.0 pyhd8ed1ab_0 conda-forge scanpy 1.9.3 pyhd8ed1ab_0 conda-forge scikit-learn 0.24.2 py38hacb3eff_1 conda-forge scipy 1.5.3 py38h828c644_0 conda-forge seaborn 0.12.2 hd8ed1ab_0 conda-forge seaborn-base 0.12.2 pyhd8ed1ab_0 conda-forge session-info 1.0.0 pyhd8ed1ab_0 conda-forge setuptools 67.7.2 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sleef 3.5.1 h9b69904_2 conda-forge sqlalchemy 2.0.15 py38h01eb140_0 conda-forge sqlite 3.42.0 h2c6b66d_0 conda-forge stack-data 0.6.2 pypi_0 pypi statsmodels 0.14.0 py38h31356c5_1 conda-forge stdlib-list 0.8.0 pyhd8ed1ab_0 conda-forge stevedore 5.1.0 pyhd8ed1ab_0 conda-forge suitesparse 5.10.1 h9e50725_1 conda-forge swig 4.1.1 he155508_1 conda-forge tbb 2021.9.0 hf52228f_0 conda-forge tensorboard 1.15.0 py38_0 conda-forge texttable 1.6.7 pyhd8ed1ab_0 conda-forge threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge tk 8.6.12 h27826a3_0 conda-forge tornado 6.3.2 pypi_0 pypi tqdm 4.65.0 pyhd8ed1ab_1 conda-forge traitlets 5.9.0 pypi_0 pypi typing-extensions 4.6.3 hd8ed1ab_0 conda-forge typing_extensions 4.6.3 pyha770c72_0 conda-forge umap-learn 0.5.3 py38h578d9bd_1 conda-forge unicodedata2 15.0.0 py38h0a891b7_0 conda-forge urllib3 2.0.3 pyhd8ed1ab_0 conda-forge wcwidth 0.2.6 pyhd8ed1ab_0 conda-forge werkzeug 2.3.4 pyhd8ed1ab_0 conda-forge wheel 0.40.0 pyhd8ed1ab_0 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xz 5.4.2 h5eee18b_0 yaml 0.2.5 h7f98852_2 conda-forge zipp 3.15.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 h3eb15da_6 conda-forge

AllenWLynch commented 10 months ago

Hello, sorry I forgot to follow up on this message, but I changed the recommend preprocessing in the tutorials to reflect this change to the scanpy API.