'Variable type field must be a TensorType.' error when running cell2location

sztankatt commented 3 years ago

Hi,

I've followed the tutorial, however I'm stack during the last part of 2/3. When I run the code above, I get an error.

os.environ["THEANO_FLAGS"] = 'device=cpu,floatX=float32,openmp=True,force_device=True'
r = cell2location.run_cell2location(

      # Single cell reference signatures as pd.DataFrame
      # (could also be data as anndata object for estimating signatures
      #  as cluster average expression - `sc_data=adata_snrna_raw`)
      sc_data=inf_aver,
      # Spatial data as anndata object
      sp_data=adata_vis,

      # the column in sc_data.obs that gives cluster idenitity of each cell
      summ_sc_data_args={'cluster_col': "TaxonomyRank4",
                         # select marker genes of cell types by specificity of their expression signatures
                         'selection': "cluster_specificity",
                         # specificity cutoff (1 = max, 0 = min)
                         'selection_specificity': 0.5
                        },

      train_args={'use_raw': True, # By default uses raw slots in both of the input datasets.
                  'n_iter': 40000, # Increase the number of iterations if needed (see QC below)

                  # Whe analysing the data that contains multiple experiments,
                  # cell2location automatically enters the mode which pools information across experiments
                  'sample_name_col': 'sample'}, # Column in sp_data.obs with experiment ID (see above)

      export_args={'path': data_dir + 'cell2location_model/'
                  }

)

And the error what I get is:

### Summarising single cell clusters ###
### Creating model ### - time 0.0 min
Traceback (most recent call last):
  File "run_cell2location.py", line 191, in <module>
    export_args={'path': data_dir + 'cell2location_model/'
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location/run_c2l.py", line 345, in run_cell2location
    **model_kwargs)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location/models/LocationModelLinearDependentWMultiExperiment.py", line 278, in __init__
    total_size=self.X_data.shape)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 83, in __new__
    return model.Var(name, dist, data, total_size, dims=dims)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pymc3/model.py", line 1117, in Var
    model=self,
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pymc3/model.py", line 1737, in __init__
    data = as_tensor(data, name, model, distribution)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pymc3/model.py", line 1691, in as_tensor
    data = tt.as_tensor_variable(data, name=name)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/tensor/basic.py", line 158, in as_tensor_variable
    "Variable type field must be a TensorType.", x, x.type)
theano.tensor.var.AsTensorError: ('Variable type field must be a TensorType.', SparseVariable{csr,int16}, Sparse[int16, csr])

vitkl commented 3 years ago

Hi @sztankatt

Thanks for using cell2location!

We generally recommend using GPU unless you have a small spatial dataset with e.g. < 100 locations / multi-cell samples (for example @AlexanderAivazidis recently added a notebook showing how to analyse Nanostring WTA data https://github.com/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2loation_for_NanostringWTA.ipynb). Training for the data in the tutorial notebook (2 sections of mouse brain Visium) is going to take >100 hours on CPU but just 25 minutes on GPU. We are working on porting cell2location to numpyro (https://github.com/vitkl/cell2location_numpyro) and pyro that have better CPU performance (hours for demo notebook) but there will always be substantial speedup from using the GPU.
With regard to the error you are getting, A) If you can share just a few (e.g. 10) observations from your adata_vis and a subset of inf_aver this will help reproduce the error (e.g. by email vitalii.kleshchevnikov@sanger.ac.uk). B) Could you please print out the type of objects type(inf_aver), type(adata_vis.raw.X) C) Could you tell which cell2location version you are using (e.g. latest github / singularity container)? D) Print out all package versions as shown in the last code cell here: https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_short_demo.html

lopollar commented 3 years ago

Hi, I am trying to run through th whole tutorial of cell2location on the data provided by the tutorial before trying it on my own data. I however, encounter problems when arriving at running the actual model. when running:

sc.settings.set_figure_params(dpi = 100, color_map = 'viridis', dpi_save = 100,
                              vector_friendly = True, format = 'pdf',
                              facecolor='white')

r = cell2location.run_cell2location(

      # Single cell reference signatures as pd.DataFrame
      # (could also be data as anndata object for estimating signatures
      #  as cluster average expression - `sc_data=adata_snrna_raw`)
      sc_data=inf_aver,
      # Spatial data as anndata object
      sp_data=adata_vis,

      # the column in sc_data.obs that gives cluster idenitity of each cell
      summ_sc_data_args={'cluster_col': "annotation_1",
                         # select marker genes of cell types by specificity of their expression signatures
                         'selection': "cluster_specificity",
                         # specificity cutoff (1 = max, 0 = min)
                         'selection_specificity': 0.07
                        },

      train_args={'use_raw': True, # By default uses raw slots in both of the input datasets.
                  'n_iter': 40000, # Increase the number of iterations if needed (see QC below)

                  # Whe analysing the data that contains multiple experiments,
                  # cell2location automatically enters the mode which pools information across experiments
                  'sample_name_col': 'sample'}, # Column in sp_data.obs with experiment ID (see above)

      export_args={'path': results_folder, # path where to save results
                   'run_name_suffix': '' # optinal suffix to modify the name the run
                  },

      model_kwargs={ # Prior on the number of cells, cell types and co-located groups

                    'cell_number_prior': {
                        # - N - the expected number of cells per location:
                        'cells_per_spot': 8,
                        # - A - the expected number of cell types per location:
                        'factors_per_spot': 9,
                        # - Y - the expected number of co-located cell type groups per location
                        'combs_per_spot': 5
                    },

                     # Prior beliefs on the sensitivity of spatial technology:
                    'gene_level_prior':{
                        # Prior on the mean
                        'mean': 1/2,
                        # Prior on standard deviation,
                        # a good choice of this value should be at least 2 times lower that the mean
                        'sd': 1/4
                    }
      }
)

I encounter following error, the same type as mentioned above:

---------------------------------------------------------------------------
AsTensorError                             Traceback (most recent call last)
<ipython-input-28-5578f54fa95f> in <module>
     49                         # Prior on standard deviation,
     50                         # a good choice of this value should be at least 2 times lower that the mean
---> 51                         'sd': 1/4
     52                     }
     53       }

/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/cell2location/run_c2l.py in run_cell2location(sc_data, sp_data, model_name, verbose, show_locations, return_all, summ_sc_data_args, train_args, model_kwargs, posterior_args, export_args)
    343                 fact_names=fact_names,
    344                 sample_id=sp_data.obs[train_args['sample_name_col']],
--> 345                 **model_kwargs)
    346 
    347     ####### Print run name #######

/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/cell2location/models/LocationModelLinearDependentWMultiExperiment.py in __init__(self, cell_state_mat, X_data, n_comb, data_type, n_iter, learning_rate, total_grad_norm_constraint, verbose, var_names, var_names_read, obs_names, fact_names, sample_id, gene_level_prior, gene_level_var_prior, cell_number_prior, cell_number_var_prior, phi_hyp_prior, spot_fact_mean_var_ratio, exper_gene_level_mean_var_ratio)
    276                                                                      1 / tt.pow(self.gene_E, 2)),
    277                                                    observed=self.x_data,
--> 278                                                    total_size=self.X_data.shape)
    279 
    280             # =====================Compute nUMI from each factor in spots  ======================= #

/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     81         else:
     82             dist = cls.dist(*args, **kwargs)
---> 83         return model.Var(name, dist, data, total_size, dims=dims)
     84 
     85     def __getnewargs__(self):

/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/pymc3/model.py in Var(self, name, dist, data, total_size, dims)
   1115                     distribution=dist,
   1116                     total_size=total_size,
-> 1117                     model=self,
   1118                 )
   1119             self.observed_RVs.append(var)

/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/pymc3/model.py in __init__(self, type, owner, index, name, data, distribution, total_size, model)
   1735 
   1736         if distribution is not None:
-> 1737             data = as_tensor(data, name, model, distribution)
   1738 
   1739             self.missing_values = data.missing_values

/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/pymc3/model.py in as_tensor(data, name, model, distribution)
   1689         return data
   1690     else:
-> 1691         data = tt.as_tensor_variable(data, name=name)
   1692         data.missing_values = None
   1693         return data

/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/theano/tensor/basic.py in as_tensor_variable(x, name, ndim)
    156         if not isinstance(x.type, TensorType):
    157             raise AsTensorError(
--> 158                 "Variable type field must be a TensorType.", x, x.type)
    159 
    160         if ndim is None:

AsTensorError: ('Variable type field must be a TensorType.', SparseVariable{csr,int16}, Sparse[int16, csr])

I run the model on an 'old' titan X GPU, as newer GPU's don't support theano anymore. I installed the package yesterday, based on the environment.yml file provided, and then using the github as mentioned by you.

However, I encountered a problem with the arviz package. When loading the package in python, I received following error: module 'arviz' has no attribute 'geweke' For this reason, I installed arviz version 0.10.0 instead of 0.11.2. This made it possible to load the package into python. If necessary, I can send more information, but I just followed all provided steps.

onahman commented 3 years ago

I did exactly the same as @lopollar, and I am getting the same error as he is. I also tried to take only one sample for the spatial data and erased the 'sample_name_col': 'sample'. At first I got an error that 'LocationModelLinearDependentW' is not defined in cell2location.models. I added it to the models init.py and then received the same error (AsTensorError) again. So it is probably not specific to the model and please also check the case for one sample.

sztankatt commented 3 years ago

Dear @vitkl

A) I'll send the data as you ask B)

print(type(adata_vis.raw.X))
print(type(inf_aver))

results in

<class 'scipy.sparse.csr.csr_matrix'>
<class 'pandas.core.frame.DataFrame'>

C) I've downloaded the latest version of cell2location. I've installed it with pip install git+https://github.com/BayraktarLab/cell2location.git, and added dependencies via conda

D) I'm printing out all the package info

sys 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16) 
[GCC 9.3.0]
ipykernel._version 5.4.2
re 2.2.1
json 2.0.9
IPython.core.release 7.19.0
logging 0.5.1.2
zlib 1.0
traitlets._version 5.0.5
traitlets 5.0.5
argparse 1.1
ipython_genutils._version 0.2.0
ipython_genutils 0.2.0
platform 1.0.8
pygments 2.7.4
ptyprocess 0.7.0
pexpect 4.8.0
IPython.core.crashhandler 7.19.0
decorator 4.4.2
pickleshare 0.7.5
backcall 0.2.0
_sqlite3 2.6.0
sqlite3.dbapi2 2.6.0
sqlite3 2.6.0
wcwidth 0.2.5
prompt_toolkit 3.0.10
parso 0.7.1
jedi 0.17.2
urllib.request 3.8
IPython 7.19.0
jupyter_client._version 6.1.11
_ctypes 1.1.0
ctypes 1.1.0
zmq.backend.cython.constants 40303
zmq.backend.cython 40303
zmq.sugar.constants 40303
_decimal 1.70
decimal 1.70
simplejson 3.17.2
zmq.sugar.version 21.0.1
zmq.sugar 21.0.1
zmq 21.0.1
jupyter_core.version 4.7.0
jupyter_core 4.7.0
jupyter_client 6.1.11
ipykernel 5.4.2
tornado 6.1
_curses b'2.2'
dateutil._version 2.8.1
dateutil 2.8.1
six 1.15.0
distutils 3.8.6
pkg_resources._vendor.appdirs 1.4.3
pkg_resources.extern.appdirs 1.4.3
pkg_resources._vendor.packaging.__about__ 20.4
pkg_resources._vendor.packaging 20.4
pkg_resources.extern.packaging 20.4
pkg_resources._vendor.pyparsing 2.2.1
pkg_resources.extern.pyparsing 2.2.1
packaging.__about__ 20.8
packaging 20.8
_csv 1.0
csv 1.0
scanpy._metadata 1.7.0rc1
numpy.version 1.19.5
numpy.core._multiarray_umath 3.1
numpy.core 1.19.5
numpy.linalg._umath_linalg b'0.1.5'
numpy.lib 1.19.5
numpy 1.19.5
scipy.version 1.5.3
scipy._lib._uarray 0.5.1+49.g4c3f1d7.scipy
scipy 1.5.3
anndata._metadata 0.7.5
h5py.version 3.1.0
h5py 3.1.0
natsort 7.1.0
pytz 2020.5
pandas.compat.numpy.function 1.19.5
pandas 1.2.0
anndata 0.7.5
stdlib_list v0.8.0
sinfo 0.3.1
yaml 5.3.1
llvmlite 0.35.0
numba.misc.appdirs 1.4.1
numba 0.52.0
joblib.externals.cloudpickle 1.6.0
psutil 5.8.0
joblib.externals.loky 2.9.0
joblib 1.0.0
sklearn.utils._joblib 1.0.0
scipy._lib.decorator 4.0.5
scipy.linalg._fblas b'$Revision: $'
scipy.linalg._flapack b'$Revision: $'
scipy.linalg._flinalg b'$Revision: $'
scipy.special.specfun b'$Revision: $'
scipy.ndimage 2.0
scipy.optimize.minpack2 b'$Revision: $'
scipy.sparse.linalg.isolve._iterative b'$Revision: $'
scipy.sparse.linalg.eigen.arpack._arpack b'$Revision: $'
scipy.optimize._lbfgsb b'$Revision: $'
scipy.optimize._cobyla b'$Revision: $'
scipy.optimize._slsqp b'$Revision: $'
scipy.optimize._minpack  1.10 
scipy.optimize.__nnls b'$Revision: $'
scipy.integrate._odepack  1.9 
scipy.integrate._quadpack  1.13 
scipy.integrate.vode b'$Revision: $'
scipy.integrate._dop b'$Revision: $'
scipy.integrate.lsoda b'$Revision: $'
scipy.integrate._ode $Id$
scipy.interpolate._fitpack  1.7 
scipy.interpolate.dfitpack b'$Revision: $'
scipy.stats.statlib b'$Revision: $'
scipy.stats.mvn b'$Revision: $'
sklearn.base 0.22
sklearn 0.22
cairo._cairo 1.20.0
cairo 1.20.0
texttable 1.6.3
igraph.version 0.8.3
igraph 0.8.3
leidenalg 0.8.3
pyparsing 2.4.7
cycler 0.10.0
kiwisolver 1.3.1
matplotlib 3.3.3
PIL._version 8.1.0
PIL 8.1.0
xml.etree.ElementTree 1.3.0
cffi 1.14.4
PIL.Image 8.1.0
numexpr.version 2.7.2
numexpr 2.7.2
tables 3.6.1
get_version 2.1
legacy_api_wrap 1.2
scanpy 1.7.0rc1
seaborn.external.husl 2.1.0
statsmodels 0.12.1
ipywidgets._version 7.6.3
ipywidgets 7.6.3
seaborn 0.11.1
_cffi_backend 1.14.4
pycparser.ply 3.9
pycparser.ply.yacc 3.10
pycparser.ply.lex 3.10
pycparser 2.20
pynndescent 0.5.2
umap 0.4.6
theano.version 1.0.5
scipy.signal.spline 0.2
theano 1.0.5
patsy.version 0.5.1
patsy 0.5.1
mizani 0.7.2
palettable 3.3.0
mizani.external.husl 4.0.3
statsmodels.__init__ 0.12.1
statsmodels.tools.web 0.12.1
statsmodels.api 0.12.1
plotnine 0.7.1
xarray 0.16.2
arviz.data.base 0.11.1
cftime._cftime 1.4.1
cftime 1.4.1
netCDF4._netCDF4 1.5.6
netCDF4 1.5.6
arviz 0.11.1
fastprogress 0.2.7
pymc3 3.9.0
tqdm._dist_ver 4.57.0
tqdm.version 4.57.0
tqdm.cli 4.57.0
tqdm 4.57.0
torch.version 1.7.1
tarfile 0.9.0
torch.cuda.nccl 2708
torch.backends.cudnn 7605
torch 1.7.1

vitkl commented 3 years ago

Hi @lopollar @sztankatt @onahman!

thanks for using cell2location! Is using docker/singularity container an option for you? We recommend using the containers to avoid version incompatibility issues such as that you are facing now.

Pymc3 and related packages (arviz, theano from pymc3 developers) have undergone major changes recently and , unfortunately, I was not able to adapt cell2location to those changes just yet, sorry. I will likely do that within the next 2-3 weeks.

@lopollar thanks for pointing out the arviz version issue! Also:

as newer GPU's don't support theano anymore

This is not true - I use Tesla V100 / P100 all the time. The issue might be in the latest drivers rather than hardware. Using the containers we provide should help with this issue.

@onahman Sorry for this issue. I believe using the containers should resolve it. I also just introduced changes that should address it in https://github.com/BayraktarLab/cell2location/commit/d4c4e7f0407af6c9c19a64334e094dafcfe3114e .

onahman commented 3 years ago

@vitkl Thank you for the fast reply! Is it possible to create a conda / virtual env with the compatible versions? I am reluctant to move to a docker.. I don't know how it affects performance, but I assume it does and I am not sure how long it will take to set up the GPU in there.

vitkl commented 3 years ago

@onahman It should be possible but it will take some trial and error which I cannot do this week. If you decide to do that would be great if you can contribute back.

On performance, 1) the container will not affect training speed; 2) the accuracy of cell2location was tested using driver versions in the container (e.g. CUDA 10.2) - but I would not expect substantial difference with new drivers.

On setting up GPU, I am actually not an expert on setting up this (central institute IT setup singularity for us). Maybe @yozhikoff can help?

vitkl commented 3 years ago

@sztankatt I can confirm that your issue is also because of incompatible dependency versions. Cell2location works on your data with our singularity container. Is it possible for you to use the container?

Also, just as a side note, since you are using a technology with a different spatial resolution, I would recommend adjusting these prior accordingly:

'cell_number_prior': {
                        # - N - the expected number of cells per location (1 cell for 10um locations):
                        'cells_per_spot': 8, 
                        # - A - the expected number of cell types per location (also 1 for 10um locations):
                        'factors_per_spot': 9, 
                        # - Y - the expected number of co-located cell type groups per location (also 1 for 10um locations)
                        'combs_per_spot': 5
                    },

                     # Prior beliefs on the sensitivity of spatial technology (you need to compare the total count per location to snRNA-seq):
                    'gene_level_prior':{
                        # Prior on the mean
                        'mean': 1/2, 
                        # Prior on standard deviation,
                        # a good choice of this value should be at least 2 times lower that the mean
                        'sd': 1/4
                    }

Also, if you want to use cell2location plotting (such as automatic plotting for all cell type in cell2location.run_cell2location) or general scanpy plotting - you need to add 1) adata.obsm['spatial'] slot with X and Y coordinates; 2) sp_data.uns['spatial'] = {'wt_1': 'random letters', 'wt_2': 'random letters', ... for other samples} 3) use cell2location.run_cell2location(export_args={'img_key': None}) option

yozhikoff commented 3 years ago

@onahman Setting GPU shouldn't be too hard, we have a short guide about it here. The only potential issue is that it does require root access, is it a problem for you?

onahman commented 3 years ago

@yozhikoff No, I have root access. I will check out the guide. Thank you!

vitkl commented 3 years ago

Hi @lopollar @sztankatt @onahman @yozhikoff !

This error was due to a bug in cell2location.run_cell2location which failed to recognise the sparse matrix of all possible types and did not covert some types (csr) to dense np.array. Sorry about this.

Now fixed in https://github.com/BayraktarLab/cell2location/commit/4e260ca3c5e657d3e6c97f5e98c761f5faed8d42

sztankatt commented 3 years ago

@vitkl thank you!

BayraktarLab / cell2location

'Variable type field must be a TensorType.' error when running cell2location #27