Closed sztankatt closed 3 years ago
Hi @sztankatt
Thanks for using cell2location!
We generally recommend using GPU unless you have a small spatial dataset with e.g. < 100 locations / multi-cell samples (for example @AlexanderAivazidis recently added a notebook showing how to analyse Nanostring WTA data https://github.com/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2loation_for_NanostringWTA.ipynb). Training for the data in the tutorial notebook (2 sections of mouse brain Visium) is going to take >100 hours on CPU but just 25 minutes on GPU. We are working on porting cell2location to numpyro (https://github.com/vitkl/cell2location_numpyro) and pyro that have better CPU performance (hours for demo notebook) but there will always be substantial speedup from using the GPU.
With regard to the error you are getting,
A) If you can share just a few (e.g. 10) observations from your adata_vis
and a subset of inf_aver
this will help reproduce the error (e.g. by email vitalii.kleshchevnikov@sanger.ac.uk).
B) Could you please print out the type of objects type(inf_aver)
, type(adata_vis.raw.X)
C) Could you tell which cell2location version you are using (e.g. latest github / singularity container)?
D) Print out all package versions as shown in the last code cell here: https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_short_demo.html
Hi, I am trying to run through th whole tutorial of cell2location on the data provided by the tutorial before trying it on my own data. I however, encounter problems when arriving at running the actual model. when running:
sc.settings.set_figure_params(dpi = 100, color_map = 'viridis', dpi_save = 100,
vector_friendly = True, format = 'pdf',
facecolor='white')
r = cell2location.run_cell2location(
# Single cell reference signatures as pd.DataFrame
# (could also be data as anndata object for estimating signatures
# as cluster average expression - `sc_data=adata_snrna_raw`)
sc_data=inf_aver,
# Spatial data as anndata object
sp_data=adata_vis,
# the column in sc_data.obs that gives cluster idenitity of each cell
summ_sc_data_args={'cluster_col': "annotation_1",
# select marker genes of cell types by specificity of their expression signatures
'selection': "cluster_specificity",
# specificity cutoff (1 = max, 0 = min)
'selection_specificity': 0.07
},
train_args={'use_raw': True, # By default uses raw slots in both of the input datasets.
'n_iter': 40000, # Increase the number of iterations if needed (see QC below)
# Whe analysing the data that contains multiple experiments,
# cell2location automatically enters the mode which pools information across experiments
'sample_name_col': 'sample'}, # Column in sp_data.obs with experiment ID (see above)
export_args={'path': results_folder, # path where to save results
'run_name_suffix': '' # optinal suffix to modify the name the run
},
model_kwargs={ # Prior on the number of cells, cell types and co-located groups
'cell_number_prior': {
# - N - the expected number of cells per location:
'cells_per_spot': 8,
# - A - the expected number of cell types per location:
'factors_per_spot': 9,
# - Y - the expected number of co-located cell type groups per location
'combs_per_spot': 5
},
# Prior beliefs on the sensitivity of spatial technology:
'gene_level_prior':{
# Prior on the mean
'mean': 1/2,
# Prior on standard deviation,
# a good choice of this value should be at least 2 times lower that the mean
'sd': 1/4
}
}
)
I encounter following error, the same type as mentioned above:
---------------------------------------------------------------------------
AsTensorError Traceback (most recent call last)
<ipython-input-28-5578f54fa95f> in <module>
49 # Prior on standard deviation,
50 # a good choice of this value should be at least 2 times lower that the mean
---> 51 'sd': 1/4
52 }
53 }
/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/cell2location/run_c2l.py in run_cell2location(sc_data, sp_data, model_name, verbose, show_locations, return_all, summ_sc_data_args, train_args, model_kwargs, posterior_args, export_args)
343 fact_names=fact_names,
344 sample_id=sp_data.obs[train_args['sample_name_col']],
--> 345 **model_kwargs)
346
347 ####### Print run name #######
/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/cell2location/models/LocationModelLinearDependentWMultiExperiment.py in __init__(self, cell_state_mat, X_data, n_comb, data_type, n_iter, learning_rate, total_grad_norm_constraint, verbose, var_names, var_names_read, obs_names, fact_names, sample_id, gene_level_prior, gene_level_var_prior, cell_number_prior, cell_number_var_prior, phi_hyp_prior, spot_fact_mean_var_ratio, exper_gene_level_mean_var_ratio)
276 1 / tt.pow(self.gene_E, 2)),
277 observed=self.x_data,
--> 278 total_size=self.X_data.shape)
279
280 # =====================Compute nUMI from each factor in spots ======================= #
/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
81 else:
82 dist = cls.dist(*args, **kwargs)
---> 83 return model.Var(name, dist, data, total_size, dims=dims)
84
85 def __getnewargs__(self):
/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/pymc3/model.py in Var(self, name, dist, data, total_size, dims)
1115 distribution=dist,
1116 total_size=total_size,
-> 1117 model=self,
1118 )
1119 self.observed_RVs.append(var)
/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/pymc3/model.py in __init__(self, type, owner, index, name, data, distribution, total_size, model)
1735
1736 if distribution is not None:
-> 1737 data = as_tensor(data, name, model, distribution)
1738
1739 self.missing_values = data.missing_values
/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/pymc3/model.py in as_tensor(data, name, model, distribution)
1689 return data
1690 else:
-> 1691 data = tt.as_tensor_variable(data, name=name)
1692 data.missing_values = None
1693 return data
/srv/scratch/lottep/anaconda3/envs/cellpymc2/lib/python3.7/site-packages/theano/tensor/basic.py in as_tensor_variable(x, name, ndim)
156 if not isinstance(x.type, TensorType):
157 raise AsTensorError(
--> 158 "Variable type field must be a TensorType.", x, x.type)
159
160 if ndim is None:
AsTensorError: ('Variable type field must be a TensorType.', SparseVariable{csr,int16}, Sparse[int16, csr])
I run the model on an 'old' titan X GPU, as newer GPU's don't support theano anymore. I installed the package yesterday, based on the environment.yml file provided, and then using the github as mentioned by you.
However, I encountered a problem with the arviz package. When loading the package in python, I received following error: module 'arviz' has no attribute 'geweke' For this reason, I installed arviz version 0.10.0 instead of 0.11.2. This made it possible to load the package into python. If necessary, I can send more information, but I just followed all provided steps.
I did exactly the same as @lopollar, and I am getting the same error as he is. I also tried to take only one sample for the spatial data and erased the 'sample_name_col': 'sample'. At first I got an error that 'LocationModelLinearDependentW' is not defined in cell2location.models. I added it to the models init.py and then received the same error (AsTensorError) again. So it is probably not specific to the model and please also check the case for one sample.
Dear @vitkl
A) I'll send the data as you ask B)
print(type(adata_vis.raw.X))
print(type(inf_aver))
results in
<class 'scipy.sparse.csr.csr_matrix'>
<class 'pandas.core.frame.DataFrame'>
C) I've downloaded the latest version of cell2location. I've installed it with pip install git+https://github.com/BayraktarLab/cell2location.git
, and added dependencies via conda
D) I'm printing out all the package info
sys 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16)
[GCC 9.3.0]
ipykernel._version 5.4.2
re 2.2.1
json 2.0.9
IPython.core.release 7.19.0
logging 0.5.1.2
zlib 1.0
traitlets._version 5.0.5
traitlets 5.0.5
argparse 1.1
ipython_genutils._version 0.2.0
ipython_genutils 0.2.0
platform 1.0.8
pygments 2.7.4
ptyprocess 0.7.0
pexpect 4.8.0
IPython.core.crashhandler 7.19.0
decorator 4.4.2
pickleshare 0.7.5
backcall 0.2.0
_sqlite3 2.6.0
sqlite3.dbapi2 2.6.0
sqlite3 2.6.0
wcwidth 0.2.5
prompt_toolkit 3.0.10
parso 0.7.1
jedi 0.17.2
urllib.request 3.8
IPython 7.19.0
jupyter_client._version 6.1.11
_ctypes 1.1.0
ctypes 1.1.0
zmq.backend.cython.constants 40303
zmq.backend.cython 40303
zmq.sugar.constants 40303
_decimal 1.70
decimal 1.70
simplejson 3.17.2
zmq.sugar.version 21.0.1
zmq.sugar 21.0.1
zmq 21.0.1
jupyter_core.version 4.7.0
jupyter_core 4.7.0
jupyter_client 6.1.11
ipykernel 5.4.2
tornado 6.1
_curses b'2.2'
dateutil._version 2.8.1
dateutil 2.8.1
six 1.15.0
distutils 3.8.6
pkg_resources._vendor.appdirs 1.4.3
pkg_resources.extern.appdirs 1.4.3
pkg_resources._vendor.packaging.__about__ 20.4
pkg_resources._vendor.packaging 20.4
pkg_resources.extern.packaging 20.4
pkg_resources._vendor.pyparsing 2.2.1
pkg_resources.extern.pyparsing 2.2.1
packaging.__about__ 20.8
packaging 20.8
_csv 1.0
csv 1.0
scanpy._metadata 1.7.0rc1
numpy.version 1.19.5
numpy.core._multiarray_umath 3.1
numpy.core 1.19.5
numpy.linalg._umath_linalg b'0.1.5'
numpy.lib 1.19.5
numpy 1.19.5
scipy.version 1.5.3
scipy._lib._uarray 0.5.1+49.g4c3f1d7.scipy
scipy 1.5.3
anndata._metadata 0.7.5
h5py.version 3.1.0
h5py 3.1.0
natsort 7.1.0
pytz 2020.5
pandas.compat.numpy.function 1.19.5
pandas 1.2.0
anndata 0.7.5
stdlib_list v0.8.0
sinfo 0.3.1
yaml 5.3.1
llvmlite 0.35.0
numba.misc.appdirs 1.4.1
numba 0.52.0
joblib.externals.cloudpickle 1.6.0
psutil 5.8.0
joblib.externals.loky 2.9.0
joblib 1.0.0
sklearn.utils._joblib 1.0.0
scipy._lib.decorator 4.0.5
scipy.linalg._fblas b'$Revision: $'
scipy.linalg._flapack b'$Revision: $'
scipy.linalg._flinalg b'$Revision: $'
scipy.special.specfun b'$Revision: $'
scipy.ndimage 2.0
scipy.optimize.minpack2 b'$Revision: $'
scipy.sparse.linalg.isolve._iterative b'$Revision: $'
scipy.sparse.linalg.eigen.arpack._arpack b'$Revision: $'
scipy.optimize._lbfgsb b'$Revision: $'
scipy.optimize._cobyla b'$Revision: $'
scipy.optimize._slsqp b'$Revision: $'
scipy.optimize._minpack 1.10
scipy.optimize.__nnls b'$Revision: $'
scipy.integrate._odepack 1.9
scipy.integrate._quadpack 1.13
scipy.integrate.vode b'$Revision: $'
scipy.integrate._dop b'$Revision: $'
scipy.integrate.lsoda b'$Revision: $'
scipy.integrate._ode $Id$
scipy.interpolate._fitpack 1.7
scipy.interpolate.dfitpack b'$Revision: $'
scipy.stats.statlib b'$Revision: $'
scipy.stats.mvn b'$Revision: $'
sklearn.base 0.22
sklearn 0.22
cairo._cairo 1.20.0
cairo 1.20.0
texttable 1.6.3
igraph.version 0.8.3
igraph 0.8.3
leidenalg 0.8.3
pyparsing 2.4.7
cycler 0.10.0
kiwisolver 1.3.1
matplotlib 3.3.3
PIL._version 8.1.0
PIL 8.1.0
xml.etree.ElementTree 1.3.0
cffi 1.14.4
PIL.Image 8.1.0
numexpr.version 2.7.2
numexpr 2.7.2
tables 3.6.1
get_version 2.1
legacy_api_wrap 1.2
scanpy 1.7.0rc1
seaborn.external.husl 2.1.0
statsmodels 0.12.1
ipywidgets._version 7.6.3
ipywidgets 7.6.3
seaborn 0.11.1
_cffi_backend 1.14.4
pycparser.ply 3.9
pycparser.ply.yacc 3.10
pycparser.ply.lex 3.10
pycparser 2.20
pynndescent 0.5.2
umap 0.4.6
theano.version 1.0.5
scipy.signal.spline 0.2
theano 1.0.5
patsy.version 0.5.1
patsy 0.5.1
mizani 0.7.2
palettable 3.3.0
mizani.external.husl 4.0.3
statsmodels.__init__ 0.12.1
statsmodels.tools.web 0.12.1
statsmodels.api 0.12.1
plotnine 0.7.1
xarray 0.16.2
arviz.data.base 0.11.1
cftime._cftime 1.4.1
cftime 1.4.1
netCDF4._netCDF4 1.5.6
netCDF4 1.5.6
arviz 0.11.1
fastprogress 0.2.7
pymc3 3.9.0
tqdm._dist_ver 4.57.0
tqdm.version 4.57.0
tqdm.cli 4.57.0
tqdm 4.57.0
torch.version 1.7.1
tarfile 0.9.0
torch.cuda.nccl 2708
torch.backends.cudnn 7605
torch 1.7.1
Hi @lopollar @sztankatt @onahman!
thanks for using cell2location! Is using docker/singularity container an option for you? We recommend using the containers to avoid version incompatibility issues such as that you are facing now.
Pymc3 and related packages (arviz, theano from pymc3 developers) have undergone major changes recently and , unfortunately, I was not able to adapt cell2location to those changes just yet, sorry. I will likely do that within the next 2-3 weeks.
@lopollar thanks for pointing out the arviz version issue! Also:
as newer GPU's don't support theano anymore
This is not true - I use Tesla V100 / P100 all the time. The issue might be in the latest drivers rather than hardware. Using the containers we provide should help with this issue.
@onahman Sorry for this issue. I believe using the containers should resolve it. I also just introduced changes that should address it in https://github.com/BayraktarLab/cell2location/commit/d4c4e7f0407af6c9c19a64334e094dafcfe3114e .
@vitkl Thank you for the fast reply! Is it possible to create a conda / virtual env with the compatible versions? I am reluctant to move to a docker.. I don't know how it affects performance, but I assume it does and I am not sure how long it will take to set up the GPU in there.
@onahman It should be possible but it will take some trial and error which I cannot do this week. If you decide to do that would be great if you can contribute back.
On performance, 1) the container will not affect training speed; 2) the accuracy of cell2location was tested using driver versions in the container (e.g. CUDA 10.2) - but I would not expect substantial difference with new drivers.
On setting up GPU, I am actually not an expert on setting up this (central institute IT setup singularity for us). Maybe @yozhikoff can help?
@sztankatt I can confirm that your issue is also because of incompatible dependency versions. Cell2location works on your data with our singularity container. Is it possible for you to use the container?
Also, just as a side note, since you are using a technology with a different spatial resolution, I would recommend adjusting these prior accordingly:
'cell_number_prior': {
# - N - the expected number of cells per location (1 cell for 10um locations):
'cells_per_spot': 8,
# - A - the expected number of cell types per location (also 1 for 10um locations):
'factors_per_spot': 9,
# - Y - the expected number of co-located cell type groups per location (also 1 for 10um locations)
'combs_per_spot': 5
},
# Prior beliefs on the sensitivity of spatial technology (you need to compare the total count per location to snRNA-seq):
'gene_level_prior':{
# Prior on the mean
'mean': 1/2,
# Prior on standard deviation,
# a good choice of this value should be at least 2 times lower that the mean
'sd': 1/4
}
Also, if you want to use cell2location plotting (such as automatic plotting for all cell type in cell2location.run_cell2location
) or general scanpy plotting - you need to add
1) adata.obsm['spatial']
slot with X and Y coordinates;
2) sp_data.uns['spatial'] = {'wt_1': 'random letters', 'wt_2': 'random letters', ... for other samples}
3) use cell2location.run_cell2location(export_args={'img_key': None})
option
@onahman Setting GPU shouldn't be too hard, we have a short guide about it here. The only potential issue is that it does require root access, is it a problem for you?
@yozhikoff No, I have root access. I will check out the guide. Thank you!
Hi @lopollar @sztankatt @onahman @yozhikoff !
This error was due to a bug in cell2location.run_cell2location
which failed to recognise the sparse matrix of all possible types and did not covert some types (csr) to dense np.array
. Sorry about this.
Now fixed in https://github.com/BayraktarLab/cell2location/commit/4e260ca3c5e657d3e6c97f5e98c761f5faed8d42
@vitkl thank you!
Hi,
I've followed the tutorial, however I'm stack during the last part of 2/3. When I run the code above, I get an error.
And the error what I get is: