Teichlab / MultiMAP

MultiMAP for integration of single cell multi-omics
MIT License
53 stars 12 forks source link

MultiMAP.Integration error for CITEseq #3

Closed kathrinjansen closed 3 years ago

kathrinjansen commented 3 years ago

Dear developers,

MultiMap looks great in the pre-print! I’m trying to apply it to a CITEseq dataset and got an error. I’ve pulled the current code again (today at 11.30 am UK time) and now tried to run it with your example data following the tutorial but I still get the error (see below). In addition to the error I’ve added the pip freeze for the packages defined in your setup.py and the anndata/scanpy versions.

Could you help me to figure out what is going wrong?

Thanks a lot for your help. Best wishes, Kathrin

Command: adata = MultiMAP.Integration([rna, atac_genes], ['X_pca', 'X_lsi'])

Error on example dataset from tutorial

xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/anndata/_core/anndata.py:794: UserWarning: AnnData expects .var.index to contain strings, but got values like: []

Inferred to be: empty

value_idx = self._prep_dim_index(value.index, attr)

ValueError Traceback (most recent call last)

in ----> 1 adata = MultiMAP.Integration([rna, atac_genes], ['X_pca', 'X_lsi']) xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/MultiMAP/__init__.py in Integration(adatas, use_reps, scale, embedding, **kwargs) 185 186 #call the wrapper. returns (params, connectivities, embedding), with embedding optional --> 187 mmp = Wrapper(flagged=flagged, use_reps=use_reps, embedding=embedding, **kwargs) 188 189 #make one happy collapsed object and shove the stuff in correct places xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/MultiMAP/__init__.py in Wrapper(flagged, use_reps, embedding, **kwargs) 62 #collapse into a single object and run a PCA 63 adata = flagged[ind1].concatenate(flagged[ind2], join='inner') ---> 64 sc.tl.pca(adata) 65 #preserve space by deleting the intermediate object and just keeping its PCA 66 #and multimap index thing xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/scanpy/preprocessing/_pca.py in pca(data, n_comps, zero_center, svd_solver, random_state, return_info, use_highly_variable, dtype, copy, chunked, chunk_size) 186 n_components=n_comps, svd_solver=svd_solver, random_state=random_state 187 ) --> 188 X_pca = pca_.fit_transform(X) 189 elif issparse(X) and zero_center: 190 from sklearn.decomposition import PCA xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/sklearn/decomposition/_pca.py in fit_transform(self, X, y) 381 C-ordered array, use 'np.ascontiguousarray'. 382 """ --> 383 U, S, Vt = self._fit(X) 384 U = U[:, :self.n_components_] 385 xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/sklearn/decomposition/_pca.py in _fit(self, X) 403 404 X = self._validate_data(X, dtype=[np.float64, np.float32], --> 405 ensure_2d=True, copy=self.copy) 406 407 # Handle n_components==None xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params) 419 out = X 420 elif isinstance(y, str) and y == 'no_validation': --> 421 X = check_array(X, **check_params) 422 out = X 423 else: xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0: ---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0 xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 735 " a minimum of %d is required%s." 736 % (n_features, array.shape, ensure_min_features, --> 737 context)) 738 739 if copy and np.may_share_memory(array, array_orig): ValueError: Found array with 0 feature(s) (shape=(6332, 0)) while a minimum of 1 is required. **Versions in my conda env:** anndata==0.7.6 annoy==1.17.0 numba==0.51.2 numpy==1.19.2 scanpy==1.7.2 scikit-learn==0.24.2 scipy==1.5.3
ktpolanski commented 3 years ago

Thank you for your interest in the method and your comprehensive reporting of the encountered issue. Unfortunately I am unable to replicate it.

I'm not much of a conda guy, so here is how I attempted to recreate your specified setup:

#needed python <3.9 to have numba==0.51.2
conda create --name mmpgh python=3.8
source activate mmpgh
conda install anndata==0.7.6 numpy==1.19.2 numba==0.51.2 scanpy==1.7.2 scikit-learn==0.24.2 scipy==1.5.3
conda install -c conda-forge python-annoy==1.17.0
python -m ipykernel install --user --name mmpgh
#cloned repository
cd MultiMAP
pip install .

Once complete, I started a new Jupyter notebook and ran the first five commands of the tutorial. Everything ran, no errors occurred.


It seems the error you're seeing comes from the MultiMAP wrapper trying to compute the paired PCA between the two inputs, and encountering no genes to use. I was able to get the same error message when running this:

test = anndata.AnnData(np.zeros((500,0)))
sc.pp.pca(test)

However, the number in your error message (6332) does not make a lot of sense to me. The dimensions of the original objects are (4382, 13575) and (3166, 19410), while the shape of the intersection object created within the wrapper is (7548, 11055).

kathrinjansen commented 3 years ago

Hi @ktpolanski ,

thanks for checking, I've made a new conda env and can now run the tutorial dataset in the terminal (not in jupyter but that could be some HPC issue on my end). I had a mix-up in the data loading too and that could have caused issues too.

However, I still can't run my own CITEseq dataset (see error below). Could you give some guidance on how this could be run? Initially I thought my .var slot was different so I've removed all columns from it to match the tutorial dataset, but without success. I've then manually ran the code within the Multimap.Integration function and at the concatenation step (line 63 in init.py) it returns an anndata object with no features in it. This makes sense to me as the rna and adt adata (n=73) objects now have no matching feature names and so the concatenation with inner merge will not be able to merge the two anndata objects.

Here is my error message from running adata = MultiMAP.Integration([rna, adt], ['X_pca', 'X_adtpca'])

AnnData expects .var.index to contain strings, but got values like:
    []

    Inferred to be: empty
  value_idx = self._prep_dim_index(value.index, attr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/MultiMAP/__init__.py", line 187, in Integration
    mmp = Wrapper(flagged=flagged, use_reps=use_reps, embedding=embedding, **kwargs)
  File "xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/MultiMAP/__init__.py", line 64, in Wrapper
    sc.tl.pca(adata)
  File "xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/scanpy/preprocessing/_pca.py", line 188, in pca
    X_pca = pca_.fit_transform(X)
 [....]
ValueError: Found array with 0 feature(s) (shape=(46040, 0)) while a minimum of 1 is required. 

Any help would be appreciated! I can rename the issue to include the word CITEseq for future reference or put this in a new issue if you prefer?

ktpolanski commented 3 years ago

Glad to hear it's not some crazy ghost error.

In that case, the package is behaving as expected - the paired PCA is computed on the shared features between the two datasets. If you have no shared features, then there's no input for the PCA.

Can you tell me more about what you're trying to do?

kathrinjansen commented 3 years ago

I have data from a CITEseq experiment (all genes in GEX plus 73 ADTs). This would be a vertical integration - cells as anchors as referred to in this recent review. I assumed MultiMap was a method that can do vertical integration, but maybe it's meant to be used for horizontal integration and I have mis-interpreted it?

ktpolanski commented 3 years ago

I see. So you've got the same cells and two sets of features for them.

I don't believe MultiMAP can help here, but I'm tagging @mikasarkinjain in case I'm wrong.

kathrinjansen commented 3 years ago

Thanks for your help and for clarifying! Apologies for mixing up how MultiMap can be applied. I'll close the issue now but happy to hear Mika's comments of course.

bio-la commented 3 years ago

Hi, I'm sorry to comment on a closed issue, could you clarify why multimap would not work on multiple modalities from same cells? or is it only that multimap specifically works on ATAC+ GEX but not ADT+GEX? I'm a little confused, your (beautiful) preprint says differently! thanks!

ktpolanski commented 3 years ago

I have taken a look at the preprint and I think I understand where the confusion may be originating from. One of the analyses mentions working with the 10x Multiome kit. However, if you look at the corresponding figure, you'll see that there are dots on the plot labelled both RNA and ATAC. While these are technically the same cells as they came out of the multiome kit, this connection appears to be ignored and they are treated as separate datasets.

Was this that caused the confusion? Or did some other part of the preprint give you this impression? Since multiple people are being confused by this, it needs resolving.

bio-la commented 3 years ago

Thank you for the reply. It does need clarification. Upon re- reading the whole preprint in light of this comment i can appreciate that in every scenario the cells are actually coming from independent experiments, which has so far been always the case for epigenetic assays and RNA in SC. the Multiome kit is just a recent exception to this. However one of the reasons why the reader is also thrown off is the following sentence from the discussion:

Finally, given the rapid development of joint multimodal single cell genomics methods (e.g. CITEseq for protein and RNA, joint snRNA-and ATACseq), it is relevant to point out that MultiMAP can be applied to multi-omic data acquired both from different cells as well as from the same cells.

Is there a particular setting that would allow multimap to deal with this type of data? Thank you.

ktpolanski commented 3 years ago

Thanks for bringing this up.

@mikasarkinjain