Closed kathrinjansen closed 3 years ago
Thank you for your interest in the method and your comprehensive reporting of the encountered issue. Unfortunately I am unable to replicate it.
I'm not much of a conda guy, so here is how I attempted to recreate your specified setup:
#needed python <3.9 to have numba==0.51.2
conda create --name mmpgh python=3.8
source activate mmpgh
conda install anndata==0.7.6 numpy==1.19.2 numba==0.51.2 scanpy==1.7.2 scikit-learn==0.24.2 scipy==1.5.3
conda install -c conda-forge python-annoy==1.17.0
python -m ipykernel install --user --name mmpgh
#cloned repository
cd MultiMAP
pip install .
Once complete, I started a new Jupyter notebook and ran the first five commands of the tutorial. Everything ran, no errors occurred.
It seems the error you're seeing comes from the MultiMAP wrapper trying to compute the paired PCA between the two inputs, and encountering no genes to use. I was able to get the same error message when running this:
test = anndata.AnnData(np.zeros((500,0)))
sc.pp.pca(test)
However, the number in your error message (6332) does not make a lot of sense to me. The dimensions of the original objects are (4382, 13575)
and (3166, 19410)
, while the shape of the intersection object created within the wrapper is (7548, 11055)
.
Hi @ktpolanski ,
thanks for checking, I've made a new conda env and can now run the tutorial dataset in the terminal (not in jupyter but that could be some HPC issue on my end). I had a mix-up in the data loading too and that could have caused issues too.
However, I still can't run my own CITEseq dataset (see error below). Could you give some guidance on how this could be run? Initially I thought my .var slot was different so I've removed all columns from it to match the tutorial dataset, but without success. I've then manually ran the code within the Multimap.Integration function and at the concatenation step (line 63 in init.py) it returns an anndata object with no features in it. This makes sense to me as the rna and adt adata (n=73) objects now have no matching feature names and so the concatenation with inner merge will not be able to merge the two anndata objects.
Here is my error message from running
adata = MultiMAP.Integration([rna, adt], ['X_pca', 'X_adtpca'])
AnnData expects .var.index to contain strings, but got values like:
[]
Inferred to be: empty
value_idx = self._prep_dim_index(value.index, attr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/MultiMAP/__init__.py", line 187, in Integration
mmp = Wrapper(flagged=flagged, use_reps=use_reps, embedding=embedding, **kwargs)
File "xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/MultiMAP/__init__.py", line 64, in Wrapper
sc.tl.pca(adata)
File "xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/scanpy/preprocessing/_pca.py", line 188, in pca
X_pca = pca_.fit_transform(X)
[....]
ValueError: Found array with 0 feature(s) (shape=(46040, 0)) while a minimum of 1 is required.
Any help would be appreciated! I can rename the issue to include the word CITEseq for future reference or put this in a new issue if you prefer?
Glad to hear it's not some crazy ghost error.
In that case, the package is behaving as expected - the paired PCA is computed on the shared features between the two datasets. If you have no shared features, then there's no input for the PCA.
Can you tell me more about what you're trying to do?
I have data from a CITEseq experiment (all genes in GEX plus 73 ADTs). This would be a vertical integration - cells as anchors as referred to in this recent review. I assumed MultiMap was a method that can do vertical integration, but maybe it's meant to be used for horizontal integration and I have mis-interpreted it?
I see. So you've got the same cells and two sets of features for them.
I don't believe MultiMAP can help here, but I'm tagging @mikasarkinjain in case I'm wrong.
Thanks for your help and for clarifying! Apologies for mixing up how MultiMap can be applied. I'll close the issue now but happy to hear Mika's comments of course.
Hi, I'm sorry to comment on a closed issue, could you clarify why multimap would not work on multiple modalities from same cells? or is it only that multimap specifically works on ATAC+ GEX but not ADT+GEX? I'm a little confused, your (beautiful) preprint says differently! thanks!
I have taken a look at the preprint and I think I understand where the confusion may be originating from. One of the analyses mentions working with the 10x Multiome kit. However, if you look at the corresponding figure, you'll see that there are dots on the plot labelled both RNA and ATAC. While these are technically the same cells as they came out of the multiome kit, this connection appears to be ignored and they are treated as separate datasets.
Was this that caused the confusion? Or did some other part of the preprint give you this impression? Since multiple people are being confused by this, it needs resolving.
Thank you for the reply. It does need clarification. Upon re- reading the whole preprint in light of this comment i can appreciate that in every scenario the cells are actually coming from independent experiments, which has so far been always the case for epigenetic assays and RNA in SC. the Multiome kit is just a recent exception to this. However one of the reasons why the reader is also thrown off is the following sentence from the discussion:
Finally, given the rapid development of joint multimodal single cell genomics methods (e.g. CITEseq for protein and RNA, joint snRNA-and ATACseq), it is relevant to point out that MultiMAP can be applied to multi-omic data acquired both from different cells as well as from the same cells.
Is there a particular setting that would allow multimap to deal with this type of data? Thank you.
Thanks for bringing this up.
@mikasarkinjain
Dear developers,
MultiMap looks great in the pre-print! I’m trying to apply it to a CITEseq dataset and got an error. I’ve pulled the current code again (today at 11.30 am UK time) and now tried to run it with your example data following the tutorial but I still get the error (see below). In addition to the error I’ve added the pip freeze for the packages defined in your setup.py and the anndata/scanpy versions.
Could you help me to figure out what is going wrong?
Thanks a lot for your help. Best wishes, Kathrin
Command: adata = MultiMAP.Integration([rna, atac_genes], ['X_pca', 'X_lsi'])
Error on example dataset from tutorial
xxx/devel/conda_envs/sc_analysis/lib/python3.6/site-packages/anndata/_core/anndata.py:794: UserWarning: AnnData expects .var.index to contain strings, but got values like: []
value_idx = self._prep_dim_index(value.index, attr)
ValueError Traceback (most recent call last)