Closed FiaSB1 closed 1 year ago
@lisa
User NotificationHello! You're receiving this notice because you've tagged me, @lisa, on GitHub, likely in error. Lisa is a common first name, and happens to be mine. I use the @lisa username here on GitHub. While being tagged in various repositories is an interesting way for me to discover new projects, it's likely not your intention. Did you mean to tag someone else? Did you mean to use @lisa
instead, in a word as word usage?
I'm unsubscribing from this issue, pull request or project so if you meant to tag someone else to notify them, I would recommend that. If you meant to use my user ID as a plain string, without having GitHub notify me, you can do so by wrapping with backticks (the ` character), like so: @lisa
.
Hi @FiaSB1 ,
Thanks for your message!
The error message ValueError: Value passed for key 'PCs' is of incorrect shape. Values of varm must match dimensions (1,) of parent. Value had shape (10305, 50) while it should have had (2000,).
, which seems to occur in a concatenation step: ---> 45 adata_sub = anndata.concat([adata_sub, adata_padding], axis=1, join='outer', index_unique=None, merge='unique')
seems to be related to the shape of the PC matrix in your adata.obsm.
Could you try removing your PC matrix (and all obsm matrices) by running:
del query_adata_full.obsm
after loading the data in the line query_data_full = sc.read_h5ad(test_dataset)
Just to see if that solves the problem. If so, I'll update some code to prevent this error in the future.
GitHub tip: if you print python code, start it with a line of triple quotation marks + 'python': ```python and end it with another line of triple quotation marks: ``` It will then be rendered in a cleaner way.
let me know if the removal of your obsm solves the error!
Hi @LisaSikkema,
Thank you so much for your response!
Unfortunately, I still encountered an error. Below is the error
Gene names detected: ensembl gene symbols.
1861 genes detected out of 2000 used for mapping.
Not all genes were recovered, filling in zeros for 139 missing genes...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_9/2669176631.py in <module>
----> 1 query_data = subset_and_pad_adata_object(query_data_full, reference_gene_order)
/tmp/ipykernel_9/2768654640.py in subset_and_pad_adata_object(adata, ref_genes_df, min_n_genes_included)
43 adata_padding = sc.AnnData(df_padding)
44 # Concatenate object
---> 45 adata_sub = anndata.concat([adata_sub, adata_padding], axis=1, join='outer', index_unique=None, merge='unique')
46 # and order:
47 adata_sub = adata_sub[:,ref_genes_df[gene_type]].copy()
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
905 f"{alt_dim}p": alt_pairwise,
906 "uns": uns,
--> 907 "raw": raw,
908 }
909 )
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/anndata/_core/anndata.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
319 varp=varp,
320 filename=filename,
--> 321 filemode=filemode,
322 )
323
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/anndata/_core/anndata.py in _init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
508 # TODO: Think about consequences of making obsm a group in hdf
509 self._obsm = AxisArrays(self, 0, vals=convert_to_dict(obsm))
--> 510 self._varm = AxisArrays(self, 1, vals=convert_to_dict(varm))
511
512 self._obsp = PairwiseArrays(self, 0, vals=convert_to_dict(obsp))
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in __init__(self, parent, axis, vals)
230 self._data = dict()
231 if vals is not None:
--> 232 self.update(vals)
233
234
/opt/conda/envs/scarches_0_3_5/lib/python3.7/_collections_abc.py in update(*args, **kwds)
839 if isinstance(other, Mapping):
840 for key in other:
--> 841 self[key] = other[key]
842 elif hasattr(other, "keys"):
843 for key in other.keys():
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in __setitem__(self, key, value)
149
150 def __setitem__(self, key: str, value: V):
--> 151 value = self._validate_value(value, key)
152 self._data[key] = value
153
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in _validate_value(self, val, key)
213 f"value.index does not match parent’s axis {self.axes[0]} names"
214 )
--> 215 return super()._validate_value(val, key)
216
217
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in _validate_value(self, val, key)
51 right_shape = tuple(self.parent.shape[a] for a in self.axes)
52 raise ValueError(
---> 53 f"Value passed for key {key!r} is of incorrect shape. "
54 f"Values of {self.attrname} must match dimensions "
55 f"{self.axes} of parent. Value had shape {val.shape} while "
ValueError: Value passed for key 'PCs' is of incorrect shape. Values of varm must match dimensions (1,) of parent. Value had shape (10305, 50) while it should have had (2000,).
Below is the output of the anndata to give you a better idea of the data:
AnnData object with n_obs × n_vars = 8444 × 45947 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'CellBarcode_Identity', 'nUMI', 'nGene', 'CellType_Category', 'Manuscript_Identity', 'Subclass_Cell_Identity', 'Disease_Identity', 'Subject_Identity', 'Library_Identity', 'percent.mt', 'RNA_snn_res.0.3', 'seurat_clusters' var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable' varm: 'PCs'
Could it be that I missed something?
Thank you!
Hi @FiaSB1,
You still have information from your PCA run in adata.varm
(that stores the loadings of features to PCs. So if you delete the varm information by del adata.varm
things should work.
Yes true forgot about the varm, thanks Malte, @FiaSB1 let us know if removing that solves the problem!
Hi @LuckyMD and @LisaSikkema,
Thank you so much for your responses! Unfortunately, I still received an error after removing the varm. Below is the error that I received. Could it be the format of the data that's causing issues?
query_data.raw = query_data
raw = query_data.raw.to_adata()
raw.X = query_data.X
query_data.raw = raw
quick check if X and raw.X have integers (do a more systematic check if you have any doubts!):
query_data.raw.X[:10, :10].toarray()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_10/2602429211.py in <module>
----> 1 query_data.raw.X[:10, :10].toarray()
AttributeError: 'numpy.ndarray' object has no attribute 'toarray'
query_data.X[:10, :10].toarray()
Set query_data.obs["scanvi_label"] to "unlabeled". Keep this code as is, it has to do with the way the reference model was changed.
query_data.obs["scanvi_label"] = "unlabeled"
Thank you so much!
Okay the first error is resolved at least! This one is related to your count matrix not being sparse, you can make it sparse by running: from scipy import sparse query_data.X = sparse.csr_matrix(query_data.X)
Just to keep note of assumptions that we can add the the description in the notebook in future:
Hi @LisaSikkema and @LuckyMD,
Thank you so much for the suggestion! That perfectly solves that error! Unfortunately, the analysis still returned an error. My apologies I have no idea what it means.
KeyError Traceback (most recent call last)
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 try:
-> 3081 return self._engine.get_loc(casted_key)
3082 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'dataset'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/tmp/ipykernel_10/772632715.py in <module>
1 batch_variable = "dataset" # the column name under which you stored your batch variable
----> 2 query_batches = sorted(query_data.obs[batch_variable].unique())
3 print(query_batches)
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
/opt/conda/envs/scarches_0_3_5/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3081 return self._engine.get_loc(casted_key)
3082 except KeyError as err:
-> 3083 raise KeyError(key) from err
3084
3085 if tolerance is not None:
KeyError: 'dataset'
Can I get a hand with this too please? Is it a naming issue? As in the column is supposed to be 'dataset'?
Thank you so much for all your help. I really appreciate it.
Hi @FiaSB1,
If you only have one dataset just add:
adata.obs['dataset'] = "My dataest"
To your anndata object. That will address this error. If your query object has multiple datasets/batches, then please add a column that specifies which cells belong to which dataset or batch here.
@LuckyMD I will just adapt the scripts so that these things are automatically done. @FiaSB1 that is part of the prerequisites of your data to run this code, improved/clarified the description on another page but forgot to do it on GitHub. It's now fixed, you can find it here in the readme file. You'll need to specificy the batches in the data you're working with under adata.obs["dataset"].
Thanks for going through all this, I'll make sure that the next person won't encounter these problems!
@LisaSikkema I should have an updated function to make this work as well. I just need to make sure that the current HLCA model can be loaded as well.
a function that removes obsm and varm and makes the adata.X sparse? That should be only three lines of code right?
I guess I have a bit more than that, including specifying where the counts data is stored, setting cell type key, batch key, and unlabeled category... but I am using updated scvi-tools (which has prep functions that take care of some of this stuff like padding 0s etc)... so not sure if this is what we want actually.
Yeah then probably not ideal for this specific setting
Hi @LuckyMD @LisaSikkema,
Thank you so much for your response! It seems like the website is currently down, but I will get back to you with my results as soon as possible. Crossing my fingers this time the mapping runs smoothly!
Thank you!
Hi @LuckyMD @LisaSikkema,
Hope you are well! Sorry I took awhile to get back to you.
The previous error was thankfully resolved! Although, I then realised that my data was not rawcounts so I had to make several changes.
After putting just the rawcounts of one sample, the analysis succeeded!! Thank you so much for all your help!
I then tried to map the entire data which is 3GB in size. Unfortunately, I received this error:
INFO:jupyterfg.execute:Stripping output from notebook analysis/LCA_scArches_mapping_new_data_to_hlca.ipynb.
INFO:jupyterfg.execute:Executing notebook analysis/LCA_scArches_mapping_new_data_to_hlca.ipynb.
ERROR:traitlets:Kernel died while waiting for execute reply.
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nbclient/client.py", line 627, in _async_poll_for_reply
msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))
File "/opt/conda/lib/python3.9/site-packages/nbclient/util.py", line 89, in ensure_async
result = await obj
File "/opt/conda/lib/python3.9/site-packages/jupyter_client/channels.py", line 224, in get_msg
ready = await self.socket.poll(timeout)
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nbclient/client.py", line 846, in async_execute_cell
exec_reply = await self.task_poll_for_reply
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.9/site-packages/jupyterfg/__main__.py", line 43, in <module>
main(args.notebook, args.cell_timeout)
File "/opt/conda/lib/python3.9/site-packages/jupyterfg/__main__.py", line 38, in main
execute_and_save(nb_file, cell_timeout=cell_timeout)
File "/opt/conda/lib/python3.9/site-packages/jupyterfg/execute.py", line 37, in execute_and_save
ep.preprocess(nb, res)
File "/opt/conda/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py", line 84, in preprocess
self.preprocess_cell(cell, resources, index)
File "/opt/conda/lib/python3.9/site-packages/nbconvert/preprocessors/execute.py", line 105, in preprocess_cell
cell = self.execute_cell(cell, index, store_history=True)
File "/opt/conda/lib/python3.9/site-packages/nbclient/util.py", line 78, in wrapped
return just_run(coro(*args, **kwargs))
File "/opt/conda/lib/python3.9/site-packages/nbclient/util.py", line 57, in just_run
return loop.run_until_complete(coro)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.9/site-packages/nbclient/client.py", line 850, in async_execute_cell
raise DeadKernelError("Kernel died")
nbclient.exceptions.DeadKernelError: Kernel died
Do you reckon this is because the file is still too big? The adata.X is a sparse matrix with float64 dtype. Would converting it to float32 be a potential solution?
Thank you so much once again!
Kernel errors are really difficult to address. It could be a memory limitation of your system. I'm actually not sure if you can reduce to float32, haven't tried that yet.
Are you planning to correct every sample separately? Or do you only expect a dataset batch? In the former case, you could map every sample separately as well... it's the same if you combine all the samples and then use the sample as a batch (encoded in adata.obs['dataset']
).
Yeah also not sure where this comes from, in which part of the notebook did it happen?
I also noticed you're using python 3.9, which is not the recommended python version to use with this version of scArches. You can try installing the conda environment that I prepared in the GitHub repo, although I am not sure if that's in any way related to the error you're observing here, probably not...
Also, are you running this with GPU? That's also recommended, on CPU it will be much slower, and it might also result in memory issues faster (although again not sure).
Hi @LuckyMD,
I was planning to try both. But seeing as how mapping one sample was successful and it would be the same if all the samples are combined, then I will probably just the results from the one sample.
@LisaSikkema thank you for the suggestions! I will try to use a different version of python to see if that works. I have been using CPU, I'll try to get access to GPU and run it with that.
Thank you so much for all the help! In any case, I'm grateful that we got to resolve the errors and successfully analyse the data!
Better late than never... the above-mentioned issues related to:
Hi there,
Pardon me if this post is a bit all over the place, it's my first time posting on Github but hopefully I articulated my issues well enough.
To start off, I am interested in mapping a COPD single cell dataset onto the HLCA. I have been facing repeated errors despite changing the object (which was originally in an R Seurat object) to a .h5ad format and subsetting the data to reduce the file size. Am I able to get a hand with understanding what the issue might be and how I can fix it? Here's the analysis results.
Afterwards, I realise I can just prepare an anndata from scratch with the rawcounts instead of preparing a seurat object first then converting it to anndata. However, I still received errors in the analysis. Below is the analysis results.
Please let me know if there's anything else I can add. Thank you so much.