Open Rafael-Silva-Oliveira opened 3 weeks ago
Hi @Rafael-Silva-Oliveira
Previous users have tried starfysh on slide-seq dataset for deconvolution but we haven't tried on Visium HD (I'm assuming it's the latest 10X visium with 2 \mu m resolution?). I'm wondering at which stage does the memory error occurs (e.g. data loading, preprocessing, model training, etc.). Thanks in advance.
Hi @Rafael-Silva-Oliveira
Previous users have tried starfysh on slide-seq dataset for deconvolution but we haven't tried on Visium HD (I'm assuming it's the latest 10X visium with 2 \mu m resolution?). I'm wondering at which stage does the memory error occurs (e.g. data loading, preprocessing, model training, etc.). Thanks in advance.
Hei, yes 2 micron resolution, but using either 8 (> 580k bins/spots) or 16 micron (>150k bins/spots) still gives the same error for the sample I'm using to test (the visium HD lung dataset from 10X Visium website); The error happens when defining the VisiumArguments
visium_args = utils.VisiumArguments(
adata,
adata_normed,
gene_sig,
img_metadata,
n_anchors=60,
window_size=3,
sample_id=sample_id,
)
Thanks for the update! I'll test the visium HD dataset from 10X to diagnose at which stage of Starfysh was RAM intensive. In the meanwhile, does running regular scanpy preprocessing (e.g. library-size normalizing, UMAP visualization) work properly on your machine?
Thanks for the update! I'll test the visium HD dataset from 10X to diagnose at which stage of Starfysh was RAM intensive. In the meanwhile, does running regular scanpy preprocessing (e.g. library-size normalizing, UMAP visualization) work properly on your machine?
Hey, yes, the regular scnapy works fine (both on my machine and server). But the terminal is simply killed after a few minutes once visium_args starts running (both machine and server). No informative error message, the terminal simply says "Killed" and any debugging session or script running stops
Thanks for the update! I'll test the visium HD dataset from 10X to diagnose at which stage of Starfysh was RAM intensive. In the meanwhile, does running regular scanpy preprocessing (e.g. library-size normalizing, UMAP visualization) work properly on your machine?
I created a subsample with 15k bins and it still crashes, so it would have to be optimized a fair bit to use the 8 micron resolution of Visium HD :)
[2024-06-17 14:29:49] Subsetting highly variable & signature genes ...
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "<path_to_project>\lib\site-packages\joblib\_utils.py", line 72, in __call__ return self.func(**kwargs)
File "<path_to_project>\lib\site-packages\joblib\parallel.py", line 598, in __call__
return [func(*args, **kwargs)
File "<path_to_project>\lib\site-packages\joblib\parallel.py", line 598, in <listcomp>
return [func(*args, **kwargs)
MemoryError: Allocation failed (probably too large).
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<path_to_project>\lib\site-packages\<project_name>\utils.py", line 96, in __init__
sc.pp.neighbors(self.adata_norm, n_neighbors=15, n_pcs=40, use_rep='X')
File "<path_to_project>\lib\site-packages\scanpy\neighbors\__init__.py", line 176, in neighbors
neighbors.compute_neighbors(
File "<path_to_project>\lib\site-packages\scanpy\neighbors\__init__.py", line 561, in compute_neighbors
self._distances = transformer.fit_transform(X)
File "<path_to_project>\lib\site-packages\sklearn\utils\_set_output.py", line 313, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "<path_to_project>\lib\site-packages\pynndescent\pynndescent_.py", line 2252, in fit_transform
self.fit(X, compress_index=False)
File "<path_to_project>\lib\site-packages\pynndescent\pynndescent_.py", line 2170, in fit
self.index_ = NNDescent(
File "<path_to_project>\lib\site-packages\pynndescent\pynndescent_.py", line 829, in __init__
leaf_array = rptree_leaf_array(self._rp_forest)
File "<path_to_project>\lib\site-packages\pynndescent\rp_trees.py", line 1436, in rptree_leaf_array
return np.vstack(rptree_leaf_array_parallel(rp_forest))
File "<path_to_project>\lib\site-packages\pynndescent\rp_trees.py", line 1428, in rptree_leaf_array_parallel
result = joblib.Parallel(n_jobs=-1, require="sharedmem")(
File "<path_to_project>\lib\site-packages\joblib\parallel.py", line 2007, in __call__
return output if self.return_generator else list(output)
File "<path_to_project>\lib\site-packages\joblib\parallel.py", line 1650, in _get_outputs
yield from self._retrieve()
File "<path_to_project>\lib\site-packages\joblib\parallel.py", line 1754, in _retrieve
self._raise_error_fast()
File "<path_to_project>\lib\site-packages\joblib\parallel.py", line 1789, in _raise_error_fast
error_job.get_result(self.timeout)
File "<path_to_project>\lib\site-packages\joblib\parallel.py", line 745, in get_result
return self._return_or_raise()
File "<path_to_project>\lib\site-packages\joblib\parallel.py", line 763, in _return_or_raise
raise self._result
MemoryError: Allocation failed (probably too large).
Hi @Rafael-Silva-Oliveira,
Sorry for the late reply! I reproduced the 16-micron-binned visium HD processing with the Starfysh function you attached utils.VisiumArguments
. It runs without memory issues on a machine with 64G RAM. The induced MemoryError was likely from the computation of neighborhood graph & UMAPs (sc.pp.neighbors
; sc.tl.umap
). I'll update the processing function this week to ensure the 1). removal of unnecessary computations and 2). processing hyperparameters specified by users.
In addition, the way spatial information stored in VisiumHD is different from regular Visium, so we'll try to address that as well. My quickest suggestion would be running Starfysh (or other processing / analysis package) on a server with larger memory.
Here's the reproducing logs, which shows the "standard" precessing indeed took a long time:
[2024-06-25 17:51:23] Subsetting highly variable & signature genes ...
[2024-06-25 17:58:19] Smoothing library size by taking averaging with neighbor spots...
[2024-06-25 17:59:45] Retrieving & normalizing signature gene expressions...
[2024-06-25 17:59:57] Identifying anchor spots (highly expression of specific cell-type signatures)...
Hey I was wondering if starfysh works with Visium HD data (over 200-300k spots/bins)? I'm getting memory errors in my local machine and on a server as well with a large amount of memory, so I was wondering if it's optimized to work with visium HD data
Thank you