Open Mark-wt opened 3 months ago
Hi @Mark-wt,
Thank you for bringing this to our attention. I will test the compatibility with PyTorch 2.x (which wasn't released during the dev time of step) and release a new version if necessary. In the meantime, you can clone the repository, modify the pyproject.toml
file to match your PyTorch version, and then build and use the package with Poetry.
Clone the repository:
git clone https://github.com/SGGb0nd/step.git
cd step
Modify the pyproject.toml
file:
Open the pyproject.toml
file in a text editor and update the PyTorch dependency to your required version (e.g., 2.3.x
).
Build and install the package using Poetry:
poetry install
For more detailed instructions, you can refer to the Poetry documentation.
Please let me know if you encounter any further issues.
Thanks so mach. It works.
Hello, I have many high-resolution ST slices (visium hd 16um bin) for integration, but it cannot run because of out of cuda memory (80G), so how does it run with multiple GPUs? or what parameters should I adjust without affecting the performance?
Hi @Mark-wt, you can use set the sample_rate
of the .run
method to an appropriate integer, e.g., 2048, which represents the number of sampled nodes in each slice in each iteration.
By the way, I've identified and fixed a bug in the spatial graph construction where the graph only had self-loops. This issue was hindering spatial domain-level analysis, which focuses more on localized and niche areas rather than individual cells. I'll push the fix soon. If your targets prioritize spatial domains, microenvironments, or spatial niches, please keep an eye out for this update.
Hi @Mark-wt, you can use set the of the method to an appropriate integer, e.g., 2048, which represents the number of sampled nodes in each slice in each iteration.
sample_rate``.run
Thanks for timely reply. Is it true that the larger the n_samples, the larger the amount of data? what about other parameters, like graph_batch_size=1, n_modules=8, edge_clip=1, n_glayers=4,hidden_dim=30,module_dim=20?
By the way, I've identified and fixed a bug in the spatial graph construction where the graph only had self-loops. This issue was hindering spatial domain-level analysis, which focuses more on localized and niche areas rather than individual cells. I'll push the fix soon. If your targets prioritize spatial domains, microenvironments, or spatial niches, please keep an eye out for this update.
Yes, I focus on spatail domain detection, hoping your update.
Hi @Mark-wt, you can use set the of the method to an appropriate integer, e.g., 2048, which represents the number of sampled nodes in each slice in each iteration.
sample_rate
.run ``Thanks for timely reply. Is it true that the larger the n_samples, the larger the amount of data? what about other parameters, like graph_batch_size=1, n_modules=8, edge_clip=1, n_glayers=4,hidden_dim=30,module_dim=20?
Sorry for the confusing naming of the parameters, n_samples
only works for single-slice scenario when there's no batch
specified at the initialization of step's interface/object/instance; and when it comes to multi-slice, the number of slices used for training in each iteration is given by graph_batch_size
. So, to answer your question, yes, larger n_samples
will result in a larger amount of data, i.e., cells or spots, involved in each iteration of the model training. Analogously, when you're trying to integrating multiple slices, the number of cells/spots used for training is given by
graph_batch_size
$\times$ sample_rate
(when sample_rate
is an integer larger than 1).
orgraph_batch_size
(if 2, then 2 slices would be sampled) sampled slices with the given sample_rate
(when sample_rate
is within 0 to 1)For the other params, they indeed affect the gpu memory usage since they directly contribute to the amount of model parameters. However, I recommend tuning the two above parameters, graph_batch_size
and sample_rate
, as well as n_iterations
, because I've successfully run step
on 26 newly released merfish sagittal slices (millions of cells in total, graph_batch_size=5, sample_rate=3000, n_iterations=4000
) and gotten descent spatial domains on a single v100 gpu.
Hi @Mark-wt, you can use set the of the method to an appropriate integer, e.g., 2048, which represents the number of sampled nodes in each slice in each iteration..run
`` sample_rate
Thanks for timely reply. Is it true that the larger the n_samples, the larger the amount of data? what about other parameters, like graph_batch_size=1, n_modules=8, edge_clip=1, n_glayers=4,hidden_dim=30,module_dim=20?
Sorry for the confusing naming of the parameters, only works for single-slice scenario when there's no specified at the initialization of step's interface/object/instance; and when it comes to multi-slice, the number of slices used for training in each iteration is given by . So, to answer your question, yes, larger will result in a larger amount of data, i.e., cells or spots, involved in each iteration of the model training. Analogously, when you're trying to integrating multiple slices, the number of cells/spots used for training is given by
n_samples``batch``graph_batch_size``n_samples
graph_batch_size
×sample_rate
(when is an integer larger than 1). orsample_rate
- the summation of sampled cells/spots from (if 2, then 2 slices would be sampled) sampled slices with the given (when is within 0 to 1)
graph_batch_size``sample_rate``sample_rate
For the other params, they indeed affect the gpu memory usage since they directly contribute to the amount of model parameters. However, I recommend tuning the two above parameters, and , as well as , because I've successfully run on 26 newly released merfish sagittal slices (millions of cells in total, ) and gotten descent spatial domains on a single v100 gpu.
graph_batch_size``sample_rate``n_iterations``step``graph_batch_size=5, sample_rate=3000, n_iterations=4000
Thanks very much. It can run now after setting these two parameters following your advice.
good to hear. You can also leave n_modules
, hidden_dim
and module_dim
just as default for a larger model capacity.
good to hear. You can also leave
n_modules
,hidden_dim
andmodule_dim
just as default for a larger model capacity.
yeah, it also works
@Mark-wt Hi Mark, I've pushed the fix for the spatial graph construction bug I mentioned earlier (where graphs only had self-loops). The update should now properly handle spatial domain-level analysis for large scale datasets.
@Mark-wt Hi Mark, I've pushed the fix for the spatial graph construction bug I mentioned earlier (where graphs only had self-loops). The update should now properly handle spatial domain-level analysis for large scale datasets.
Thanks very much for announcing.
Hello, i want to install this tool with pytorch 2.3.1, but fails because of the dependency coonflict. So how i install it with pytorch 2.3.x?