Closed liouhy closed 4 months ago
You could run it with a subset of cell barcodes. e.g. you run it 4 times for a different subset. Once you have your clusters for each of the 4 runs, you could subsample the biggest clusters in each and use all cell barcodes from all small clusters and the subsampled cell barcodes from the biggest clusters and run the impute_accessibility
step again with those cell barcodes so the matrix will fit in memory (and not losing any resolution).
Did you mean I could subset the cistopic object based on the cell barcodes and run topic modeling separately? If I separate them, at which step should I merge all the sub-dataset?
I did try to run impute_accessibility
in each cell type and merge the resulting imputed_acc_object
as following. But it still gave me the MemoryError when merging them. Is this what you meant?
imputed_acc_obj = None
for sample in cistopic_obj.cell_data['cell_type'].unique():
subset_cell = (cistopic_obj.cell_data['cell_type'] == sample)
cell_list = list(cistopic_obj.cell_data.index[subset_cell])
temp = impute_accessibility(cistopic_obj, selected_cells=cell_list, selected_regions=None, scale_factor=10**6)
if imputed_acc_obj is None:
imputed_acc_obj = temp
else:
imputed_acc_obj = imputed_acc_obj.merge([temp], copy=True)
Did you mean I could subset the cistopic object based on the cell barcodes and run topic modeling separately? If I separate them, at which step should I merge all the sub-dataset?
I did try to run
impute_accessibility
in each cell type and merge the resultingimputed_acc_object
as following. But it still gave me the MemoryError when merging them. Is this what you meant?imputed_acc_obj = None for sample in cistopic_obj.cell_data['cell_type'].unique(): subset_cell = (cistopic_obj.cell_data['cell_type'] == sample) cell_list = list(cistopic_obj.cell_data.index[subset_cell]) temp = impute_accessibility(cistopic_obj, selected_cells=cell_list, selected_regions=None, scale_factor=10**6) if imputed_acc_obj is None: imputed_acc_obj = temp else: imputed_acc_obj = imputed_acc_obj.merge([temp], copy=True)
Run the SCENIC+ workflow till you have cell types / clusters for each of your e.g. 4 runs. Then for your biggest clusters you only sample a subset of your cell barcodes (but keep all the cell barcodes for the small clusters). Then you start from the beginning again with all cell barcodes for your small clusters and subset of cell barcodes for the big clusters, so you don't run out of memory.
I see. Thanks for your advice!
Dear all,
Thanks for the great work!
I am trying to run SCENIC+ with the other public dataset, but when I ran impute_accessibility on my dataset:
I got this error:
I have read this issue https://github.com/aertslab/scenicplus/issues/241, but unfortunately reserving more memory is not possible at my site.
My question is: Is it feasible to separate the cistopic object into smaller objects with fewer cells and combine them after running
impute_accessibility()
? I have already set thechunk_size
to 1000 on the feature, just wondering whether it is also possible to split the cells.my versions: scenicplus=1.0.1.dev4+ge4bdd9f numpy=1.22.3 python=3.8.16
Best, liouhy