Closed adkinsrs closed 3 months ago
Why does saving dataset for highly variable genes step take so long?
if save_dataset:
# Regress out effects of total counts per cell and the percentage of mitochondrial genes expressed.
if regress_out == 'true':
sc.pp.regress_out(adata, ['n_counts', 'percent_mito'])
if scale_unit_variance == 'true':
sc.pp.scale(adata, max_value=10)
adata.write(dest_datafile_path)
Both the scanpy "regress out" and "scaling" functions occur only when saving a dataset vs not saving. Since there have never been options to enable/disable these two things in the single-cell workbench, these were never run in the v1 version of gEAR. I will disable them for now to speed up the saving step, but I think @DanLesperance and @JPReceveur should also have a say. Info can be found in the last paragraphs of the Preprocessing step of https://scanpy.readthedocs.io/en/stable/tutorials/basics/clustering-2017.html
mdi-forbid
tomdi-lock
to show irreversable steps