IGS / gEAR

The gEAR Portal was created as a data archive and viewer for gene expression data including microarrays, bulk RNA-Seq, single-cell RNA-Seq and more.
https://umgear.org
GNU Affero General Public License v3.0
13 stars 4 forks source link

gEAR Workshop - Single-cell workbench - Misc fixes #727

Closed adkinsrs closed 3 months ago

adkinsrs commented 3 months ago
adkinsrs commented 3 months ago

Why does saving dataset for highly variable genes step take so long?

    if save_dataset:
        # Regress out effects of total counts per cell and the percentage of mitochondrial genes expressed.
        if regress_out == 'true':
            sc.pp.regress_out(adata, ['n_counts', 'percent_mito'])

        if scale_unit_variance == 'true':
            sc.pp.scale(adata, max_value=10)

        adata.write(dest_datafile_path)

Both the scanpy "regress out" and "scaling" functions occur only when saving a dataset vs not saving. Since there have never been options to enable/disable these two things in the single-cell workbench, these were never run in the v1 version of gEAR. I will disable them for now to speed up the saving step, but I think @DanLesperance and @JPReceveur should also have a say. Info can be found in the last paragraphs of the Preprocessing step of https://scanpy.readthedocs.io/en/stable/tutorials/basics/clustering-2017.html