BradhamLab / icat

Identifying Cell-states Across Treatments
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Perturbation collapses subpop identity #23

Closed dakota-hawkins closed 3 years ago

dakota-hawkins commented 3 years ago

Simulate does not produce populations that are easily separable in perturbation exclusive data:

controls

umapproblem_controls

perturbed

umapproblem_treatment

Reproduce

import icat 
import numpy
import ncfs

dispersions = np.random.randint(1, 3, 1000)
ctrls = icat.simulate.SingleCellDataset(dispersion=dispersions).simulate()
prtbs = icat.simulate.perturb(ctrls)

for each in [ctrls, prtbs]:
    each.obs['Population'] = each.obs['Population'].astype('category')
    sc.pp.neighbors(each,
                    n_neighbors=50,
                    metric=ncfs.distances.phi_s,
                    metric_kwds={'w': np.ones(each.shape[1])})
    sc.tl.umap(each)

sc.pl.umap(ctrls, color='Population')
sc.pl.umap(prtbs, color='Population')
dakota-hawkins commented 3 years ago

Collapse happened due to perturbation on average increasing expected expression averages for perturbed genes compared to control -- resulting in marker genes falling out during zero-inflation. Fixed by fixing marker gene dropout rates between simulations as well as changing gamma shifts for both perturbation shifts and marker gene shifts. Fixed in 63f4f2a