cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
52 stars 7 forks source link

AssertionError: Graphs with multiple connected components may not be used. Subset cells to include only one connected component. #9

Closed shaistamadad closed 2 years ago

shaistamadad commented 2 years ago

Hi, I am trying to building a lineage trajectory on a female ovary dataset, and I get the following error. start cell is the barcode for the cell belonging to the progenitor germ cell type (earliest in pseudotime).

sc.tl.diffmap(rna_ds) # denoise the KNN graph by calculating a diffusion map
sc.pp.neighbors(rna_ds, use_rep='X_diffmap', key_added='X_diffmap', n_neighbors = 5, n_pcs = 5) # calculate another KNN graph, this time in diffusion space
mira.time.normalize_diffmap(rna_ds)
mira.time.get_connected_components(rna_ds) # calculate the subgraphs within the data. Lineage inference may only be used on connected groups of cells.
mira.time.get_transport_map(rna_sub, start_cell = 'FCA_GND10287603_AACAGGATCATCCTCA') 

AssertionError: Graphs with multiple connected components may not be used. Subset cells to include only one connected component.

Many thanks for your help.

AllenWLynch commented 2 years ago

Hi,

This simply means that MIRA found subgraphs within your nearest-neighbors graph representation that are disconnected. So perhaps disjoint clusters of cells with no edges between them. The MIRA pseudotime API only works on connected groups of cells, so you can plot:

sc.pl.umap(data, color = 'mira_connected_components')

To see which subgraph marks your cells of interest. Then, you need to subset your adata to contain purely those cells:

data = data[:, data.obs.mira_connected_components == '1'] # for example, if group 1 was your cells of interest

You may also manually edit the NN graph to connect clusters. I haven't tried that before, though, and it would sort of go against what the NN graph is saying since there are no edges between the groups.

Let me know if this helps AL