self-connections - Githubissues

jasonserviss commented 7 years ago

At the moment the only situations where we report a self-connection is when one multiplet is totally comprised of the same cell type...

For example, if the spSwarm results are the following: A1 B1 C1 D1 mA1B1 0.5 0.5 0 0 mA1A1 1 0 0 0

We report a self-connection for multiplet mA1A1 but not mA1B1. Should this be the case? What alternatives are there?

jasonserviss commented 7 years ago

In the case that a multiplet contains several examples of 2 (or more) types of cells, the number of connections is most likely underestimated which causes the true number of connections to appear to be less than what they actually are. This reduces the power of the method. Despite this, it would be expected that the "error" would be equally distributed over all cell types and, thus, not drastically affecting the findings if hypothesis testing is used in a accurate manner.

In a case where the multiplet contains cell types A1 and B1 in the following manner: A1A1B1B1, the fractions for these cell types would be high (maybe 0.5, 0.5) but this, currently would only give rise to one A1 and B1 connection when it is actually 4 connections. Similarly, A1A1B1 would give one A1 and B1 connection where it is actually 2.

Can this be resolved? Setting the fraction cutoff (the number above which the fraction is thought to represent the cell type being present in the multiplet) dependant on the number of cells estimated to be in the multiplet may be able to resolve this. For example, in the case where the cell is a doublet, a fraction of 0.5, 0.5 for cell types A1 and B1 would indicate one connection between A1 and B1 and the fraction cutoff would be 0.5 (or slightly less to allow for error).

In the case where the multiplet was estimated to be a triplet, a fractions of 0.66 for A1 and 0.33 for B1 would indicate that the multiplet is comprised of 2 A1 cells and 1 B1 cell thus giving 2 A1-B1 connections and 1 A1-A1 connection.

To gauge the feasibility of this solution we would need to:

Have an accurate method to determine the number of cells per multiplet using the ERCC spike-ins. This information could potentially be stored in the spCounts object. Write a method downstream of the spSwarm method (or as a final stage of the said method) that uses the above information to quantify the connections.

jasonserviss commented 7 years ago

On the other hand... Should a multiplet containing A1A1A1B1 actually give rise to 3 A1-B1 connections? Its actually unsure if B1 is connected to all of the A1 cells or only one of them. In the later case you would actually want only 1 A1-B1 connection which is what is happening now. Due to this, it is optimal that the experimental protocol gives rise to doublets rather than triplets, quadruplets, etc.

EngeLab / CIMseq

self-connections #18