flyconnectome / ALPN_crossmatching

Data and code snippets for cross-matching olfactory PNs between FAFB (left + right) and hemibrain
1 stars 0 forks source link

Potential directions #2

Open schlegelp opened 3 years ago

schlegelp commented 3 years ago

@bdpedigo et al. (I'm afraid I don't have the other's Github handles)

Just to quickly follow up on our short discussion just now.

I'm afraid I was a bit scatter-brained and didn't do a good job explaining the question I'm (personally) most keen on but you actually summed it up quite nicely into a single word: "stability". Basically: what's the highest granularity we can reach while making sure that groups/clusters can still be reliably recovered across data sets? So it's both a matching and a grouping/clustering problem.

Let's say, for example, you have 5 neurons A, B, C, D and E on FAFB left that fall into two obvious clusters (A, B, C) and (D, E). In a first step you would try to find matches for these 5 neurons in FAFB right and hemibrain (keeping in mind that it might not be a 1:1:1 matching). Once we have that, we can ask whether we see the same clusters in the other two data sets, - or whether we see e.g. (A, B) and (C, D, E) in the hemibrain and (A, D) and (B, C, E) in FAFB right. A conservation of clusters/groups supports the view that (A, B, C) and (D, E) likely represent two cell types. In case of the latter they are more likely to represent a single cell type (A, B, C, D, E). To my mind, it's critical to include all three data sets in that comparison (e.g. to have a tie-breaker).

In practical terms for potential next steps: I'd be very curious to see how the highly granular hemibrain labels behave after matching the hemibrain neurons to FAFB left and right. To illustrate with another example: let's say you have five hemibrain mPNs falling into two types (labels) - 2 x M_lvPNm25 and 3 x M_lvPNm26 - and you find matches for all 5 in FAFB left and FAFB right. When you then look at the 2 M_lvPNm25 and 3 M_lvPNm26 candidates in FAFB left and FAFB right: are they more similar to each other within type (i.e. M_lvPNm25 <-> M_lvPNm25 and M_lvPNm26 <-> M_lvPNm26) than across type, or do you see cases where a putative M_lvPNm25 match is actually more similar to a M_lvPNm26 candidate?

Rephrasing the above: leveraging not just one but three data sets, do you see any indication that e.g. M_lvPNm25 and M_lvPNm26 should really have the same label. Or conversely: maybe M_lvPNm25 actually breaks into multiple groups in FAFB left and right.

I hope this makes some sense. As Greg mentioned, in our recent preprint I used a rather naive approach with only across- but not within-dataset matches to try and address this but you guys are obviously much more experienced with that kind of thing. I also imagine that it will be difficult to get clear-cut answers to above questions but maybe you can think of a way to get a something like a "stability score" - i.e. something that describes how well a given group can be recovered in another dataset.

bdpedigo commented 3 years ago

@jovo @asaadeldin11 @tliu68 lots for us to think about above ^

Thanks @schlegelp we'll look closely at this