Nanostring-Biostats / InSituType

An R package for performing cell typing in SMI and other single cell data
Other
29 stars 11 forks source link

semi-supervised final version #142

Closed patrickjdanaher closed 8 months ago

patrickjdanaher commented 2 years ago

Basic question: do we use only anchors in defining profiles, or do we use all cells in the anchor clusters?

If the former, then save time by not computing logliks of anchor cells. This will hugely speed up early iters. Can then finally calc anchor logliks when running insitutypeML at the end.

Also, if we only use anchors to estimate profiles, then they can be totally omitted from the algorithm. And we're back to the old nbclust, where some profiles are just pre-defined and never updated! Do we really want that? Would be more predictable, but then you'd only be able to use cells with enough anchors to specify a profile, say 500. (Not that anchors work great when you only have 20 of them...)

I.e., are anchors just a better way to pre-define cell type profiles? I.e to update them for CosMx?

Read CallR paper from Wei and Zhang - does semi-sup in scRNAseq Not very relevant.

patrickjdanaher commented 2 years ago

Findings / conclusions

Proceed with workflow reverting to "old" nbclust with "fixedprofiles" argument.

patrickjdanaher commented 2 years ago

What's needed for the fixedprofiles -> anchor -> updatedprofiles workflow:

Terms:

patrickjdanaher commented 2 years ago

Status:

patrickjdanaher commented 2 years ago

Ways to get fixedprofiles:

Need a function: update_reference_profiles.

Calling in wrapper fns:

Functions:

patrickjdanaher commented 2 years ago

Status: partway thru updating runinsitutype (line 118). Next: continue stripping references to anchors, add in fixed_profiles references when needed

patrickjdanaher commented 2 years ago
patrickjdanaher commented 2 years ago

Next: test that the new version works in multiple datasets. Finding: some stealing from anchor clusters still occurs. It's a somewhat small effect, but still obviously wrong.

Next: how do you slightly update the anchor-based profiles without letting them run away? Solution to try:

patrickjdanaher commented 2 years ago

Best guess at how to optimize semi-sup: make anchor selection less biased towards the cells with the most extreme agreements with the reference. E.g. just take all cells meeting criteria. (But that alone won't accomplish it.)