Closed patrickjdanaher closed 8 months ago
Findings / conclusions
What's needed for the fixedprofiles -> anchor -> updatedprofiles workflow:
Terms:
Status:
Ways to get fixedprofiles:
Need a function: update_reference_profiles.
Calling in wrapper fns:
Functions:
Status: partway thru updating runinsitutype (line 118). Next: continue stripping references to anchors, add in fixed_profiles references when needed
Next: test that the new version works in multiple datasets. Finding: some stealing from anchor clusters still occurs. It's a somewhat small effect, but still obviously wrong.
Next: how do you slightly update the anchor-based profiles without letting them run away? Solution to try:
Best guess at how to optimize semi-sup: make anchor selection less biased towards the cells with the most extreme agreements with the reference. E.g. just take all cells meeting criteria. (But that alone won't accomplish it.)
Basic question: do we use only anchors in defining profiles, or do we use all cells in the anchor clusters?
If the former, then save time by not computing logliks of anchor cells. This will hugely speed up early iters. Can then finally calc anchor logliks when running insitutypeML at the end.
Also, if we only use anchors to estimate profiles, then they can be totally omitted from the algorithm. And we're back to the old nbclust, where some profiles are just pre-defined and never updated! Do we really want that? Would be more predictable, but then you'd only be able to use cells with enough anchors to specify a profile, say 500. (Not that anchors work great when you only have 20 of them...)
I.e., are anchors just a better way to pre-define cell type profiles? I.e to update them for CosMx?
Read CallR paper from Wei and Zhang - does semi-sup in scRNAseqNot very relevant.