Open jeremymiller opened 1 month ago
Thanks for putting this together! I think @xuehanci has basically done the first part of (1) already - testing the original code vs the new code (she's on vacation this week and next, though). I believe one of the differences is that the current scrattch.mapping/scrattch.taxonomy only can use binary dendrograms, while the original code and taxonomies had dendrograms that could have multiple children per node (see https://github.com/AllenInstitute/scrattch.taxonomy/issues/15).
She has been working on implementing that change but has run into a few places where a multi-child dendrogram causes errors in other places (I believe the hclust
package is used in a couple instances to manage taxonomy dendrograms, and that doesn't accept multi-child dendrograms). So it's taken longer to fix that difference between the two code bases, which we'd need to do before identifying if there are other relevant differences (like with marker gene selection). But I think it is a pretty key difference to reconcile.
I'm compiling a list of several error corrections and requests for mapping algorithms into a single issue. These are sorted from hardest/lowest priority (# 1) to easiest/highest priority (# 3), although it would be ideal if we could address them all at some point: