Closed jmcbroome closed 2 years ago
Thank you so much for the feedback and PR, @jmcbroome ! Left a small code suggestion (just making it a bit shorter for space). LMK what you think.
Re: your other questions --
Yes, I agree! I'm actually about to push a nice cluster diagram (i.e., an unrooted subtree with a few visual tweaks to help it feel more familiar to epi folks). I'd love your feedback once it's live!
I also worry about sampling bias quite a bit. The hard thing in this case is that we don't know anything about the data coming in -- e.g., they may well have already done their own downsampling, or they may have selected specific samples to include based on who they're trying to do contact tracing for. If you have any ideas on how to make the sampling bias information more interpretable or visible, I'd love to hear! :)
This feedback on the workflow is gold! We're in the midst of a design audit, I'll make sure to pass this along to our UI designers. To clarify: are you talking about the controls under the "case definition" section?
cc @happyimadesignr
RE Point 1, let me know once it's live and I'll check it out! I reference timetree specifically so that it can appear in line with your existing graph, though if its unrooted I suppose that doesn't apply.
RE Point 2, it's possible to remove samples that are identical to other samples and collected from the same region at about the same time, at the most basic level, which is pretty safe. You could even replace these samples with a number of condensed or collapsed samples on your tree visual, potentially... definitely could use some more thought.
RE Point 3, I mean the panel that appears when "Filter and Suggest Clades" is clicked, where the text edit I proposed is and so on, if that's what you're referring to.
Hi Sidney- I played around some with your demo and a local development server version of Galago and have a handful of points of feedback/questions. I also have a small text edit proposed here, specifically with regards to the matUtils cluster method, clarifying what the method is and linking my recently published manuscript on it to reference for more information.
Additional feedback/questions:
Have you considered using a timetree as your base for analysis? It might make a nicer visual paired with your time-based graph. Theo Sanderson’s Chronumental is reportedly pretty sweet for SARS-CoV-2 big trees.
The data representation and assumption sections are nice additions. Have you considered metadata-informed downsampling to try to reduce the bias, since you’re often working with relatively limited subsets of the data anyways? Or would that feel like you’re obfuscating all available information too much?
Adding to the text box serially based on multiple metadata types is a nice solution to users wanting N numbers of groups of samples to be selected simultaneously, but it’s a little unintuitive in how it updates immediately and the pane must be exited by clicking out when the user is satisfied. Simply withholding the update until the user clicks a confirmation button at the bottom of the panel, which applies the new parameters and closes the panel could be useful (especially if the user tries to click back and forth between different clustering methods and the site freezes up as it computes matUtils heuristic clusters, though it doesn’t seem to take any time in the current demo. It’s a little unclear what features are currently available).
Let me know your thoughts and if there's anything I could assist with!