AlexsLemonade / training-modules

A collection of modules that are combined into 1-5 day workshops on computational topics for the childhood cancer research community.
Other
61 stars 28 forks source link

Update "roadmap" diagram for all single cell #648

Closed jashapiro closed 1 year ago

jashapiro commented 1 year ago

In the intro to single-cell module, we have a "roadmap" that describes the steps that we take through that workshop, but we did not have one for advanced single cell. Having that overview was a good framing mechanism, both for introducing the overall flow of analysis and for reorienting participants to where we were in that process at various points in the workshop.

We should therefore expand the current roadmap to include the topics that we cover in advanced single-cell. We can then use the same roadmap in both the intro and advanced, highlighting the steps we take in each notebook. This diagram will almost certainly have to be a bit more complex than the current roadmap: There will be places where we have multiple options from a prior step, as steps like cell typing and integration can be performed at multiple stages/in different orders. It may be that we want to have a few versions of the diagram as well, that zoom in and out of different levels of detail. For example, we may want to collapse filtering and normalization in some views.

We also may want to add tools/commands that we use when we are highlighting a particular step (the "you are here" versions of the diagram)

I am not at the moment sure where the source file for https://github.com/AlexsLemonade/training-modules/blob/master/scRNA-seq/diagrams/overview_workflow.png is. @allyhawkins was the last to update it, so I hope we can add that location to this issue., but we may well want to make a Google Drawing with that within the training folder for easier updating/collaboration. We will still want to store output PNGs in this repo (and link to them in the notebooks), but the source files should be linked in readme files for easier editing.

allyhawkins commented 1 year ago

I believe I took it from this slide deck of training diagrams - https://docs.google.com/presentation/d/1j_qH4KD2WxAbwX3n9vCkoiG87E1wIhnYzThjk8VgZ34/edit?usp=sharing

jashapiro commented 1 year ago

I believe I took it from this slide deck of training diagrams - https://docs.google.com/presentation/d/1j_qH4KD2WxAbwX3n9vCkoiG87E1wIhnYzThjk8VgZ34/edit?usp=sharing

It does seem reasonable to keep the diagrams there!

jashapiro commented 1 year ago

I made some rough sketches (in OmniGraffle) of the general idea I was thinking for a "full" roadmap. I started with a general overview that I think captures the main components that we introduce:

Canvas 1

My thought was that might then expand sections as we work through them, so for the first two modules of the intro, that might look something like these, which break the major steps into their substeps, with references to the tools/functions that we will be using:

Canvas 2

Canvas 3

Having made it, I tend to think this makes for a diagram that is a bit busier than I would like, but I'm not really sure I have great ideas about how to represent the possible flows without it being a bit busy.

But I'm curious to hear other's ideas about this general format, as well as specific suggestions. Are there other analysis steps we should add to the main diagram? If there are topic we don't cover, we could add them as dotted boxes or some other faded mode.

We can do things like color-code by workshop, etc.

allyhawkins commented 1 year ago

A few comments:

jashapiro commented 1 year ago

Thanks for your comments Ally! I think I agree about having separate diagrams for single-sample vs. multi-sample... I had a thought about using color to show steps that apply only to multi-sample analysis, but I think fully separate diagrams might work out better.

  • I haven't quite decided how I feel yet about including Find Markers since we don't explicitly cover it, but we do talk about using markers to classify cell types. I think the dotted boxes could be a good idea, but I also wonder if we could move it out of the main diagram and then include it in the sub diagram that I mentioned in my first point.

We do use Find Markers in the intro workshop, which is why I included it here. I had thought about making the arrow from find markers to cell typing a dotted one, since we don't explicitly cover that flow, but we heavily imply it!

jashapiro commented 1 year ago

We discussed in DSTM that we would like to collect initial comments here by EOD Monday, so I am assigning those who have not commented yet for their feedback: @sjspielman, @cbethell and @jaclyn-taroni. I don't know if we want to have Deepa weigh in too.

If others want to play around with things while I am gone, and happen to have OmniGraffle, I am also attaching the OmniGraffle file I was working from here: Single Cell diagram.zip

sjspielman commented 1 year ago

First, I'd like to say that while I was previously unfamiliar with OmniGraffle, I am happy to learn about it for the joy of saying that word.

I do like the idea of single- vs mutli-sample versions of this figure, but more simply we could also consider one for each of our modules (many of these will be "the same," just with different end points!) flowcharts for each of cell annotation, integration, and DE. These can be small flowcharts to pop at the top of each of those notebooks, for example, as a roadmap for JUST that notebook. That said, separating out single/multi may indeed achieve a level of simplicity that I'm thinking of for this!

jaclyn-taroni commented 1 year ago

Okay, this comment contains my attempt to both synthesize what others have said so far and add my own thoughts about where each piece of content would “live.”

First, I agree with separating the single-sample and multi-sample diagrams, but I think this will only work as (I) intended (example: help participants distinguish finding marker genes vs. differential expression analysis) if we’re very clear about the purpose of each step in the introduction of both workshops.

Introduction slides version

So, starting with what should be in the introduction slides for the intro workshop (couched as part of what you will learn perhaps):

Single-cell Roadmap Diagrams Single sample overview with purpose

As noted in the handwritten note here, this slide also go in the advanced material slides (single-cell refresher), ideally with an explanation of why you might take some of these steps first even if you intend to perform a multi-sample analysis.

Instruction/module slides version

Then when it comes to the instruction slides for the preprocess and import steps, we could expand as follows:

Single-cell Roadmap Diagrams slide version of preprocessing:import

So the features are — this very simplified roadmap up top and an expansion that covers purpose (reiterated from intro slides), the input, the output, and any summary of considerations we’d like to add (presumably to be expanded in other instruction slides).

Note: For later analytical steps (e.g., clustering), I’d also love to see what scientific questions can you answer with this as part of the purpose.

Notebook version

Then the road map version that could go into the notebooks themselves could look something like (these tools might be wrong, but hopefully you get the point):

Single-cell Roadmap Diagrams notebook version of preprocess-import

Other notes

cbethell commented 1 year ago

The initial diagrams drafted on this issue are a bit much to follow, so I agree with others on keeping the single-sample vs multi-sample diagrams separate to make things a bit easier to reference.

The first instance of the diagrams would be in the intro/slides, where I agree the purpose of each step may be emphasized but not yet the tools used at each step. This would keep the diagrams as clean as possible and waiting to introduce the tools in the notebook version of the diagrams (as described in the illustration above) seems most intuitive. I also still feel that color-coding single-sample vs multi-sample steps may still be useful (reasoning here is to distinguish the steps that would overlap between single-sample vs multi-sample and those that would not overlap), but I do not feel as strongly about this point.

jashapiro commented 1 year ago

Thank you all for your feedback on the initial diagram! I made separated single and multisample versions to start... Here is the single sample version, which I did want to include cell typing on, even though that is not something we actually do in the intro workshop...

image

And here is the multisample version, where I added some color coding to distinguish the single sample parts from multisample parts. I have cell typing on here twice, because one could do it at either stage... not sure what people think of that, or of the dotted lines. I also recognize that this version is still a bit "messy", but I did want to be sure to get across the idea that you probably want cell types for your differential expression analysis.

image

I have not added the "whys" for these steps, or zoom-ins for each step, but I like the idea that I think Jackie was proposing of keeping the zooms as separate diagrams, as that will minimize the amount of reformatting that has to happen as we expand each step.

allyhawkins commented 1 year ago

I think the first diagram looks good! I don't really have any comments on it and I like the dotted lines.

For the second diagram, I think it would be helpful to either add a legend or labels above the multi vs. single sample blocks so we know what the colors are for. I do like that you have cell type twice, because in reality you can do it on either dataset. I actually don't think its that messy and I think it shows a nice overall view of what a real workflow looks like. I would not add any more to it though other than the labels.

I think I was trying to say this in my initial comment too, but I prefer having the "zoom-ins" as completely separate diagrams that we would use for each section of the advanced workshop.

jashapiro commented 1 year ago

I translated the diagrams to Google Slides here, with some small modifications:

I didn't yet add labels, per se, but I did make the multi-sample steps "stacks". Of course, now that I think about it, maybe the single samples should be the stacks...

jashapiro commented 1 year ago

This is my attempt to implement @jaclyn-taroni's "why" statements... thoughts?

Training Diagrams

Feel free to also leave comments on the slides themselves (linked in the comment above)

jaclyn-taroni commented 1 year ago

I like it!

sjspielman commented 1 year ago

I also like it! Misc feedback -

allyhawkins commented 1 year ago

This looks good! I also like how you divided up the single/multi sample in the second overview in the slides 🎉

jashapiro commented 1 year ago

I made a few tentative "zoom in" slides... Here is the google slides link Preprocess.svg QC.svg Reduce.svg

I think I like them, but curious what others think.

If these get 👍, I think I will close this issue, and move on to the ones where we incorporate these into notebooks and slides. (And allow people to make their own "zooms" for their sections).

jaclyn-taroni commented 1 year ago

These look good to me overall @jashapiro. I might have a quibble about spelling out PCA but not UMAP, for example, but I understand you to be advocating for leaving that discussion for the individual implementation 🎟️

jashapiro commented 1 year ago

Since we have moved on to subdiagrams at this point, I am calling this issue closed!