AlexsLemonade / sc-data-integration

0 stars 0 forks source link

Dive into SingleR and comparing references with and without immune cells #210

Closed allyhawkins closed 1 year ago

allyhawkins commented 1 year ago

Begins to address #204

This PR begins some exploration of cell type assignment with SingleR, specifically I spent some time trying to compare cell type assignments of some AML samples between references with and without immune cells. In doing this I also spent some time just looking at the stats that come out of SingleR and how we might be able to use them to compare across samples, specifically the score and the delta.

Some things to keep in mind as you go through this notebook:

  1. Other than the initial stacked bar charts where I looked at the overall cell type label assigned across each reference for each library that we have, I mainly focused on the Blueprint and the HPCA references. The reason for this is because those are the ones that we have a corresponding reference where we removed immune cell types before building/training the model and running SingleR. I think the main question that this notebook serves to answer is how SingleR annotates cells when we have a negative or incorrect reference. There are additional follow up questions such as which of the immune cell type references is the correct or best reference to use, and that is not what I'm after here. You will see that each reference gives a slightly different annotation for the most part, and that is due to the differences in the original reference so I didn't want to get too far into that here.
  2. For some plots I looked at all libraries, using purrr to generate the same plot across each library, but for a few, I just made the plot for a single library. Mostly because the notebook was getting long and similar patterns were seen across all libraries.

Some overall conclusions and things that I learned here:

  1. SingleR really wants to make an assignment, regardless of if its right. If there are celltypes in the reference and if those celltypes contain marker genes that are correlated with cells in our test dataset, then SingleR will assign it. Also the score metric is not easily compared across samples as it is very easily influenced by batch effects. As noted in the annotation diagnostics section of the SingleR book, the magnitude of differences between the scores have no interpretation. Only the pre-tuned scores are comparable across labels, but not across samples.
  2. The magnitude of the delta, the difference between the score for the assigned label pre fine-tuning and the median score across all labels, is less sensitive to batch effect and might be a good metric we can use for comparing cell type annotation across samples to identify a "good" annotation (and maybe for the label shuffling test). However, if you had a reference that was more granular, like HPCA that seemed to toggle cells between CMP/GMP progenitor states, the delta was much lower because cells appeared to get high scores for both of those cell types. So again... reference is really important.
  3. This may seem obvious, but I think the biggest influence on cell type assignment is the marker gene lists and how those change from reference to reference. I noticed here that by removing immune cells we actually changed the marker gene lists that were specific to non-immune cells. This meant that adipocytes and endothelial cells marker gene lists both contained genes that were previously marker genes for HSCs or Monocytes when immune cells were included, which probably explained why SingleR now classified cells as these other non-immune cell types.

One big thought I had for a future analysis was to actually try some of the methods of combining references suggested by SingleR. That may help us identify a consensus for immune celltypes, rather than dealing with 4 separate references that all have slightly different levels of annotations.

Here's a copy of the current rendered notebook for reviewing. SingleR-non-immune-ref-comparison.nb.html.zip

sjspielman commented 1 year ago

@allyhawkins Just want to give you a sense of my time here - I am almost certainly not going to be able to get to this until after Research Focus next week, if that changes anything for you (though I don't know that others' availability is necessarily better..!)

allyhawkins commented 1 year ago

I went ahead and requested @jashapiro also, but really whoever is able to review is fine with me.

allyhawkins commented 1 year ago

@jashapiro Thanks so much for the feedback and for the helpful discussion last week. I also had some discussion with Jackie around what types of things we would like to look at so made changes accordingly. In general, the goal here is thinking about what things we can show in a QC report to help users evaluate if the reference(s) we have used are appropriate or not.

I did a pretty big revamp of this based on feedback and made the following changes:

Here's a copy of the latest rendered notebook. SingleR-non-immune-ref-comparison.nb.html.zip

allyhawkins commented 1 year ago

Thanks for the comments @jashapiro. I made the smaller changes, but didn't make any real plot changes because I think some of them are minor comments that we could address if including those plots in the QC report. For the heatmap comment, I think if we are going to include a heatmap of scores in the QC report, we would want to create our own custom heatmap so we could make sure the scales are the same.

Regarding some comments about looking into the delta.next and why it might be higher than expected, I did do a little digging into the codebase but I'm still a little confused. Partially because the actual code that calculates and returns the delta.next is C++ which is not my forte. I can dig in more if we need to, but I think we've made the decision to focus on the delta median distribution (or other similar distributions). For your reference though, this is the part of the classifySingleR code where the delta is calculated, and it refers to a function that is found in the RcppExports.R file. In looking at the actual function, I think this line might be responsible for getting the delta.next, but I honestly would have to do a lot more googling about C++ to figure that out for sure. https://github.com/LTLA/SingleR/blob/5e4daf8b3db5068d8ed317ec42d5d13df1edd7ca/src/run.cpp#L28

The last thing I wanted to address was your comment about looking at different distributions of computing a delta score. I think we do want to do this, but maybe outside this PR and in a separate notebook.

jashapiro commented 1 year ago

Regarding some comments about looking into the delta.next and why it might be higher than expected, I did do a little digging into the codebase but I'm still a little confused. Partially because the actual code that calculates and returns the delta.next is C++ which is not my forte. I can dig in more if we need to, but I think we've made the decision to focus on the delta median distribution (or other similar distributions).

Yeah, I just tried to look at that code, and it is pretty opaque! Finding where the delta code actually lives from a mess of C++ references is not easy at all! Not to mention this is not bare C++, but RCpp, which has its own little nuances.

But even worse, I think the real code lives in another repo: https://github.com/LTLA/singlepp/blob/99b5c62d9c65703e30b29feb449e07dcde8519ff/tests/src/naive_method.h#L50-L51

(Which does seem to be doing the delta.next as expected.) So I'm not sure why the median we are getting is smaller. I think I might look at some of the cases where delta.next is particularly large and see what the distribution of scores looks like for those cases.