Closed allyhawkins closed 1 year ago
@allyhawkins Just want to give you a sense of my time here - I am almost certainly not going to be able to get to this until after Research Focus next week, if that changes anything for you (though I don't know that others' availability is necessarily better..!)
I went ahead and requested @jashapiro also, but really whoever is able to review is fine with me.
@jashapiro Thanks so much for the feedback and for the helpful discussion last week. I also had some discussion with Jackie around what types of things we would like to look at so made changes accordingly. In general, the goal here is thinking about what things we can show in a QC report to help users evaluate if the reference(s) we have used are appropriate or not.
I did a pretty big revamp of this based on feedback and made the following changes:
Here's a copy of the latest rendered notebook. SingleR-non-immune-ref-comparison.nb.html.zip
Thanks for the comments @jashapiro. I made the smaller changes, but didn't make any real plot changes because I think some of them are minor comments that we could address if including those plots in the QC report. For the heatmap comment, I think if we are going to include a heatmap of scores in the QC report, we would want to create our own custom heatmap so we could make sure the scales are the same.
Regarding some comments about looking into the delta.next
and why it might be higher than expected, I did do a little digging into the codebase but I'm still a little confused. Partially because the actual code that calculates and returns the delta.next
is C++ which is not my forte. I can dig in more if we need to, but I think we've made the decision to focus on the delta median distribution (or other similar distributions).
For your reference though, this is the part of the classifySingleR
code where the delta is calculated, and it refers to a function that is found in the RcppExports.R
file.
In looking at the actual function, I think this line might be responsible for getting the delta.next
, but I honestly would have to do a lot more googling about C++ to figure that out for sure. https://github.com/LTLA/SingleR/blob/5e4daf8b3db5068d8ed317ec42d5d13df1edd7ca/src/run.cpp#L28
The last thing I wanted to address was your comment about looking at different distributions of computing a delta score. I think we do want to do this, but maybe outside this PR and in a separate notebook.
Regarding some comments about looking into the
delta.next
and why it might be higher than expected, I did do a little digging into the codebase but I'm still a little confused. Partially because the actual code that calculates and returns thedelta.next
is C++ which is not my forte. I can dig in more if we need to, but I think we've made the decision to focus on the delta median distribution (or other similar distributions).
Yeah, I just tried to look at that code, and it is pretty opaque! Finding where the delta code actually lives from a mess of C++ references is not easy at all! Not to mention this is not bare C++, but RCpp, which has its own little nuances.
But even worse, I think the real code lives in another repo: https://github.com/LTLA/singlepp/blob/99b5c62d9c65703e30b29feb449e07dcde8519ff/tests/src/naive_method.h#L50-L51
(Which does seem to be doing the delta.next
as expected.) So I'm not sure why the median we are getting is smaller. I think I might look at some of the cases where delta.next
is particularly large and see what the distribution of scores looks like for those cases.
Begins to address #204
This PR begins some exploration of cell type assignment with SingleR, specifically I spent some time trying to compare cell type assignments of some AML samples between references with and without immune cells. In doing this I also spent some time just looking at the stats that come out of SingleR and how we might be able to use them to compare across samples, specifically the score and the delta.
Some things to keep in mind as you go through this notebook:
purrr
to generate the same plot across each library, but for a few, I just made the plot for a single library. Mostly because the notebook was getting long and similar patterns were seen across all libraries.Some overall conclusions and things that I learned here:
One big thought I had for a future analysis was to actually try some of the methods of combining references suggested by SingleR. That may help us identify a consensus for immune celltypes, rather than dealing with 4 separate references that all have slightly different levels of annotations.
Here's a copy of the current rendered notebook for reviewing. SingleR-non-immune-ref-comparison.nb.html.zip