Interpreting first.label vs. label?

bbimber commented 5 years ago

Hello,

The dataframe returned by SingleR has 'label', first.label and score. I find instances where cells have the highest score as 'NK_cell', label=T_cells and first.label=NK_cell. this is being generated using hpca as the reference. Do you have a suggestion on how I should interpret this?

Related to that: do you expect that users might interrogate the score matrix and set some kind of threshold, or are you assuming SingleR has internally done this and one would interpret any call to be of good quality?

dtm2451 commented 5 years ago

Great questions.

So here's what's in your output: The labels column contains the results after fine-tuning (in which cells are re-checked against just their top potential labels). The first.labels column holds the labels with the top scores from before fine-tuning was run. The scores from fine-tuning are not included in the scores matrix because they are not calculated for every label for every cell. Thus, you should interpret the results as initially pointing to 'NK_cell' for those instances, yet ultimately showing better correlation with T_cells upon further inspection.

We actually recently added a pruneScores function with the purpose of the type of interrogation that you are getting at in your second question; it uses the scores matrix (and a newer output from fine-tuning, see note below) in order to distinguish likely good calls from likely bad calls. The main SingleR function does not evaluate call quality on its own, in part because defaults are not very one-set-fits-all; Check out the documentation for pruneScores. We recommend using this function for further interpretation of the calls outputs because relying only on the scores matrix can ignore fine-tuning.

(Note: If you update SingleR again, restart, rerun your analysis, I believe that you will find a 4th output in the DataFrame called tuning.scores, which was recently added, where we report the best and second best scores from fine tuning.)

j-andrews7 commented 5 years ago

On a related note, I've found it can be useful to compare to multiple reference datasets given the variability in how sorting is done, the granularity of the cell types in each set, etc. I find this is particularly true for immune cells (CD8 cytotoxic vs NK vs NK T cells are always a bit of a mess, for instance).

I've just submitted a PR (#27) for additional immune reference sets, and we're hoping to add more varied and expansive sets as we go. In tandem with the methods Dan described, you can usually nail down calls with a bit more confidence for those tougher groups without too much effort.

bbimber commented 5 years ago

OK. As a general comment: if you do find comparing multiple sets and reconciling their calls to be best-practice, it would be extremely convenient if one could call your code with a vector of refs (i.e. SingleR(refs = c('hpca', 'otherRef'))) and have your code run both and merge their calls.

j-andrews7 commented 5 years ago

That's a tricky thing to do. What would a merged call look like? How would we define it?

It would take a lot of effort to standardize labels across datasets if we even wanted to try to take such a route. It is quite easy to write a function that will run SingleR multiple times and tack the calls on to your SCE/Seurat object metadata (or return them as a list) for easy comparison/visualization. There is going to be some amount of interpretation the user is likely going to need to do on their own, even if we do try to minimize it as much as possible.

bbimber commented 5 years ago

Alright, i can see that.

Note: iterating/tacking on calls to a seurat object is effectively what we do right now w/ hpca course/fine calls (one reference dataset, two label types).

j-andrews7 commented 5 years ago

I have a function you might be interested in here. It can be provided multiple reference sets/methods/clusters to add them to your Seurat object, but it also spits out the score heatmaps for each if you want. Be warned, docs are sparse, as it's part of a package that's very much a WIP.

LTLA commented 5 years ago

if you do find comparing multiple sets and reconciling their calls to be best-practice, it would be extremely convenient if one could call your code with a vector of refs (i.e. SingleR(refs = c('hpca', 'otherRef'))) and have your code run both and merge their calls

Not to flog a dead horse, but this is literally an FTE's worth of work to harmonize labels across all datasets of interest. Otherwise you'd have "T cells" competing with "t_cells" and "t" and "T" and whatever the analyst decided to call their T cells. It's only a minor problem computationally, I think SingleR would be pretty robust to these redundancies after fine-tuning - the real problem is that the burden of interpretation is now shafted over to the user when they get predictions back.

It would not be particularly hard to computationally guess which labels match up by assigning datasets against each other and looking at the % of overlapping labels between datasets. But I'd be disinclined to do this automatically, I'd like some human oversight on the sensibility of "equivalent" labels.

Note that if you want to combine references (and have some way of dealing with redundant labels), you can just cbind your two reference objects together after subsetting to the common genes.

bbimber commented 5 years ago

I get it. Anything you can do to help users interpret the result of our calling or guidance on how to obtain the best calls is appreciated though.

@j-andrews7 my lab is making something that seems similar as well: https://github.com/bimberlabinternal/OOSAP. though necessarily intended as public.

dtm2451 commented 5 years ago

We are planning on making some comprehensive vignettes with examples of SingleR analysis in diverse datasets in the near future. We'll link them here once they're ready.

LTLA commented 5 years ago

If you want to combine references, I just added a matchReferences function (see #25) that should help in matching up labels across two sets of references. Not quite fully automatic but it should get you pretty close. I wrote it with single-cell references in mind as the statistics are only interesting if you have lots of representatives per label; but it should work okay with bulk references, it would just basically be looking for mutual nearest neighbors in that application.

LTLA commented 4 years ago

I'm going to close this, because the questions here are probably answered in the new vignette in one way or the other.

SingleR-inc / SingleR

Interpreting first.label vs. label? #26