Closed MartaBenegas closed 4 months ago
So, if a label has been pruned, an NA is assigned on the pruned.labels?
This is correct.
Shouldn't the NA be present in the labels column? I would expect that, if an assignment is not reliable, no cell label should be present in the labels column.
The labels
column contains the best guess. If you want a better idea of ambiguity, use the pruned.labels
.
Under which circumstances are the columns first.labels and pruned.socres shown or not shown?
The vignette is seemingly just out of date, and these columns are no longer returned. The vignette was superseded by a book, but it's not currently live due to build issues. The function docstrings (e.g. ?SingleR
) are reliable sources.
Additionally, what's the difference between the fine-tuning and the prunning?
Fine-tuning is a way to improve resolution of closely related labels, e.g. subtypes of a broader cell type. Those subtypes will have less separation between them and will all score well when compared to all other labels. When used, fine-tuning will take those labels that score well and perform another round of marker finding and scoring for just those labels to determine the highest scorer among them, which will then be returned as the label.
Pruning is used to prevent erroneous labels of cell types not well represented in the reference dataset. For example, if your dataset has a population of macrophages that are not present in the reference being used, they may score relatively equally for a number of the labels present. But in comparison to other cell types in your test set that are represented, they'll have poor separation between their top score and the next best. As such, they get labeled as ambiguous/low confidence. You can read ?pruneScores
for more details about the process.
Shouldn't the NA be present in the labels column? I would expect that, if an assignment is not reliable, no cell label should be present in the labels column.
Also some historical context: the original version of SingleR didn't do any pruning in the reported labels, and to avoid introducing a change in results for active users, we kept the unpruned labels in labels
. In addition, I would say the pruning itself is... just okay. It's our best guess for what is a "bad" assignment, but it's hard to say it with much certainty because all of these things are relative. At least I wasn't confident enough in the pruning to force it on everyone else.
Moreover, in your vignette you mention those columns but they do not appear in the example:
Yes, as @j-andrews7 mentioned, I just forgot to update the vignette when we switched implementations. The new implementation is based on the singlepp C++ library and doesn't provide the pre-fine-tuning labels by default. If these are needed, you could just set fine.tune=FALSE
to compute them explicitly.
@j-andrews7 @LTLA thank you very much for your detailed explanations! They helped a lot.
Just to double-check: then, it is expected that the labels
and the pruned.labels
columns contain the same cell labels, but the pruned.labels
will contain NA in case the label has been pruned, right?
At first, I misunderstood the pruned.labels
column and I thought it contained what would be present in the old first.labels
column.
Third eyes now =)
Just to double-check: then, it is expected that the
labels
and thepruned.labels
columns contain the same cell labels, but thepruned.labels
will contain NA in case the label has been pruned, right?
correct! pruned.labels
is the more confident set where values are either NA
or the same as in labels
.
Crystal clear! Thanks again :D I'll close the issue now.
Hi SingleR Team!
I'm quite confused with the
pruned.labels
output. I would expect it to contain the candidate labels that finally have not been assigned to the cell after the fine-tuning.However, for my dataset I seem to have the same![image](https://github.com/LTLA/SingleR/assets/57805802/4d4d8c58-76c4-46dd-b19a-a7600d9a0ce2)
labels
andpruned.labels
:Additionally, in your vignette it says:
So, if a label has been pruned, an NA is assigned on the
pruned.labels
? Shouldn't the NA be present in thelabels
column? I would expect that, if an assignment is not reliable, no cell label should be present in thelabels
column.Moreover, I've seen in your vignette and in this youtube tutorial that the output should contain a column of
first.labels
andpruned.scores
as well, but I don't receive those columns in the output, even if I explicitly say to run fine-tuning and prunning:Moreover, in your vignette you mention those columns but they do not appear in the example:![image](https://github.com/LTLA/SingleR/assets/57805802/4e654e67-8f90-4c16-9156-8368834d0c6f)
Under which circumstances are the columns
first.labels
andpruned.socres
shown or not shown? Additionally, what's the difference between the fine-tuning and the prunning?Sorry if I missed something!