AlexsLemonade / sc-data-integration

0 stars 0 forks source link

Combine immune references for SingleR #219

Closed allyhawkins closed 1 year ago

allyhawkins commented 1 year ago

Closes #217

Here I did a quick test of looking at using SingleR with combining all the immune specific references in celldex. I used the combining references section of SingleR book as a guide. I took the cell ontology labels from each reference and provided that to SingleR. This results in getting both an individual label for each reference included and then a single label that considers all references. We also obtain the information about which reference each label is from.

I also did initially try assigning with just Blueprint and the HPCA dataset and it looked fairly similar with most of the cells just being assigned to the HSC group (CL:0000557).

The one thing I did think about doing was looking at the score distributions, but I don't really know how to do that with the combined labels since there is only one score that gets output for each cell and then the rest is NA. I am definitely open to ideas on how we could validate that the combined reference is really "better" than the individual refs?

I think my main goal of this was to see if using the combined reference may be a better approach to using individual references and picking one "good" reference. We talked about wanting to provide one single label to users from one reference that is most appropriate. Because of that I think I'm in favor of using the combined reference when we have a way of combining using ontology ID's, rather than trying to pick out which reference is the most appropriate for a given tissue. Given how SingleR will push to assign a cell type even if its not the most accurate, I'm in favor of including more options rather than using a limited reference. SingleR-combining-refs.nb.html.zip

allyhawkins commented 1 year ago

Okay I updated this to use the cell type names pulled directly from the cell ontology throughout. In doing that I actually removed the plot looking at all of the labels and just have the top 10. I can put it back or look at a higher number if you want, but it was a very busy plot and hard to interpret anything with so many labels and very long names for everything.

I also made the change from the very beginning which allowed me to use the labels in the heatmap too. The one thing I could not figure out was removing the annotation legend in the heatmaps and just show the row labels.

Here's an updated report: SingleR-combining-refs.nb.html.zip

jashapiro commented 1 year ago

Okay I updated this to use the cell type names pulled directly from the cell ontology throughout. In doing that I actually removed the plot looking at all of the labels and just have the top 10. I can put it back or look at a higher number if you want, but it was a very busy plot and hard to interpret anything with so many labels and very long names for everything.

I also made the change from the very beginning which allowed me to use the labels in the heatmap too. The one thing I could not figure out was removing the annotation legend in the heatmaps and just show the row labels.

Here's an updated report: SingleR-combining-refs.nb.html.zip

It looks like there are changes here that might not have been pushed yet?

allyhawkins commented 1 year ago

It looks like there are changes here that might not have been pushed yet?

You're right, it would help if I had done that 🤦‍♀️ All good now!