SingleR-inc / SingleR

Clone of the Bioconductor repository for the SingleR package.
https://bioconductor.org/packages/devel/bioc/html/SingleR.html
GNU General Public License v3.0
177 stars 19 forks source link

Vignette Doesn't Explore Between-Atlas Labelling Stability #236

Closed DarioS closed 1 year ago

DarioS commented 1 year ago

I think there should be more emphasis on labelling stability in the vignette. For example, I have been investigating a publication by Regeneron Pharmaceuticals titled Immunostimulatory Cancer-Associated Fibroblast Subpopulations Can Predict Immunotherapy Response in Head and Neck Cancer in Clinical Cancer Research. In the main text, only BLUEPRINT is mentioned.

But, looking at the RDS file (the one named hnscc.gene.expression.integrated.rds), they also tried HPCA. The main labels almost all change between the two attempts. So, it is not clear if the biological conclusions are valid or why BLUEPRINT was chosen and not HPCA (swept under the rug and reviewers probably didn't even try downloading RDS file to reproduce any results of the journal article). image

Annotation Diagnostics section of vignette uses just one reference. It would be nice to see the concordance (or lack of it) between different altases with mostly same cell types demonstrated. It might at least help to make more peer reivewers aware of the issue.

j-andrews7 commented 1 year ago

This is covered in the SingleR book. http://bioconductor.org/books/release/SingleRBook/using-multiple-references.html

This particular issue is also true of any reference-based method using author annotations, which is made quite clear throughout the book.

If you want to really get in the weeds, pick any mouse brain atlas and compare it to any other similar brain atlas. They'll label certain populations completely differently from each other even if they pull out similar markers.

They will not align perfectly

On Sat, Mar 11, 2023, 5:00 PM Dario Strbenac @.***> wrote:

I think there should be more emphasis on labelling stability in the vignette. For example, I have been investigating a publication by Regeneron Pharmaceuticals titled Immunostimulatory Cancer-Associated Fibroblast Subpopulations Can Predict Immunotherapy Response in Head and Neck Cancer https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9161438/ in Clinical Cancer Research. In the main text, only BLUEPRINT is mentioned.

But, looking at the RDS file https://data.mendeley.com/datasets/yk8wj7xgdg/1 (the one named hnscc.gene.expression.integrated.rds), they also tried HPCA. The main labels almost all change between the two attempts. So, it is not clear if the biological conclusions are valid or why BLUEPRINT was chosen and not HPCA (swept under the rug and reviewers probably didn't even try downloading RDS file to reproduce any results of the journal article). [image: image] https://user-images.githubusercontent.com/631218/224514147-46391114-4bad-45a6-a2bf-bfd46c268dff.png

Annotation Diagnostics section of vignette uses just one reference. It would be nice to see the concordance (or lack of it) between different altases with mostly same cell types demonstrated. It might at least help to make more peer reivewers aware of the issue.

— Reply to this email directly, view it on GitHub https://github.com/LTLA/SingleR/issues/236, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOAQNGES3LU4PANMXQ7FBTW3T7X3ANCNFSM6AAAAAAVXXVMZY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DarioS commented 1 year ago

Doh! Thanks. I'll read the section titled Comparing Scores Across References.