chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
63 stars 12 forks source link

Citations in RDS datasets incorrectly reference H5AD dataset versions #6671

Closed brianraymor closed 7 months ago

brianraymor commented 8 months ago

Context

I was examining RDS datasets in RStudio and noted that the citation references the H5AD dataset version and not the RDS dataset version:

cellxgene@misc[["citation"]] [1] "Publication: https://doi.org/10.1126/sciadv.adh1914 Dataset Version: https://datasets.cellxgene.cziscience.com/188dfb70-dd81-44ec-a8fd-648d54cb7b5e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/8c4bcf0d-b4df-45c7-888c-74fb0013e9e7"

See the requirements and RDS example in citation.

danieljhegeman commented 8 months ago

Brainstorming initial solution idea, which is slightly hacky, but would get the job done: base the rds conversion off of an h5ad file that has the rds-version citation in it.

brianraymor commented 8 months ago

This impacts Embedding "publication/citation" information of queried cells to seurat object.

joyceyan commented 8 months ago

Conversion to RDS is here: https://github.com/chanzuckerberg/single-cell-data-portal/blob/main/backend/layers/processing/make_seurat.R#L16 It uses the sceasy package to do the conversion: https://github.com/cellgeni/sceasy

Note that we will eventually want to remove in 5.1: https://github.com/chanzuckerberg/single-cell-data-portal/issues/6672 so if it's easier to use a different package that could be one potential solution.