giovp / spatialdata-sandbox

GNU General Public License v3.0
9 stars 13 forks source link

datasets #1

Closed giovp closed 4 months ago

giovp commented 2 years ago

Available datasets

Missign datasets

melonora commented 2 years ago

Looks like there is a CODEX dataset we can get without having to wait on a rep: https://portal.hubmapconsortium.org/browse/dataset/053544cd63125fc25f6a71a8f444bafc#files

giovp commented 2 years ago

@LucaMarconato could you post the link to the metaspace and the nanostring geomx that you had in mind?

LucaMarconato commented 2 years ago
LucaMarconato commented 1 year ago

Here is a cool 3D single-molecule dataset with time information and multiple samples (multiple coordinate spaces) and paired with images. It would benefit from our infrastructure for deep learning applications. It's some GB so we need to subsample. The single-molecule data can be downloaded from here (link found in this paper). The associated images are not public (will be soon), but for the moment we will have access privately.

LucaMarconato commented 1 year ago

2 datasets from Resolve: Mouse skin (melanoma) - from this publication: https://www.nature.com/articles/s41586-022-05242-7 Data available here: https://zenodo.org/record/6856193#.Yzr3H3ZByF5

Mouse and human liver – from this publication: https://www.cell.com/cell/fulltext/S0092-8674(21)01481-1? Data available here: https://cloud.irc.ugent.be/public/index.php/s/HrXG9WKqjqHBEzS

kevinyamauchi commented 1 year ago

Serial sections (i.e., multiple slides of the same sample) with different immunofluorescence techniques applied: https://mcmicro.org/datasets/#whole-slide-images-wsis

I haven't opened it, but based on the description I think it also comes with registration transformations, segmentation masks and measurements. Might be a nice example.

Additional info: https://www.synapse.org/#!Synapse:syn24849819/wiki/608441

edit: preview of the data with a browser-based viewer (minerva): https://labsyspharm.github.io/mcmicro-images/

kevinyamauchi commented 1 year ago

Lightseq - sequencing + imaging

https://www.nature.com/articles/s41592-022-01604-1#data-availability

ivirshup commented 1 year ago

@kevinyamauchi, more resources (including notebooks for handling raw data) from their website: lightseq.io

giovp commented 1 year ago

Use case 1:

Use case 2:

Use case 3 (using 4 datasets):

LucaMarconato commented 1 year ago

Nice dataset and study from our colleagues at EBI and DKFZ: base-specific in situ sequencing (BaSISS), for studying clonal evolution in breast cancer

berl commented 1 year ago

hey all- I'm excited to learn about this SpatialData project and see the progress!

It looks like the readme.MD refers to the MERSCOPE data as being from the Zhuang lab, but the download code here shows it grabbing the VISp data from our prototype MERFISH pipeline. It's fine to use this if you want, but

  1. it's a bit of an orphan dataset in terms of formatting, technical details, etc. upcoming BICCN Zhuang lab data or Allen Institute MERSCOPE data will be more useful long term.
  2. If you continue to use it, please change the readme.md to reflect that this is data from the Allen Institute prototype MERFISH pipeline.
LucaMarconato commented 1 year ago

Thank you @berl we are working for making a beta release the soonest!

I have updated the README.md with the correct data attribution, sorry for that. And thank you for pointing us to the datasets you mentioned.

kevinyamauchi commented 1 year ago

Thanks for catching the misattribution, @berl . Sorry about that!

it's a bit of an orphan dataset in terms of formatting, technical details, etc. upcoming BICCN Zhuang lab data or Allen Institute MERSCOPE data will be more useful long term.

Do you know when/where those datasets will be released? I'd be interested in checking them out.

kevinyamauchi commented 1 year ago

Curio (commercialized slideseq) has example data upon request. It could be worth making a reader if we can get some of the data.

https://curiobioscience.com/example-data/

LucaMarconato commented 4 months ago

Closing, enough datasets added.