ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

What types of spatial sequencing and imaging technologies are the community using #266

Closed lauraclarke closed 3 years ago

lauraclarke commented 3 years ago

Goal: Find evidence to help choose which spatially resolved sequencing and imaging technology should be chosen for the first pilot data ingestion into the DCP

There is growing use of spatially resolved sequencing technologies for transcriptomic where an anchor image provides spatial information to the location of the cell or cells which have been sequenced associated with a particular sample index. e.g 10x Visium, Spatial Transcriptomics or Slide-seq

We need ballpark figures across these different technologies so we can select the most prevalent technology for the pilot.

Possible sources of information

WSSS reports MRC awardees @ESapenaVentura do you have any estimates, I know https://app.zenhub.com/workspaces/dataset-wrangling-status-5f994cb88e0805001759d2e9/issues/ebi-ait/hca-ebi-wrangler-central/239 has some, any understanding for any others? European Commission H2020 projects https://www.humancellatlas.org/euh2020/ Valentine Svensson's publication spreadsheet? Gut Cell Atlas https://www.gutcellatlas.helmsleytrust.org/

ESapenaVentura commented 3 years ago

I know most of the MRC projects contain imaging/spatial data (Visium, CODEX, etc), I know I have notes somewhere so if we need a specific list of the technologies I can try and get them

lauraclarke commented 3 years ago

There is a desire to be evidence led here so if there is a source you can point Marion to so we can have a number of MRC awardees producing each technology that would be great. Do you have a gut feel for which is most popular?

ESapenaVentura commented 3 years ago

Not super straight forward but I think the easiest way to get a quick summary would be looking at these slides https://drive.google.com/file/d/1weSFsj3SDUwqOtThYzz-tHJG9tTXeg8B/view?usp=sharing

lauraclarke commented 3 years ago

@mshadbolt Tim is seeking ballpark figures by this Thursday, is that a reasonable deadline or do you need more time?

mshadbolt commented 3 years ago

Just to be clear on what is wanted:

Are there any other facets/metadata that are wanted about each project/publication that is included? possible examples:

This is probably going to involve a fair bit of manual curation of the tracker sheet because I think when we have curated in the past we have mostly been capturing only the sequencing methods, rather than both what was used for sequencing as well as imaging in the same paper.

I am not really sure what is meant by 'ball park' figures. I am putting together a list of projects/publications that I can find. I will do what I can by Thursday.

mshadbolt commented 3 years ago

I am also not really sure from the MRC slides that @ESapenaVentura provided, how I would tell which have used/will use spatial methods?

lauraclarke commented 3 years ago

Just to be clear on what is wanted:

  • We are only interested in spatially resolved transcriptomics, not any FISH based method? Yes

  • The methods I have found for spatially resolved transcriptomics are:

    • Visium
    • slide-seq
    • nanostring GeoMx
    • APEX-seq
    • HDST Are there any others that are missing from this list?

That is a more extensive list than I am aware of so sounds good to me

  • The ideal output is a count of any known publication or project that has used one of these techniques that would also be suitable for HCA? i.e. Human or mouse data, data is from at least some normal subjects or comparison between some phenotype and normal state, project/pub was funded by HCA related grant, or is official HCA publication . Any other criteria for inclusion in this list?

HCA publications and knowledge about rough numbers from consortia (even just how many consortia are doing each method helps) I think is a good starting point

If you can use Valentine's spreadsheet or any other publication source to get specific numbers associated more general following your guidelines (human or mouse, primary tissue or case/control type scenario) that would be great but I think that is a lower priority

Are there any other facets/metadata that are wanted about each project/publication that is included? possible examples:

  • is hca_pub
  • organs involved
  • any other technology in the same study

Right now, the most important number is how many studies/publications include these spatially resolved sequencing methods. If it is easy to get us stratification by organ and what other technologies are included please do but that isn't essential

This is probably going to involve a fair bit of manual curation of the tracker sheet because I think when we have curated in the past we have mostly been capturing only the sequencing methods, rather than both what was used for sequencing as well as imaging in the same paper.

I am not really sure what is meant by 'ball park' figures. I am putting together a list of projects/publications that I can find. I will do what I can by Thursday.

When I say ballpark I mean estimate, we don't need a high level of precision here, we need an estimate, so if what you can say is all the consortia seem to be using 10x visium but only half of them seem to be using slide-seq and none are obviously using any other method, that is fine, we don't need specific numbers of experiments or specific versions. This doesn't need to be an exhaustive search, just a first pass to reassure the people who want this that when we pick 10x visium to highlight that we can point to numbers suggesting it is the right assay to try first.

If you can get precise numbers from any of these sources that would be great but this shouldn't be a long task (no more than 1 day)

mshadbolt commented 3 years ago

ok, I am not really sure what the level of precision that is wanted, because at one point you say you want a count of publications, and then later is at the level of consortia. I guess I don't really know how to do this in one day since it requires a lot of manual digging into what papers or reports have been published by various consortia ( which are not usually that easy to track down) and looking at methods and classifying those. anyways I will do what I can.

lauraclarke commented 3 years ago

Doing what you can is the right thing to do.

I have mentioned publications and consortia as they are both possible sources of information for this. You should pick the route which works best for you

The level of precision doesn't need to be high, this does not need to be an exhaustive search

mshadbolt commented 3 years ago

I did the very brief overview as suggested. The doc is here: https://docs.google.com/document/d/1Ud_HF2oAPVEcbui0rKaasdpo_iYT5TtrriNQGDWLCnk/edit#heading=h.pncjs49lwrcq

In short, the only consortium I could find detailed information about what methods they are using for spatial transcriptomics was WSSS, they only mention Visium but refer to some 'other hiplex spatial sequencing modalities' without naming them.

The MRC slides above have one project (developmental lung) that mentions Visium.

I browsed through the various websites above but could not find any detailed reporting on what specific methods have been or will be used. If there are any other detailed information or reports I could browse I would be happy to.

The wrangler's data tracker has 4 projects with a spatial transcriptomics method listed, they are all Visium.

The Svensson single cell database has 2 papers for spatial transcriptomics, but they are the methods papers for each method, APEX-seq and HDST, so not sure if we would consider them as 'suitable' to wrangle for the HCA.

mshadbolt commented 3 years ago

I added a section which was a very quick literature search on europmc to the doc.