gigascience / giga-omero

Source code for deploying a test instance of GigaOMERO
GNU General Public License v3.0
0 stars 1 forks source link

BBSRC China Partnering Award proposal #3

Open pli888 opened 5 years ago

pli888 commented 5 years ago

@pli888 Focus the proposal on metabolic imaging?

pli888 commented 5 years ago

@ChrisArmit Can you provide more detail on the metabolic imaging data that you have in mind?

To give you a bit more context, I tend to think of metabolic imaging as modalities that are translatable to the clinic such as PET (positron emission tomography) and MRS (magnetic resonance spectroscopy). PET utilises metabolic probes such as 18F-fluoro-deoxyglucose which are administered to the patient. In contrast, MRS imaging measures endogenous metabolites, such as N-acetylaspartate (NAA) for neuronal integrity, or lactate for anaerobic metabolism. There are MRS datasets in MetaboLights, for example: https://www.ebi.ac.uk/metabolights/MTBLS482 and so this could definitely be worth exploring further.

However, a query for 'positron' does not find any datasets, and so perhaps MetaboLights does not currently include PET data.

pli888 commented 5 years ago

@pli888 I was thinking along the lines of mass spectrometry imaging, specifically involving biological cells and tissues, maybe in the form of histological sections, being blasted with a laser and the resulting ions undergoing analysis in a mass spectrometer. This would keep the metabolomics angle from the CUDDEL grant, and also bring biological imaging and therefore Jason Swedlow's work into the project.

pli888 commented 5 years ago

@ScottBGI We've also published examples of benchmark datasets.

pli888 commented 5 years ago

@ChrisArmit Okay, thanks to @pli888 and @ScottBGI for clarifying. I note that the benchmark dataset is also available in MetaboLights.

Did the authors independently submit this dataset to both GigaDB and MetaboLights? Or was there communication between both resources with GigaDB scraping data from MetaboLights or vice versa?

An important consideration for the grant proposal is that, as these data already exist in both MetaboLights and GigaDB, we need to be very clear on the additional benefits that are to gained from this collaboration. Have you any thoughts on this?

pli888 commented 5 years ago

@pli888 I was thinking that we can include Susanna's team in the proposal. The metadata about the work in the form of ISA files which are associated with MetaboLights datasets are not usually as detailed as they could be when created by Philippe.

I would try to find a problem associated with mass spectrometry imaging that could be addressed by integrating the use of OMERO, ISA, GigaDB and GigaScience journal. Then in the proposal, say how this relates to a key BBSRC objective.

The other option for the proposal is to just focus on GigaDB and OMERO. Also, promote the use of OMERO to researchers from Hong Kong universities.

pli888 commented 5 years ago

@ChrisArmit says: I guess the main issue with mass spec imaging is the relatively low-resolution images that it generates. MALDI 2D maps of thin sections can deliver a XY spatial resolution of about 20 µm, whereas DESI has an even poorer resolution of about 100 µm. Consequently, these image maps can look quite pixelated. As a means of addressing this, researchers often display a MALDI or DESI pixelated image alongside an adjacent histological section, stained with an appropriate histological stain (e.g. Nissl staining for sections of brain tissue), that helps the researcher interpret the MALDI/DESI 2D maps.

From a GigaDB perspective, one editorial check that we could provide is to ensure that MALDI/DESI maps plus the associated histology images are collected and archived in OMERO. The histology images provide the necessary context for interpreting MALDI/DESI datasets and contribute towards the re-use potential of the entire dataset.

In addition, we could develop a web-tool to allow researchers to annotate their histology images. This could either be by regions of interest (e.g. boxes), by point annotation (i.e. flags) or by segmenting the histology image to produce a series of masks reflecting different compartments within a tissue section. Then, through the use of overlays between annotated images and pixelated MALDI/DESI images, we could infer whether the mass spec profiles of individual pixels relate to, for example, parenchymal tissue, connective tissue and/or blood vessels. This would increase the granularity of the annotation from that of whole tissue e.g. bladder, to microanatomical structure, such as [adventitial layer of bladder], [lamina propria of bladder] etc, or even to individual cell types. This may be a nice hook for a grant application.

pli888 commented 5 years ago

@ChrisArmit says: If we alternatively wish to focus on GigaDB and OMERO, there may be value in integrating GigaGalaxy with OMERO. There have been attempts at this already, with OMERO Biobank in Sardinia using a combination of OMERO and Galaxy to deliver pipelines for sequence data, and to additionally act as a "bridge between –omic data produced by the analysis of tissue picked from pathology slides (or paraffin blocks) and the relative morphological information contained in the corresponding digital pathology slides".

Perhaps we could deliver something similar for metabolomics data?

More details at the following link.

pli888 commented 5 years ago

@ChrisArmit says: A further alternative is to use OMERO and GigaGalaxy to explore reproducible image analysis. In a similar approach to that already used in the previous CUDDEL workshop, where you re-implemented the metabolomics data analysis from Eva’s PhD studies and passed the output onto collaborating researchers from EBI who helped develop a Galaxy pipeline, we could attempt something similar with GigaDB imaging data. If we consider the CAMELYON dataset which we are currently using as a test set for OMERO, it would be great if we could use GigaGalaxy to run a workflow on OMERO that utilises the MatLab scripts and Python scripts that are included in this dataset. In this way, we would be pioneering an approach where:

  1. Imaging data is first organised using OMERO.
  2. Galaxy workflows are designed to run on OMERO.

This could additionally serve as the basis of a hackathon that would explore basic workflow editing procedures for OMERO users. This grant application would draw on the success of the previous CUDDEL workshop, but with an emphasis on imaging data. It would also be useful from an imaging standards perspective, as we would be defining a more standardised approach for image processing of research data.

Do any of these ideas appeal to @pli888 @ScottBGI? I additionally like @pli888's idea of promoting the use of OMERO to researchers from Hong Kong universities. We should also include this in the grant application, perhaps as an outreach initative for the imaging standards that we are proposing.

pli888 commented 5 years ago

@ChrisArmit I think we need to base the proposal on a fundamental problem, for example the replication of image processing as Galaxy workflows to prove reproducibility as you propose.

I was wondering about the use of OMERO to store images from mass spectrometry imaging data. We might want to do this because I believe (correct me if I'm wrong) it is best practice in biological data curation to store data in appropriate data repositories. The problem then is if the mass spectrometry data is in MetaboLights then how do we link them to the imaging data in OMERO, IDR or your favourite image database? Also, how might we exchange the metadata between MS and imaging repositories?

I've not looked at mass spec imaging data in detail in MetaboLights but the one dataset I saw, the images were contained in amongst the other datafiles. I should check if the dataset's ISA files relate them to each other too.

ChrisArmit commented 5 years ago

Thanks Peter, and it’s a good point you make that data should be stored in an appropriate database. For imaging data, OMERO and IDR are both excellent options. With MALDI data, there is the additional complexity of displaying mass-to-charge (m/z) values and so I think we would also have to consider a means of delivering this. Mass spectrum profiles for each pixel are probably better displayed as a series of vertical bar graphs – one for each pixel - and the underlying mass spec data can be archived as a measurements table in HDF format or similar.

From a visualisation perspective, I could envisage a scenario where a user selects a molecular mass on a bar graph, and this is used to deliver a colormap visualisation or similar that allows the end user to interactively explore the spatial complexity of a given molecular mass. In this hypothetical viewer, the colormap would show a pixelated 2D image with hotspot regions where a single metabolite or protein is enriched. One could then zoom in on those pixels of interest, select an individual pixel, and explore the spectra of that pixel using the bar graph.

However, this interactive functionality of using a vertical bar graph to drive the visualisation is not built into OMERO, and so it would require a lot of developer effort to customise OMERO in this way!

With the developer effort that we have available, I think it might be more straightforward to focus on the replication of image processing.

pli888 commented 5 years ago

@ChrisArmit These are all good ideas for implementation but we need to bear in mind that the proposal itself doesn't require such detail - IIRC, I already sent you a copy of the previous China Partnering Award proposal. We need to explain the reasoning why we want to bring the different groups together, describe what we will do from a much higher level point of view and tell BBSRC how it relates to their own objectives.

We will need to involve people who have data for storage in OMERO and MetaboLights. We can invite Nancy Ip's group at HKUST as a HK participant. Maybe invite the authors of the benchmark dataset. I have been thinking about Doug Kell and Roy Goodacre at Liverpool University who do Raman spectroscopy imaging of cells.

ChrisArmit commented 5 years ago

Here is a first draft with the rationale and main scientific objectives.

Reproducibility of Open Source Image Analysis Platforms

Rationale

A major challenge in Omics is in the integration and interpretation of diverse biological datasets that include genomics, proteomics, metabolomics, and imaging data. Towards this end, multi-omics databases and biobanks are emerging that can archive these diverse data types. It follows that in this era of Big Data, computational tools are necessary for the high-throughput analysis of these multi-omics datasets, and that there is a clear need for open source software to ensure data provenance and reproducibility of all data processing steps outlined in an analysis. However, a question remains as to what degree computational analysis is reproducible on different platforms.

One means of addressing this issue is to explore whether different open source, analytic platforms that are currently used for data intensive biomedical research do actually deliver comparable results. We propose to test this in the context of high-throughput image processing of large-volume image datasets archived in the GigaScience DataBase (GigaDB). All data files and scripts archived in GigaDB are open access and CC0, and it is an additional de facto requirement of GigaScience that an Open Source Initiative approved license is ascribed to all software that are archived in the GigaScience DataBase. Consequently, this is an ideal resource for obtaining open source data to that can be used to test image analysis platforms. Furthermore, the peer-review process has ensured that the various scripts that were used in the original analysis are included in the published dataset. Use of these scripts allows us to generate a ‘gold standard’ computational analysis to which we can compare analysis on different platforms.

Main Scientific Objectives

We wish to compare analyses performed on the following open source image analysis platforms to ensure that image processing is reproducible:

ImageJ/Fiji – ImageJ is an NIH-funded Java-based image processing program and Fiji is an open source image-processing package based on ImageJ

OMERO.mtools - OMERO is an open source microscopy image archive that ensures compatibility with a wide range of proprietary image formats (Bio-Formats). OMERO.mtools is a suite of MATLAB-based tools that allow image analysis of OMERO datasets.

Galaxy - Galaxy is an open source, web-based platform for data intensive biomedical research. Galaxy workflows have been used for reproducible image analysis (Figure 1), and GigaScience has previously used the Galaxy Toolshed to ensure that metabolomics analyses that are archived in GigaDB and MetaboLights are reproducible.

Image-processed data will be compared against Gold standard datasets, published in GigaScience and archived in GigaDB.

It is further noteworthy that there have been attempts at integrating Galaxy with OMERO, with OMERO.biobank using a combination of OMERO and Galaxy to deliver reproducible pipelines for sequence data (https://www-legacy.openmicroscopy.org/site/products/partner/omero.biobank). Consequently, we additionally wish to explore whether Galaxy-based image analysis of OMERO datasets is a plausible web-based alternative to using OMERO.mtools, which requires software download and installation.

ChrisArmit commented 5 years ago

Screen Shot 2019-04-03 at 11 50 01

Figure 1 – An example CSIRO-funded Galaxy workflow demonstrates how Galaxy has been used for reproducible image analysis. Image sourced from: https://eresearchau.files.wordpress.com/2013/02/04_cellular_imaging_tools.pdf

ScottBGI commented 5 years ago

We publisher another imaging-related Galaxy project:

Fernandez-Gutierrez MM, van Zessen DBH, van Baarlen P, Kleerebezem M, Stubbs AP. KREAP: an automated Galaxy platform to quantify in vitro re-epithelialization kinetics. Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy078.

ChrisArmit commented 5 years ago

Thanks Scott! I've added the corresponding author for this paper (Andrew Stubbs) as a potential scientist involved in the project. Please feel free to amend the following list.

Scientists involved

Susanna-Assunta Sansone (Oxford) Why – Interested in interoperable and scalable data analysis applications in imaging (check)

Jason Swedlow (Dundee) Why – OMERO and OMERO.mtools

Josh Moore Why - Senior Software Architect for OMERO

Gianluigi Zanetti (Sardinia) Why - OMERO.biobank

Geert Litjens (Radboud University) Why - Potential use case for OMERO https://doi.org/10.1093/gigascience/giy065

Andrew Stubbs (Rotterdam) Why - Potential use case for OMERO and GigaGalaxy https://doi.org/10.1093/gigascience/giy078

Nancy Yuk-Yu Ip (HKUST) Why – Interested in high-throughput image analysis in neuroscience (check)

Peter Li, Scott Edmunds, Chris Armit GigaScience

ChrisArmit commented 5 years ago

The BBSRC responsive mode priority that is most relevant to this project is as follows:

Data driven biology, which has the following aim: “The data driven biology priority aims to encourage the development of the bioinformatics tools and computational approaches that are required to extract value and generate new biological understanding from the huge volume and diversity of bioscience data now available and so underpin and enable biological research as it continues to evolve as a data intensive discipline.” https://bbsrc.ukri.org/funding/grants/priorities/data-driven-biology/

ChrisArmit commented 5 years ago

The international partnership that I have outlined here involve the UK, China / Hong Kong, and the Netherlands. Have you explored a 3-way partnership like this before? The following BBSRC guidelines are at least suggestive that there may be scope for this.

Provide the UK participation in joint research activity with partners outside the UK

“BBSRC grants may cover the UK costs of joint research in larger projects involving more than one country. Please contact the relevant Committee contact before submission if there are parallel applications being submitted in another country, we can usually (with sufficient advance information) liaise with partner funding organisations to ensure complementarity in review mechanisms (e.g. shared referees) and timescales. We see benefit in how such research levers resource in other countries, reduces fragmentation of research effort and draws from a larger skills and resource base.” https://bbsrc.ukri.org/funding/grants/priorities/international-partnerships/

ChrisArmit commented 5 years ago

I've created a Google Doc that is easier to read.

https://docs.google.com/document/d/1QOuqN-ZyHbgFyu2cv0OqmfsshhmhHBSgYwTFwkdJ5TM/edit?usp=sharing