alan-turing-institute / grace

Graph Representation Analysis for Connected Embeddings
Other
34 stars 1 forks source link
computer-vision data-science feature-extraction graphical-models image-processing latent-representations machine-learning neural-networks object-detection

Ruff Black pre-commit Actions status

GRACE - Graph Representation Analysis for Connected Embeddings ๐ŸŒ ๐Ÿ“Š ๐Ÿค“

project logo

This grace repository contains a Python library ๐Ÿ for identification of patterns in imaging data. The package provides a method ๐Ÿ–ฅ๏ธ to find connected objects & regions of interest in images by constructing graph-like representations ๐ŸŒ .

Read more about:


Science

The acronym grace stands for G raph R epresentation A nalysis for C onnected E mbeddings ๐Ÿ“ˆ๐Ÿ“‰. This tool was developed by researchers as a scientific project at The Alan Turing Institute in the Data Science for Science programme.

As the initial use case, we (see the list of contributors below) developed grace for localising filaments in cryo-electron microscopy (cryoEM) imaging datasets as an image processing tool that automatically identifies filamentous proteins and locates the regions of interest, an accessory or binding protein.

Find out more details about the project aims & objectives here & here or visit the citation panel below to check out the overarching research projects.


Workflow

workflow steps

The grace workflow consists of the following steps:

  1. Image data acquisition (e.g. cryo-electron microscopy)
  2. Object detection via bounding boxes (e.g. crYOLO, RELION, or FasterRCNN)
  3. Organisation of the bounding boxes as nodes connected via edges as a 2D graph structure (e.g. Delaunay triangulation)
  4. Cropping of image patches (at various scales) from each bounding box detected in the image
  5. Latent feature extraction from image patches (e.g. pre-trained neural network, such as ResNet-152)
  6. 'Human-in-the-loop' annotation of the desired pattern in the image data (see the napari plugin below)
  7. Classification of each 'nodeness' and 'edgeness' confidence via deep neural network classifiers. The neural network can be applied to a full graph, or subgraphs around each node (e.g. using immediate 1-hop neighbourhood).
  8. Combinatorial optimisation via integer linear programming (ILP) to connect the candidate object nodes via edges (see the expected outcomes below)
  9. Quantitative evaluation of the filament detection performance
  10. Ta-da! ๐Ÿฅณ

Installation

grace has been tested with Python 3.8+ on OS X.

For local development, clone the repo and install in editable mode following these guidelines:

Note: Choose which conda environment you'd like to use:

Specify your preference & follow the steps below:

# clone the grace GitHub repository

git clone https://github.com/alan-turing-institute/grace.git
cd ./grace

# create a conda playground from the respective environment.yaml
conda env create -f YOUR-CHOSEN-ENVIRONMENT.yaml

# To activate this environment, use
#
#     $ conda activate grace-env-with-napari
#     OR
#     $ conda activate grace-env-napari-free
#
# To deactivate an active environment, use
#
#     $ conda deactivate

conda activate grace-env-OF-YOUR-CHOICE

# install grace from local folder (not on pypi yet)

pip install -e ".[dev]"

# install pre-commit separately

conda install -c conda-forge pre_commit

# follow the hooks from .pre-commit-config.yaml

pre-commit install

Note: when exporting your own grace conda environment, use the following:

conda env export --no-builds > new_environment.yaml

This will allow environments to be shared between different platforms and OS. For a new install with a grace version not on pypi, please remove grace from the requirements under pip within the newly created yaml file.


If you currently do not have any data to test / implement GRACE on, have a look at the option of simulating a synthetic dataset as described in this README. An accessible link to some pre-annotated simulated images is coming soon! ๐Ÿšง

Annotator GUI

Our repository contains a graphical user interface (GUI) which allows the user to manually annotate the regions of interests (motifs) in their cryo-EM data.

To test the annotator, make sure you've installed the repository using the annotation environment & navigate to:

python examples/show_data.py

https://user-images.githubusercontent.com/48791041/233156173-cf2a69d3-d4be-4ba1-ae57-aebf6b9501cc.mov

Demonstration of the napari widget to annotate cryo-EM images.

The recording above ๐Ÿ‘† shows a napari-based GUI widget for annotation of the desired motifs, in our case, filamentous proteins. Follow these steps to test the plugin out:

  1. Build the graph from all vertices (node, white circle) using the 'build graph' function in the right-hand panel.
  2. Navigates the triangulated graph by zooming in/out or moving along the image from either the 'nodes_...' or 'edges_...' layer list.
  3. Choose the 'annotation_...' layer in the left-hand layer list and click on the 'brush'๐Ÿ–Œ๏ธ icon at the top of the layer control.
  4. Annotate nodes belonging to object instances by drawing over the nodes in a continuous line.
  5. Identify edges within connected objects (green ๐ŸŸฉ lines) versus edges outside of annotated objects (magenta ๐ŸŸช lines) by cutting the graph using the 'cut graph' function in the right-hand panel.
  6. In case of an annotation error โŒ, choose the eraser icon at the top of the layer control to erase incorrect annotations. Re-cut the graph until you are happy with the overall annotation of the image.
  7. Note: Not every single node / object has to be accounted for when annotating, take it easy ๐Ÿ˜Ž.
  8. Once happy with the annotations, save them out by exporting via the 'export...' button on the right-hand side. Inversely, you can load previously saved annotations using the 'import...' button.
  9. Ta-da! ๐Ÿฅณ

Outcomes

๐Ÿšง Work in progress ๐Ÿšง

The expected outcome of the grace workflow is to identify all connected objects as individual filament instances. We tested the combinatorial optimisation step on simulated data with 3 levels of 'line-seeding' densities: dense, medium and sparse.

optimising dummy graphs

As you can see, the optimiser works well to identify filamentous object instances simulated at various densities, and appears to work across object cross-overs (middle image, pink objects).

More details about how this type of graph representation analysis could be applied to other image data processing will become available soon - stay tuned! ๐Ÿ˜Ž๐Ÿ‘Œ


Contributors

Methodology / software development [The Alan Turing Institute]:

Dataset generation / processing [The University of Bristol]:

...and many others...

If you'd like to contribute to our ongoing work, please do not hesitate to let us know your suggestions for potential improvements by raising an issue on GitHub.


Citation

๐Ÿšง Work in progress ๐Ÿšง

Project:ML_for_CryoEM

Project:Mol_Structures

We are currently writing up our methodology and key results, so please stay tuned for future updates!

In the meantime, please use the template below to cite our work:

@unpublished{grace_repository,
    year = {2023},
    month = {April},
    publisher = {{CCP-EM} Collaborative Computational Project for Electron cryo-Microscopy},
    howpublished = {Paper presented at the 2023 {CCP-EM} Spring Symposium},
    url = {https://www.ccpem.ac.uk/downloads/symposium/ccp-em_symp_schedule_2023.pdf},
    author = {Beatriz Costa-Gomes, Kristina Ulicna, Christorpher Soelistyo, Marjan Famili, Alan Loweโ€‹},
    title = {Deconstructing cryoEM micrographs with a graph-based analysis for effective structure detection},
    abstract = {Reliable detection of structures is a fundamental step in analysis of cryoEM micrographs.
    Despite intense developments of computational approaches in recent years, time-consuming hand annotating
    remains inevitable and represents a rate-limiting step in the analysis of cryoEM data samples with
    heterogeneous objects. Furthermore, many of the current solutions are constrained by image characteristics:
    the large sizes of individual micrographs, the need to perform extensive re-training of the detection models
    to find objects of various categories in the same image dataset, and the presence of artefacts that might
    have similar shapes to the intended targets.
    To address these challenges, we developed GRACE (Graph Representation Analysis for Connected Embeddings),
    a computer vision-based Python package for identification of structural motifs in complex imaging data.
    GRACE sources from large images populated with low-fidelity object detections to build a graph representation
    of the entire image. This global graph is then traversed to find structured regions of interest via extracting
    latent node representations from the local image patches and connecting candidate objects in a supervised manner
    with a graph neural network.
    Using a human-in-the-loop approach, the user is encouraged to annotate the desired motifs of interest, making
    our tool agnostic to the type of object detections. The user-nominated structures are then localised and
    connected using a combinatorial optimisation step, which uses the latent embeddings to decide whether the
    graph nodes belong to an object instance.
    Importantly, GRACE reduces the search space from millions of pixels to hundreds of nodes, which allows for
    fast and efficient implementation and potential tool customisation. In addition, our method can be repurposed
    to search for different motifs of interest within the same dataset in a significantly smaller time scale to
    the currently available open-source methods. We envisage that our end-to-end approach could be extended to
    other types of imaging data where object segmentation and detection remains challenging.}
}

Happy graphing! ๐ŸŽฎ