CAMMA-public / rendezvous

A transformer-inspired neural network for surgical action triplet recognition from laparoscopic videos.
Other
25 stars 9 forks source link
action-recognition action-triplet attention-mechanism cholect45 cholect50 deep-learning laparoscopy python python3 pytorch state-of-the-art tensorflow tensorflow2 transformer weakly-supervised-learning


PyTorch TensorFlow

Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos

C.I. Nwoye, T. Yu, C. Gonzalez, B. Seeliger, P. Mascagni, D. Mutter, J. Marescaux, and N. Padoy

This repository contains the implementation code, inference demo, and evaluation scripts.
Read on ArXiv Journal Publication PWC

Abstract

Out of all existing frameworks for surgical workflow analysis in endoscopic videos, action triplet recognition stands out as the only one aiming to provide truly fine-grained and comprehensive information on surgical activities. This information, presented as <instrument, verb, target> combinations, is highly challenging to be accurately identified. Triplet components can be difficult to recognize individually; in this task, it requires not only performing recognition simultaneously for all three triplet components, but also correctly establishing the data association between them.

To achieve this task, we introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging attention at two different levels. We first introduce a new form of spatial attention to capture individual action triplet components in a scene; called Class Activation Guided Attention Mechanism (CAGAM). This technique focuses on the recognition of verbs and targets using activations resulting from instruments. To solve the association problem, our RDV model adds a new form of semantic attention inspired by Transformer networks; Multi-Head of Mixed Attention (MHMA). This technique uses several cross and self attentions to effectively capture relationships between instruments, verbs, and targets.

We also introduce CholecT50 - a dataset of 50 endoscopic videos in which every frame has been annotated with labels from 100 triplet classes. Our proposed RDV model significantly improves the triplet prediction mAP by over 9% compared to the state-of-the-art methods on this dataset.


News and Updates


Model Overview

The RDV model is composed of:

We hope this repo will help researches/engineers in the development of surgical action recognition systems. For algorithm development, we provide training data, baseline models and evaluation methods to make a level playground. For application usage, we also provide a small video demo that takes raw videos as input without any bells and whistles.


Performance

Results Table

Components AP Association AP
API APV APT APIV APIT APIVT
92.0 60.7 38.3 39.4 36.9 29.9


Video Demo

Available on Youtube.


Installation

Requirements

The model depends on the following libraries:

  1. sklearn
  2. PIL
  3. Python >= 3.5
  4. ivtmetrics
  5. Developer's framework:
    1. For Tensorflow version 1:
      • TF >= 1.10
    2. For Tensorflow version 2:
      • TF >= 2.1
    3. For PyTorch version:
      • Pyorch >= 1.10.1
      • TorchVision >= 0.11


System Requirements:

The code has been test on Linux operating system. It runs on both CPU and GPU. Equivalence of basic OS commands such as unzip, cd, wget, etc. will be needed to run in Windows or Mac OS.


Quick Start


Docker Example

coming soon . . .


Dataset Zoo


Data Preparation


Evaluation Metrics

The ivtmetrics computes AP for triplet recognition. It also support the evaluation of the recognition of the triplet components.

pip install ivtmetrics

or

conda install -c nwoye ivtmetrics

Usage guide is found on pypi.org.


Running the Model

The code can be run in a trianing mode (-t) or testing mode (-e) or both (-t -e) if you want to evaluate at the end of training :


Training on CholecT45/CholecT50 Dataset

Simple training on CholecT50 dataset:

python run.py -t  --data_dir="/path/to/dataset" --dataset_variant=cholect50 --version=1

You can include more details such as epoch, batch size, cross-validation and evaluation fold, weight initialization, learning rates for all subtasks, etc.:

python3 run.py -t -e  --data_dir="/path/to/dataset" --dataset_variant=cholect45-crossval --kfold=1 --epochs=180 --batch=64 --version=2 -l 1e-2 1e-3 1e-4 --pretrain_dir='path/to/imagenet/weights'

All the flags can been seen in the run.py file. The experimental setup of the published model is contained in the paper.


Testing

python3 run.py -e --data_dir="/path/to/dataset" --dataset_variant=cholect45-crossval --kfold 3 --batch 32 --version=1 --test_ckpt="/path/to/model-k3/weights"


Training on Custom Dataset

Adding custom datasets is quite simple, what you need to do are:


Model Zoo

PyTorch

Network Base Resolution Dataset Data split Model Weights
Rendezvous ResNet-18 Low CholecT50 RDV [Download]()
Rendezvous ResNet-18 High CholecT50 RDV [Download]
Rendezvous ResNet-18 Low CholecT50 Challenge Download
Rendezvous ResNet-18 Low CholecT50 crossval k1 Download
Rendezvous ResNet-18 Low CholecT50 crossval k2 Download
Rendezvous ResNet-18 Low CholecT50 crossval k3 Download
Rendezvous ResNet-18 Low CholecT50 crossval k4 Download
Rendezvous ResNet-18 Low CholecT50 crossval k5 Download
Rendezvous ResNet-18 Low CholecT45 crossval k1 Download
Rendezvous ResNet-18 Low CholecT45 crossval k2 Download
Rendezvous ResNet-18 Low CholecT45 crossval k3 Download
Rendezvous ResNet-18 Low CholecT45 crossval k4 Download
Rendezvous ResNet-18 Low CholecT45 crossval k5 Download


TensorFlow v1

Network Base Resolution Dataset Data split Link
Rendezvous ResNet-18 High CholecT50 RDV [Download]
Rendezvous ResNet-18 High CholecT50 Challenge [Download]
Rendezvous ResNet-18 High CholecT50 Challenge [Download]


TensorFlow v2

Network Base Resolution Dataset Data split Link
Rendezvous ResNet-18 High CholecT50 RDV [Download]
Rendezvous ResNet-18 Low CholecT50 RDV [Download]


Baseline Models

TensorFlow v1 Model Layer Size Ablation Component APIVT Link
Rendezvous 1 Proposed 24.6 [Download]
Rendezvous 2 Proposed 27.0 [Download]
Rendezvous 4 Proposed 27.3 [Download]
Rendezvous 8 Proposed 29.9 [Download]
Rendezvous 8 Patch sequence 24.1 [Download]
Rendezvous 8 Temporal sequence --.-- [Download]
Rendezvous 8 Single Self Attention Head 18.8 [Download]
Rendezvous 8 Multiple Self Attention Head 26.1 [Download]
Rendezvous 8 CholecTriplet2021 Challenge Model 32.7 [Download]

Model weights are released periodically because some training are in progress.




License

This code, models, and datasets are available for non-commercial scientific research purposes provided by CC BY-NC-SA 4.0 LICENSE attached as LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party codes are subject to their respective licenses.



Acknowledgment

This work was supported by French state funds managed within the Investissements d'Avenir program by BPI France in the scope of ANR project CONDOR, ANR Labex CAMI, ANR DeepSurg, ANR IHU Strasbourg and ANR National AI Chair AI4ORSafety. We thank the research teams of IHU and IRCAD for their help in the initial annotation of the dataset during the CONDOR project.





Related Resources

  • CholecT45 / CholecT50 Datasets Download dataset GitHub
  • Offical Dataset Splits Official dataset split
  • Tripnet ArXiv paper GitHub
  • Attention Tripnet ArXiv paper GitHub
  • CholecTriplet2021 Challenge Challenge website ArXiv paper GitHub
  • CholecTriplet2022 Challenge Challenge website GitHub



Citation

If you find this repo useful in your project or research, please consider citing the relevant publications:

# This repo is maintained by CAMMA. Comments and suggestions on models are welcomed. Check this page for updates.