RoyHirsch / endossl

Code and models for MICCAI23 paper: "Self-Supervised Learning for Endoscopy Video Analysis".
BSD 3-Clause "New" or "Revised" License
13 stars 0 forks source link

Self-Supervised Learning for Endoscopic Video Analysis

Code and models for MICCAI23 paper: "Self-Supervised Learning for Endoscopy Video Analysis".

Background

Self-supervised learning (SSL) has led to important breakthroughs in computer vision by allowing learning from large amounts of unlabeled data. As such, it might have a pivotal role to play in biomedicine where annotating data requires a highly specialized expertise.

In this work, we study the use of a leading SSL framework, Masked Siamese Networks (MSNs), for endoscopic video analysis such as colonoscopy and laparoscopy. To fully exploit the power of SSL, we create sizable endoscopic video datasets. Our extensive experiments show that MSN training on this data leads to state-of-the-art performance in public standard endoscopic benchmarks such as surgical phase recognition during laparoscopy and in colonoscopic polyp characterization.

Furthermore, we show that 50% the annotated data are sufficient to match the performance when training on the entire labeled datasets. Our work provides evidence that SSL can dramatically reduce the need of annotated data in endoscopy.

alt text

Pre-trained models

We release a series of models pre-trained with our method over a large corpus of endoscopic videos:

Arch Dataset Down-stream results Link
ViT-S Private Laparoscopy Cholec80 F1: 83.4 Link
ViT-B Private Laparoscopy Cholec80 F1: 82.6 Link
ViT-L Private Laparoscopy Cholec80 F1: 84.0 Link
- - - -
ViT-S Private Colonoscopy PolypSet Acc: 78.5 Link
ViT-B Private Colonoscopy PolypSet Acc: 78.2 Link
ViT-L Private Colonoscopy PolypSet Acc: 80.4 Link

Repository

Environment

You may use the requirements.`txt file for reproduction of our development environment.

conda create --name <env_name> --file ./requirements.txt

Data

We publish the data modules for Cholec80 experiments, which can be easily adopted to the rest of the paper. Our data pipeline is heavily adopted from TF-Cholec80.

Run prepare.py for downloading and extracting the public Cholec80 dataset:

python prepare.py --data_rootdir YOUR_LOCATION

The ./data/cholec80_images.py module contains classes for loading the pre-processed datasets into a TF dataset object.

Down-stream experiments

./down_stream/main.py is the entry point for running the downstream experiments, where a pre-trained module can be fine-tunned for the task of phase classification.

Inference

./inference.py script can be used for loading a pre-trained model and extracting representations from it.

Citation

Please cite:

@misc{hirsch2023selfsupervised,
      title={Self-Supervised Learning for Endoscopic Video Analysis}, 
      author={Roy Hirsch and Mathilde Caron and Regev Cohen and Amir Livne and Ron Shapiro and Tomer Golany and Roman Goldenberg and Daniel Freedman and Ehud Rivlin},
      year={2023},
      eprint={2308.12394},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

Our work is licensed under BSD 3-Clause license, as found in the LICENSE file.