egeozsoy / 4D-OR

Official code of the paper 4D-OR: Semantic Scene Graphs for OR Domain Modeling accepted at MICCAI 2022. This repo includes both the dataset and our code.
MIT License
43 stars 1 forks source link
3d 4d deep-learning point-cloud scene-graph scene-graph-generation video

4D-OR: Semantic Scene Graphs for OR Domain Modeling

Official code of the paper 4D-OR: Semantic Scene Graphs for OR Domain Modeling (https://link.springer.com/chapter/10.1007/978-3-031-16449-1_45) published at MICCAI 2022.

PDF Paper: https://link.springer.com/content/pdf/10.1007/978-3-031-16449-1_45.pdf?pdf=inline%20link
PDF Book: https://link.springer.com/content/pdf/10.1007/978-3-031-16449-1.pdf?pdf=button%20sticky
Oral Presentation at MICCAI22: https://youtu.be/U_VUr-IPIPE
We publish both a new dataset, 4D-OR and our code.

4d_or

Authors: Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Tobias Czempiel, Federico Tombari, Nassir Navab

@Article{Özsoy2023,
author={{\"O}zsoy, Ege
and Czempiel, Tobias
and {\"O}rnek, Evin P{\i}nar
and Eck, Ulrich
and Tombari, Federico
and Navab, Nassir},
title={Holistic OR domain modeling: a semantic scene graph approach},
journal={International Journal of Computer Assisted Radiology and Surgery},
year={2023},
doi={10.1007/s11548-023-03022-w},
url={https://doi.org/10.1007/s11548-023-03022-w}
}
@inproceedings{Özsoy2022_4D_OR,
    title={4D-OR: Semantic Scene Graphs for OR Domain Modeling},
    author={Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Tobias Czempiel, Federico Tombari, Nassir Navab},
    booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
    year={2022},
    organization={Springer}
}
@inproceedings{Özsoy2021_MSSG,
    title={Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical Procedures},
    author={Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Federico Tombari, Nassir Navab},
    booktitle={Arxiv},
    year={2021}
}

In the extended version of our paper, which is published at IJCARS (https://link.springer.com/article/10.1007/s11548-023-03022-w) we add a secondary downstream task, which is surgery phase recognition, where we get excellent results. We update this repository to include the new surgical phase labels as well as the code to predict them.

LABRAD-OR (https://github.com/egeozsoy/LABRAD-OR) achieves significantly higher results, by using temporal information in form of memory scene graphs.

ORacle (https://github.com/egeozsoy/ORacle) proposes a new end-to-end LVLM based approach, which at the same time simplies the existing approaches, relaxes some requirements, reaches SOTA results and allows inference time adaptation without retraining by leveraging visual or textual prompts.

4D-OR Dataset

4D-OR includes a total of 6734 scenes, recorded by six calibrated RGB-D Kinect sensors 1 mounted to the ceiling of the OR, with one frame-per-second, providing synchronized RGB and depth images. We provide fused point cloud sequences of entire scenes, automatically annotated human 6D poses and 3D bounding boxes for OR objects. Furthermore, we provide SSG annotations for each step of the surgery together with the clinical roles of all the humans in the scenes, e.g., nurse, head surgeon, anesthesiologist. More details are provided in the paper.

To use it:

2D, 3D Human Pose Prediction and 3D Object Pose Prediction (External Code)

Please refer to README in external_src. The rest assumes you have completed the steps described in that README.

Create Scene Graph Prediction Environment (Can be used for the rest of the tasks)

Instance Label Computation (Project Human and Object Pose Detections to Point Cloud)

Scene Graph Prediction

pipeline example_ssg

We use https://github.com/ShunChengWu/3DSSG as a starting point for our scene graph prediction codebase.

Case Study: Role Prediction

Case Study: Surgery Phase Recognition