Abstract
Concept Bottleneck Models (CBMs) are considered inherently interpretable because they first predict a set of human-defined concepts before using these concepts to predict the output of a downstream task. For inherent interpretability to be fully realised, and ensure trust in a model's output, we need to guarantee concepts are predicted based on semantically mapped input features. For example, one might expect the pixels representing a broken bone in an image to be used for the prediction of a fracture. However, current literature indicates this is not the case, as concept predictions are often mapped to irrelevant input features. We hypothesise that this occurs when concept annotations are inaccurate or how input features should relate to concepts is unclear. In general, the effect of dataset labelling on concept representations in CBMs remains an understudied area. Therefore, in this paper, we examine how CBMs learn concepts from datasets with fine-grained concept annotations. We demonstrate that CBMs can learn concept representations with semantic mapping to input features by removing problematic concept correlations, such as two concepts always appearing together. To support our evaluation, we introduce a new synthetic image dataset based on a playing cards domain, which we hope will serve as a benchmark for future CBM research. For validation, we provide empirical evidence on a real-world dataset of chest X-rays, to demonstrate semantically meaningful concepts can be learned in real-world applications.
Keyword: x-ray
VISION-MAE: A Foundation Model for Medical Image Segmentation and Classification
Authors: Zelong Liu, Andrew Tieu, Nikhil Patel, Alexander Zhou, George Soultanidis, Zahi A. Fayad, Timothy Deyer, Xueyan Mei
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Artificial Intelligence (AI) has the potential to revolutionize diagnosis and segmentation in medical imaging. However, development and clinical implementation face multiple challenges including limited data availability, lack of generalizability, and the necessity to incorporate multi-modal data effectively. A foundation model, which is a large-scale pre-trained AI model, offers a versatile base that can be adapted to a variety of specific tasks and contexts. Here, we present a novel foundation model, VISION-MAE, specifically designed for medical imaging. Specifically, VISION-MAE is trained on a dataset of 2.5 million unlabeled images from various modalities (CT, MR, PET, X-rays, and ultrasound), using self-supervised learning techniques. It is then adapted to classification and segmentation tasks using explicit labels. VISION-MAE has high label efficiency, outperforming several benchmark models in both in-domain and out-of-domain applications, and achieves high performance even with reduced availability of labeled data. This model represents a significant advancement in medical imaging AI, offering a generalizable and robust solution for improving segmentation and classification tasks while reducing the data annotation workload.
Unconditional Latent Diffusion Models Memorize Patient Imaging Data
Authors: Salman Ul Hassan Dar, Marvin Seyfarth, Jannik Kahmann, Isabelle Ayx, Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Generative latent diffusion models hold a wide range of applications in the medical imaging domain. A noteworthy application is privacy-preserved open-data sharing by proposing synthetic data as surrogates of real patient data. Despite the promise, these models are susceptible to patient data memorization, where models generate patient data copies instead of novel synthetic samples. This undermines the whole purpose of preserving patient data and may even result in patient re-identification. Considering the importance of the problem, surprisingly it has received relatively little attention in the medical imaging community. To this end, we assess memorization in latent diffusion models for medical image synthesis. We train 2D and 3D latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation. Afterwards, we examine the amount of training data memorized utilizing self-supervised models and further investigate various factors that can possibly lead to memorization by training models in different settings. We observe a surprisingly large amount of data memorization among all datasets, with up to 41.7%, 19.6%, and 32.6% of the training data memorized in CT, MRI, and X-ray datasets respectively. Further analyses reveal that increasing training data size and using data augmentation reduce memorization, while over-training enhances it. Overall, our results suggest a call for memorization-informed evaluation of synthetic data prior to open-data sharing.
Keyword: clinical
Deep Multimodal Fusion of Data with Heterogeneous Dimensionality via Projective Networks
Authors: José Morano, Guilherme Aresta, Christoph Grechenig, Ursula Schmidt-Erfurth, Hrvoje Bogunović
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
The use of multimodal imaging has led to significant improvements in the diagnosis and treatment of many diseases. Similar to clinical practice, some works have demonstrated the benefits of multimodal fusion for automatic segmentation and classification using deep learning-based methods. However, current segmentation methods are limited to fusion of modalities with the same dimensionality (e.g., 3D+3D, 2D+2D), which is not always possible, and the fusion strategies implemented by classification methods are incompatible with localization tasks. In this work, we propose a novel deep learning-based framework for the fusion of multimodal data with heterogeneous dimensionality (e.g., 3D+2D) that is compatible with localization tasks. The proposed framework extracts the features of the different modalities and projects them into the common feature subspace. The projected features are then fused and further processed to obtain the final prediction. The framework was validated on the following tasks: segmentation of geographic atrophy (GA), a late-stage manifestation of age-related macular degeneration, and segmentation of retinal blood vessels (RBV) in multimodal retinal imaging. Our results show that the proposed method outperforms the state-of-the-art monomodal methods on GA and RBV segmentation by up to 3.10% and 4.64% Dice, respectively.
3D Vertebrae Measurements: Assessing Vertebral Dimensions in Human Spine Mesh Models Using Local Anatomical Vertebral Axes
Authors: Ivanna Kramer, Vinzent Rittel, Lara Blomenkamp, Sabine Bauer, Dietrich Paulus
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Vertebral morphological measurements are important across various disciplines, including spinal biomechanics and clinical applications, pre- and post-operatively. These measurements also play a crucial role in anthropological longitudinal studies, where spinal metrics are repeatedly documented over extended periods. Traditionally, such measurements have been manually conducted, a process that is time-consuming. In this study, we introduce a novel, fully automated method for measuring vertebral morphology using 3D meshes of lumbar and thoracic spine models.Our experimental results demonstrate the method's capability to accurately measure low-resolution patient-specific vertebral meshes with mean absolute error (MAE) of 1.09 mm and those derived from artificially created lumbar spines, where the average MAE value was 0.7 mm. Our qualitative analysis indicates that measurements obtained using our method on 3D spine models can be accurately reprojected back onto the original medical images if these images are available.
MRAnnotator: A Multi-Anatomy Deep Learning Model for MRI Segmentation
Authors: Alexander Zhou, Zelong Liu, Andrew Tieu, Nikhil Patel, Sean Sun, Anthony Yang, Peter Choi, Valentin Fauveau, George Soultanidis, Mingqian Huang, Amish Doshi, Zahi A. Fayad, Timothy Deyer, Xueyan Mei
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Purpose To develop a deep learning model for multi-anatomy and many-class segmentation of diverse anatomic structures on MRI imaging. Materials and Methods In this retrospective study, two datasets were curated and annotated for model development and evaluation. An internal dataset of 1022 MRI sequences from various clinical sites within a health system and an external dataset of 264 MRI sequences from an independent imaging center were collected. In both datasets, 49 anatomic structures were annotated as the ground truth. The internal dataset was divided into training, validation, and test sets and used to train and evaluate an nnU-Net model. The external dataset was used to evaluate nnU-Net model generalizability and performance in all classes on independent imaging data. Dice scores were calculated to evaluate model segmentation performance. Results The model achieved an average Dice score of 0.801 on the internal test set, and an average score of 0.814 on the complete external dataset across 49 classes. Conclusion The developed model achieves robust and generalizable segmentation of 49 anatomic structures on MRI imaging. A future direction is focused on the incorporation of additional anatomic regions and structures into the datasets and model.
Assessing Patient Eligibility for Inspire Therapy through Machine Learning and Deep Learning Models
Authors: Mohsena Chowdhury, Tejas Vyas, Rahul Alapati, Andrés M Bur, Guanghui Wang
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Inspire therapy is an FDA-approved internal neurostimulation treatment for obstructive sleep apnea. However, not all patients respond to this therapy, posing a challenge even for experienced otolaryngologists to determine candidacy. This paper makes the first attempt to leverage both machine learning and deep learning techniques in discerning patient responsiveness to Inspire therapy using medical data and videos captured through Drug-Induced Sleep Endoscopy (DISE), an essential procedure for Inspire therapy. To achieve this, we gathered and annotated three datasets from 127 patients. Two of these datasets comprise endoscopic videos focused on the Base of the Tongue and Velopharynx. The third dataset composes the patient's clinical information. By utilizing these datasets, we benchmarked and compared the performance of six deep learning models and five classical machine learning algorithms. The results demonstrate the potential of employing machine learning and deep learning techniques to determine a patient's eligibility for Inspire therapy, paving the way for future advancements in this field.
Keyword: biomedical
There is no result
Keyword: radiology
There is no result
Keyword: radiography
There is no result
Keyword: medical
AmbientCycleGAN for Establishing Interpretable Stochastic Object Models Based on Mathematical Phantoms and Medical Imaging Measurements
Authors: Xichen Xu, Wentao Chen, Weimin Zhou
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Medical imaging systems that are designed for producing diagnostically informative images should be objectively assessed via task-based measures of image quality (IQ). Ideally, computation of task-based measures of IQ needs to account for all sources of randomness in the measurement data, including the variability in the ensemble of objects to be imaged. To address this need, stochastic object models (SOMs) that can generate an ensemble of synthesized objects or phantoms can be employed. Various mathematical SOMs or phantoms were developed that can interpretably synthesize objects, such as lumpy object models and parameterized torso phantoms. However, such SOMs that are purely mathematically defined may not be able to comprehensively capture realistic object variations. To establish realistic SOMs, it is desirable to use experimental data. An augmented generative adversarial network (GAN), AmbientGAN, was recently proposed for establishing SOMs from medical imaging measurements. However, it remains unclear to which extent the AmbientGAN-produced objects can be interpretably controlled. This work introduces a novel approach called AmbientCycleGAN that translates mathematical SOMs to realistic SOMs by use of noisy measurement data. Numerical studies that consider clustered lumpy background (CLB) models and real mammograms are conducted. It is demonstrated that our proposed method can stably establish SOMs based on mathematical models and noisy measurement data. Moreover, the ability of the proposed AmbientCycleGAN to interpretably control image features in the synthesized objects is investigated.
Ambient-Pix2PixGAN for Translating Medical Images from Noisy Data
Authors: Wentao Chen, Xichen Xu, Jie Luo, Weimin Zhou
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image-to-image translation is a common task in computer vision and has been rapidly increasing the impact on the field of medical imaging. Deep learning-based methods that employ conditional generative adversarial networks (cGANs), such as Pix2PixGAN, have been extensively explored to perform image-to-image translation tasks. However, when noisy medical image data are considered, such methods cannot be directly applied to produce clean images. Recently, an augmented GAN architecture named AmbientGAN has been proposed that can be trained on noisy measurement data to synthesize high-quality clean medical images. Inspired by AmbientGAN, in this work, we propose a new cGAN architecture, Ambient-Pix2PixGAN, for performing medical image-to-image translation tasks by use of noisy measurement data. Numerical studies that consider MRI-to-PET translation are conducted. Both traditional image quality metrics and task-based image quality metrics are employed to assess the proposed Ambient-Pix2PixGAN. It is demonstrated that our proposed Ambient-Pix2PixGAN can be successfully trained on noisy measurement data to produce high-quality translated images in target imaging modality.
Keyword: chest
Can we Constrain Concept Bottleneck Models to Learn Semantically Meaningful Input Features?
Keyword: x-ray
VISION-MAE: A Foundation Model for Medical Image Segmentation and Classification
Unconditional Latent Diffusion Models Memorize Patient Imaging Data
Keyword: clinical
Deep Multimodal Fusion of Data with Heterogeneous Dimensionality via Projective Networks
3D Vertebrae Measurements: Assessing Vertebral Dimensions in Human Spine Mesh Models Using Local Anatomical Vertebral Axes
MRAnnotator: A Multi-Anatomy Deep Learning Model for MRI Segmentation
Assessing Patient Eligibility for Inspire Therapy through Machine Learning and Deep Learning Models
Keyword: biomedical
There is no result
Keyword: radiology
There is no result
Keyword: radiography
There is no result
Keyword: medical
AmbientCycleGAN for Establishing Interpretable Stochastic Object Models Based on Mathematical Phantoms and Medical Imaging Measurements
Ambient-Pix2PixGAN for Translating Medical Images from Noisy Data
Keyword: chexpert
There is no result