Abstract
Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports. However, existing approaches often face challenges in encoding medical knowledge effectively. While radiology reports provide insights into the current disease manifestation, medical definitions (as used by contemporary methods) tend to be overly abstract, creating a gap in knowledge. To address this, we propose DeViDe, a novel transformer-based method that leverages radiographic descriptions from the open web. These descriptions outline general visual characteristics of diseases in radiographs, and when combined with abstract definitions and radiology reports, provide a holistic snapshot of knowledge. DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources. Second, this knowledge is aligned with image information at various levels of granularity. Third, a novel projection layer is proposed to handle the complexity of aligning each image with multiple descriptions arising in a multi-label setting. In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets. Additionally, fine-tuning DeViDe on four downstream tasks and six segmentation tasks showcases its superior performance across data from diverse distributions.
Keyword: x-ray
Linear Anchored Gaussian Mixture Model for Location and Width Computation of Objects in Thick Line Shape
Abstract
An accurate detection of the centerlines of linear objects is a challenging topic in many sensitive real-world applications such X-ray imaging, remote sensing and lane marking detection in road traffic. Model-based approaches using Hough and Radon transforms are often used but, are not recommended for thick line detection, whereas approaches based on image derivatives need further step-by-step processing, making their efficiency dependent on each step outcomes. In this paper, we aim to detect linear structures found in images by considering the 3D representation of the image gray levels as a finite mixture model of statistical distribution. The latter, which we named linear anchored Gaussian distribution could be parametrized by a scale value {\sigma} describing the linear structure thickness and a line equation, parametrized, in turn, by a radius \r{ho} and an orientation angle {\theta}, describing the linear structure centerline location. Expectation-Maximization (EM) algorithm is used for the mixture model parameter estimation, where a new paradigm, using the background subtraction for the likelihood function computation, is proposed. For the EM algorithm, two {\theta} parameter initialization schemes are used: the first one is based on a random choice of the first component of {\theta} vector, whereas the second is based on the image Hessian with a simultaneous computation of the mixture model components number. Experiments on real world images and synthetic images corrupted by blur and additive noise show the good performance of the proposed methods, where the algorithm using background subtraction and Hessian-based {\theta} initialization provides an outstanding accuracy of the linear structure detection despite irregular image background and presence of blur and noise.
Keyword: clinical
A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation
Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Liping Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we introduce the first comprehensive NPC MRI dataset, encompassing MR axial imaging of 277 primary NPC patients. This dataset includes T1-weighted, T2-weighted, and contrast-enhanced T1-weighted sequences, totaling 831 scans. In addition to the corresponding clinical data, manually annotated and labeled segmentations by experienced radiologists offer high-quality data resources from untreated primary NPC.
Keyword: biomedical
There is no result
Keyword: radiology
There is no result
Keyword: radiography
There is no result
Keyword: medical
MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy
Authors: John J. Han, Ayberk Acar, Nicholas Kavoussi, Jie Ying Wu
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Style transfer is a promising approach to close the sim-to-real gap in medical endoscopy. Rendering realistic endoscopic videos by traversing pre-operative scans (such as MRI or CT) can generate realistic simulations as well as ground truth camera poses and depth maps. Although image-to-image (I2I) translation models such as CycleGAN perform well, they are unsuitable for video-to-video synthesis due to the lack of temporal consistency, resulting in artifacts between frames. We propose MeshBrush, a neural mesh stylization method to synthesize temporally consistent videos with differentiable rendering. MeshBrush uses the underlying geometry of patient imaging data while leveraging existing I2I methods. With learned per-vertex textures, the stylized mesh guarantees consistency while producing high-fidelity outputs. We demonstrate that mesh stylization is a promising approach for creating realistic simulations for downstream tasks such as training and preoperative planning. Although our method is tested and designed for ureteroscopy, its components are transferable to general endoscopic and laparoscopic procedures.
Segmentation-Guided Knee Radiograph Generation using Conditional Diffusion Models
Abstract
Deep learning-based medical image processing algorithms require representative data during development. In particular, surgical data might be difficult to obtain, and high-quality public datasets are limited. To overcome this limitation and augment datasets, a widely adopted solution is the generation of synthetic images. In this work, we employ conditional diffusion models to generate knee radiographs from contour and bone segmentations. Remarkably, two distinct strategies are presented by incorporating the segmentation as a condition into the sampling and training process, namely, conditional sampling and conditional training. The results demonstrate that both methods can generate realistic images while adhering to the conditioning segmentation. The conditional training method outperforms the conditional sampling method and the conventional U-Net.
Keyword: chest
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Keyword: x-ray
Linear Anchored Gaussian Mixture Model for Location and Width Computation of Objects in Thick Line Shape
Keyword: clinical
A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation
Keyword: biomedical
There is no result
Keyword: radiology
There is no result
Keyword: radiography
There is no result
Keyword: medical
MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy
Segmentation-Guided Knee Radiograph Generation using Conditional Diffusion Models
Keyword: chexpert
There is no result