9 New submissions for Wed, 1 May 24

Keyword: chest

Learning Low-Rank Feature for Thorax Disease Classification

Authors: Rajeev Goel, Utkarsh Nath, Yancheng Wang, Alvin C. Silva, Teresa Wu, Yingzhen Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.18933
Pdf link: https://arxiv.org/pdf/2404.18933
Abstract Deep neural networks, including Convolutional Neural Networks (CNNs) and Visual Transformers (ViT), have achieved stunning success in medical image domain. We study thorax disease classification in this paper. Effective extraction of features for the disease areas is crucial for disease classification on radiographic images. While various neural architectures and training techniques, such as self-supervised learning with contrastive/restorative learning, have been employed for disease classification on radiographic images, there are no principled methods which can effectively reduce the adverse effect of noise and background, or non-disease areas, on the radiographic images for disease classification. To address this challenge, we propose a novel Low-Rank Feature Learning (LRFL) method in this paper, which is universally applicable to the training of all neural networks. The LRFL method is both empirically motivated by the low frequency property observed on all the medical datasets in this paper, and theoretically motivated by our sharp generalization bound for neural networks with low-rank features. In the empirical study, using a neural network such as a ViT or a CNN pre-trained on unlabeled chest X-rays by Masked Autoencoders (MAE), our novel LRFL method is applied on the pre-trained neural network and demonstrate better classification results in terms of both multiclass area under the receiver operating curve (mAUC) and classification accuracy.
Keyword: x-ray

Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction
Authors: K. Aditya Mohan, Massimiliano Ferrucci, Chuck Divin, Garrett A. Stevenson, Hyojin Kim
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2404.19075
Pdf link: https://arxiv.org/pdf/2404.19075
Abstract 4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in-situ experiments that causes spurious artifacts and inaccurate morphological reconstructions of the object. To solve this problem, we propose to perform a 4D time-space reconstruction using a distributed implicit neural representation (DINR) network that is trained using a novel distributed stochastic training algorithm. Our DINR network learns to reconstruct the object at its output by iterative optimization of its network parameters such that the measured projection images best match the output of the CT forward measurement model. We use a continuous time and space forward measurement model that is a function of the DINR outputs at a sparsely sampled set of continuous valued object coordinates. Unlike existing state-of-the-art neural representation architectures that forward and back propagate through dense voxel grids that sample the object's entire time-space coordinates, we only propagate through the DINR at a small subset of object coordinates in each iteration resulting in an order-of-magnitude reduction in memory and compute for training. DINR leverages distributed computation across several compute nodes and GPUs to produce high-fidelity 4D time-space reconstructions even for extremely large CT data sizes. We use both simulated parallel-beam and experimental cone-beam X-ray CT datasets to demonstrate the superior performance of our approach.
X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
Authors: Emmanuelle Bourigault, Abdullah Hamdi, Amir Jamaludin
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.19604
Pdf link: https://arxiv.org/pdf/2404.19604
Abstract In this work, we present X-Diffusion, a cross-sectional diffusion model tailored for Magnetic Resonance Imaging (MRI) data. X-Diffusion is capable of generating the entire MRI volume from just a single MRI slice or optionally from few multiple slices, setting new benchmarks in the precision of synthesized MRIs from extremely sparse observations. The uniqueness lies in the novel view-conditional training and inference of X-Diffusion on MRI volumes, allowing for generalized MRI learning. Our evaluations span both brain tumour MRIs from the BRATS dataset and full-body MRIs from the UK Biobank dataset. Utilizing the paired pre-registered Dual-energy X-ray Absorptiometry (DXA) and MRI modalities in the UK Biobank dataset, X-Diffusion is able to generate detailed 3D MRI volume from a single full-body DXA. Remarkably, the resultant MRIs not only stand out in precision on unseen examples (surpassing state-of-the-art results by large margins) but also flawlessly retain essential features of the original MRI, including tumour profiles, spine curvature, brain volume, and beyond. Furthermore, the trained X-Diffusion model on the MRI datasets attains a generalization capacity out-of-domain (e.g. generating knee MRIs even though it is trained on brains). The code is available on the project website https://emmanuelleb985.github.io/XDiffusion/ .
Keyword: clinical

Artificial Intelligence in Bone Metastasis Analysis: Current Advancements, Opportunities and Challenges
Authors: Marwa Afnouch, Fares Bougourzi, Olfa Gaddour, Fadi Dornaika, Abdelmalik Taleb-Ahmed
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.19598
Pdf link: https://arxiv.org/pdf/2404.19598
Abstract In recent years, Artificial Intelligence (AI) has been widely used in medicine, particularly in the analysis of medical imaging, which has been driven by advances in computer vision and deep learning methods. This is particularly important in overcoming the challenges posed by diseases such as Bone Metastases (BM), a common and complex malignancy of the bones. Indeed, there have been an increasing interest in developing Machine Learning (ML) techniques into oncologic imaging for BM analysis. In order to provide a comprehensive overview of the current state-of-the-art and advancements for BM analysis using artificial intelligence, this review is conducted with the accordance with PRISMA guidelines. Firstly, this review highlights the clinical and oncologic perspectives of BM and the used medical imaging modalities, with discussing their advantages and limitations. Then the review focuses on modern approaches with considering the main BM analysis tasks, which includes: classification, detection and segmentation. The results analysis show that ML technologies can achieve promising performance for BM analysis and have significant potential to improve clinician efficiency and cope with time and cost limitations. Furthermore, there are requirements for further research to validate the clinical performance of ML tools and facilitate their integration into routine clinical practice.
Keyword: biomedical

There is no result

Keyword: radiology

There is no result

Keyword: radiography

There is no result

Keyword: medical

Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning
Authors: Zhipeng Yuan, Nasamu Musa, Katarzyna Dybal, Matthew Back, Daniel Leybourne, Po Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.19748
Pdf link: https://arxiv.org/pdf/2404.19748
Abstract Every year, plant parasitic nematodes, one of the major groups of plant pathogens, cause a significant loss of crops worldwide. To mitigate crop yield losses caused by nematodes, an efficient nematode monitoring method is essential for plant and crop disease management. In other respects, efficient nematode detection contributes to medical research and drug discovery, as nematodes are model organisms. With the rapid development of computer technology, computer vision techniques provide a feasible solution for quantifying nematodes or nematode infections. In this paper, we survey and categorise the studies and available datasets on nematode detection through deep-learning models. To stimulate progress in related research, this survey presents the potential state-of-the-art object detection models, training techniques, optimisation techniques, and evaluation metrics for deep learning beginners. Moreover, seven state-of-the-art object detection models are validated on three public datasets and the AgriNema dataset for plant parasitic nematodes to construct a baseline for nematode detection.
Foundations of Multisensory Artificial Intelligence
Authors: Paul Pu Liang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2404.18976
Pdf link: https://arxiv.org/pdf/2404.18976
Abstract Building multisensory AI systems that learn from multiple sensory inputs such as text, speech, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. By synthesizing a range of theoretical frameworks and application domains, this thesis aims to advance the machine learning foundations of multisensory AI. In the first part, we present a theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets, design principled approaches to learn these interactions, and analyze whether their model has succeeded in learning. In the second part, we study the design of practical multimodal foundation models that generalize over many modalities and tasks, which presents a step toward grounding large language models to real-world sensory modalities. We introduce MultiBench, a unified large-scale benchmark across a wide range of modalities, tasks, and research areas, followed by the cross-modal attention and multimodal transformer architectures that now underpin many of today's multimodal foundation models. Scaling these architectures on MultiBench enables the creation of general-purpose multisensory AI systems, and we discuss our collaborative efforts in applying these models for real-world impact in affective computing, mental health, cancer prognosis, and robotics. Finally, we conclude this thesis by discussing how future work can leverage these ideas toward more general, interactive, and safe multisensory AI.
Data Set Terminology of Artificial Intelligence in Medicine: A Historical Review and Recommendation
Authors: Shannon L. Walston, Hiroshi Seki, Hirotaka Takita, Yasuhito Mitsuyama, Shingo Sato, Akifumi Hagiwara, Rintaro Ito, Shouhei Hanaoka, Yukio Miki, Daiju Ueda
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.19303
Pdf link: https://arxiv.org/pdf/2404.19303
Abstract Medicine and artificial intelligence (AI) engineering represent two distinct fields each with decades of published history. With such history comes a set of terminology that has a specific way in which it is applied. However, when two distinct fields with overlapping terminology start to collaborate, miscommunication and misunderstandings can occur. This narrative review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical AI contexts, and offer solutions to mitigate misunderstandings by readers from either field. Through an examination of historical documents, including articles, writing guidelines, and textbooks, this review traces the divergent evolution of terms for data sets and their impact. Initially, the discordant interpretations of the word 'validation' in medical and AI contexts are explored. Then the data sets used for AI evaluation are classified, namely random splitting, cross-validation, temporal, geographic, internal, and external sets. The accurate and standardized description of these data sets is crucial for demonstrating the robustness and generalizability of AI applications in medicine. This review clarifies existing literature to provide a comprehensive understanding of these classifications and their implications in AI evaluation. This review then identifies often misunderstood terms and proposes pragmatic solutions to mitigate terminological confusion. Among these solutions are the use of standardized terminology such as 'training set,' 'validation (or tuning) set,' and 'test set,' and explicit definition of data set splitting terminologies in each medical AI research publication. This review aspires to enhance the precision of communication in medical AI, thereby fostering more effective and transparent research methodologies in this interdisciplinary field.
Enhancing Deep Learning Model Explainability in Brain Tumor Datasets using Post-Heuristic Approaches
Authors: Konstantinos Pasvantis, Eftychios Protopapadakis
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.19568
Pdf link: https://arxiv.org/pdf/2404.19568
Abstract The application of deep learning models in medical diagnosis has showcased considerable efficacy in recent years. Nevertheless, a notable limitation involves the inherent lack of explainability during decision-making processes. This study addresses such a constraint, by enhancing the interpretability robustness. The primary focus is directed towards refining the explanations generated by the LIME Library and LIME image explainer. This is achieved throuhg post-processing mechanisms, based on scenario-specific rules. Multiple experiments have been conducted using publicly accessible datasets related to brain tumor detection. Our proposed post-heuristic approach demonstrates significant advancements, yielding more robust and concrete results, in the context of medical diagnosis.
Automatic Cardiac Pathology Recognition in Echocardiography Images Using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets
Authors: Andrés Bell-Navas, Nourelhouda Groun, María Villalba-Orero, Enrique Lara-Pezzi, Jesús Garicano-Mena, Soledad Le Clainche
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.19579
Pdf link: https://arxiv.org/pdf/2404.19579
Abstract Heart diseases are the main international cause of human defunction. According to the WHO, nearly 18 million people decease each year because of heart diseases. Also considering the increase of medical data, much pressure is put on the health industry to develop systems for early and accurate heart disease recognition. In this work, an automatic cardiac pathology recognition system based on a novel deep learning framework is proposed, which analyses in real-time echocardiography video sequences. The system works in two stages. The first one transforms the data included in a database of echocardiography sequences into a machine-learning-compatible collection of annotated images which can be used in the training stage of any kind of machine learning-based framework, and more specifically with deep learning. This includes the use of the Higher Order Dynamic Mode Decomposition (HODMD) algorithm, for the first time to the authors' knowledge, for both data augmentation and feature extraction in the medical field. The second stage is focused on building and training a Vision Transformer (ViT), barely explored in the related literature. The ViT is adapted for an effective training from scratch, even with small datasets. The designed neural network analyses images from an echocardiography sequence to predict the heart state. The results obtained show the superiority of the proposed system and the efficacy of the HODMD algorithm, even outperforming pretrained Convolutional Neural Networks (CNNs), which are so far the method of choice in the literature.
Keyword: chexpert

There is no result

PanagiotisFytas / get-daily-arxiv-noti

9 New submissions for Wed, 1 May 24 #580

Keyword: chest

Learning Low-Rank Feature for Thorax Disease Classification

Keyword: x-ray

Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models

Keyword: clinical

Artificial Intelligence in Bone Metastasis Analysis: Current Advancements, Opportunities and Challenges

Keyword: biomedical

Keyword: radiology

Keyword: radiography

Keyword: medical

Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning

Foundations of Multisensory Artificial Intelligence

Data Set Terminology of Artificial Intelligence in Medicine: A Historical Review and Recommendation

Enhancing Deep Learning Model Explainability in Brain Tumor Datasets using Post-Heuristic Approaches

Automatic Cardiac Pathology Recognition in Echocardiography Images Using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets

Keyword: chexpert