Abstract
The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. The goal of this challenge was to promote the development of deep generative models (DGMs) for medical imaging and to emphasize the need for their domain-relevant assessment via the analysis of relevant image statistics. As part of this Grand Challenge, a training dataset was developed based on 3D anthropomorphic breast phantoms from the VICTRE virtual imaging toolbox. A two-stage evaluation procedure consisting of a preliminary check for memorization and image quality (based on the Frechet Inception distance (FID)), and a second stage evaluating the reproducibility of image statistics corresponding to domain-relevant radiomic features was developed. A summary measure was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, and to identify various artifacts. 58 submissions from 12 unique users were received for this Challenge. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. We observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.
Keyword: medical visualization
There is no result
Keyword: interactive volume
There is no result
Keyword: rendering
There is no result
Keyword: cinematic rendering
There is no result
Keyword: volume data
There is no result
Keyword: remote visualization
There is no result
Keyword: direct volume rendering
There is no result
Keyword: mobile device
There is no result
Keyword: transfer function
There is no result
Keyword: retrieval
Analysing PolSAR data from vegetation by using the subaperture decomposition approach
Abstract
A common assumption in radar remote sensing studies for vegetation is that radar returns originate from a target made up by a set of uniformly distributed isotropic scatterers. Nonetheless, several studies in the literature have noted that orientation effects and heterogeneities have a noticeable impact in backscattering signatures according to the specific vegetation type and sensor frequency. In this paper we have employed the subaperture decomposition technique (i.e. a time-frequency analysis) and the 3-D Barakat degree of polarisation to assess the variation of the volume backscatterig power as a function of the azimuth look angle. Three different datasets, i.e. multi-frequency indoor acquisitions over short vegetation samples, and P-band airborne data and L-band satellite data over boreal and tropical forest, respectively, have been employed in this study. We have argued that despite depolarising effects may be only sensed through a small portion of the synthetic aperture, they can lead to overestimated retrievals of the volume scattering for the full resolution image. This has direct implications in the existing model-based and model-free polarimetric SAR decompositions.
Keyword: video retrieval
There is no result
Keyword: mobile
Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants
Abstract
The increasing demand for direct electric energy in the grid is also tied to the increase of Electric Vehicle (EV) usage in the cities, which eventually will totally substitute combustion engine Vehicles. Nevertheless, this high amount of energy required, which is stored in the EV batteries, is not always used and it can constitute a virtual power plant on its own. Bidirectional EVs equipped with batteries connected to the grid can therefore charge or discharge energy depending on public needs, producing a smart shift of energy where and when needed. EVs employed as mobile storage devices can add resilience and supply/demand balance benefits to specific loads, in many cases as part of a Microgrid (MG). Depending on the direction of the energy transfer, EVs can provide backup power to households through vehicle-to-house (V2H) charging, or storing unused renewable power through renewable-to-vehicle (RE2V) charging. V2H and RE2V solutions can complement renewable power sources like solar photovoltaic (PV) panels and wind turbines (WT), which fluctuate over time, increasing the self-consumption and autarky. The concept of distributed energy resources (DERs) is becoming more and more present and requires new solutions for the integration of multiple complementary resources with variable supply over time. The development of these ideas is coupled with the growth of new AI techniques that will potentially be the managing core of such systems. Machine learning techniques can model the energy grid environment in such a flexible way that constant optimization is possible. This fascinating working principle introduces the wider concept of an interconnected, shared, decentralized grid of energy. This research on Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants focuses on providing solutions for such energy supply optimization models.
Enhancing NLoS RIS-Aided Localization with Optimization and Machine Learning
Authors: Rafael A. Aguiar, Nuno Paulino, Luís M. Pessoa
Abstract
This paper introduces two machine learning optimization algorithms to significantly enhance position estimation in Reconfigurable Intelligent Surface (RIS) aided localization for mobile user equipment in Non-Line-of-Sight conditions. Leveraging the strengths of these algorithms, we present two methods capable of achieving extremely high accuracy, reaching sub-centimeter or even sub-millimeter levels at 3.5 GHz. The simulation results highlight the potential of these approaches, showing significant improvements in indoor mobile localization. The demonstrated precision and reliability of the proposed methods offer new opportunities for practical applications in real-world scenarios, particularly in Non-Line-of-Sight indoor localization. By evaluating four optimization techniques, we determine that a combination of a Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) results in localization errors under 30 cm in 90 % of the cases, and under 5 mm for close to 85 % of cases when considering a simulated room of 10 m by 10 m where two of the walls are equipped with RIS tiles.
Multipath-based SLAM with Cooperation and Map Fusion
Authors: Erik Leitinger, Lukas Wielandner, Alexander Venus, Klaus Witrisal
Abstract
Multipath-based simultaneous localization and mapping (MP-SLAM) is a promising approach in wireless networks for obtaining position information of transmitters and receivers as well as information on the propagation environment. MP-SLAM models specular reflections of radio frequency (RF) signals at flat surfaces as virtual anchors (VAs), the mirror images of base stations (BSs). Conventional methods for MP-SLAM consider a single mobile terminal (MT) which has to be localized. The availability of additional MTs paves the way for utilizing additional information in the scenario. Specifically enabling MTs to exchange information allows for data fusion over different observations of VAs made by different MTs. Furthermore, cooperative localization becomes possible in addition to multipath-based localization. Utilizing this additional information enables more robust mapping and higher localization accuracy.
Keyword: smartphone
There is no result
Keyword: medical volume data
There is no result
Keyword: webgpu
There is no result
Keyword: webgl
There is no result
Keyword: pre-rendering
There is no result
Keyword: prerendering
There is no result
Keyword: motion prediction
There is no result
Keyword: incremental learning
There is no result
Keyword: svm incremental
There is no result
Keyword: nerf
There is no result
Keyword: multiorgan
There is no result
Keyword: multi-organ
There is no result
Keyword: multi organ
There is no result
Keyword: SAM
Deep Learning Descriptor Hybridization with Feature Reduction for Accurate Cervical Cancer Colposcopy Image Classification
Abstract
Cervical cancer stands as a predominant cause of female mortality, underscoring the need for regular screenings to enable early diagnosis and preemptive treatment of pre-cancerous conditions. The transformation zone in the cervix, where cellular differentiation occurs, plays a critical role in the detection of abnormalities. Colposcopy has emerged as a pivotal tool in cervical cancer prevention since it provides a meticulous examination of cervical abnormalities. However, challenges in visual evaluation necessitate the development of Computer Aided Diagnosis (CAD) systems. We propose a novel CAD system that combines the strengths of various deep-learning descriptors (ResNet50, ResNet101, and ResNet152) with appropriate feature normalization (min-max) as well as feature reduction technique (LDA). The combination of different descriptors ensures that all the features (low-level like edges and colour, high-level like shape and texture) are captured, feature normalization prevents biased learning, and feature reduction avoids overfitting. We do experiments on the IARC dataset provided by WHO. The dataset is initially segmented and balanced. Our approach achieves exceptional performance in the range of 97%-100% for both the normal-abnormal and the type classification. A competitive approach for type classification on the same dataset achieved 81%-91% performance.
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Authors: Zongyang Du, Junchen Lu, Kun Zhou, Lakshmish Kaushik, Berrak Sisman
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Abstract
Expressive voice conversion (VC) conducts speaker identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Emotional style modeling for arbitrary speakers in expressive VC has not been extensively explored. Previous approaches have relied on vocoders for speech reconstruction, which makes speech quality heavily dependent on the performance of vocoders. A major challenge of expressive VC lies in emotion prosody modeling. To address these challenges, this paper proposes a fully end-to-end expressive VC framework based on a conditional denoising diffusion probabilistic model (DDPM). We utilize speech units derived from self-supervised speech models as content conditioning, along with deep features extracted from speech emotion recognition and speaker verification systems to model emotional style and speaker identity. Objective and subjective evaluations show the effectiveness of our framework. Codes and samples are publicly available.
RF Chain-Free mmWave Transmission: Modeling and Experimental Verification
Authors: M.Yaser Yağan, Ibrahim Hökelek, Ali E. Pusane, Ali Görçin
Abstract
The utilization of millimeter wave frequency bands is expected to become prevalent in the following communication systems. However, generating and transmitting communication signals over these frequencies is not as straightforward as in sub-6 GHz frequencies due to complex transceiver structures. As an alternative to conventional transmitter architectures, this paper investigates the implementation of time-modulated arrays to effectively modulate and transmit high-quality communication signals at millimeter wave frequencies. By exploiting the array structures and analog beamformers, which are the fundamental components of millimeter wave transmitters, secure and low-cost transmission can be achieved. Though, harmonics of theoretically infinite bandwidth arise as a fundamental problem in this approach. Thus, this paper presents a frequency analysis tool for the time-modulated arrays with hardware impairments and shows how controlling the sampling period can reduce the harmonics. Furthermore, the derived results are experimentally verified at 25 GHz with two important remarks. First, the phase error of received signals can be reduced by 32% using the proposed architecture. Second, the harmonics can be significantly suppressed by the correct choice of sampling period for the given hardware.
Analysing PolSAR data from vegetation by using the subaperture decomposition approach
Abstract
A common assumption in radar remote sensing studies for vegetation is that radar returns originate from a target made up by a set of uniformly distributed isotropic scatterers. Nonetheless, several studies in the literature have noted that orientation effects and heterogeneities have a noticeable impact in backscattering signatures according to the specific vegetation type and sensor frequency. In this paper we have employed the subaperture decomposition technique (i.e. a time-frequency analysis) and the 3-D Barakat degree of polarisation to assess the variation of the volume backscatterig power as a function of the azimuth look angle. Three different datasets, i.e. multi-frequency indoor acquisitions over short vegetation samples, and P-band airborne data and L-band satellite data over boreal and tropical forest, respectively, have been employed in this study. We have argued that despite depolarising effects may be only sensed through a small portion of the synthetic aperture, they can lead to overestimated retrievals of the volume scattering for the full resolution image. This has direct implications in the existing model-based and model-free polarimetric SAR decompositions.
Physics-informed generative neural networks for RF propagation prediction with application to indoor body perception
Abstract
Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations.
Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts
Authors: Han Cui, Alfredo De Goyeneche, Efrat Shimron, Boyuan Ma, Michael Lustig
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image Quality Assessment (IQA) is essential in various Computer Vision tasks such as image deblurring and super-resolution. However, most IQA methods require reference images, which are not always available. While there are some reference-free IQA metrics, they have limitations in simulating human perception and discerning subtle image quality variations. We hypothesize that the JPEG quality factor is representatives of image quality measurement, and a well-trained neural network can learn to accurately evaluate image quality without requiring a clean reference, as it can recognize image degradation artifacts based on prior knowledge. Thus, we developed a reference-free quality evaluation network, dubbed "Quality Factor (QF) Predictor", which does not require any reference. Our QF Predictor is a lightweight, fully convolutional network comprising seven layers. The model is trained in a self-supervised manner: it receives JPEG compressed image patch with a random QF as input, is trained to accurately predict the corresponding QF. We demonstrate the versatility of the model by applying it to various tasks. First, our QF Predictor can generalize to measure the severity of various image artifacts, such as Gaussian Blur and Gaussian noise. Second, we show that the QF Predictor can be trained to predict the undersampling rate of images reconstructed from Magnetic Resonance Imaging (MRI) data.
Keyword: volume render
There is no result
Keyword: volumetric render
There is no result
Keyword: remote render
There is no result
Keyword: hybrid render
There is no result
Keyword: raycast
There is no result
Keyword: medical imaging
Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics
Keyword: medical visualization
There is no result
Keyword: interactive volume
There is no result
Keyword: rendering
There is no result
Keyword: cinematic rendering
There is no result
Keyword: volume data
There is no result
Keyword: remote visualization
There is no result
Keyword: direct volume rendering
There is no result
Keyword: mobile device
There is no result
Keyword: transfer function
There is no result
Keyword: retrieval
Analysing PolSAR data from vegetation by using the subaperture decomposition approach
Keyword: video retrieval
There is no result
Keyword: mobile
Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants
Enhancing NLoS RIS-Aided Localization with Optimization and Machine Learning
Multipath-based SLAM with Cooperation and Map Fusion
Keyword: smartphone
There is no result
Keyword: medical volume data
There is no result
Keyword: webgpu
There is no result
Keyword: webgl
There is no result
Keyword: pre-rendering
There is no result
Keyword: prerendering
There is no result
Keyword: motion prediction
There is no result
Keyword: incremental learning
There is no result
Keyword: svm incremental
There is no result
Keyword: nerf
There is no result
Keyword: multiorgan
There is no result
Keyword: multi-organ
There is no result
Keyword: multi organ
There is no result
Keyword: SAM
Deep Learning Descriptor Hybridization with Feature Reduction for Accurate Cervical Cancer Colposcopy Image Classification
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
RF Chain-Free mmWave Transmission: Modeling and Experimental Verification
Analysing PolSAR data from vegetation by using the subaperture decomposition approach
Physics-informed generative neural networks for RF propagation prediction with application to indoor body perception
Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts