New submissions for Wed, 17 Apr 24

Keyword: detection

Optimizing Malware Detection in IoT Networks: Leveraging Resource-Aware Distributed Computing for Enhanced Security

Authors: Authors: Sreenitha Kasarapu, Sanket Shukla, Sai Manoj Pudukotai Dinakarrao
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.10012
Pdf link: https://arxiv.org/pdf/2404.10012
Abstract In recent years, networked IoT systems have revolutionized connectivity, portability, and functionality, offering a myriad of advantages. However, these systems are increasingly targeted by adversaries due to inherent security vulnerabilities and limited computational and storage resources. Malicious applications, commonly known as malware, pose a significant threat to IoT devices and networks. While numerous malware detection techniques have been proposed, existing approaches often overlook the resource constraints inherent in IoT environments, assuming abundant resources for detection tasks. This oversight is compounded by ongoing workloads such as sensing and on-device computations, further diminishing available resources for malware detection. To address these challenges, we present a novel resource- and workload-aware malware detection framework integrated with distributed computing for IoT networks. Our approach begins by analyzing available resources for malware detection using a lightweight regression model. Depending on resource availability, ongoing workload executions, and communication costs, the malware detection task is dynamically allocated either on-device or offloaded to neighboring IoT nodes with sufficient resources. To safeguard data integrity and user privacy, rather than transferring the entire malware detection task, the classifier is partitioned and distributed across multiple nodes, and subsequently integrated at the parent node for comprehensive malware detection. Experimental analysis demonstrates the efficacy of our proposed technique, achieving a remarkable speed-up of 9.8x compared to on-device inference, while maintaining a high malware detection accuracy of 96.7%.
Enhanced Low-Complexity Receiver Design for Short Block Transmission Systems
Authors: Authors: Mody Sy, Raymond Knopp
Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2404.10065
Pdf link: https://arxiv.org/pdf/2404.10065
Abstract This paper presents a comprehensive analysis and performance enhancement of short block length channel detection incorporating training information. The current communication systems' short block length channel detection typically consists of least squares channel estimation followed by quasi-coherent detection. By investigating the receiver structure, specifically the estimator-correlator, we show that the non-coherent term, often disregarded in conventional detection metrics, results in significant losses in performance and sensitivity in typical operating regimes of 5G and 6G systems. A comparison with the fully non-coherent receiver in multi-antenna configurations reveals substantial losses in low spectral efficiency operating areas. Additionally, we demonstrate that by employing an adaptive DMRS-data power adjustment, it is possible to reduce the performance loss gap, which is amenable to a more sensitive quasi-coherent receiver. However, both of the aforementioned ML detection strategies can result in substantial computational complexity when processing long bit-length codes. We propose an approach to tackle this challenge by introducing the principle of block or segment coding using First-Order RM Codes, which is amenable to low-cost decoding through block-based fast Hadamard transforms. The Block-based FHT has demonstrated to be cost-efficient with regards to decoding time, as it evolves from quadric to quasi-linear complexity with a manageable decline in performance. Additionally, by incorporating an adaptive DMRS-data power adjustment technique, we are able to bridge/reduce the performance gap with respect to the conventional maximum likelihood receiver and attain high sensitivity, leading to a good trade-off between performance and complexity to efficiently handle small payloads.
Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stres
Authors: Authors: Aswini Kumar Patra, Lingaraj Sahoo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10073
Pdf link: https://arxiv.org/pdf/2404.10073
Abstract Early identification of drought stress in crops is vital for implementing effective mitigation measures and reducing yield loss. Non-invasive imaging techniques hold immense potential by capturing subtle physiological changes in plants under water deficit. Sensor based imaging data serves as a rich source of information for machine learning and deep learning algorithms, facilitating further analysis aimed at identifying drought stress. While these approaches yield favorable results, real-time field applications requires algorithms specifically designed for the complexities of natural agricultural conditions. Our work proposes a novel deep learning framework for classifying drought stress in potato crops captured by UAVs in natural settings. The novelty lies in the synergistic combination of a pretrained network with carefully designed custom layers. This architecture leverages feature extraction capabilities of the pre-trained network while the custom layers enable targeted dimensionality reduction and enhanced regularization, ultimately leading to improved performance. A key innovation of our work involves the integration of Gradient-Class Activation Mapping (Grad-CAM), an explainability technique. Grad-CAM sheds light on the internal workings of the deep learning model, typically referred to as a black box. By visualizing the focus areas of the model within the images, Grad-CAM fosters interpretability and builds trust in the decision-making process of the model. Our proposed framework achieves superior performance, particularly with the DenseNet121 pre-trained network, reaching a precision of 98% to identify the stressed class with an overall accuracy of 90%. Comparative analysis of existing state-of-the-art object detection algorithms reveals the superiority of our approach in significantly higher precision and accuracy.
Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets
Authors: Authors: Dai Quoc Tran, Armstrong Aboah, Yuntae Jeon, Maged Shoman, Minsoo Park, Seunghee Park
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10078
Pdf link: https://arxiv.org/pdf/2404.10078
Abstract This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle density. Traditional monitoring methods, which rely on static cameras with narrow fields of view, are ineffective in dynamic urban environments, necessitating the installation of multiple cameras, which raises costs. Fisheye lenses, which were recently introduced, provide wide and omnidirectional coverage in a single frame, making them a transformative solution. However, issues such as distorted views and blurriness arise, preventing accurate object detection on these images. Motivated by these challenges, this study proposes a novel approach that combines a ransformer-based image enhancement framework and ensemble learning technique to address these challenges and improve traffic monitoring accuracy, making significant contributions to the future of intelligent traffic management systems. Our proposed methodological framework won 5th place in the 2024 AI City Challenge, Track 4, with an F1 score of 0.5965 on experimental validation data. The experimental results demonstrate the effectiveness, efficiency, and robustness of the proposed system. Our code is publicly available at https://github.com/daitranskku/AIC2024-TRACK4-TEAM15.
Epistemic Uncertainty Quantification For Pre-trained Neural Network
Authors: Authors: Hanjing Wang, Qiang Ji
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10124
Pdf link: https://arxiv.org/pdf/2404.10124
Abstract Epistemic uncertainty quantification (UQ) identifies where models lack knowledge. Traditional UQ methods, often based on Bayesian neural networks, are not suitable for pre-trained non-Bayesian models. Our study addresses quantifying epistemic uncertainty for any pre-trained model, which does not need the original training data or model modifications and can ensure broad applicability regardless of network architectures or training techniques. Specifically, we propose a gradient-based approach to assess epistemic uncertainty, analyzing the gradients of outputs relative to model parameters, and thereby indicating necessary model adjustments to accurately represent the inputs. We first explore theoretical guarantees of gradient-based methods for epistemic UQ, questioning the view that this uncertainty is only calculable through differences between multiple models. We further improve gradient-driven UQ by using class-specific weights for integrating gradients and emphasizing distinct contributions from neural network layers. Additionally, we enhance UQ accuracy by combining gradient and perturbation methods to refine the gradients. We evaluate our approach on out-of-distribution detection, uncertainty calibration, and active learning, demonstrating its superiority over current state-of-the-art UQ methods for pre-trained models.
Using Long Short-term Memory (LSTM) to merge precipitation data over mountainous area in Sierra Nevada
Authors: Authors: Yihan Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2404.10135
Pdf link: https://arxiv.org/pdf/2404.10135
Abstract Obtaining reliable precipitation estimation with high resolutions in time and space is of great importance to hydrological studies. However, accurately estimating precipitation is a challenging task over high mountainous complex terrain. The three widely used precipitation measurement approaches, namely rainfall gauge, precipitation radars, and satellite-based precipitation sensors, have their own pros and cons in producing reliable precipitation products over complex areas. One way to decrease the detection error probability and improve data reliability is precipitation data merging. With the rapid advancements in computational capabilities and the escalating volume and diversity of earth observational data, Deep Learning (DL) models have gained considerable attention in geoscience. In this study, a deep learning technique, namely Long Short-term Memory (LSTM), was employed to merge a radar-based and a satellite-based Global Precipitation Measurement (GPM) precipitation product Integrated Multi-Satellite Retrievals for GPM (IMERG) precipitation product at hourly scale. The merged results are compared with the widely used reanalysis precipitation product, Multi-Radar Multi-Sensor (MRMS), and assessed against gauge observational data from the California Data Exchange Center (CDEC). The findings indicated that the LSTM-based merged precipitation notably underestimated gauge observations and, at times, failed to provide meaningful estimates, showing predominantly near-zero values. Relying solely on individual Quantitative Precipitation Estimates (QPEs) without additional meteorological input proved insufficient for generating reliable merged QPE. However, the merged results effectively captured the temporal trends of the observations, outperforming MRMS in this aspect. This suggested that incorporating bias correction techniques could potentially enhance the accuracy of the merged product.
High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layers
Authors: Authors: Luiz Schirmer, Guilherme Schardong, Vinícius da Silva, Rogério Santos, Hélio Lopes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.10170
Pdf link: https://arxiv.org/pdf/2404.10170
Abstract Earth structural heterogeneities have a remarkable role in the petroleum economy for both exploration and production projects. Automatic detection of detailed structural heterogeneities is challenging when considering modern machine learning techniques like deep neural networks. Typically, these techniques can be an excellent tool for assisted interpretation of such heterogeneities, but it heavily depends on the amount of data to be trained. We propose an efficient and cost-effective architecture for detecting seismic structural heterogeneities using Convolutional Neural Networks (CNNs) combined with Attention layers. The attention mechanism reduces costs and enhances accuracy, even in cases with relatively noisy data. Our model has half the parameters compared to the state-of-the-art, and it outperforms previous methods in terms of Intersection over Union (IoU) by 0.6% and precision by 0.4%. By leveraging synthetic data, we apply transfer learning to train and fine-tune the model, addressing the challenge of limited annotated data availability.
Anomaly Correction of Business Processes Using Transformer Autoencoder
Authors: Authors: Ziyou Gong, Xianwen Fang, Ping Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.10211
Pdf link: https://arxiv.org/pdf/2404.10211
Abstract Event log records all events that occur during the execution of business processes, so detecting and correcting anomalies in event log can provide reliable guarantee for subsequent process analysis. The previous works mainly include next event prediction based methods and autoencoder-based methods. These methods cannot accurately and efficiently detect anomalies and correct anomalies at the same time, and they all rely on the set threshold to detect anomalies. To solve these problems, we propose a business process anomaly correction method based on Transformer autoencoder. By using self-attention mechanism and autoencoder structure, it can efficiently process event sequences of arbitrary length, and can directly output corrected business process instances, so that it can adapt to various scenarios. At the same time, the anomaly detection is transformed into a classification problem by means of selfsupervised learning, so that there is no need to set a specific threshold in anomaly detection. The experimental results on several real-life event logs show that the proposed method is superior to the previous methods in terms of anomaly detection accuracy and anomaly correction results while ensuring high running efficiency.
TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content
Authors: Authors: Avinash Anand, Raj Jaiswal, Pijush Bhuyan, Mohit Gupta, Siddhesh Bangar, Md. Modassir Imam, Rajiv Ratn Shah, Shin'ichi Satoh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10305
Pdf link: https://arxiv.org/pdf/2404.10305
Abstract The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various systems such as search engines and Knowledge Graphs. Addressing the two main problems, namely table detection (TD) and table structure recognition (TSR), has traditionally been approached independently. In this research, we propose an end-to-end pipeline that integrates deep learning models, including DETR, CascadeTabNet, and PP OCR v2, to achieve comprehensive image-based table recognition. This integrated approach effectively handles diverse table styles, complex structures, and image distortions, resulting in improved accuracy and efficiency compared to existing methods like Table Transformers. Our system achieves simultaneous table detection (TD), table structure recognition (TSR), and table content recognition (TCR), preserving table structures and accurately extracting tabular data from document images. The integration of multiple models addresses the intricacies of table recognition, making our approach a promising solution for image-based table understanding, data extraction, and information retrieval applications. Our proposed approach achieves an IOU of 0.96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.
Multiple Mobile Target Detection and Tracking in Active Sonar Array Using a Track-Before-Detect Approach
Authors: Authors: Avi Abu, Nikola Miskovic, Oleg Chebotar, Neven Cukrov, Roee Diamant
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.10316
Pdf link: https://arxiv.org/pdf/2404.10316
Abstract We present an algorithm for detecting and tracking underwater mobile objects using active acoustic transmission of broadband chirp signals whose reflections are received by a hydrophone array. The method overcomes the problem of high false alarm rate by applying a track-before-detect ap- proach to the sequence of received reflections. A 2D time- space matrix is created for the reverberations received from each transmitted probe signal by performing delay and sum beamforming and pulse compression. The result is filtered by a 2D constant false alarm rate (CFAR) detector to identify reflection patterns corresponding to potential targets. Closely spaced signals for multiple probe transmissions are combined into blobs to avoid multiple detections of a single object. A track- before-detect method using a Nearly Constant Velocity (NCV) model is employed to track multiple objects. The position and velocity is estimated by the debiased converted measurement Kalman filter. Results are analyzed for simulated scenarios and for experiments at sea, where GPS tagged gilt-head seabream fish were tracked. Compared to two benchmark schemes, the results show a favorable track continuity and accuracy that is robust to the choice of detection threshold.
Application of Deep Learning Methods to Processing of Noisy Medical Video Data
Authors: Authors: Danil Afonchikov, Elena Kornaeva, Irina Makovik, Alexey Kornaev
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10319
Pdf link: https://arxiv.org/pdf/2404.10319
Abstract Cells count become a challenging problem when the cells move in a continuous stream, and their boundaries are difficult for visual detection. To resolve this problem we modified the training and decision making processes using curriculum learning and multi-view predictions techniques, respectively.
CARE to Compare: A real-world dataset for anomaly detection in wind turbine data
Authors: Authors: Christian Gück, Cyriana M. A. Roelofs, Stefan Faulstich
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.10320
Pdf link: https://arxiv.org/pdf/2404.10320
Abstract Anomaly detection plays a crucial role in the field of predictive maintenance for wind turbines, yet the comparison of different algorithms poses a difficult task because domain specific public datasets are scarce. Many comparisons of different approaches either use benchmarks composed of data from many different domains, inaccessible data or one of the few publicly available datasets which lack detailed information about the faults. Moreover, many publications highlight a couple of case studies where fault detection was successful. With this paper we publish a high quality dataset that contains data from 36 wind turbines across 3 different wind farms as well as the most detailed fault information of any public wind turbine dataset as far as we know. The new dataset contains 89 years worth of real-world operating data of wind turbines, distributed across 44 labeled time frames for anomalies that led up to faults, as well as 51 time series representing normal behavior. Additionally, the quality of training data is ensured by turbine-status-based labels for each data point. Furthermore, we propose a new scoring method, called CARE (Coverage, Accuracy, Reliability and Earliness), which takes advantage of the information depth that is present in the dataset to identify a good all-around anomaly detection model. This score considers the anomaly detection performance, the ability to recognize normal behavior properly and the capability to raise as few false alarms as possible while simultaneously detecting anomalies early.
Asset management, condition monitoring and Digital Twins: damage detection and virtual inspection on a reinforced concrete bridge
Authors: Authors: Arnulf Hagen, Trond Michael Andersen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10341
Pdf link: https://arxiv.org/pdf/2404.10341
Abstract In April 2021 Stava bridge, a main bridge on E6 in Norway, was abruptly closed for traffic. A structural defect had seriously compromised the bridge structural integrity. The Norwegian Public Roads Administration (NPRA) closed it, made a temporary solution and reopened with severe traffic restrictions. The incident was alerted through what constitutes the bridge Digital Twin processing data from Internet of Things sensors. The solution was crucial in online and offline diagnostics, the case demonstrating the value of technologies to tackle emerging dangerous situations as well as acting preventively. A critical and rapidly developing damage was detected in time to stop the development, but not in time to avoid the incident altogether. The paper puts risk in a broader perspective for an organization responsible for highway infrastructure. It positions online monitoring and Digital Twins in the context of Risk- and Condition-Based Maintenance. The situation that arose at Stava bridge, and how it was detected, analyzed, and diagnosed during virtual inspection, is described. The case demonstrates how combining physics-based methods with Machine Learning can facilitate damage detection and diagnostics. A summary of lessons learnt, both from technical and organizational perspectives, as well as plans of future work, is presented.
On the Universality of Spatially Coupled LDPC Codes Over Intersymbol Interference Channels
Authors: Authors: Mgeni Makambi Mashauri, Alexandre Graell i Amat, Michael Lentmaier
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2404.10348
Pdf link: https://arxiv.org/pdf/2404.10348
Abstract In this paper, we derive the exact input/output transfer functions of the optimal a-posteriori probability channel detector for a general ISI channel with erasures. Considering three channel impulse responses of different memory as an example, we compute the BP and MAP thresholds for regular spatially coupled LDPC codes with joint iterative detection and decoding. When we compare the results with the thresholds of ISI channels with Gaussian noise we observe an apparent inconsistency, i.e., a channel which performs better with erasures performs worse with AWGN. We show that this anomaly can be resolved by looking at the thresholds from an entropy perspective. We finally show that with spatial coupling we can achieve the symmetric information rates of different ISI channels using the same code.
Stampede Alert Clustering Algorithmic System Based on Tiny-Scale Strengthened DETR
Authors: Authors: Mingze Sun, Yiqing Wang, Zhenyi Zhao
Subjects: Social and Information Networks (cs.SI); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2404.10359
Pdf link: https://arxiv.org/pdf/2404.10359
Abstract A novel crowd stampede detection and prediction algorithm based on Deformable DETR is proposed to address the challenges of detecting a large number of small targets and target occlusion in crowded airport and train station environments. In terms of model design, the algorithm incorporates a multi-scale feature fusion module to enlarge the receptive field and enhance the detection capability of small targets. Furthermore, the deformable attention mechanism is improved to reduce missed detections and false alarms for critical targets. Additionally, a new algorithm is innovatively introduced for stampede event prediction and visualization. Experimental evaluations on the PKX-LHR dataset demonstrate that the enhanced algorithm achieves a 34% performance in small target detection accuracy while maintaining the original detection speed.
Camera clustering for scalable stream-based active distillation
Authors: Authors: Dani Manjah, Davide Cacciarelli, Christophe De Vleeschouwer, Benoit Macq
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10411
Pdf link: https://arxiv.org/pdf/2404.10411
Abstract We present a scalable framework designed to craft efficient lightweight models for video object detection utilizing self-training and knowledge distillation techniques. We scrutinize methodologies for the ideal selection of training images from video streams and the efficacy of model sharing across numerous cameras. By advocating for a camera clustering methodology, we aim to diminish the requisite number of models for training while augmenting the distillation dataset. The findings affirm that proper camera clustering notably amplifies the accuracy of distilled models, eclipsing the methodologies that employ distinct models for each camera or a universal model trained on the aggregate camera data.
Community detection and anomaly prediction in dynamic networks
Authors: Authors: Hadiseh Safdari, Caterina De Bacco
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2404.10468
Pdf link: https://arxiv.org/pdf/2404.10468
Abstract Anomaly detection is an essential task in the analysis of dynamic networks, as it can provide early warning of potential threats or abnormal behavior. We present a principled approach to detect anomalies in dynamic networks that integrates community structure as a foundational model for regular behavior. Our model identifies anomalies as irregular edges while capturing structural changes. Leveraging a Markovian approach for temporal transitions and incorporating structural information via latent variables for communities and anomaly detection, our model infers these hidden parameters to pinpoint abnormal interactions within the network. Our approach is evaluated on both synthetic and real-world datasets. Real-world network analysis shows strong anomaly detection across diverse scenarios. In a more specific study of transfers of professional male football players, we observe various types of unexpected patterns and investigate how the country and wealth of clubs influence interactions. Additionally, we identify anomalies between clubs with incompatible community memberships, but also instances of anomalous transactions between clubs with similar memberships. The latter is due in particular to the dynamic nature of the transactions, as we find that the frequency of transfers results in anomalous behaviors that are otherwise expected to interact as they belong to similar communities.
How quickly can you pack short paths? Engineering a search-tree algorithm for disjoint s-t paths of bounded length
Authors: Authors: Michael Kiran Huber
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2404.10469
Pdf link: https://arxiv.org/pdf/2404.10469
Abstract We study the Short Path Packing problem which asks, given a graph $G$, integers $k$ and $\ell$, and vertices $s$ and $t$, whether there exist $k$ pairwise internally vertex-disjoint $s$-$t$ paths of length at most $\ell$. The problem has been proven to be NP-hard and fixed-parameter tractable parameterized by $k$ and $\ell$. Most previous research on this problem has been theoretical with limited practical implemetations. We present an exact FPT-algorithm based on a search-tree approach in combination with greedy localization. While its worst case runtime complexity of $(k\cdot \ell^2)^{k\cdot \ell}\cdot n^{O(1)}$ is larger than the state of the art, the nature of search-tree algorithms allows for a broad range of potential optimizations. We exploit this potential by presenting techniques for input preprocessing, early detection of trivial and infeasible instances, and strategic selection of promising subproblems. Those approaches were implemented and heavily tested on a large dataset of diverse graphs. The results show that our heuristic improvements are very effective and that for the majority of instances, we can achieve fast runtimes.
Toward a Realistic Benchmark for Out-of-Distribution Detection
Authors: Authors: Pietro Recalcati, Fabio Garcea, Luca Piano, Fabrizio Lamberti, Lia Morra
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10474
Pdf link: https://arxiv.org/pdf/2404.10474
Abstract Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and validate OOD detection techniques. However, many of them are based on far-OOD samples drawn from very different distributions, and thus lack the complexity needed to capture the nuances of real-world scenarios. In this work, we introduce a comprehensive benchmark for OOD detection, based on ImageNet and Places365, that assigns individual classes as in-distribution or out-of-distribution depending on the semantic similarity with the training set. Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties. Experimental results on different OOD detection techniques show how their measured efficacy depends on the selected benchmark and how confidence-based techniques may outperform classifier-based ones on near-OOD samples.
Efficient optimal dispersed Haar-like filters for face detection
Authors: Authors: Zeinab Sedaghatjoo, Hossein Hosseinzadeh, Ahmad shirzadi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2404.10476
Pdf link: https://arxiv.org/pdf/2404.10476
Abstract This paper introduces a new dispersed Haar-like filter for efficiently detection face. The basic idea for finding the filter is maximising between-class and minimising within-class variance. The proposed filters can be considered as an optimal configuration dispersed Haar-like filters; filters with disjoint black and white parts.
An Enhanced Differential Grouping Method for Large-Scale Overlapping Problems
Authors: Authors: Maojiang Tian, Mingke Chen, Wei Du, Yang Tang, Yaochu Jin
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2404.10515
Pdf link: https://arxiv.org/pdf/2404.10515
Abstract Large-scale overlapping problems are prevalent in practical engineering applications, and the optimization challenge is significantly amplified due to the existence of shared variables. Decomposition-based cooperative coevolution (CC) algorithms have demonstrated promising performance in addressing large-scale overlapping problems. However, current CC frameworks designed for overlapping problems rely on grouping methods for the identification of overlapping problem structures and the current grouping methods for large-scale overlapping problems fail to consider both accuracy and efficiency simultaneously. In this article, we propose a two-stage enhanced grouping method for large-scale overlapping problems, called OEDG, which achieves accurate grouping while significantly reducing computational resource consumption. In the first stage, OEDG employs a grouping method based on the finite differences principle to identify all subcomponents and shared variables. In the second stage, we propose two grouping refinement methods, called subcomponent union detection (SUD) and subcomponent detection (SD), to enhance and refine the grouping results. SUD examines the information of the subcomponents and shared variables obtained in the previous stage, and SD corrects inaccurate grouping results. To better verify the performance of the proposed OEDG, we propose a series of novel benchmarks that consider various properties of large-scale overlapping problems, including the topology structure, overlapping degree, and separability. Extensive experimental results demonstrate that OEDG is capable of accurately grouping different types of large-scale overlapping problems while consuming fewer computational resources. Finally, we empirically verify that the proposed OEDG can effectively improve the optimization performance of diverse large-scale overlapping problems.
A Game-Theoretic Approach for PMU Deployment Against False Data Injection Attacks
Authors: Authors: Sajjad Maleki, Subhash Lakshminarayana, E. Veronica Belmega, Carsten Maple
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.10520
Pdf link: https://arxiv.org/pdf/2404.10520
Abstract Phasor Measurement Units (PMUs) are used in the measurement, control and protection of power grids. However, deploying PMUs at every bus in a power system is prohibitively expensive, necessitating partial PMU placement that can ensure system observability with minimal units. One consequence of this economic approach is increased system vulnerability to False Data Injection Attacks (FDIAs). This paper proposes a zero-sum game-based approach to strategically place an additional PMU (following the initial optimal PMU deployment that ensures full observability) to bolster robustness against FDIAs by introducing redundancy in attack-susceptible areas. To compute the Nash equilibrium (NE) solution, we leverage a reinforcement learning algorithm that mitigates the need for complete knowledge of the opponent's actions. The proposed PMU deployment algorithm increases the detection rate of FDIA by 36% compared to benchmark algorithms.
SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception
Authors: Authors: Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10540
Pdf link: https://arxiv.org/pdf/2404.10540
Abstract Recently, event-based vision sensors have gained attention for autonomous driving applications, as conventional RGB cameras face limitations in handling challenging dynamic conditions. However, the availability of real-world and synthetic event-based vision datasets remains limited. In response to this gap, we present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-based dataset using multiple dynamic vision sensors within the CARLA simulator. Data sequences are recorded across diverse lighting (noon, nighttime, twilight) and weather conditions (clear, cloudy, wet, rainy, foggy) with domain shifts (discrete and continuous). SEVD spans urban, suburban, rural, and highway scenes featuring various classes of objects (car, truck, van, bicycle, motorcycle, and pedestrian). Alongside event data, SEVD includes RGB imagery, depth maps, optical flow, semantic, and instance segmentation, facilitating a comprehensive understanding of the scene. Furthermore, we evaluate the dataset using state-of-the-art event-based (RED, RVT) and frame-based (YOLOv8) methods for traffic participant detection tasks and provide baseline benchmarks for assessment. Additionally, we conduct experiments to assess the synthetic event-based dataset's generalization capabilities. The dataset is available at https://eventbasedvision.github.io/SEVD
Shining Light into the Tunnel: Understanding and Classifying Network Traffic of Residential Proxies
Authors: Authors: Ronghong Huang, Dongfang Zhao, Xianghang Mi, Xiaofeng Wang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.10610
Pdf link: https://arxiv.org/pdf/2404.10610
Abstract Emerging in recent years, residential proxies (RESIPs) feature multiple unique characteristics when compared with traditional network proxies (e.g., commercial VPNs), particularly, the deployment in residential networks rather than data center networks, the worldwide distribution in tens of thousands of cities and ISPs, and the large scale of millions of exit nodes. All these factors allow RESIP users to effectively masquerade their traffic flows as ones from authentic residential users, which leads to the increasing adoption of RESIP services, especially in malicious online activities. However, regarding the (malicious) usage of RESIPs (i.e., what traffic is relayed by RESIPs), current understanding turns out to be insufficient. Particularly, previous works on RESIP traffic studied only the maliciousness of web traffic destinations and the suspicious patterns of visiting popular websites. Also, a general methodology is missing regarding capturing large-scale RESIP traffic and analyzing RESIP traffic for security risks. Furthermore, considering many RESIP nodes are found to be located in corporate networks and are deployed without proper authorization from device owners or network administrators, it is becoming increasingly necessary to detect and block RESIP traffic flows, which unfortunately is impeded by the scarcity of realistic RESIP traffic datasets and effective detection methodologies. To fill in these gaps, multiple novel tools have been designed and implemented in this study, which include a general framework to deploy RESIP nodes and collect RESIP traffic in a distributed manner, a RESIP traffic analyzer to efficiently process RESIP traffic logs and surface out suspicious traffic flows, and multiple machine learning based RESIP traffic classifiers to timely and accurately detect whether a given traffic flow is RESIP traffic or not.
A Calibrated and Automated Simulator for Innovations in 5G
Authors: Authors: Conrado Boeira, Antor Hasan, Khaleda Papry, Yue Ju, Zhongwen Zhu, Israat Haque
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.10643
Pdf link: https://arxiv.org/pdf/2404.10643
Abstract The rise of 5G deployments has created the environment for many emerging technologies to flourish. Self-driving vehicles, Augmented and Virtual Reality, and remote operations are examples of applications that leverage 5G networks' support for extremely low latency, high bandwidth, and increased throughput. However, the complex architecture of 5G hinders innovation due to the lack of accessibility to testbeds or realistic simulators with adequate 5G functionalities. Also, configuring and managing simulators are complex and time consuming. Finally, the lack of adequate representative data hinders the data-driven designs in 5G campaigns. Thus, we calibrated a system-level open-source simulator, Simu5G, following 3GPP guidelines to enable faster innovation in the 5G domain. Furthermore, we developed an API for automatic simulator configuration without knowing the underlying architectural details. Finally, we demonstrate the usage of the calibrated and automated simulator by developing an ML-based anomaly detection in a 5G Radio Access Network (RAN).
Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks
Authors: Authors: Mohsen Hami, Mahdi JameBozorg
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.10664
Pdf link: https://arxiv.org/pdf/2404.10664
Abstract Images captured from the real world are often affected by different types of noise, which can significantly impact the performance of Computer Vision systems and the quality of visual data. This study presents a novel approach for defect detection in casting product noisy images, specifically focusing on submersible pump impellers. The methodology involves utilizing deep learning models such as VGG16, InceptionV3, and other models in both the spatial and frequency domains to identify noise types and defect status. The research process begins with preprocessing images, followed by applying denoising techniques tailored to specific noise categories. The goal is to enhance the accuracy and robustness of defect detection by integrating noise detection and denoising into the classification pipeline. The study achieved remarkable results using VGG16 for noise type classification in the frequency domain, achieving an accuracy of over 99%. Removal of salt and pepper noise resulted in an average SSIM of 87.9, while Gaussian noise removal had an average SSIM of 64.0, and periodic noise removal yielded an average SSIM of 81.6. This comprehensive approach showcases the effectiveness of the deep AutoEncoder model and median filter, for denoising strategies in real-world industrial applications. Finally, our study reports significant improvements in binary classification accuracy for defect detection compared to previous methods. For the VGG16 classifier, accuracy increased from 94.6% to 97.0%, demonstrating the effectiveness of the proposed noise detection and denoising approach. Similarly, for the InceptionV3 classifier, accuracy improved from 84.7% to 90.0%, further validating the benefits of integrating noise analysis into the classification pipeline.
Network architecture search of X-ray based scientific applications
Authors: Authors: Adarsha Balaji, Ramyad Hadidi, Gregory Kollmer, Mohammed E. Fouda, Prasanna Balaprakash
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.10689
Pdf link: https://arxiv.org/pdf/2404.10689
Abstract X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the design and development of the neural network models depends on time and labor intensive tuning of the model by application experts. To that end, we propose a hyperparameter (HPS) and neural architecture search (NAS) approach to automate the design and optimization of the neural network models for model size, energy consumption and throughput. We demonstrate the improved performance of the auto-tuned models when compared to the manually tuned BraggNN and PtychoNN benchmark. We study and demonstrate the importance of the exploring the search space of tunable hyperparameters in enhancing the performance of bragg peak detection and ptychographic reconstruction. Our NAS and HPS of (1) BraggNN achieves a 31.03\% improvement in bragg peak detection accuracy with a 87.57\% reduction in model size, and (2) PtychoNN achieves a 16.77\% improvement in model accuracy and a 12.82\% reduction in model size when compared to the baseline PtychoNN model. When inferred on the Orin-AGX platform, the optimized Braggnn and Ptychonn models demonstrate a 10.51\% and 9.47\% reduction in inference latency and a 44.18\% and 15.34\% reduction in energy consumption when compared to their respective baselines, when inferred in the Orin-AGX edge platform.
GazeHTA: End-to-end Gaze Target Detection with Head-Target Association
Authors: Authors: Zhi-Yi Lin, Jouh Yeong Chew, Jan van Gemert, Xucong Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10718
Pdf link: https://arxiv.org/pdf/2404.10718
Abstract We propose an end-to-end approach for gaze target detection: predicting a head-target connection between individuals and the target image regions they are looking at. Most of the existing methods use independent components such as off-the-shelf head detectors or have problems in establishing associations between heads and gaze targets. In contrast, we investigate an end-to-end multi-person Gaze target detection framework with Heads and Targets Association (GazeHTA), which predicts multiple head-target instances based solely on input scene image. GazeHTA addresses challenges in gaze target detection by (1) leveraging a pre-trained diffusion model to extract scene features for rich semantic understanding, (2) re-injecting a head feature to enhance the head priors for improved head understanding, and (3) learning a connection map as the explicit visual associations between heads and gaze targets. Our extensive experimental results demonstrate that GazeHTA outperforms state-of-the-art gaze target detection methods and two adapted diffusion-based baselines on two standard datasets.
Watch Your Step: Optimal Retrieval for Continual Learning at Scale
Authors: Authors: Truman Hickok, Dhireesha Kudithipudi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10758
Pdf link: https://arxiv.org/pdf/2404.10758
Abstract One of the most widely used approaches in continual learning is referred to as replay. Replay methods support interleaved learning by storing past experiences in a replay buffer. Although there are methods for selectively constructing the buffer and reprocessing its contents, there is limited exploration of the problem of selectively retrieving samples from the buffer. Current solutions have been tested in limited settings and, more importantly, in isolation. Existing work has also not explored the impact of duplicate replays on performance. In this work, we propose a framework for evaluating selective retrieval strategies, categorized by simple, independent class- and sample-selective primitives. We evaluated several combinations of existing strategies for selective retrieval and present their performances. Furthermore, we propose a set of strategies to prevent duplicate replays and explore whether new samples with low loss values can be learned without replay. In an effort to match our problem setting to a realistic continual learning pipeline, we restrict our experiments to a setting involving a large, pre-trained, open vocabulary object detection model, which is fully fine-tuned on a sequence of 15 datasets.
Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark
Authors: Authors: Jiangning Zhang, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Zhucun Xue, Yong Liu, Guansong Pang, Dacheng Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10760
Pdf link: https://arxiv.org/pdf/2404.10760
Abstract Anomaly detection (AD) is often focused on detecting anomaly areas for industrial quality inspection and medical lesion examination. However, due to the specific scenario targets, the data scale for AD is relatively small, and evaluation metrics are still deficient compared to classic vision tasks, such as object detection and semantic segmentation. To fill these gaps, this work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field. This enables fair evaluation and sustainable development for different methods on this challenging benchmark. Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods. Inspired by the metrics in the segmentation field, we further propose several more practical threshold-dependent AD-specific metrics, ie, m$F1$$^{.2}{.8}$, mAcc$^{.2}{.8}$, mIoU$^{.2}{.8}$, and mIoU-max. Motivated by GAN inversion's high-quality reconstruction capability, we propose a simple but more powerful InvAD framework to achieve high-quality feature reconstruction. Our method improves the effectiveness of reconstruction-based methods on popular MVTec AD, VisA, and our newly proposed COCO-AD datasets under a multi-class unsupervised setting, where only a single detection model is trained to detect anomalies from different classes. Extensive ablation experiments have demonstrated the effectiveness of each component of our InvAD. Full codes and models are available at https://github.com/zhangzjn/ader.
Keyword: face recognition

Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data
Authors: Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw, Cheng Yaw Low, Hao Liu, Chuyi Wang, Qing Zuo, Zhixiang He, Hatef Otroshi Shahreza, Anjith George, Alexander Unnervik, Parsa Rahimi, Sébastien Marcel, Pedro C. Neto, Marco Huber, Jan Niklas Kolf, Naser Damer, Fadi Boutros, Jaime S. Cardoso, Ana F. Sequeira, Andrea Atzori, Gianni Fenu, Mirko Marras, Vitomir Štruc, Jiang Yu, Zhangjie Li, Jichun Li, Weisong Zhao, Zhen Lei, Xiangyu Zhu, Xiao-Yu Zhang, Bernardo Biesseck, et al. (4 additional authors not shown)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10378
Pdf link: https://arxiv.org/pdf/2404.10378
Abstract Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at CVPR 2024. FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations, including data privacy concerns, demographic biases, generalization to novel scenarios, and performance constraints in challenging situations such as aging, pose variations, and occlusions. Unlike the 1st edition, in which synthetic data from DCFace and GANDiffFace methods was only allowed to train face recognition systems, in this 2nd edition we propose new sub-tasks that allow participants to explore novel face generative methods. The outcomes of the 2nd FRCSyn Challenge, along with the proposed experimental protocol and benchmarking contribute significantly to the application of synthetic data to face recognition.
Adversarial Identity Injection for Semantic Face Image Synthesis
Authors: Authors: Giuseppe Tarollo, Tomaso Fontanini, Claudio Ferrari, Guido Borghi, Andrea Prati
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10408
Pdf link: https://arxiv.org/pdf/2404.10408
Abstract Nowadays, deep learning models have reached incredible performance in the task of image generation. Plenty of literature works address the task of face generation and editing, with human and automatic systems that struggle to distinguish what's real from generated. Whereas most systems reached excellent visual generation quality, they still face difficulties in preserving the identity of the starting input subject. Among all the explored techniques, Semantic Image Synthesis (SIS) methods, whose goal is to generate an image conditioned on a semantic segmentation mask, are the most promising, even though preserving the perceived identity of the input subject is not their main concern. Therefore, in this paper, we investigate the problem of identity preservation in face image generation and present an SIS architecture that exploits a cross-attention mechanism to merge identity, style, and semantic features to generate faces whose identities are as similar as possible to the input ones. Experimental results reveal that the proposed method is not only suitable for preserving the identity but is also effective in the face recognition adversarial attack, i.e. hiding a second identity in the generated faces.
Keyword: augmentation

Vision Augmentation Prediction Autoencoder with Attention Design (VAPAAD)
Authors: Authors: Yiqiao Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.10096
Pdf link: https://arxiv.org/pdf/2404.10096
Abstract Despite significant advancements in sequence prediction, current methods lack attention-based mechanisms for next-frame prediction. Our work introduces VAPAAD or Vision Augmentation Prediction Autoencoder with Attention Design, an innovative model that enhances predictive performance by integrating attention designs, allowing for nuanced understanding and handling of temporal dynamics in video sequences. We demonstrate using the famous Moving MNIST dataset the robust performance of the proposed model and potential applicability of such design in the literature.
Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints
Authors: Authors: Faraz Faruqi, Yingtao Tian, Vrushank Phadnis, Varun Jampani, Stefanie Mueller
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.10142
Pdf link: https://arxiv.org/pdf/2404.10142
Abstract Generative AI tools are becoming more prevalent in 3D modeling, enabling users to manipulate or create new models with text or images as inputs. This makes it easier for users to rapidly customize and iterate on their 3D designs and explore new creative ideas. These methods focus on the aesthetic quality of the 3D models, refining them to look similar to the prompts provided by the user. However, when creating 3D models intended for fabrication, designers need to trade-off the aesthetic qualities of a 3D model with their intended physical properties. To be functional post-fabrication, 3D models have to satisfy structural constraints informed by physical principles. Currently, such requirements are not enforced by generative AI tools. This leads to the development of aesthetically appealing, but potentially non-functional 3D geometry, that would be hard to fabricate and use in the real world. This workshop paper highlights the limitations of generative AI tools in translating digital creations into the physical world and proposes new augmentations to generative AI tools for creating physically viable 3D models. We advocate for the development of tools that manipulate or generate 3D models by considering not only the aesthetic appearance but also using physical properties as constraints. This exploration seeks to bridge the gap between digital creativity and real-world applicability, extending the creative potential of generative AI into the tangible domain.
Awareness of uncertainty in classification using a multivariate model and multi-views
Authors: Authors: Alexey Kornaev, Elena Kornaeva, Oleg Ivanov, Ilya Pershin, Danis Alukaev
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10314
Pdf link: https://arxiv.org/pdf/2404.10314
Abstract One of the ways to make artificial intelligence more natural is to give it some room for doubt. Two main questions should be resolved in that way. First, how to train a model to estimate uncertainties of its own predictions? And then, what to do with the uncertain predictions if they appear? First, we proposed an uncertainty-aware negative log-likelihood loss for the case of N-dimensional multivariate normal distribution with spherical variance matrix to the solution of N-classes classification tasks. The loss is similar to the heteroscedastic regression loss. The proposed model regularizes uncertain predictions, and trains to calculate both the predictions and their uncertainty estimations. The model fits well with the label smoothing technique. Second, we expanded the limits of data augmentation at the training and test stages, and made the trained model to give multiple predictions for a given number of augmented versions of each test sample. Given the multi-view predictions together with their uncertainties and confidences, we proposed several methods to calculate final predictions, including mode values and bin counts with soft and hard weights. For the latter method, we formalized the model tuning task in the form of multimodal optimization with non-differentiable criteria of maximum accuracy, and applied particle swarm optimization to solve the tuning task. The proposed methodology was tested using CIFAR-10 dataset with clean and noisy labels and demonstrated good results in comparison with other uncertainty estimation methods related to sample selection, co-teaching, and label smoothing.
Improving Bracket Image Restoration and Enhancement with Flow-guided Alignment and Enhanced Feature Aggregation
Authors: Authors: Wenjie Lin, Zhen Liu, Chengzhi Jiang, Mingyan Han, Ting Jiang, Shuaicheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10358
Pdf link: https://arxiv.org/pdf/2404.10358
Abstract In this paper, we address the Bracket Image Restoration and Enhancement (BracketIRE) task using a novel framework, which requires restoring a high-quality high dynamic range (HDR) image from a sequence of noisy, blurred, and low dynamic range (LDR) multi-exposure RAW inputs. To overcome this challenge, we present the IREANet, which improves the multiple exposure alignment and aggregation with a Flow-guide Feature Alignment Module (FFAM) and an Enhanced Feature Aggregation Module (EFAM). Specifically, the proposed FFAM incorporates the inter-frame optical flow as guidance to facilitate the deformable alignment and spatial attention modules for better feature alignment. The EFAM further employs the proposed Enhanced Residual Block (ERB) as a foundational component, wherein a unidirectional recurrent network aggregates the aligned temporal features to better reconstruct the results. To improve model generalization and performance, we additionally employ the Bayer preserving augmentation (BayerAug) strategy to augment the multi-exposure RAW inputs. Our experimental evaluations demonstrate that the proposed IREANet shows state-of-the-art performance compared with previous methods.
Offline Trajectory Generalization for Offline Reinforcement Learning
Authors: Authors: Ziqi Zhao, Zhaochun Ren, Liu Yang, Fajie Yuan, Pengjie Ren, Zhumin Chen, jun Ma, Xin Xin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.10393
Pdf link: https://arxiv.org/pdf/2404.10393
Abstract Offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories. Existing methods for offline RL either constrain the learned policy to the support of offline data or utilize model-based virtual environments to generate simulated rollouts. However, these methods suffer from (i) poor generalization to unseen states; and (ii) trivial improvement from low-qualified rollout simulation. In this paper, we propose offline trajectory generalization through world transformers for offline reinforcement learning (OTTO). Specifically, we use casual Transformers, a.k.a. World Transformers, to predict state dynamics and the immediate reward. Then we propose four strategies to use World Transformers to generate high-rewarded trajectory simulation by perturbing the offline data. Finally, we jointly use offline data with simulated data to train an offline RL algorithm. OTTO serves as a plug-in module and can be integrated with existing offline RL methods to enhance them with better generalization capability of transformers and high-rewarded data augmentation. Conducting extensive experiments on D4RL benchmark datasets, we verify that OTTO significantly outperforms state-of-the-art offline RL methods.
Self-Supervised Visual Preference Alignment
Authors: Authors: Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10501
Pdf link: https://arxiv.org/pdf/2404.10501
Abstract This paper makes the first attempt towards unsupervised preference alignment in Vision-Language Models (VLMs). We generate chosen and rejected responses with regard to the original and augmented image pairs, and conduct preference alignment with direct preference optimization. It is based on a core idea: properly designed augmentation to the image input will induce VLM to generate false but hard negative responses, which helps the model to learn from and produce more robust and powerful answers. The whole pipeline no longer hinges on supervision from GPT4 or human involvement during alignment, and is highly efficient with few lines of code. With only 8k randomly sampled unsupervised data, it achieves 90\% relative score to GPT-4 on complex reasoning in LLaVA-Bench, and improves LLaVA-7B/13B by 6.7\%/5.6\% score on complex multi-modal benchmark MM-Vet. Visualizations shows its improved ability to align with user-intentions. A series of ablations are firmly conducted to reveal the latent mechanism of the approach, which also indicates its potential towards further scaling. Code will be available.
Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms
Authors: Authors: Zehao Zhou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.10645
Pdf link: https://arxiv.org/pdf/2404.10645
Abstract Distributed Distributional DrQ is a model-free and off-policy RL algorithm for continuous control tasks based on the state and observation of the agent, which is an actor-critic method with the data-augmentation and the distributional perspective of critic value function. Aim to learn to control the agent and master some tasks in a high-dimensional continuous space. DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks. Here Distributed Distributional DrQ uses Distributed Distributional DDPG as the backbone, and this modification aims to achieve better performance in some hard continuous control tasks through the better expression ability of distributional value function and distributed actor policies.

LeeKyungwook / get-arxiv-noti

New submissions for Wed, 17 Apr 24 #1066

Keyword: detection

Optimizing Malware Detection in IoT Networks: Leveraging Resource-Aware Distributed Computing for Enhanced Security

Enhanced Low-Complexity Receiver Design for Short Block Transmission Systems

Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stres

Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets

Epistemic Uncertainty Quantification For Pre-trained Neural Network

Using Long Short-term Memory (LSTM) to merge precipitation data over mountainous area in Sierra Nevada

High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layers

Anomaly Correction of Business Processes Using Transformer Autoencoder

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

Multiple Mobile Target Detection and Tracking in Active Sonar Array Using a Track-Before-Detect Approach

Application of Deep Learning Methods to Processing of Noisy Medical Video Data

CARE to Compare: A real-world dataset for anomaly detection in wind turbine data

Asset management, condition monitoring and Digital Twins: damage detection and virtual inspection on a reinforced concrete bridge

On the Universality of Spatially Coupled LDPC Codes Over Intersymbol Interference Channels

Stampede Alert Clustering Algorithmic System Based on Tiny-Scale Strengthened DETR

Camera clustering for scalable stream-based active distillation

Community detection and anomaly prediction in dynamic networks

How quickly can you pack short paths? Engineering a search-tree algorithm for disjoint s-t paths of bounded length

Toward a Realistic Benchmark for Out-of-Distribution Detection

Efficient optimal dispersed Haar-like filters for face detection

An Enhanced Differential Grouping Method for Large-Scale Overlapping Problems

A Game-Theoretic Approach for PMU Deployment Against False Data Injection Attacks

SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception

Shining Light into the Tunnel: Understanding and Classifying Network Traffic of Residential Proxies

A Calibrated and Automated Simulator for Innovations in 5G

Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks

Network architecture search of X-ray based scientific applications

GazeHTA: End-to-end Gaze Target Detection with Head-Target Association

Watch Your Step: Optimal Retrieval for Continual Learning at Scale

Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark

Keyword: face recognition

Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Adversarial Identity Injection for Semantic Face Image Synthesis

Keyword: augmentation

Vision Augmentation Prediction Autoencoder with Attention Design (VAPAAD)

Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints

Awareness of uncertainty in classification using a multivariate model and multi-views

Improving Bracket Image Restoration and Enhancement with Flow-guided Alignment and Enhanced Feature Aggregation

Offline Trajectory Generalization for Offline Reinforcement Learning

Self-Supervised Visual Preference Alignment

Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms