Abstract
This paper investigates the emerging challenges of conflict detection and mitigation in Open Radio Access Network (O-RAN). Conflicts between xApps can arise that affect network performance and stability due to the disaggregated nature of O-RAN. This work provides a detailed theoretical framework of Extended Application (xApp)-level conflicts, i.e., direct, indirect, and implicit conflicts. Leveraging conflict graphs, we further highlight how conflicts impact Key Performance Indicators (KPIs) and explore strategies for conflict detection using Service Level Agreements (SLAs) and Quality of Service (QoS) thresholds. We evaluate the effectiveness of several mitigation strategies in a simulated environment with Mobility Robustness Optimization (MRO) and Energy Saving (ES) xApps and present experimental results showing comparisons among these strategies. The findings of this research provide significant insights for enhancing O-RAN deployments with flexible and efficient conflict management.
Title:
Maximal Extractable Value in Decentralized Finance: Taxonomy, Detection, and Mitigation
Authors: Huned Materwala, Shraddha M. Naik, Aya Taha, Tala Abdulrahman Abed, Davor Svetinovic
Subjects: Subjects:
Cryptography and Security (cs.CR); Computational Engineering, Finance, and Science (cs.CE); Computers and Society (cs.CY)
Abstract
Decentralized Finance (DeFi) leverages blockchain-enabled smart contracts to deliver automated and trustless financial services without the need for intermediaries. However, the public visibility of financial transactions on the blockchain can be exploited, as participants can reorder, insert, or remove transactions to extract value, often at the expense of others. This extracted value is known as the Maximal Extractable Value (MEV). MEV causes financial losses and consensus instability, disrupting the security, efficiency, and decentralization goals of the DeFi ecosystem. Therefore, it is crucial to analyze, detect, and mitigate MEV to safeguard DeFi. Our comprehensive survey offers a holistic view of the MEV landscape in the DeFi ecosystem. We present an in-depth understanding of MEV through a novel taxonomy of MEV transactions supported by real transaction examples. We perform a critical comparative analysis of various MEV detection approaches, evaluating their effectiveness in identifying different transaction types. Furthermore, we assess different categories of MEV mitigation strategies and discuss their limitations. We identify the challenges of current mitigation and detection approaches and discuss potential solutions. This survey provides valuable insights for researchers, developers, stakeholders, and policymakers, helping to curb and democratize MEV for a more secure and efficient DeFi ecosystem.
Title:
LLM-based Continuous Intrusion Detection Framework for Next-Gen Networks
Abstract
In this paper, we present an adaptive framework designed for the continuous detection, identification and classification of emerging attacks in network traffic. The framework employs a transformer encoder architecture, which captures hidden patterns in a bidirectional manner to differentiate between malicious and legitimate traffic. Initially, the framework focuses on the accurate detection of malicious activities, achieving a perfect recall of 100\% in distinguishing between attack and benign traffic. Subsequently, the system incrementally identifies unknown attack types by leveraging a Gaussian Mixture Model (GMM) to cluster features derived from high-dimensional BERT embeddings. This approach allows the framework to dynamically adjust its identification capabilities as new attack clusters are discovered, maintaining high detection accuracy. Even after integrating additional unknown attack clusters, the framework continues to perform at a high level, achieving 95.6\% in both classification accuracy and this http URL results demonstrate the effectiveness of the proposed framework in adapting to evolving threats while maintaining high accuracy in both detection and identification tasks. Our ultimate goal is to develop a scalable, real-time intrusion detection system that can continuously evolve with the ever-changing network threat landscape.
Title:
Exploring Feature Importance and Explainability Towards Enhanced ML-Based DoS Detection in AI Systems
Authors: Paul Badu Yakubu, Evans Owusu, Lesther Santana, Mohamed Rahouti, Abdellah Chehri, Kaiqi Xiong
Abstract
Denial of Service (DoS) attacks pose a significant threat in the realm of AI systems security, causing substantial financial losses and downtime. However, AI systems' high computational demands, dynamic behavior, and data variability make monitoring and detecting DoS attacks challenging. Nowadays, statistical and machine learning (ML)-based DoS classification and detection approaches utilize a broad range of feature selection mechanisms to select a feature subset from networking traffic datasets. Feature selection is critical in enhancing the overall model performance and attack detection accuracy while reducing the training time. In this paper, we investigate the importance of feature selection in improving ML-based detection of DoS attacks. Specifically, we explore feature contribution to the overall components in DoS traffic datasets by utilizing statistical analysis and feature engineering approaches. Our experimental findings demonstrate the usefulness of the thorough statistical analysis of DoS traffic and feature engineering in understanding the behavior of the attack and identifying the best feature selection for ML-based DoS classification and detection.
Title:
Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection
Authors: Geng Yu, Jianing Zhu, Jiangchao Yao, Bo Han
Abstract
Out-of-distribution (OOD) detection is crucial for deploying reliable machine learning models in open-world applications. Recent advances in CLIP-based OOD detection have shown promising results via regularizing prompt tuning with OOD features extracted from ID data. However, the irrelevant context mined from ID data can be spurious due to the inaccurate foreground-background decomposition, thus limiting the OOD detection performance. In this work, we propose a novel framework, namely, Self-Calibrated Tuning (SCT), to mitigate this problem for effective OOD detection with only the given few-shot ID data. Specifically, SCT introduces modulating factors respectively on the two components of the original learning objective. It adaptively directs the optimization process between the two tasks during training on data with different prediction uncertainty to calibrate the influence of OOD regularization, which is compatible with many prompt tuning based OOD detection methods. Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed SCT. The code is publicly available.
Title:
TDDBench: A Benchmark for Training data detection
Authors: Zhihao Zhu, Yi Yang, Defu Lian
Subjects: Subjects:
Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Training Data Detection (TDD) is a task aimed at determining whether a specific data instance is used to train a machine learning model. In the computer security literature, TDD is also referred to as Membership Inference Attack (MIA). Given its potential to assess the risks of training data breaches, ensure copyright authentication, and verify model unlearning, TDD has garnered significant attention in recent years, leading to the development of numerous methods. Despite these advancements, there is no comprehensive benchmark to thoroughly evaluate the effectiveness of TDD methods. In this work, we introduce TDDBench, which consists of 13 datasets spanning three data modalities: image, tabular, and text. We benchmark 21 different TDD methods across four detection paradigms and evaluate their performance from five perspectives: average detection performance, best detection performance, memory consumption, and computational efficiency in both time and memory. With TDDBench, researchers can identify bottlenecks and areas for improvement in TDD algorithms, while practitioners can make informed trade-offs between effectiveness and efficiency when selecting TDD algorithms for specific use cases. Our large-scale benchmarking also reveals the generally unsatisfactory performance of TDD algorithms across different datasets. To enhance accessibility and reproducibility, we open-source TDDBench for the research community.
Title:
Enhanced Real-Time Threat Detection in 5G Networks: A Self-Attention RNN Autoencoder Approach for Spectral Intrusion Analysis
Authors: Mohammadreza Kouchaki, Minglong Zhang, Aly S. Abdalla, Guangchen Lan, Christopher G. Brinton, Vuk Marojevic
Subjects: Subjects:
Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract
In the rapidly evolving landscape of 5G technology, safeguarding Radio Frequency (RF) environments against sophisticated intrusions is paramount, especially in dynamic spectrum access and management. This paper presents an enhanced experimental model that integrates a self-attention mechanism with a Recurrent Neural Network (RNN)-based autoencoder for the detection of anomalous spectral activities in 5G networks at the waveform level. Our approach, grounded in time-series analysis, processes in-phase and quadrature (I/Q) samples to identify irregularities that could indicate potential jamming attacks. The model's architecture, augmented with a self-attention layer, extends the capabilities of RNN autoencoders, enabling a more nuanced understanding of temporal dependencies and contextual relationships within the RF spectrum. Utilizing a simulated 5G Radio Access Network (RAN) test-bed constructed with srsRAN 5G and Software Defined Radios (SDRs), we generated a comprehensive stream of data that reflects real-world RF spectrum conditions and attack scenarios. The model is trained to reconstruct standard signal behavior, establishing a normative baseline against which deviations, indicative of security threats, are identified. The proposed architecture is designed to balance between detection precision and computational efficiency, so the LSTM network, enriched with self-attention, continues to optimize for minimal execution latency and power consumption. Conducted on a real-world SDR-based testbed, our results demonstrate the model's improved performance and accuracy in threat detection. Keywords: self-attention, real-time intrusion detection, RNN autoencoder, Transformer architecture, LSTM, time series anomaly detection, 5G Security, spectrum access security.
Title:
Blockchain-Based Multi-Path Mobile Access Point Selection for Secure 5G VANETs
Authors: Zhiou Zhang, Weian Guo, Li Li, Dongyang Li
Subjects: Subjects:
Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Abstract
This letter presents a blockchain-based multi-path mobile access point (MAP) selection strategy for secure 5G vehicular ad-hoc networks (VANETs). The proposed method leverages blockchain technology for decentralized, transparent, and secure MAP selection, while the multi-path transmission strategy enhances network reliability and reduces communication delays. A trust-based attack detection mechanism is integrated to ensure network security. Simulation results demonstrate that the proposed algorithm reduces both handover frequency and average communication delay by over 80%, and successfully identifies and excludes more than 95% of Sybil nodes, ensuring reliable and secure communication in highly dynamic vehicular environments.
Title:
Enhancing Maritime Situational Awareness through End-to-End Onboard Raw Data Analysis
Authors: Roberto Del Prete, Manuel Salvoldi, Domenico Barretta, Nicolas Longépé, Gabriele Meoni, Arnon Karnieli, Maria Daniela Graziano, Alfredo Renga
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
Satellite-based onboard data processing is crucial for time-sensitive applications requiring timely and efficient rapid response. Advances in edge artificial intelligence are shifting computational power from ground-based centers to on-orbit platforms, transforming the "sensing-communication-decision-feedback" cycle and reducing latency from acquisition to delivery. The current research presents a framework addressing the strict bandwidth, energy, and latency constraints of small satellites, focusing on maritime monitoring. The study contributes three main innovations. Firstly, it investigates the application of deep learning techniques for direct ship detection and classification from raw satellite imagery. By simplifying the onboard processing chain, our approach facilitates direct analyses without requiring computationally intensive steps such as calibration and ortho-rectification. Secondly, to address the scarcity of raw satellite data, we introduce two novel datasets, VDS2Raw and VDV2Raw, which are derived from raw data from Sentinel-2 and Vegetation and Environment Monitoring New Micro Satellite (VENuS) missions, respectively, and enriched with Automatic Identification System (AIS) records. Thirdly, we characterize the tasks' optimal single and multiple spectral band combinations through statistical and feature-based analyses validated on both datasets. In sum, we demonstrate the feasibility of the proposed method through a proof-of-concept on CubeSat-like hardware, confirming the models' potential for operational satellite-based maritime monitoring.
Title:
Solving Trojan Detection Competitions with Linear Weight Classification
Abstract
Neural networks can conceal malicious Trojan backdoors that allow a trigger to covertly change the model behavior. Detecting signs of these backdoors, particularly without access to any triggered data, is the subject of ongoing research and open challenges. In one common formulation of the problem, we are given a set of clean and poisoned models and need to predict whether a given test model is clean or poisoned. In this paper, we introduce a detector that works remarkably well across many of the existing datasets and domains. It is obtained by training a binary classifier on a large number of models' weights after performing a few different pre-processing steps including feature selection and standardization, reference model weights subtraction, and model alignment prior to detection. We evaluate this algorithm on a diverse set of Trojan detection benchmarks and domains and examine the cases where the approach is most and least effective.
Title:
An Application-Agnostic Automatic Target Recognition System Using Vision Language Models
Authors: Anthony Palladino, Dana Gajewski, Abigail Aronica, Patryk Deptula, Alexander Hamme, Seiyoung C. Lee, Jeff Muri, Todd Nelling, Michael A. Riley, Brian Wong, Margaret Duff
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
We present a novel Automatic Target Recognition (ATR) system using open-vocabulary object detection and classification models. A primary advantage of this approach is that target classes can be defined just before runtime by a non-technical end user, using either a few natural language text descriptions of the target, or a few image exemplars, or both. Nuances in the desired targets can be expressed in natural language, which is useful for unique targets with little or no training data. We also implemented a novel combination of several techniques to improve performance, such as leveraging the additional information in the sequence of overlapping frames to perform tubelet identification (i.e., sequential bounding box matching), bounding box re-scoring, and tubelet linking. Additionally, we developed a technique to visualize the aggregate output of many overlapping frames as a mosaic of the area scanned during the aerial surveillance or reconnaissance, and a kernel density estimate (or heatmap) of the detected targets. We initially applied this ATR system to the use case of detecting and clearing unexploded ordinance on airfield runways and we are currently extending our research to other real-world applications.
Title:
Hamiltonian Monte Carlo methods for spectroscopy data analysis
Abstract
We present a scalable Bayesian framework for the analysis of confocal fluorescence spectroscopy data, addressing key limitations in traditional fluorescence correlation spectroscopy methods. Our framework captures molecular motion, microscope optics, and photon detection with high fidelity, enabling statistical inference of molecule trajectories from raw photon count data, introducing a superresolution parameter which further enhances trajectory estimation beyond the native time resolution of data acquisition. To handle the high dimensionality of the arising posterior distribution, we develop a family of Hamiltonian Monte Carlo (HMC) algorithms that leverages the unique characteristics inherent to spectroscopy data analysis. Here, due to the highly-coupled correlation structure of the target posterior distribution, HMC requires the numerical solution of a stiff ordinary differential equation containing a two-scale discrete Laplacian. By considering the spectral properties of this operator, we produce a CFL-type integrator stability condition for the standard Störmer-Verlet integrator used in HMC. To circumvent this instability we introduce a semi-implicit (IMEX) method which treats the stiff and non-stiff parts differently, while leveraging the sparse structure of the discrete Laplacian for computational efficiency. Detailed numerical experiments demonstrate that this method improves upon fully explicit approaches, allowing larger HMC step sizes and maintaining second-order accuracy in position and energy. Our framework provides a foundation for extensions to more complex models such as surface constrained molecular motion or motion with multiple diffusion modes.
Title:
A Behavior Architecture for Fast Humanoid Robot Door Traversals
Authors: Duncan Calvert, Luigi Penco, Dexton Anderson, Tomasz Bialek, Arghya Chatterjee, Bhavyansh Mishra, Geoffrey Clark, Sylvain Bertrand, Robert Griffin
Abstract
Towards the role of humanoid robots as squad mates in urban operations and other domains, we identified doors as a major area lacking capability development. In this paper, we focus on the ability of humanoid robots to navigate and deal with doors. Human-sized doors are ubiquitous in many environment domains and the humanoid form factor is uniquely suited to operate and traverse them. We present an architecture which incorporates GPU accelerated perception and a tree based interactive behavior coordination system with a whole body motion and walking controller. Our system is capable of performing door traversals on a variety of door types. It supports rapid authoring of behaviors for unseen door types and techniques to achieve re-usability of those authored behaviors. The behaviors are modelled using trees and feature logical reactivity and action sequences that can be executed with layered concurrency to increase speed. Primitive actions are built on top of our existing whole body controller which supports manipulation while walking. We include a perception system using both neural networks and classical computer vision for door mechanism detection outside of the lab environment. We present operator-robot interdependence analysis charts to explore how human cognition is combined with artificial intelligence to produce complex robot behavior. Finally, we present and discuss real robot performances of fast door traversals on our Nadia humanoid robot. Videos online at this https URL.
Title:
Hybrid Attention for Robust RGB-T Pedestrian Detection in Real-World Conditions
Authors: Arunkumar Rathinam, Leo Pauly, Abd El Rahman Shabayek, Wassim Rharbaoui, Anis Kacem, Vincent Gaudillière, Djamila Aouada
Abstract
Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combination of thermal and visible images has demonstrated its advantages. However, existing fusion methods rely on the critical assumption that the RGB-Thermal (RGB-T) image pairs are fully overlapping. These assumptions often do not hold in real-world applications, where only partial overlap between images can occur due to sensors configuration. Moreover, sensor failure can cause loss of information in one modality. In this paper, we propose a novel module called the Hybrid Attention (HA) mechanism as our main contribution to mitigate performance degradation caused by partial overlap and sensor failure, i.e. when at least part of the scene is acquired by only one sensor. We propose an improved RGB-T fusion algorithm, robust against partial overlap and sensor failure encountered during inference in real-world applications. We also leverage a mobile-friendly backbone to cope with resource constraints in embedded systems. We conducted experiments by simulating various partial overlap and sensor failure scenarios to evaluate the performance of our proposed method. The results demonstrate that our approach outperforms state-of-the-art methods, showcasing its superiority in handling real-world challenges.
Title:
vMF-Contact: Uncertainty-aware Evidential Learning for Probabilistic Contact-grasp in Noisy Clutter
Authors: Yitian Shi, Edgar Welte, Maximilian Gilles, Rania Rayyes
Abstract
Grasp learning in noisy environments, such as occlusions, sensor noise, and out-of-distribution (OOD) objects, poses significant challenges. Recent learning-based approaches focus primarily on capturing aleatoric uncertainty from inherent data noise. The epistemic uncertainty, which represents the OOD recognition, is often addressed by ensembles with multiple forward paths, limiting real-time application. In this paper, we propose an uncertainty-aware approach for 6-DoF grasp detection using evidential learning to comprehensively capture both uncertainties in real-world robotic grasping. As a key contribution, we introduce vMF-Contact, a novel architecture for learning hierarchical contact grasp representations with probabilistic modeling of directional uncertainty as von Mises-Fisher (vMF) distribution. To achieve this, we derive and analyze the theoretical formulation of the second-order objective on the posterior parametrization, providing formal guarantees for the model's ability to quantify uncertainty and improve grasp prediction performance. Moreover, we enhance feature expressiveness by applying partial point reconstructions as an auxiliary task, improving the comprehension of uncertainty quantification as well as the generalization to unseen objects. In the real-world experiments, our method demonstrates a significant improvement by 39% in the overall clearance rate compared to the baselines. Video is under this https URL
Title:
3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement
Authors: Ziqi Lu, Jianbo Ye, John Leonard
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS's novel view rendering and EfficientSAM's zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D changes. Our method can detect changes in cluttered environments using sparse post-change images within as little as 18s, using as few as a single new image. It does not rely on depth input, user instructions, object classes, or object models -- An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update. Our code and data will be made available at this https URL.
Title:
Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage
Authors: Claus D. Hansen, Thuy Hai Le, David Campos
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper examines the use of computer vision algorithms to estimate aspects of the psychosocial work environment using CCTV footage. We present a proof of concept for a methodology that detects and tracks people in video footage and estimates interactions between customers and employees by estimating their poses and calculating the duration of their encounters. We propose a pipeline that combines existing object detection and tracking algorithms (YOLOv8 and DeepSORT) with pose estimation algorithms (BlazePose) to estimate the number of customers and employees in the footage as well as the duration of their encounters. We use a simple rule-based approach to classify the interactions as positive, neutral or negative based on three different criteria: distance, duration and pose. The proposed methodology is tested on a small dataset of CCTV footage. While the data is quite limited in particular with respect to the quality of the footage, we have chosen this case as it represents a typical setting where the method could be applied. The results show that the object detection and tracking part of the pipeline has a reasonable performance on the dataset with a high degree of recall and reasonable accuracy. At this stage, the pose estimation is still limited to fully detect the type of interactions due to difficulties in tracking employees in the footage. We conclude that the method is a promising alternative to self-reported measures of the psychosocial work environment and could be used in future studies to obtain external observations of the work environment.
Title:
Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection
Abstract
Unmanned aerial vehicle (UAV)-based bi-modal salient object detection (BSOD) aims to segment salient objects in a scene utilizing complementary cues in unaligned RGB and thermal image pairs. However, the high computational expense of existing UAV-based BSOD models limits their applicability to real-world UAV devices. To address this problem, we propose an efficient Fourier filter network with contrastive learning that achieves both real-time and accurate performance. Specifically, we first design a semantic contrastive alignment loss to align the two modalities at the semantic level, which facilitates mutual refinement in a parameter-free way. Second, inspired by the fast Fourier transform that obtains global relevance in linear complexity, we propose synchronized alignment fusion, which aligns and fuses bi-modal features in the channel and spatial dimensions by a hierarchical filtering mechanism. Our proposed model, AlignSal, reduces the number of parameters by 70.0%, decreases the floating point operations by 49.4%, and increases the inference speed by 152.5% compared to the cutting-edge BSOD model (i.e., MROS). Extensive experiments on the UAV RGB-T 2400 and three weakly aligned datasets demonstrate that AlignSal achieves both real-time inference speed and better performance and generalizability compared to sixteen state-of-the-art BSOD models across most evaluation metrics. In addition, our ablation studies further verify AlignSal's potential in boosting the performance of existing aligned BSOD models on UAV-based unaligned data. The code is available at: this https URL.
Title:
Understanding the Effects of Human-written Paraphrases in LLM-generated Text Detection
Authors: Hiu Ting Lau, Arkaitz Zubiaga
Subjects: Subjects:
Computation and Language (cs.CL)
Abstract
Natural Language Generation has been rapidly developing with the advent of large language models (LLMs). While their usage has sparked significant attention from the general public, it is important for readers to be aware when a piece of text is LLM-generated. This has brought about the need for building models that enable automated LLM-generated text detection, with the aim of mitigating potential negative outcomes of such content. Existing LLM-generated detectors show competitive performances in telling apart LLM-generated and human-written text, but this performance is likely to deteriorate when paraphrased texts are considered. In this study, we devise a new data collection strategy to collect Human & LLM Paraphrase Collection (HLPC), a first-of-its-kind dataset that incorporates human-written texts and paraphrases, as well as LLM-generated texts and paraphrases. With the aim of understanding the effects of human-written paraphrases on the performance of state-of-the-art LLM-generated text detectors OpenAI RoBERTa and watermark detectors, we perform classification experiments that incorporate human-written paraphrases, watermarked and non-watermarked LLM-generated documents from GPT and OPT, and LLM-generated paraphrases from DIPPER and BART. The results show that the inclusion of human-written paraphrases has a significant impact of LLM-generated detector performance, promoting TPR@1%FPR with a possible trade-off of AUROC and accuracy.
Title:
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
Abstract
The rapid progression of multimodal large language models (MLLMs) has demonstrated superior performance on various multimodal benchmarks. However, the issue of data contamination during training creates challenges in performance evaluation and comparison. While numerous methods exist for detecting dataset contamination in large language models (LLMs), they are less effective for MLLMs due to their various modalities and multiple training phases. In this study, we introduce a multimodal data contamination detection framework, MM-Detect, designed for MLLMs. Our experimental results indicate that MM-Detect is sensitive to varying degrees of contamination and can highlight significant performance improvements due to leakage of the training set of multimodal benchmarks. Furthermore, We also explore the possibility of contamination originating from the pre-training phase of LLMs used by MLLMs and the fine-tuning phase of MLLMs, offering new insights into the stages at which contamination may be introduced.
Title:
Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts
Authors: Zhitong Gao, Bingnan Li, Mathieu Salzmann, Xuming He
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety and generalize to new domains. However, existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts, leading to poor out-of-distribution (OOD) detection or domain generalization performance. In this work, we aim to equip the model to generalize effectively to covariate-shift regions while precisely identifying semantic-shift regions. To achieve this, we design a novel generative augmentation method to produce coherent images that incorporate both anomaly (or novel) objects and various covariate shifts at both image and object levels. Furthermore, we introduce a training strategy that recalibrates uncertainty specifically for semantic shifts and enhances the feature extractor to align features associated with domain shifts. We validate the effectiveness of our method across benchmarks featuring both semantic and domain shifts. Our method achieves state-of-the-art performance across all benchmarks for both OOD detection and domain generalization. Code is available at this https URL.
Title:
An Enhancement of Haar Cascade Algorithm Applied to Face Recognition for Gate Pass Security
Authors: Clarence A. Antipona, Romeo R. Magsino, Raymund M. Dioses, Khatalyn E. Mata
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
This study is focused on enhancing the Haar Cascade Algorithm to decrease the false positive and false negative rate in face matching and face detection to increase the accuracy rate even under challenging conditions. The face recognition library was implemented with Haar Cascade Algorithm in which the 128-dimensional vectors representing the unique features of a face are encoded. A subprocess was applied where the grayscale image from Haar Cascade was converted to RGB to improve the face encoding. Logical process and face filtering are also used to decrease non-face detection. The Enhanced Haar Cascade Algorithm produced a 98.39% accuracy rate (21.39% increase), 63.59% precision rate, 98.30% recall rate, and 72.23% in F1 Score. In comparison, the Haar Cascade Algorithm achieved a 46.70% to 77.00% accuracy rate, 44.15% precision rate, 98.61% recall rate, and 47.01% in F1 Score. Both algorithms used the Confusion Matrix Test with 301,950 comparisons using the same dataset of 550 images. The 98.39% accuracy rate shows a significant decrease in false positive and false negative rates in facial recognition. Face matching and face detection are more accurate in images with complex backgrounds, lighting variations, and occlusions, or even those with similar attributes.
Title:
RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation
Authors: Ian Poey, Jiajun Liu, Qishuai Zhong, Adrien Chenailler
Subjects: Subjects:
Computation and Language (cs.CL)
Abstract
Real-time detection of out-of-context LLM outputs is crucial for enterprises looking to safely adopt RAG applications. In this work, we train lightweight models to discriminate LLM-generated text that is semantically out-of-context from retrieved text documents. We preprocess a combination of summarisation and semantic textual similarity datasets to construct training data using minimal resources. We find that DeBERTa is not only the best-performing model under this pipeline, but it is also fast and does not require additional text preprocessing or feature engineering. While emerging work demonstrates that generative LLMs can also be fine-tuned and used in complex data pipelines to achieve state-of-the-art performance, we note that speed and resource limits are important considerations for on-premise deployment.
Title:
Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis
Abstract
Anomaly and missing data constitute a thorny problem in industrial applications. In recent years, deep learning enabled anomaly detection has emerged as a critical direction, however the improved detection accuracy is achieved with the utilization of large neural networks, increasing their storage and computational cost. Moreover, the data collected in edge devices contain user privacy, introducing challenges that can be successfully addressed by the privacy-preserving distributed paradigm, known as federated learning (FL). This framework allows edge devices to train and exchange models increasing also the communication cost. Thus, to deal with the increased communication, processing and storage challenges of the FL based deep anomaly detection NN pruning is expected to have significant benefits towards reducing the processing, storage and communication complexity. With this focus, a novel compression-based optimization problem is proposed at the server-side of a FL paradigm that fusses the received local models broadcast and performs pruning generating a more compressed model. Experiments in the context of anomaly detection and missing value imputation demonstrate that the proposed FL scenario along with the proposed compressed-based method are able to achieve high compression rates (more than $99.7\%$) with negligible performance losses (less than $1.18\%$ ) as compared to the centralized solutions.
Title:
Beemo: Benchmark of Expert-edited Machine-generated Outputs
Authors: Ekaterina Artemova, Jason Lucas, Saranya Venkatraman, Jooyoung Lee, Sergei Tilga, Adaku Uchendu, Vladislav Mikhailov
Subjects: Subjects:
Computation and Language (cs.CL)
Abstract
The rapid proliferation of large language models (LLMs) has increased the volume of machine-generated texts (MGTs) and blurred text authorship in various domains. However, most existing MGT benchmarks include single-author texts (human-written and machine-generated). This conventional design fails to capture more practical multi-author scenarios, where the user refines the LLM response for natural flow, coherence, and factual correctness. Our paper introduces the Benchmark of Expert-edited Machine-generated Outputs (Beemo), which includes 6.5k texts written by humans, generated by ten instruction-finetuned LLMs, and edited by experts for various use cases, ranging from creative writing to summarization. Beemo additionally comprises 13.1k machine-generated and LLM-edited texts, allowing for diverse MGT detection evaluation across various edit types. We document Beemo's creation protocol and present the results of benchmarking 33 configurations of MGT detectors in different experimental setups. We find that expert-based editing evades MGT detection, while LLM-edited texts are unlikely to be recognized as human-written. Beemo and all materials are publicly available.
Title:
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors
Authors: Jeongsoo Park, Andrew Owens
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
One of the key challenges of detecting AI-generated images is spotting images that have been created by previously unseen generative models. We argue that the limited diversity of the training data is a major obstacle to addressing this problem, and we propose a new dataset that is significantly larger and more diverse than prior work. As part of creating this dataset, we systematically download thousands of text-to-image latent diffusion models and sample images from them. We also collect images from dozens of popular open source and commercial models. The resulting dataset contains 2.7M images that have been sampled from 4803 different models. These images collectively capture a wide range of scene content, generator architectures, and image processing settings. Using this dataset, we study the generalization abilities of fake image detectors. Our experiments suggest that detection performance improves as the number of models in the training set increases, even when these models have similar architectures. We also find that detection performance improves as the diversity of the models increases, and that our trained detectors generalize better than those trained on other datasets.
Keyword: face recognition
Title:
Undermining Image and Text Classification Algorithms Using Adversarial Attacks
Authors: Langalibalele Lunga, Suhas Sreehari
Subjects: Subjects:
Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Machine learning models are prone to adversarial attacks, where inputs can be manipulated in order to cause misclassifications. While previous research has focused on techniques like Generative Adversarial Networks (GANs), there's limited exploration of GANs and Synthetic Minority Oversampling Technique (SMOTE) in text and image classification models to perform adversarial attacks. Our study addresses this gap by training various machine learning models and using GANs and SMOTE to generate additional data points aimed at attacking text classification models. Furthermore, we extend our investigation to face recognition models, training a Convolutional Neural Network(CNN) and subjecting it to adversarial attacks with fast gradient sign perturbations on key features identified by GradCAM, a technique used to highlight key image characteristics CNNs use in classification. Our experiments reveal a significant vulnerability in classification models. Specifically, we observe a 20 % decrease in accuracy for the top-performing text classification models post-attack, along with a 30 % decrease in facial recognition accuracy. This highlights the susceptibility of these models to manipulation of input data. Adversarial attacks not only compromise the security but also undermine the reliability of machine learning systems. By showcasing the impact of adversarial attacks on both text classification and face recognition models, our study underscores the urgent need for develop robust defenses against such vulnerabilities.
Title:
An Enhancement of Haar Cascade Algorithm Applied to Face Recognition for Gate Pass Security
Authors: Clarence A. Antipona, Romeo R. Magsino, Raymund M. Dioses, Khatalyn E. Mata
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
This study is focused on enhancing the Haar Cascade Algorithm to decrease the false positive and false negative rate in face matching and face detection to increase the accuracy rate even under challenging conditions. The face recognition library was implemented with Haar Cascade Algorithm in which the 128-dimensional vectors representing the unique features of a face are encoded. A subprocess was applied where the grayscale image from Haar Cascade was converted to RGB to improve the face encoding. Logical process and face filtering are also used to decrease non-face detection. The Enhanced Haar Cascade Algorithm produced a 98.39% accuracy rate (21.39% increase), 63.59% precision rate, 98.30% recall rate, and 72.23% in F1 Score. In comparison, the Haar Cascade Algorithm achieved a 46.70% to 77.00% accuracy rate, 44.15% precision rate, 98.61% recall rate, and 47.01% in F1 Score. Both algorithms used the Confusion Matrix Test with 301,950 comparisons using the same dataset of 550 images. The 98.39% accuracy rate shows a significant decrease in false positive and false negative rates in facial recognition. Face matching and face detection are more accurate in images with complex backgrounds, lighting variations, and occlusions, or even those with similar attributes.
Title:
Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model
Abstract
Face recognition systems extract embedding vectors from face images and use these embeddings to verify or identify individuals. Face reconstruction attack (also known as template inversion) refers to reconstructing face images from face embeddings and using the reconstructed face image to enter a face recognition system. In this paper, we propose to use a face foundation model to reconstruct face images from the embeddings of a blackbox face recognition model. The foundation model is trained with 42M images to generate face images from the facial embeddings of a fixed face recognition model. We propose to use an adapter to translate target embeddings into the embedding space of the foundation model. The generated images are evaluated on different face recognition models and different datasets, demonstrating the effectiveness of our method to translate embeddings of different face recognition models. We also evaluate the transferability of reconstructed face images when attacking different face recognition models. Our experimental results show that our reconstructed face images outperform previous reconstruction attacks against face recognition models.
Title:
Aligning Characteristic Descriptors with Images for Human-Expert-like Explainability
Abstract
In mission-critical domains such as law enforcement and medical diagnosis, the ability to explain and interpret the outputs of deep learning models is crucial for ensuring user trust and supporting informed decision-making. Despite advancements in explainability, existing methods often fall short in providing explanations that mirror the depth and clarity of those given by human experts. Such expert-level explanations are essential for the dependable application of deep learning models in law enforcement and medical contexts. Additionally, we recognize that most explanations in real-world scenarios are communicated primarily through natural language. Addressing these needs, we propose a novel approach that utilizes characteristic descriptors to explain model decisions by identifying their presence in images, thereby generating expert-like explanations. Our method incorporates a concept bottleneck layer within the model architecture, which calculates the similarity between image and descriptor encodings to deliver inherent and faithful explanations. Through experiments in face recognition and chest X-ray diagnosis, we demonstrate that our approach offers a significant contrast over existing techniques, which are often limited to the use of saliency maps. We believe our approach represents a significant step toward making deep learning systems more accountable, transparent, and trustworthy in the critical domains of face recognition and medical diagnosis.
Keyword: augmentation
Title:
FUsion-based ConstitutivE model (FuCe): Towards model-data augmentation in constitutive modelling
Abstract
Constitutive modeling is crucial for engineering design and simulations to accurately describe material behavior. However, traditional phenomenological models often struggle to capture the complexities of real materials under varying stress conditions due to their fixed forms and limited parameters. While recent advances in deep learning have addressed some limitations of classical models, purely data-driven methods tend to require large datasets, lack interpretability, and struggle to generalize beyond their training data. To tackle these issues, we introduce "Fusion-based Constitutive model (FuCe): Towards model-data augmentation in constitutive modelling". This approach combines established phenomenological models with an ICNN architecture, designed to train on the limited and noisy force-displacement data typically available in practical applications. The hybrid model inherently adheres to necessary constitutive conditions. During inference, Monte Carlo dropout is employed to generate Bayesian predictions, providing mean values and confidence intervals that quantify uncertainty. We demonstrate the model's effectiveness by learning two isotropic constitutive models and one anisotropic model with a single fiber direction, across six different stress states. The framework's applicability is also showcased in finite element simulations across three geometries of varying complexities. Our results highlight the framework's superior extrapolation capabilities, even when trained on limited and noisy data, delivering accurate and physically meaningful predictions across all numerical examples.
Title:
An Open API Architecture to Discover the Trustworthy Explanation of Cloud AI Services
Authors: Zerui Wang, Yan Liu, Jun Huang
Subjects: Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Abstract
This article presents the design of an open-API-based explainable AI (XAI) service to provide feature contribution explanations for cloud AI services. Cloud AI services are widely used to develop domain-specific applications with precise learning metrics. However, the underlying cloud AI services remain opaque on how the model produces the prediction. We argue that XAI operations are accessible as open APIs to enable the consolidation of the XAI operations into the cloud AI services assessment. We propose a design using a microservice architecture that offers feature contribution explanations for cloud AI services without unfolding the network structure of the cloud models. We can also utilize this architecture to evaluate the model performance and XAI consistency metrics showing cloud AI services trustworthiness. We collect provenance data from operational pipelines to enable reproducibility within the XAI service. Furthermore, we present the discovery scenarios for the experimental tests regarding model performance and XAI consistency metrics for the leading cloud vision AI services. The results confirm that the architecture, based on open APIs, is cloud-agnostic. Additionally, data augmentations result in measurable improvements in XAI consistency metrics for cloud AI services.
Title:
Understanding Contrastive Learning via Gaussian Mixture Models
Authors: Parikshit Bansal, Ali Kavis, Sujay Sanghavi
Abstract
Contrastive learning attempts to learn representations from un-labeled data; it does so via a loss function that encourages the embedding of a point to be close to that of its augmentations, and far from the embeddings of random other points. This simple idea performs remarkably well, yet it is not precisely theoretically understood why this is the case. In this paper we analyze contrastive learning (specifically, the InfoNCE loss) in a natural context: dimensionality reduction in Gaussian Mixture Models. Crucially, we define an augmentation of a data point as being another independent draw from the same underlying mixture component. We show that vanilla InfoNCE is able to find the optimal lower-dimensional subspace even when the Gaussians are not isotropic -- something that vanilla spectral techniques cannot do. We further extend our analyses to multi-modal contrastive learning algorithms (e.g., CLIP). In this setting we show that contrastive learning learns the subset of fisher-optimal subspace, effectively filtering out all the noise from the learnt representations.
Title:
Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts
Authors: Zhitong Gao, Bingnan Li, Mathieu Salzmann, Xuming He
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety and generalize to new domains. However, existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts, leading to poor out-of-distribution (OOD) detection or domain generalization performance. In this work, we aim to equip the model to generalize effectively to covariate-shift regions while precisely identifying semantic-shift regions. To achieve this, we design a novel generative augmentation method to produce coherent images that incorporate both anomaly (or novel) objects and various covariate shifts at both image and object levels. Furthermore, we introduce a training strategy that recalibrates uncertainty specifically for semantic shifts and enhances the feature extractor to align features associated with domain shifts. We validate the effectiveness of our method across benchmarks featuring both semantic and domain shifts. Our method achieves state-of-the-art performance across all benchmarks for both OOD detection and domain generalization. Code is available at this https URL.
Keyword: detection
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Keyword: face recognition
Title:
Title:
Title:
Title:
Keyword: augmentation
Title:
Title:
Title:
Title: