New submissions for Mon, 25 Dec 23

Keyword: detection

Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation

Authors: Authors: Nina Weng, Paraskevas Pegios, Aasa Feragen, Eike Petersen, Siavash Bigdeli
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.14223
Pdf link: https://arxiv.org/pdf/2312.14223
Abstract Shortcut learning is when a model -- e.g. a cardiac disease classifier -- exploits correlations between the target label and a spurious shortcut feature, e.g. a pacemaker, to predict the target label based on the shortcut rather than real discriminative features. This is common in medical imaging, where treatment and clinical annotations correlate with disease labels, making them easy shortcuts to predict disease. We propose a novel detection and quantification of the impact of potential shortcut features via a fast diffusion-based counterfactual image generation that can synthetically remove or add shortcuts. Via a novel inpainting-based modification we spatially limit the changes made with no extra inference step, encouraging the removal of spatially constrained shortcut features while ensuring that the shortcut-free counterfactuals preserve their remaining image features to a high degree. Using these, we assess how shortcut features influence model predictions. This is enabled by our second contribution: An efficient diffusion-based counterfactual explanation method with significant inference speed-up at comparable image quality as state-of-the-art. We confirm this on two large chest X-ray datasets, a skin lesion dataset, and CelebA.
Low-power event-based face detection with asynchronous neuromorphic hardware
Authors: Authors: Caterina Caccavella, Federico Paredes-Vallés, Marco Cannici, Lyes Khacef
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.14261
Pdf link: https://arxiv.org/pdf/2312.14261
Abstract The rise of mobility, IoT and wearables has shifted processing to the edge of the sensors, driven by the need to reduce latency, communication costs and overall energy consumption. While deep learning models have achieved remarkable results in various domains, their deployment at the edge for real-time applications remains computationally expensive. Neuromorphic computing emerges as a promising paradigm shift, characterized by co-localized memory and computing as well as event-driven asynchronous sensing and processing. In this work, we demonstrate the possibility of solving the ubiquitous computer vision task of object detection at the edge with low-power requirements, using the event-based N-Caltech101 dataset. We present the first instance of an on-chip spiking neural network for event-based face detection deployed on the SynSense Speck neuromorphic chip, which comprises both an event-based sensor and a spike-based asynchronous processor implementing Integrate-and-Fire neurons. We show how to reduce precision discrepancies between off-chip clock-driven simulation used for training and on-chip event-driven inference. This involves using a multi-spike version of the Integrate-and-Fire neuron on simulation, where spikes carry values that are proportional to the extent the membrane potential exceeds the firing threshold. We propose a robust strategy to train spiking neural networks with back-propagation through time using multi-spike activation and firing rate regularization and demonstrate how to decode output spikes into bounding boxes. We show that the power consumption of the chip is directly proportional to the number of synaptic operations in the spiking neural network, and we explore the trade-off between power consumption and detection precision with different firing rate regularization, achieving an on-chip face detection mAP[0.5] of ~0.6 while consuming only ~20 mW.
Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective
Authors: Authors: João B. S. Carvalho, Mengtao Zhang, Robin Geyer, Carlos Cotrini, Joachim M. Buhmann
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.14329
Pdf link: https://arxiv.org/pdf/2312.14329
Abstract Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples by solely relying on the consistency of the normal training samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. In this work, by leveraging tools from causal inference we attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts. We begin by elucidating a simple yet necessary statistical property that ensures invariant representations, which is critical for robust AD under both domain and covariate shifts. From this property, we derive a regularization term which, when minimized, leads to partial distribution invariance across environments. Through extensive experimental evaluation on both synthetic and real-world tasks, covering a range of six different AD methods, we demonstrated significant improvements in out-of-distribution performance. Under both covariate and domain shift, models regularized with our proposed term showed marked increased robustness. Code is available at: https://github.com/JoaoCarv/invariant-anomaly-detection.
Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits
Authors: Authors: Qi Xu, Lijie Wang, Jing Wang, Song Chen, Lin Cheng, Yi Kang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.14405
Pdf link: https://arxiv.org/pdf/2312.14405
Abstract In recent years, analog circuits have received extensive attention and are widely used in many emerging applications. The high demand for analog circuits necessitates shorter circuit design cycles. To achieve the desired performance and specifications, various geometrical symmetry constraints must be carefully considered during the analog layout process. However, the manual labeling of these constraints by experienced analog engineers is a laborious and time-consuming process. To handle the costly runtime issue, we propose a graph-based learning framework to automatically extract symmetric constraints in analog circuit layout. The proposed framework leverages the connection characteristics of circuits and the devices'information to learn the general rules of symmetric constraints, which effectively facilitates the extraction of device-level constraints on circuit netlists. The experimental results demonstrate that compared to state-of-the-art symmetric constraint detection approaches, our framework achieves higher accuracy and lower false positive rate.
Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection
Authors: Authors: Ze Yu Zhao (1), Zheng Zhu (1), Guilin Li (1), Wenhan Wang (1), Bo Wang (1) ((1) Tencent, WeChat Pay)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.14406
Pdf link: https://arxiv.org/pdf/2312.14406
Abstract In this work, we introduce an innovative autoregressive model leveraging Generative Pretrained Transformer (GPT) architectures, tailored for fraud detection in payment systems. Our approach innovatively confronts token explosion and reconstructs behavioral sequences, providing a nuanced understanding of transactional behavior through temporal and contextual analysis. Utilizing unsupervised pretraining, our model excels in feature representation without the need for labeled data. Additionally, we integrate a differential convolutional approach to enhance anomaly detection, bolstering the security and efficacy of one of the largest online payment merchants in China. The scalability and adaptability of our model promise broad applicability in various transactional contexts.
GROOD: GRadient-aware Out-Of-Distribution detection in interpolated manifolds
Authors: Authors: Mostafa ElAraby, Sabyasachi Sahoo, Yann Pequignot, Paul Novello, Liam Paull
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.14427
Pdf link: https://arxiv.org/pdf/2312.14427
Abstract Deep neural networks (DNNs) often fail silently with over-confident predictions on out-of-distribution (OOD) samples, posing risks in real-world deployments. Existing techniques predominantly emphasize either the feature representation space or the gradient norms computed with respect to DNN parameters, yet they overlook the intricate gradient distribution and the topology of classification regions. To address this gap, we introduce GRadient-aware Out-Of-Distribution detection in interpolated manifolds (GROOD), a novel framework that relies on the discriminative power of gradient space to distinguish between in-distribution (ID) and OOD samples. To build this space, GROOD relies on class prototypes together with a prototype that specifically captures OOD characteristics. Uniquely, our approach incorporates a targeted mix-up operation at an early intermediate layer of the DNN to refine the separation of gradient spaces between ID and OOD samples. We quantify OOD detection efficacy using the distance to the nearest neighbor gradients derived from the training set, yielding a robust OOD score. Experimental evaluations substantiate that the introduction of targeted input mix-upamplifies the separation between ID and OOD in the gradient space, yielding impressive results across diverse datasets. Notably, when benchmarked against ImageNet-1k, GROOD surpasses the established robustness of state-of-the-art baselines. Through this work, we establish the utility of leveraging gradient spaces and class prototypes for enhanced OOD detection for DNN in image classification.
How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection?
Authors: Authors: Soumya Suvra Ghosal, Yiyou Sun, Yixuan Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.14452
Pdf link: https://arxiv.org/pdf/2312.14452
Abstract Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances in OOD detection rely on distance measures to distinguish samples that are relatively far away from the in-distribution (ID) data. Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high-dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. In training, our method regularizes the model and its feature representation by leveraging the most relevant subset of dimensions (i.e. subspace). Subspace learning yields highly distinguishable distance measures between ID and OOD data. We provide comprehensive experiments and ablations to validate the efficacy of SNN. Compared to the current best distance-based method, SNN reduces the average FPR95 by 15.96% on the CIFAR-100 benchmark.
FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection
Authors: Authors: Dongmei Zhang, Chang Li, Ray Zhang, Shenghao Xie, Wei Xue, Xiaodong Xie, Shanghang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.14465
Pdf link: https://arxiv.org/pdf/2312.14465
Abstract The superior performances of pre-trained foundation models in various visual tasks underscore their potential to enhance the 2D models' open-vocabulary ability. Existing methods explore analogous applications in the 3D space. However, most of them only center around knowledge extraction from singular foundation models, which limits the open-vocabulary ability of 3D models. We hypothesize that leveraging complementary pre-trained knowledge from various foundation models can improve knowledge transfer from 2D pre-trained visual language models to the 3D space. In this work, we propose FM-OV3D, a method of Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, which improves the open-vocabulary localization and recognition abilities of 3D model by blending knowledge from multiple pre-trained foundation models, achieving true open-vocabulary without facing constraints from original 3D datasets. Specifically, to learn the open-vocabulary 3D localization ability, we adopt the open-vocabulary localization knowledge of the Grounded-Segment-Anything model. For open-vocabulary 3D recognition ability, We leverage the knowledge of generative foundation models, including GPT-3 and Stable Diffusion models, and cross-modal discriminative models like CLIP. The experimental results on two popular benchmarks for open-vocabulary 3D object detection show that our model efficiently learns knowledge from multiple foundation models to enhance the open-vocabulary ability of the 3D model and successfully achieves state-of-the-art performance in open-vocabulary 3D object detection tasks. Code is released at https://github.com/dmzhang0425/FM-OV3D.git.
MonoLSS: Learnable Sample Selection For Monocular 3D Detection
Authors: Authors: Zhenjia Li, Jinrang Jia, Yifeng Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.14474
Pdf link: https://arxiv.org/pdf/2312.14474
Abstract In the field of autonomous driving, monocular 3D detection is a critical task which estimates 3D properties (depth, dimension, and orientation) of objects in a single RGB image. Previous works have used features in a heuristic way to learn 3D properties, without considering that inappropriate features could have adverse effects. In this paper, sample selection is introduced that only suitable samples should be trained to regress the 3D properties. To select samples adaptively, we propose a Learnable Sample Selection (LSS) module, which is based on Gumbel-Softmax and a relative-distance sample divider. The LSS module works under a warm-up strategy leading to an improvement in training stability. Additionally, since the LSS module dedicated to 3D property sample selection relies on object-level features, we further develop a data augmentation method named MixUp3D to enrich 3D property samples which conforms to imaging principles without introducing ambiguity. As two orthogonal methods, the LSS module and MixUp3D can be utilized independently or in conjunction. Sufficient experiments have shown that their combined use can lead to synergistic effects, yielding improvements that transcend the mere sum of their individual applications. Leveraging the LSS module and the MixUp3D, without any extra data, our method named MonoLSS ranks 1st in all three categories (Car, Cyclist, and Pedestrian) on KITTI 3D object detection benchmark, and achieves competitive results on both the Waymo dataset and KITTI-nuScenes cross-dataset evaluation. The code is included in the supplementary material and will be released to facilitate related academic and industrial studies.
Navigating the Concurrency Landscape: A Survey of Race Condition Vulnerability Detectors
Authors: Authors: Aishwarya Upadhyay, Vijay Laxmi, Smita Naval
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2312.14479
Pdf link: https://arxiv.org/pdf/2312.14479
Abstract As technology continues to advance and we usher in the era of Industry 5.0, there has been a profound paradigm shift in operating systems, file systems, web, and network applications. The conventional utilization of multiprocessing and multicore systems has made concurrent programming increasingly pervasive. However, this transformation has brought about a new set of issues known as concurrency bugs, which, due to their wide prevalence in concurrent programs, have led to severe failures and potential security exploits. Over the past two decades, numerous researchers have dedicated their efforts to unveiling, detecting, mitigating, and preventing these bugs, with the last decade witnessing a surge in research within this domain. Among the spectrum of concurrency bugs, data races or race condition vulnerabilities stand out as the most prevalent, accounting for a staggering 80\% of all concurrency bugs. This survey paper is focused on the realm of race condition bug detectors. We systematically categorize these detectors based on the diverse methodologies they employ. Additionally, we delve into the techniques and algorithms associated with race detection, tracing the evolution of this field over time. Furthermore, we shed light on the application of fuzzing techniques in the detection of race condition vulnerabilities. By reviewing these detectors and their static analyses, we draw conclusions and outline potential future research directions, including enhancing accuracy, performance, applicability, and comprehensiveness in race condition vulnerability detection.
Context Enhanced Transformer for Single Image Object Detection
Authors: Authors: Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek, Byeongwon Lee, Seungryong Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.14492
Pdf link: https://arxiv.org/pdf/2312.14492
Abstract With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.
Revisiting Few-Shot Object Detection with Vision-Language Models
Authors: Authors: Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.14494
Pdf link: https://arxiv.org/pdf/2312.14494
Abstract Few-shot object detection (FSOD) benchmarks have advanced techniques for detecting new categories with limited annotations. Existing benchmarks repurpose well-established datasets like COCO by partitioning categories into base and novel classes for pre-training and fine-tuning respectively. However, these benchmarks do not reflect how FSOD is deployed in practice. Rather than only pre-training on a small number of base categories, we argue that it is more practical to fine-tune a foundation model (e.g., a vision-language model (VLM) pre-trained on web-scale data) for a target domain. Surprisingly, we find that zero-shot inference from VLMs like GroundingDINO significantly outperforms the state-of-the-art (48.3 vs. 33.1 AP) on COCO. However, such zero-shot models can still be misaligned to target concepts of interest. For example, trailers on the web may be different from trailers in the context of autonomous vehicles. In this work, we propose Foundational FSOD, a new benchmark protocol that evaluates detectors pre-trained on any external datasets and fine-tuned on K-shots per target class. Further, we note that current FSOD benchmarks are actually federated datasets containing exhaustive annotations for each category on a subset of the data. We leverage this insight to propose simple strategies for fine-tuning VLMs with federated losses. We demonstrate the effectiveness of our approach on LVIS and nuImages, improving over prior work by 5.9 AP.
ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection
Authors: Authors: Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Qingming Huang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.14535
Pdf link: https://arxiv.org/pdf/2312.14535
Abstract Graph anomaly detection is crucial for identifying nodes that deviate from regular behavior within graphs, benefiting various domains such as fraud detection and social network. Although existing reconstruction-based methods have achieved considerable success, they may face the \textit{Anomaly Overfitting} and \textit{Homophily Trap} problems caused by the abnormal patterns in the graph, breaking the assumption that normal nodes are often better reconstructed than abnormal ones. Our observations indicate that models trained on graphs with fewer anomalies exhibit higher detection performance. Based on this insight, we introduce a novel two-stage framework called Anomaly-Denoised Autoencoders for Graph Anomaly Detection (ADA-GAD). In the first stage, we design a learning-free anomaly-denoised augmentation method to generate graphs with reduced anomaly levels. We pretrain graph autoencoders on these augmented graphs at multiple levels, which enables the graph autoencoders to capture normal patterns. In the next stage, the decoders are retrained for detection on the original graph, benefiting from the multi-level representations learned in the previous stage. Meanwhile, we propose the node anomaly distribution regularization to further alleviate \textit{Anomaly Overfitting}. We validate the effectiveness of our approach through extensive experiments on both synthetic and real-world datasets.
The Projection Method: a Unified Formalism for Community Detection
Authors: Authors: Martijn Gösgens, Remco van der Hofstad, Nelly Litvak
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.14568
Pdf link: https://arxiv.org/pdf/2312.14568
Abstract We present the class of projection methods for community detection that generalizes many popular community detection methods. In this framework, we represent each clustering (partition) by a vector on a high-dimensional hypersphere. A community detection method is a projection method if it can be described by the following two-step approach: 1) the graph is mapped to a query vector on the hypersphere; and 2) the query vector is projected on the set of clustering vectors. This last projection step is performed by minimizing the distance between the query vector and the clustering vector, over the set of clusterings. We prove that optimizing Markov stability, modularity, the likelihood of planted partition models and correlation clustering fit this framework. A consequence of this equivalence is that algorithms for each of these methods can be modified to perform the projection step in our framework. In addition, we show that these different methods suffer from the same granularity problem: they have parameters that control the granularity of the resulting clustering, but choosing these to obtain clusterings of the desired granularity is nontrivial. We provide a general heuristic to address this granularity problem, which can be applied to any projection method. Finally, we show how, given a generator of graphs with community structure, we can optimize a projection method for this generator in order to obtain a community detection method that performs well on this generator.
PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer
Authors: Authors: Neha Sengar, Indra Kumari, Jihui Lee, Dongsoo Har
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.14577
Pdf link: https://arxiv.org/pdf/2312.14577
Abstract Driver distraction is a principal cause of traffic accidents. In a study conducted by the National Highway Traffic Safety Administration, engaging in activities such as interacting with in-car menus, consuming food or beverages, or engaging in telephonic conversations while operating a vehicle can be significant sources of driver distraction. From this viewpoint, this paper introduces a novel method for detection of driver distraction using multi-view driver action images. The proposed method is a vision transformer-based framework with pose estimation and action inference, namely PoseViNet. The motivation for adding posture information is to enable the transformer to focus more on key features. As a result, the framework is more adept at identifying critical actions. The proposed framework is compared with various state-of-the-art models using SFD3 dataset representing 10 behaviors of drivers. It is found from the comparison that the PoseViNet outperforms these models. The proposed framework is also evaluated with the SynDD1 dataset representing 16 behaviors of driver. As a result, the PoseViNet achieves 97.55% validation accuracy and 90.92% testing accuracy with the challenging dataset.
Reasons to Reject? Aligning Language Models with Judgments
Authors: Authors: Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.14591
Pdf link: https://arxiv.org/pdf/2312.14591
Abstract As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.
Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps
Authors: Authors: Till Beemelmanns, Wassim Zahr, Lutz Eckstein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.14606
Pdf link: https://arxiv.org/pdf/2312.14606
Abstract Vision Transformers (ViTs) have achieved state-of-the-art results on various computer vision tasks, including 3D object detection. However, their end-to-end implementation also makes ViTs less explainable, which can be a challenge for deploying them in safety-critical applications, such as autonomous driving, where it is important for authorities, developers, and users to understand the model's reasoning behind its predictions. In this paper, we propose a novel method for generating saliency maps for a DetR-like ViT with multiple camera inputs used for 3D object detection. Our method is based on the raw attention and is more efficient than gradient-based methods. We evaluate the proposed method on the nuScenes dataset using extensive perturbation tests and show that it outperforms other explainability methods in terms of visual quality and quantitative metrics. We also demonstrate the importance of aggregating attention across different layers of the transformer. Our work contributes to the development of explainable AI for ViTs, which can help increase trust in AI applications by establishing more transparency regarding the inner workings of AI models.
MEAOD: Model Extraction Attack against Object Detectors
Authors: Authors: Zeyu Li, Chenghui Shi, Yuwen Pu, Xuhong Zhang, Yu Li, Jinbao Li, Shouling Ji
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.14677
Pdf link: https://arxiv.org/pdf/2312.14677
Abstract The widespread use of deep learning technology across various industries has made deep neural network models highly valuable and, as a result, attractive targets for potential attackers. Model extraction attacks, particularly query-based model extraction attacks, allow attackers to replicate a substitute model with comparable functionality to the victim model and present a significant threat to the confidentiality and security of MLaaS platforms. While many studies have explored threats of model extraction attacks against classification models in recent years, object detection models, which are more frequently used in real-world scenarios, have received less attention. In this paper, we investigate the challenges and feasibility of query-based model extraction attacks against object detection models and propose an effective attack method called MEAOD. It selects samples from the attacker-possessed dataset to construct an efficient query dataset using active learning and enhances the categories with insufficient objects. We additionally improve the extraction effectiveness by updating the annotations of the query dataset. According to our gray-box and black-box scenarios experiments, we achieve an extraction performance of over 70% under the given condition of a 10k query budget.
BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions
Authors: Authors: Elias Marks, Jonas Bömer, Federico Magistri, Anurag Sah, Jens Behley, Cyrill Stachniss
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.14706
Pdf link: https://arxiv.org/pdf/2312.14706
Abstract Agricultural production is facing severe challenges in the next decades induced by climate change and the need for sustainability, reducing its impact on the environment. Advancements in field management through non-chemical weeding by robots in combination with monitoring of crops by autonomous unmanned aerial vehicles (UAVs) and breeding of novel and more resilient crop varieties are helpful to address these challenges. The analysis of plant traits, called phenotyping, is an essential activity in plant breeding, it however involves a great amount of manual labor. With this paper, we address the problem of automatic fine-grained organ-level geometric analysis needed for precision phenotyping. As the availability of real-world data in this domain is relatively scarce, we propose a novel dataset that was acquired using UAVs capturing high-resolution images of a real breeding trial containing 48 plant varieties and therefore covering great morphological and appearance diversity. This enables the development of approaches for autonomous phenotyping that generalize well to different varieties. Based on overlapping high-resolution images from multiple viewing angles, we compute photogrammetric dense point clouds and provide detailed and accurate point-wise labels for plants, leaves, and salient points as the tip and the base. Additionally, we include measurements of phenotypic traits performed by experts from the German Federal Plant Variety Office on the real plants, allowing the evaluation of new approaches not only on segmentation and keypoint detection but also directly on the downstream tasks. The provided labeled point clouds enable fine-grained plant analysis and support further progress in the development of automatic phenotyping approaches, but also enable further research in surface reconstruction, point cloud completion, and semantic interpretation of point clouds.
Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis
Authors: Authors: Thorsten Wittkopp, Alexander Acker, Odej Kao
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2312.14748
Pdf link: https://arxiv.org/pdf/2312.14748
Abstract The realm of AIOps is transforming IT landscapes with the power of AI and ML. Despite the challenge of limited labeled data, supervised models show promise, emphasizing the importance of leveraging labels for training, especially in deep learning contexts. This study enhances the field by introducing a taxonomy for log anomalies and exploring automated data labeling to mitigate labeling challenges. It goes further by investigating the potential of diverse anomaly detection techniques and their alignment with specific anomaly types. However, the exploration doesn't stop at anomaly detection. The study envisions a future where root cause analysis follows anomaly detection, unraveling the underlying triggers of anomalies. This uncharted territory holds immense potential for revolutionizing IT systems management. In essence, this paper enriches our understanding of anomaly detection, and automated labeling, and sets the stage for transformative root cause analysis. Together, these advances promise more resilient IT systems, elevating operational efficiency and user satisfaction in an ever-evolving technological landscape.
Large Language Model (LLM) Bias Index -- LLMBI
Authors: Authors: Abiodun Finbarrs Oketunji, Muhammad Anas, Deepthi Saina
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.14769
Pdf link: https://arxiv.org/pdf/2312.14769
Abstract The Large Language Model Bias Index (LLMBI) is a pioneering approach designed to quantify and address biases inherent in large language models (LLMs), such as GPT-4. We recognise the increasing prevalence and impact of LLMs across diverse sectors. This research introduces a novel metric, LLMBI, to systematically measure and mitigate biases potentially skewing model responses. We formulated LLMBI using a composite scoring system incorporating multiple dimensions of bias, including but not limited to age, gender, and racial biases. To operationalise this metric, we engaged in a multi-step process involving collecting and annotating LLM responses, applying sophisticated Natural Language Processing (NLP) techniques for bias detection, and computing the LLMBI score through a specially crafted mathematical formula. The formula integrates weighted averages of various bias dimensions, a penalty for dataset diversity deficiencies, and a correction for sentiment biases. Our empirical analysis, conducted using responses from OpenAI's API, employs advanced sentiment analysis as a representative method for bias detection. The research reveals LLMs, whilst demonstrating impressive capabilities in text generation, exhibit varying degrees of bias across different dimensions. LLMBI provides a quantifiable measure to compare biases across models and over time, offering a vital tool for systems engineers, researchers and regulators in enhancing the fairness and reliability of LLMs. It highlights the potential of LLMs in mimicking unbiased human-like responses. Additionally, it underscores the necessity of continuously monitoring and recalibrating such models to align with evolving societal norms and ethical standards.
Prototype-Guided Text-based Person Search based on Rich Chinese Descriptions
Authors: Authors: Ziqiang Wu, Bingpeng Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.14834
Pdf link: https://arxiv.org/pdf/2312.14834
Abstract Text-based person search aims to simultaneously localize and identify the target person based on query text from uncropped scene images, which can be regarded as the unified task of person detection and text-based person retrieval task. In this work, we propose a large-scale benchmark dataset named PRW-TPS-CN based on the widely used person search dataset PRW. Our dataset contains 47,102 sentences, which means there is quite more information than existing dataset. These texts precisely describe the person images from top to bottom, which in line with the natural description order. We also provide both Chinese and English descriptions in our dataset for more comprehensive evaluation. These characteristics make our dataset more applicable. To alleviate the inconsistency between person detection and text-based person retrieval, we take advantage of the rich texts in PRW-TPS-CN dataset. We propose to aggregate multiple texts as text prototypes to maintain the prominent text features of a person, which can better reflect the whole character of a person. The overall prototypes lead to generating the image attention map to eliminate the detection misalignment causing the decrease of text-based person retrieval. Thus, the inconsistency between person detection and text-based person retrieval is largely alleviated. We conduct extensive experiments on the PRW-TPS-CN dataset. The experimental results show the PRW-TPS-CN dataset's effectiveness and the state-of-the-art performance of our approach.
Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures
Authors: Authors: Lingyun Zuo, Keyu An, Shiliang Zhang, Zhijie Yan
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2312.14860
Pdf link: https://arxiv.org/pdf/2312.14860
Abstract In a speech recognition system, voice activity detection (VAD) is a crucial frontend module. Addressing the issues of poor noise robustness in traditional binary VAD systems based on DFSMN, the paper further proposes semantic VAD based on multi-task learning with improved models for real-time and offline systems, to meet specific application requirements. Evaluations on internal datasets show that, compared to the real-time VAD system based on DFSMN, the real-time semantic VAD system based on RWKV achieves relative decreases in CER of 7.0\%, DCF of 26.1\% and relative improvement in NRR of 19.2\%. Similarly, when compared to the offline VAD system based on DFSMN, the offline VAD system based on SAN-M demonstrates relative decreases in CER of 4.4\%, DCF of 18.6\% and relative improvement in NRR of 3.5\%.
Enriching Automatic Test Case Generation by Extracting Relevant Test Inputs from Bug Reports
Authors: Authors: Wendkûuni C. Ouédraogo, Laura Plein, Kader Kaboré, Andrew Habib, Jacques Klein, David Lo, Tegawendé F. Bissyandé
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2312.14898
Pdf link: https://arxiv.org/pdf/2312.14898
Abstract The quality of a software is highly dependent on the quality of the tests it is submitted to. Writing tests for bug detection is thus essential. However, it is time-consuming when done manually. Automating test cases generation has therefore been an exciting research area in the software engineering community. Most approaches have been focused on generating unit tests. Unfortunately, current efforts often do not lead to the generation of relevant inputs, which limits the efficiency of automatically generated tests. Towards improving the relevance of test inputs, we present \name, a technique for exploring bug reports to identify input values that can be fed to automatic test generation tools. In this work, we investigate the performance of using inputs extracted from bug reports with \name to generate test cases with Evosuite. The evaluation is performed on the Defects4J benchmark. For Defects4J projects, our study has shown that \name successfully extracted 68.68\% of relevant inputs when using regular expression in its approach versus 50.21\% relevant inputs without regular expression. Further, our study has shown the potential to improve the Line and Instruction Coverage across all projects. Overall, we successfully collected relevant inputs that led to the detection of 45 bugs that were previously undetected by the baseline.
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers
Authors: Authors: James Gunn, Zygmunt Lenyk, Anuj Sharma, Andrea Donati, Alexandru Buburuzan, John Redford, Romain Mueller
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.14919
Pdf link: https://arxiv.org/pdf/2312.14919
Abstract Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not leverage depth as expected and show that naively improving depth estimation does not lead to improvements in object detection performance and that, strikingly, removing depth estimation altogether does not degrade object detection performance. This suggests that relying on monocular depth could be an unnecessary architectural bottleneck during camera-lidar fusion. In this work, we introduce a novel fusion method that bypasses monocular depth estimation altogether and instead selects and fuses camera and lidar features in a bird's-eye-view grid using a simple attention mechanism. We show that our model can modulate its use of camera features based on the availability of lidar features and that it yields better 3D object detection on the nuScenes dataset than baselines relying on monocular depth estimation.
Keyword: face recognition

Autoencoder Based Face Verification System
Authors: Authors: Enoch Solomon, Abraham Woubie, Eyael Solomon Emiru
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2312.14301
Pdf link: https://arxiv.org/pdf/2312.14301
Abstract The primary objective of this work is to present an alternative approach aimed at reducing the dependency on labeled data. Our proposed method involves utilizing autoencoder pre-training within a face image recognition task with two step processes. Initially, an autoencoder is trained in an unsupervised manner using a substantial amount of unlabeled training dataset. Subsequently, a deep learning model is trained with initialized parameters from the pre-trained autoencoder. This deep learning training process is conducted in a supervised manner, employing relatively limited labeled training dataset. During evaluation phase, face image embeddings is generated as the output of deep neural network layer. Our training is executed on the CelebA dataset, while evaluation is performed using benchmark face recognition datasets such as Labeled Faces in the Wild (LFW) and YouTube Faces (YTF). Experimental results demonstrate that by initializing the deep neural network with pre-trained autoencoder parameters achieve comparable results to state-of-the-art methods.
Inclusive normalization of face images to passport format
Authors: Authors: Hongliu Cao, Minh Nhat Do, Alexis Ravanel, Eoin Thomas
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.14544
Pdf link: https://arxiv.org/pdf/2312.14544
Abstract Face recognition has been used more and more in real world applications in recent years. However, when the skin color bias is coupled with intra-personal variations like harsh illumination, the face recognition task is more likely to fail, even during human inspection. Face normalization methods try to deal with such challenges by removing intra-personal variations from an input image while keeping the identity the same. However, most face normalization methods can only remove one or two variations and ignore dataset biases such as skin color bias. The outputs of many face normalization methods are also not realistic to human observers. In this work, a style based face normalization model (StyleFNM) is proposed to remove most intra-personal variations including large changes in pose, bad or harsh illumination, low resolution, blur, facial expressions, and accessories like sunglasses among others. The dataset bias is also dealt with in this paper by controlling a pretrained GAN to generate a balanced dataset of passport-like images. The experimental results show that StyleFNM can generate more realistic outputs and can improve significantly the accuracy and fairness of face recognition systems.
Keyword: augmentation

Experimenting with Large Language Models and vector embeddings in NASA SciX
Authors: Authors: Sergi Blanco-Cuaresma, Ioana Ciucă, Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Kelly E. Lockhart, Felix Grezes, Thomas Allen, Golnaz Shapurian, Carolyn S. Grant, Donna M. Thompson, Timothy W. Hostetler, Matthew R. Templeton, Shinyi Chen, Jennifer Koch, Taylor Jacovich, Daniel Chivvis, Fernanda de Macedo Alves, Jean-Claude Paquin, Jennifer Bartlett, Mugdha Polimera, Stephanie Jarmak
Subjects: Computation and Language (cs.CL); Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.14211
Pdf link: https://arxiv.org/pdf/2312.14211
Abstract Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed an experiment where we created semantic vectors for our large collection of abstracts and full-text content, and we designed a prompt system to ask questions using contextual chunks from our system. Based on a non-systematic human evaluation, the experiment shows a lower degree of hallucination and better responses when using Retrieval Augmented Generation. Further exploration is required to design new features and data augmentation processes at NASA SciX that leverages this technology while respecting the high level of trust and quality that the project holds.
MonoLSS: Learnable Sample Selection For Monocular 3D Detection
Authors: Authors: Zhenjia Li, Jinrang Jia, Yifeng Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.14474
Pdf link: https://arxiv.org/pdf/2312.14474
Abstract In the field of autonomous driving, monocular 3D detection is a critical task which estimates 3D properties (depth, dimension, and orientation) of objects in a single RGB image. Previous works have used features in a heuristic way to learn 3D properties, without considering that inappropriate features could have adverse effects. In this paper, sample selection is introduced that only suitable samples should be trained to regress the 3D properties. To select samples adaptively, we propose a Learnable Sample Selection (LSS) module, which is based on Gumbel-Softmax and a relative-distance sample divider. The LSS module works under a warm-up strategy leading to an improvement in training stability. Additionally, since the LSS module dedicated to 3D property sample selection relies on object-level features, we further develop a data augmentation method named MixUp3D to enrich 3D property samples which conforms to imaging principles without introducing ambiguity. As two orthogonal methods, the LSS module and MixUp3D can be utilized independently or in conjunction. Sufficient experiments have shown that their combined use can lead to synergistic effects, yielding improvements that transcend the mere sum of their individual applications. Leveraging the LSS module and the MixUp3D, without any extra data, our method named MonoLSS ranks 1st in all three categories (Car, Cyclist, and Pedestrian) on KITTI 3D object detection benchmark, and achieves competitive results on both the Waymo dataset and KITTI-nuScenes cross-dataset evaluation. The code is included in the supplementary material and will be released to facilitate related academic and industrial studies.
ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection
Authors: Authors: Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Qingming Huang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.14535
Pdf link: https://arxiv.org/pdf/2312.14535
Abstract Graph anomaly detection is crucial for identifying nodes that deviate from regular behavior within graphs, benefiting various domains such as fraud detection and social network. Although existing reconstruction-based methods have achieved considerable success, they may face the \textit{Anomaly Overfitting} and \textit{Homophily Trap} problems caused by the abnormal patterns in the graph, breaking the assumption that normal nodes are often better reconstructed than abnormal ones. Our observations indicate that models trained on graphs with fewer anomalies exhibit higher detection performance. Based on this insight, we introduce a novel two-stage framework called Anomaly-Denoised Autoencoders for Graph Anomaly Detection (ADA-GAD). In the first stage, we design a learning-free anomaly-denoised augmentation method to generate graphs with reduced anomaly levels. We pretrain graph autoencoders on these augmented graphs at multiple levels, which enables the graph autoencoders to capture normal patterns. In the next stage, the decoders are retrained for detection on the original graph, benefiting from the multi-level representations learned in the previous stage. Meanwhile, we propose the node anomaly distribution regularization to further alleviate \textit{Anomaly Overfitting}. We validate the effectiveness of our approach through extensive experiments on both synthetic and real-world datasets.
Data augmentation for the POD formulation of the parametric laminar incompressible Navier-Stokes equations
Authors: Authors: Alba Muixí, Sergio Zlotnik, Matteo Giacomini, Pedro Díez
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2312.14756
Pdf link: https://arxiv.org/pdf/2312.14756
Abstract A posteriori reduced-order models, e.g. proper orthogonal decomposition, are essential to affordably tackle realistic parametric problems. They rely on a trustful training set, that is a family of full-order solutions (snapshots) representative of all possible outcomes of the parametric problem. Having such a rich collection of snapshots is not, in many cases, computationally viable. A strategy for data augmentation, designed for parametric laminar incompressible flows, is proposed to enrich poorly populated training sets. The goal is to include in the new, artificial snapshots emerging features, not present in the original basis, that do enhance the quality of the reduced-order solution. The methodologies devised are based on exploiting basic physical principles, such as mass and momentum conservation, to devise physically-relevant, artificial snapshots at a fraction of the cost of additional full-order solutions. Interestingly, the numerical results show that the ideas exploiting only mass conservation (i.e., incompressibility) are not producing significant added value with respect to the standard linear combinations of snapshots. Conversely, accounting for the linearized momentum balance via the Oseen equation does improve the quality of the resulting approximation and therefore is an effective data augmentation strategy in the framework of viscous incompressible laminar flows.

LeeKyungwook / get-arxiv-noti

New submissions for Mon, 25 Dec 23 #902

Keyword: detection

Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation

Low-power event-based face detection with asynchronous neuromorphic hardware

Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective

Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits

Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection

GROOD: GRadient-aware Out-Of-Distribution detection in interpolated manifolds

How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection?

FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection

MonoLSS: Learnable Sample Selection For Monocular 3D Detection

Navigating the Concurrency Landscape: A Survey of Race Condition Vulnerability Detectors

Context Enhanced Transformer for Single Image Object Detection

Revisiting Few-Shot Object Detection with Vision-Language Models

ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection

The Projection Method: a Unified Formalism for Community Detection

PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer

Reasons to Reject? Aligning Language Models with Judgments

Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps

MEAOD: Model Extraction Attack against Object Detectors

BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions

Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis

Large Language Model (LLM) Bias Index -- LLMBI

Prototype-Guided Text-based Person Search based on Rich Chinese Descriptions

Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures

Enriching Automatic Test Case Generation by Extracting Relevant Test Inputs from Bug Reports

Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers

Keyword: face recognition

Autoencoder Based Face Verification System

Inclusive normalization of face images to passport format

Keyword: augmentation

Experimenting with Large Language Models and vector embeddings in NASA SciX

MonoLSS: Learnable Sample Selection For Monocular 3D Detection

ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection

Data augmentation for the POD formulation of the parametric laminar incompressible Navier-Stokes equations