New submissions for Wednesday, 3 July 2024 (showing 448 of 448 entries )

Keyword: detection

Title:

      Graphical user interface agents optimization for visual instruction grounding using multi-modal artificial intelligence systems

Authors: Tassnim Dardouri, Laura Minkova, Jessica López Espejel, Walid Dahhane, El Hassane Ettifouri
Subjects: Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Most instance perception and image understanding solutions focus mainly on natural images. However, applications for synthetic images, and more specifically, images of Graphical User Interfaces (GUI) remain limited. This hinders the development of autonomous computer-vision-powered Artificial Intelligence (AI) agents. In this work, we present Search Instruction Coordinates or SIC, a multi-modal solution for object identification in a GUI. More precisely, given a natural language instruction and a screenshot of a GUI, SIC locates the coordinates of the component on the screen where the instruction would be executed. To this end, we develop two methods. The first method is a three-part architecture that relies on a combination of a Large Language Model (LLM) and an object detection model. The second approach uses a multi-modal foundation model.
Title:
```
  neuROSym: Deployment and Evaluation of a ROS-based Neuro-Symbolic Model for Human Motion Prediction
```
Authors: Sariah Mghames, Luca Castri, Marc Hanheide, Nicola Bellotto
Subjects: Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Autonomous mobile robots can rely on several human motion detection and prediction systems for safe and efficient navigation in human environments, but the underline model architectures can have different impacts on the trustworthiness of the robot in the real world. Among existing solutions for context-aware human motion prediction, some approaches have shown the benefit of integrating symbolic knowledge with state-of-the-art neural networks. In particular, a recent neuro-symbolic architecture (NeuroSyM) has successfully embedded context with a Qualitative Trajectory Calculus (QTC) for spatial interactions representation. This work achieved better performance than neural-only baseline architectures on offline datasets. In this paper, we extend the original architecture to provide neuROSym, a ROS package for robot deployment in real-world scenarios, which can run, visualise, and evaluate previous neural-only and neuro-symbolic models for motion prediction online. We evaluated these models, NeuroSyM and a baseline SGAN, on a TIAGo robot in two scenarios with different human motion patterns. We assessed accuracy and runtime performance of the prediction models, showing a general improvement in case our neuro-symbolic architecture is used. We make the neuROSym package1 publicly available to the robotics community.
Title:
```
  NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers
```
Authors: Salvatore Greco, Ke Zhou, Licia Capra, Tania Cerquitelli, Daniele Quercia
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract AI regulations are expected to prohibit machine learning models from using sensitive attributes during training. However, the latest Natural Language Processing (NLP) classifiers, which rely on deep learning, operate as black-box systems, complicating the detection and remediation of such misuse. Traditional bias mitigation methods in NLP aim for comparable performance across different groups based on attributes like gender or race but fail to address the underlying issue of reliance on protected attributes. To partly fix that, we introduce NLPGuard, a framework for mitigating the reliance on protected attributes in NLP classifiers. NLPGuard takes an unlabeled dataset, an existing NLP classifier, and its training data as input, producing a modified training dataset that significantly reduces dependence on protected attributes without compromising accuracy. NLPGuard is applied to three classification tasks: identifying toxic language, sentiment analysis, and occupation classification. Our evaluation shows that current NLP classifiers heavily depend on protected attributes, with up to $23\%$ of the most predictive words associated with these attributes. However, NLPGuard effectively reduces this reliance by up to $79\%$, while slightly improving accuracy.
Title:
```
  Uncertainty Quantification in Table Structure Recognition
```
Authors: Kehinde Ajayi, Leizhen Zhang, Yi He, Jian Wu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Quantifying uncertainties for machine learning models is a critical step to reduce human verification effort by detecting predictions with low confidence. This paper proposes a method for uncertainty quantification (UQ) of table structure recognition (TSR). The proposed UQ method is built upon a mixture-of-expert approach termed Test-Time Augmentation (TTA). Our key idea is to enrich and diversify the table representations, to spotlight the cells with high recognition uncertainties. To evaluate the effectiveness, we proposed two heuristics to differentiate highly uncertain cells from normal cells, namely, masking and cell complexity quantification. Masking involves varying the pixel intensity to deem the detection uncertainty. Cell complexity quantification gauges the uncertainty of each cell by its topological relation with neighboring cells. The evaluation results based on standard benchmark datasets demonstrate that the proposed method is effective in quantifying uncertainty in TSR models. To our best knowledge, this study is the first of its kind to enable UQ in TSR tasks. Our code and data are available at: this https URL.
Title:
```
  Algorithmic Dimensions via Learning Functions
```
Authors: Jack H. Lutz, Andrei N. Migunov
Subjects: Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We characterize the algorithmic dimensions (i.e., the lower and upper asymptotic densities of information) of infinite binary sequences in terms of the inability of learning functions having an algorithmic constraint to detect patterns in them. Our pattern detection criterion is a quantitative extension of the criterion that Zaffora Blando used to characterize the algorithmically random (i.e., Martin-Löf random) sequences. Our proof uses Lutz's and Mayordomo's respective characterizations of algorithmic dimension in terms of gales and Kolmogorov complexity.
Title:
```
  Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models
```
Authors: Lam Pham, Phat Lam, Truong Nguyen, Huyen Nguyen, Alexander Schindler
Subjects: Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and discrete cosine transform (DCT). Given the spectrograms, we evaluate a wide range of classification models based on three deep learning approaches. The first approach is to train directly the spectrograms using our proposed baseline models of CNN-based model (CNN-baseline), RNN-based model (RNN-baseline), C-RNN model (C-RNN baseline). Meanwhile, the second approach is transfer learning from computer vision models such as ResNet-18, MobileNet-V3, EfficientNet-B0, DenseNet-121, SuffleNet-V2, Swint, Convnext-Tiny, GoogLeNet, MNASsnet, RegNet. In the third approach, we leverage the state-of-the-art audio pre-trained models of Whisper, Seamless, Speechbrain, and Pyannote to extract audio embeddings from the input spectrograms. Then, the audio embeddings are explored by a Multilayer perceptron (MLP) model to detect the fake or real audio samples. Finally, high-performance deep learning models from these approaches are fused to achieve the best performance. We evaluated our proposed models on ASVspoof 2019 benchmark dataset. Our best ensemble model achieved an Equal Error Rate (EER) of 0.03, which is highly competitive to top-performing systems in the ASVspoofing 2019 challenge. Experimental results also highlight the potential of selective spectrograms and deep learning approaches to enhance the task of audio deepfake detection.
Title:
```
  Analyzing Persuasive Strategies in Meme Texts: A Fusion of Language Models with Paraphrase Enrichment
```
Authors: Kota Shamanth Ramanath Nayak, Leila Kosseim
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This paper describes our approach to hierarchical multi-label detection of persuasion techniques in meme texts. Our model, developed as a part of the recent SemEval task, is based on fine-tuning individual language models (BERT, XLM-RoBERTa, and mBERT) and leveraging a mean-based ensemble model in addition to dataset augmentation through paraphrase generation from ChatGPT. The scope of the study encompasses enhancing model performance through innovative training techniques and data augmentation strategies. The problem addressed is the effective identification and classification of multiple persuasive techniques in meme texts, a task complicated by the diversity and complexity of such content. The objective of the paper is to improve detection accuracy by refining model training methods and examining the impact of balanced versus unbalanced training datasets. Novelty in the results and discussion lies in the finding that training with paraphrases enhances model performance, yet a balanced training set proves more advantageous than a larger unbalanced one. Additionally, the analysis reveals the potential pitfalls of indiscriminate incorporation of paraphrases from diverse distributions, which can introduce substantial noise. Results with the SemEval 2024 data confirm these insights, demonstrating improved model efficacy with the proposed methods.
Title:
```
  Science DMZ Networks: How Different are They Really?
```
Authors: Emily Mutter, Susmit Shannigrahi
Subjects: Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The Science Demilitarized Zone (Science DMZ) is a network environment optimized for scientific applications. A Science DMZ provides an environment mostly free from competing traffic flows and complex security middleware such as firewalls or intrusion detection systems that often impede data transfer performance. The Science DMZ model provides a reference set of network design patterns, tuned hosts and protocol stacks dedicated to large data transfers and streamlined security postures that significantly improve data transfer performance, accelerating scientific collaborations and discovery. Over the past decade, many universities and organizations have adopted this model for their research computing. Despite becoming increasingly popular, there is a lack of quantitative studies comparing such a specialized network to conventional production networks regarding network characteristics and data transfer performance. We strive to answer the following research questions in this study: Does a Science DMZ exhibit significantly different behavior than a general-purpose campus network? Does it improve application performance compared to such general-purpose networks? Through a two-year-long quantitative network measurement study, we find that a Science DMZ exhibits lower latency, higher throughput, and lower jitter behaviors. However, we also see several non-intuitive results. For example, a DMZ may take a longer route to external destinations and experience higher latency than the campus network. While the DMZ model benefits researchers, the benefits are not automatic - careful network tuning based on specific use cases is required to realize the full potential of such infrastructure.
Title:
```
  A Study of Nationality Bias in Names and Perplexity using Off-the-Shelf Affect-related Tweet Classifiers
```
Authors: Valentin Barriere, Sebastian Cifuentes
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper, we apply a method to quantify biases associated with named entities from various countries. We create counterfactual examples with small perturbations on target-domain data instead of relying on templates or specific datasets for bias detection. On widely used classifiers for subjectivity analysis, including sentiment, emotion, hate speech, and offensive text using Twitter data, our results demonstrate positive biases related to the language spoken in a country across all classifiers studied. Notably, the presence of certain country names in a sentence can strongly influence predictions, up to a 23\% change in hate speech detection and up to a 60\% change in the prediction of negative emotions such as anger. We hypothesize that these biases stem from the training data of pre-trained language models (PLMs) and find correlations between affect predictions and PLMs likelihood in English and unknown languages like Basque and Maori, revealing distinct patterns with exacerbate correlations. Further, we followed these correlations in-between counterfactual examples from a same sentence to remove the syntactical component, uncovering interesting results suggesting the impact of the pre-training data was more important for English-speaking-country names. Our anonymized code is [https://anonymous.4open.science/r/biases_ppl-576B/README.md](available here).
Title:
```
  Research on target detection method of distracted driving behavior based on improved YOLOv8
```
Authors: Shiquan Shen, Zhizhong Wu, Pan Zhang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract With the development of deep learning technology, the detection and classification of distracted driving behaviour requires higher accuracy. Existing deep learning-based methods are computationally intensive and parameter redundant, limiting the efficiency and accuracy in practical applications. To solve this problem, this study proposes an improved YOLOv8 detection method based on the original YOLOv8 model by integrating the BoTNet module, GAM attention mechanism and EIoU loss function. By optimising the feature extraction and multi-scale feature fusion strategies, the training and inference processes are simplified, and the detection accuracy and efficiency are significantly improved. Experimental results show that the improved model performs well in both detection speed and accuracy, with an accuracy rate of 99.4%, and the model is smaller and easy to deploy, which is able to identify and classify distracted driving behaviours in real time, provide timely warnings, and enhance driving safety.
Title:
```
  Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection
```
Authors: Zixing Li, Chao Yan, Zhen Lan, Dengqing Tang, Xiaojia Xiang, Han Zhou, Jun Lai
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method.
Title:
```
  LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis
```
Authors: Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Shimin Tao, Yilun Liu, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian Sun, Dan Pei
Subjects: Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maintenance script generation, and alert information summarization. However, the performance of current LLMs in log analysis tasks remains inadequately validated. To address this gap, we introduce LogEval, a comprehensive benchmark suite designed to evaluate the capabilities of LLMs in various log analysis tasks for the first time. This benchmark covers tasks such as log parsing, log anomaly detection, log fault diagnosis, and log summarization. LogEval evaluates each task using 4,000 publicly available log data entries and employs 15 different prompts for each task to ensure a thorough and fair assessment. By rigorously evaluating leading LLMs, we demonstrate the impact of various LLM technologies on log analysis performance, focusing on aspects such as self-consistency and few-shot contextual learning. We also discuss findings related to model quantification, Chinese-English question-answering evaluation, and prompt engineering. These findings provide insights into the strengths and weaknesses of LLMs in multilingual environments and the effectiveness of different prompt strategies. Various evaluation methods are employed for different tasks to accurately measure the performance of LLMs in log analysis, ensuring a comprehensive assessment. The insights gained from LogEvals evaluation reveal the strengths and limitations of LLMs in log analysis tasks, providing valuable guidance for researchers and practitioners.
Title:
```
  Enhancing Multi-Class Anomaly Detection via Diffusion Refinement with Dual Conditioning
```
Authors: Jiawei Zhan, Jinxiang Lai, Bin-Bin Gao, Jun Liu, Xiaochen Chen, Chengjie Wang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Anomaly detection, the technique of identifying abnormal samples using only normal samples, has attracted widespread interest in industry. Existing one-model-per-category methods often struggle with limited generalization capabilities due to their focus on a single category, and can fail when encountering variations in product. Recent feature reconstruction methods, as representatives in one-model-all-categories schemes, face challenges including reconstructing anomalous samples and blurry reconstructions. In this paper, we creatively combine a diffusion model and a transformer for multi-class anomaly detection. This approach leverages diffusion to obtain high-frequency information for refinement, greatly alleviating the blurry reconstruction problem while maintaining the sampling efficiency of the reverse diffusion process. The task is transformed into image inpainting to disconnect the input-output correlation, thereby mitigating the "identical shortcuts" problem and avoiding the model from reconstructing anomalous samples. Besides, we introduce category-awareness using dual conditioning to ensure the accuracy of prediction and reconstruction in the reverse diffusion process, preventing excessive deviation from the target category, thus effectively enabling multi-class anomaly detection. Futhermore, Spatio-temporal fusion is also employed to fuse heatmaps predicted at different timesteps and scales, enhancing the performance of multi-class anomaly detection. Extensive experiments on benchmark datasets demonstrate the superior performance and exceptional multi-class anomaly detection capabilities of our proposed method compared to others.
Title:
```
  Securing Distributed Network Digital Twin Systems Against Model Poisoning Attacks
```
Authors: Zifan Zhang, Minghong Fang, Mingzhe Chen, Gaolei Li, Xi Lin, Yuchen Liu
Subjects: Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In the era of 5G and beyond, the increasing complexity of wireless networks necessitates innovative frameworks for efficient management and deployment. Digital twins (DTs), embodying real-time monitoring, predictive configurations, and enhanced decision-making capabilities, stand out as a promising solution in this context. Within a time-series data-driven framework that effectively maps wireless networks into digital counterparts, encapsulated by integrated vertical and horizontal twinning phases, this study investigates the security challenges in distributed network DT systems, which potentially undermine the reliability of subsequent network applications such as wireless traffic forecasting. Specifically, we consider a minimal-knowledge scenario for all attackers, in that they do not have access to network data and other specialized knowledge, yet can interact with previous iterations of server-level models. In this context, we spotlight a novel fake traffic injection attack designed to compromise a distributed network DT system for wireless traffic prediction. In response, we then propose a defense mechanism, termed global-local inconsistency detection (GLID), to counteract various model poisoning threats. GLID strategically removes abnormal model parameters that deviate beyond a particular percentile range, thereby fortifying the security of network twinning process. Through extensive experiments on real-world wireless traffic datasets, our experimental evaluations show that both our attack and defense strategies significantly outperform existing baselines, highlighting the importance of security measures in the design and implementation of DTs for 5G and beyond network systems.
Title:
```
  Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning
```
Authors: Chengchao Shen, Jianzhong Chen, Jianxin Wang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The existing contrastive learning methods mainly focus on single-grained representation learning, e.g., part-level, object-level or scene-level ones, thus inevitably neglecting the transferability of representations on other granularity levels. In this paper, we aim to learn multi-grained representations, which can effectively describe the image on various granularity levels, thus improving generalization on extensive downstream tasks. To this end, we propose a novel Multi-Grained Contrast method (MGC) for unsupervised representation learning. Specifically, we construct delicate multi-grained correspondences between positive views and then conduct multi-grained contrast by the correspondences to learn more general unsupervised representations. Without pretrained on large-scale dataset, our method significantly outperforms the existing state-of-the-art methods on extensive downstream tasks, including object detection, instance segmentation, scene parsing, semantic segmentation and keypoint detection. Moreover, experimental results support the data-efficient property and excellent representation transferability of our method. The source code and trained weights are available at \url{this https URL}.
Title:
```
  Fake News Detection and Manipulation Reasoning via Large Vision-Language Models
```
Authors: Ruihan Jin, Ruibo Fu, Zhengqi Wen, Shuai Zhang, Yukun Liu, Jianhua Tao
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content remains under-explored. Furthermore, due to the lack of external knowledge, the performance of existing methods on fact-related news is questionable, leaving their practical implementation unclear. In this paper, we propose a new multi-media research topic, namely manipulation reasoning. Manipulation reasoning aims to reason manipulations based on news content. To support the research, we introduce a benchmark for fake news detection and manipulation reasoning, referred to as Human-centric and Fact-related Fake News (HFFN). The benchmark highlights the centrality of human and the high factual relevance, with detailed manual annotations. HFFN encompasses four realistic domains with fake news samples generated through three manipulation approaches. Moreover, a Multi-modal news Detection and Reasoning langUage Model (M-DRUM) is presented not only to judge on the authenticity of multi-modal news, but also raise analytical reasoning about potential manipulations. On the feature extraction level, a cross-attention mechanism is employed to extract fine-grained fusion features from multi-modal inputs. On the reasoning level, a large vision-language model (LVLM) serves as the backbone to facilitate fact-related reasoning. A two-stage training framework is deployed to better activate the capacity of identification and reasoning. Comprehensive experiments demonstrate that our model outperforms state-of-the-art (SOTA) fake news detection models and powerful LVLMs like GPT-4 and LLaVA.
Title:
```
  HC-GLAD: Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection
```
Authors: Yali Fu, Jindong Li, Jiahong Liu, Qianli Xing, Qi Wang, Irwin King
Subjects: Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Unsupervised graph-level anomaly detection (UGAD) has garnered increasing attention in recent years due to its significance. However, most existing methods only rely on traditional graph neural networks to explore pairwise relationships but such kind of pairwise edges are not enough to describe multifaceted relationships involving anomaly. There is an emergency need to exploit node group information which plays a crucial role in UGAD. In addition, most previous works ignore the global underlying properties (e.g., hierarchy and power-law structure) which are common in real-world graph datasets and therefore are indispensable factors on UGAD task. In this paper, we propose a novel Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection (HC-GLAD in short). To exploit node group connections, we construct hypergraphs based on gold motifs and subsequently perform hypergraph convolution. Furthermore, to preserve the hierarchy of real-world graphs, we introduce hyperbolic geometry into this field and conduct both graph and hypergraph embedding learning in hyperbolic space with hyperboloid model. To the best of our knowledge, this is the first work to simultaneously apply hypergraph with node group connections and hyperbolic geometry into this field. Extensive experiments on several real world datasets of different fields demonstrate the superiority of HC-GLAD on UGAD task. The code is available at this https URL.
Title:
```
  LiDAR-based HD Map Localization using Semantic Generalized ICP with Road Marking Detection
```
Authors: Yansong Gong, Xinglian Zhang, Jingyi Feng, Xiao He, Dan Zhang
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In GPS-denied scenarios, a robust environmental perception and localization system becomes crucial for autonomous driving. In this paper, a LiDAR-based online localization system is developed, incorporating road marking detection and registration on a high-definition (HD) map. Within our system, a road marking detection approach is proposed with real-time performance, in which an adaptive segmentation technique is first introduced to isolate high-reflectance points correlated with road markings, enhancing real-time efficiency. Then, a spatio-temporal probabilistic local map is formed by aggregating historical LiDAR scans, providing a dense point cloud. Finally, a LiDAR bird's-eye view (LiBEV) image is generated, and an instance segmentation network is applied to accurately label the road markings. For road marking registration, a semantic generalized iterative closest point (SG-ICP) algorithm is designed. Linear road markings are modeled as 1-manifolds embedded in 2D space, mitigating the influence of constraints along the linear direction, addressing the under-constrained problem and achieving a higher localization accuracy on HD maps than ICP. Extensive experiments are conducted in real-world scenarios, demonstrating the effectiveness and robustness of our system.
Title:
```
  DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection
```
Authors: Kaixin Xu, Qingtian Feng, Hao Chen, Zhe Wang, Xue Geng, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Applying deep neural networks to 3D point cloud processing has attracted increasing attention due to its advanced performance in many areas, such as AR/VR, autonomous driving, and robotics. However, as neural network models and 3D point clouds expand in size, it becomes a crucial challenge to reduce the computational and memory overhead to meet latency and energy constraints in real-world applications. Although existing approaches have proposed to reduce both computational cost and memory footprint, most of them only address the spatial redundancy in inputs, i.e. removing the redundancy of background points in 3D data. In this paper, we propose a novel post-training weight pruning scheme for 3D object detection that is (1) orthogonal to all existing point cloud sparsifying methods, which determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence (detection distortion); and (2) a universal plug-and-play pruning framework that works with arbitrary 3D detection model. This framework aims to minimize detection distortion of network output to maximally maintain detection precision, by identifying layer-wise sparsity based on second-order Taylor approximation of the distortion. Albeit utilizing second-order information, we introduced a lightweight scheme to efficiently acquire Hessian information, and subsequently perform dynamic programming to solve the layer-wise sparsity. Extensive experiments on KITTI, Nuscenes and ONCE datasets demonstrate that our approach is able to maintain and even boost the detection precision on pruned model under noticeable computation reduction (FLOPs). Noticeably, we achieve over 3.89x, 3.72x FLOPs reduction on CenterPoint and PVRCNN model, respectively, without mAP decrease, significantly improving the state-of-the-art.
Title:
```
  Fake News Detection: It's All in the Data!
```
Authors: Soveatin Kuntur, Anna Wróblewska, Marcin Paprzycki, Maria Ganzha
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This comprehensive survey serves as an indispensable resource for researchers embarking on the journey of fake news detection. By highlighting the pivotal role of dataset quality and diversity, it underscores the significance of these elements in the effectiveness and robustness of detection models. The survey meticulously outlines the key features of datasets, various labeling systems employed, and prevalent biases that can impact model performance. Additionally, it addresses critical ethical issues and best practices, offering a thorough overview of the current state of available datasets. Our contribution to this field is further enriched by the provision of GitHub repository, which consolidates publicly accessible datasets into a single, user-friendly portal. This repository is designed to facilitate and stimulate further research and development efforts aimed at combating the pervasive issue of fake news.
Title:
```
  Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks
```
Authors: Wataru Hashimoto, Hidetaka Kamigaito, Taro Watanabe
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Trustworthy prediction in Deep Neural Networks (DNNs), including Pre-trained Language Models (PLMs) is important for safety-critical applications in the real world. However, DNNs often suffer from uncertainty estimation, such as miscalibration. In particular, approaches that require multiple stochastic inference can mitigate this problem, but the expensive cost of inference makes them impractical. In this study, we propose $k$-Nearest Neighbor Uncertainty Estimation ($k$NN-UE), which is an uncertainty estimation method that uses the distances from the neighbors and label-existence ratio of neighbors. Experiments on sentiment analysis, natural language inference, and named entity recognition show that our proposed method outperforms the baselines or recent density-based methods in confidence calibration, selective prediction, and out-of-distribution detection. Moreover, our analyses indicate that introducing dimension reduction or approximate nearest neighbor search inspired by recent $k$NN-LM studies reduces the inference overhead without significantly degrading estimation performance when combined them appropriately.
Title:
```
  Counterfactual Data Augmentation with Denoising Diffusion for Graph Anomaly Detection
```
Authors: Chunjing Xiao, Shikang Pang, Xovee Xu, Xuan Li, Goce Trajcevski, Fan Zhou
Subjects: Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract A critical aspect of Graph Neural Networks (GNNs) is to enhance the node representations by aggregating node neighborhood information. However, when detecting anomalies, the representations of abnormal nodes are prone to be averaged by normal neighbors, making the learned anomaly representations less distinguishable. To tackle this issue, we propose CAGAD -- an unsupervised Counterfactual data Augmentation method for Graph Anomaly Detection -- which introduces a graph pointer neural network as the heterophilic node detector to identify potential anomalies whose neighborhoods are normal-node-dominant. For each identified potential anomaly, we design a graph-specific diffusion model to translate a part of its neighbors, which are probably normal, into anomalous ones. At last, we involve these translated neighbors in GNN neighborhood aggregation to produce counterfactual representations of anomalies. Through aggregating the translated anomalous neighbors, counterfactual representations become more distinguishable and further advocate detection performance. The experimental results on four datasets demonstrate that CAGAD significantly outperforms strong baselines, with an average improvement of 2.35% on F1, 2.53% on AUC-ROC, and 2.79% on AUC-PR.
Title:
```
  VRBiom: A New Periocular Dataset for Biometric Applications of HMD
```
Authors: Ketan Kotwal, Ibrahim Ulucan, Gokhan Ozbulak, Janani Selliah, Sebastien Marcel
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract With advancements in hardware, high-quality HMD devices are being developed by numerous companies, driving increased consumer interest in AR, VR, and MR applications. In this work, we present a new dataset, called VRBiom, of periocular videos acquired using a Virtual Reality headset. The VRBiom, targeted at biometric applications, consists of 900 short videos acquired from 25 individuals recorded in the NIR spectrum. These 10s long videos have been captured using the internal tracking cameras of Meta Quest Pro at 72 FPS. To encompass real-world variations, the dataset includes recordings under three gaze conditions: steady, moving, and partially closed eyes. We have also ensured an equal split of recordings without and with glasses to facilitate the analysis of eye-wear. These videos, characterized by non-frontal views of the eye and relatively low spatial resolutions (400 x 400), can be instrumental in advancing state-of-the-art research across various biometric applications. The VRBiom dataset can be utilized to evaluate, train, or adapt models for biometric use-cases such as iris and/or periocular recognition and associated sub-tasks such as detection and semantic segmentation. In addition to data from real individuals, we have included around 1100 PA constructed from 92 PA instruments. These PAIs fall into six categories constructed through combinations of print attacks (real and synthetic identities), fake 3D eyeballs, plastic eyes, and various types of masks and mannequins. These PA videos, combined with genuine (bona-fide) data, can be utilized to address concerns related to spoofing, which is a significant threat if these devices are to be used for authentication. The VRBiom dataset is publicly available for research purposes related to biometric applications only.
Title:
```
  GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection
```
Authors: Zhenchun Lei, Hui Yan, Changhong Liu, Yong Zhou, Minglei Ma
Subjects: Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scale Log Gaussian Probability features. Secondly, the grouping technique is used to improve the classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The final score is obtained by ensemble of all group classifier outputs using the averaging method. Thirdly, the residual block is improved by including one activation function and one batch normalization layer. Finally, an ensemble-aware loss function is proposed to integrate the independent loss functions of all ensemble members. On the ASVspoof 2019 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.0227 and an EER of 0.79\%. On the ASVspoof 2021 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.2362 and an EER of 2.19\%, and represents a relative reductions of 31.4\% and 76.3\% compared with the LFCC-LCNN baseline.
Title:
```
  Detecting Driver Fatigue With Eye Blink Behavior
```
Authors: Ali Akin, Habil Kalkan
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Traffic accidents, causing millions of deaths and billions of dollars in economic losses each year globally, have become a significant issue. One of the main causes of these accidents is drivers being sleepy or fatigued. Recently, various studies have focused on detecting drivers' sleep/wake states using camera-based solutions that do not require physical contact with the driver, thereby enhancing ease of use. In this study, besides the eye blink frequency, a driver adaptive eye blink behavior feature set have been evaluated to detect the fatigue status. It is observed from the results that behavior of eye blink carries useful information on fatigue detection. The developed image-based system provides a solution that can work adaptively to the physical characteristics of the drivers and their positions in the vehicle
Title:
```
  MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
```
Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, Ying-Cong Chen
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging Mamba, while CTM explicitly models task interactions to facilitate information exchange across tasks. Experiments on NYUDv2 and PASCAL-Context datasets demonstrate the superior performance of MTMamba over Transformer-based and CNN-based methods. Notably, on the PASCAL-Context dataset, MTMamba achieves improvements of +2.08, +5.01, and +4.90 over the previous best method in the tasks of semantic segmentation, human parsing, and object boundary detection, respectively. The code is available at \url{this https URL}.
Title:
```
  Improving Explainability of Softmax Classifiers Using a Prototype-Based Joint Embedding Method
```
Authors: Hilarie Sit, Brendan Keith, Karianne Bergen
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We propose a prototype-based approach for improving explainability of softmax classifiers that provides an understandable prediction confidence, generated through stochastic sampling of prototypes, and demonstrates potential for out of distribution detection (OOD). By modifying the model architecture and training to make predictions using similarities to any set of class examples from the training dataset, we acquire the ability to sample for prototypical examples that contributed to the prediction, which provide an instance-based explanation for the model's decision. Furthermore, by learning relationships between images from the training dataset through relative distances within the model's latent space, we obtain a metric for uncertainty that is better able to detect out of distribution data than softmax confidence.
Title:
```
  Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks
```
Authors: Adrian Rebmann, Fabian David Schmidt, Goran Glavaš, Han van der Aa
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit from an understanding of process behavior. Examples of such tasks include (semantic) anomaly detection and next activity prediction, which both involve considerations of the meaning of activities and their inter-relations. In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks. Furthermore, whereas most works on the intersection of LLMs and process mining only focus on testing these models out of the box, we provide a more principled investigation of the utility of LLMs for process mining, including their ability to obtain process mining knowledge post-hoc by means of in-context learning and supervised fine-tuning. Concretely, we define three process mining tasks that benefit from an understanding of process semantics and provide extensive benchmarking datasets for each of them. Our evaluation experiments reveal that (1) LLMs fail to solve challenging process mining tasks out of the box and when provided only a handful of in-context examples, (2) but they yield strong performance when fine-tuned for these tasks, consistently surpassing smaller, encoder-based language models.
Title:
```
  OpenSlot: Mixed Open-set Recognition with Object-centric Learning
```
Authors: Xu Yin, Fei Pan, Guoyuan An, Yuchi Huo, Zixuan Xie, Sung-Eui Yoon
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Existing open-set recognition (OSR) studies typically assume that each image contains only one class label, and the unknown test set (negative) has a disjoint label space from the known test set (positive), a scenario termed full-label shift. This paper introduces the mixed OSR problem, where test images contain multiple class semantics, with known and unknown classes co-occurring in negatives, leading to a more challenging super-label shift. Addressing the mixed OSR requires classification models to accurately distinguish different class semantics within images and measure their "knowness". In this study, we propose the OpenSlot framework, built upon object-centric learning. OpenSlot utilizes slot features to represent diverse class semantics and produce class predictions. Through our proposed anti-noise-slot (ANS) technique, we mitigate the impact of noise (invalid and background) slots during classification training, effectively addressing the semantic misalignment between class predictions and the ground truth. We conduct extensive experiments with OpenSlot on mixed & conventional OSR benchmarks. Without elaborate designs, OpenSlot not only exceeds existing OSR studies in detecting super-label shifts across single & multi-label mixed OSR tasks but also achieves state-of-the-art performance on conventional benchmarks. Remarkably, our method can localize class objects without using bounding boxes during training. The competitive performance in open-set object detection demonstrates OpenSlot's ability to explicitly explain label shifts and benefits in computational efficiency and generalization.
Title:
```
  Similarity Distance-Based Label Assignment for Tiny Object Detection
```
Authors: Shuohao Shi, Qiang Fang, Tong Zhao, Xin Xu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Tiny object detection is becoming one of the most challenging tasks in computer vision because of the limited object size and lack of information. The label assignment strategy is a key factor affecting the accuracy of object detection. Although there are some effective label assignment strategies for tiny objects, most of them focus on reducing the sensitivity to the bounding boxes to increase the number of positive samples and have some fixed hyperparameters need to set. However, more positive samples may not necessarily lead to better detection results, in fact, excessive positive samples may lead to more false positives. In this paper, we introduce a simple but effective strategy named the Similarity Distance (SimD) to evaluate the similarity between bounding boxes. This proposed strategy not only considers both location and shape similarity but also learns hyperparameters adaptively, ensuring that it can adapt to different datasets and various object sizes in a dataset. Our approach can be simply applied in common anchor-based detectors in place of the IoU for label assignment and Non Maximum Suppression (NMS). Extensive experiments on four mainstream tiny object detection datasets demonstrate superior performance of our method, especially, 1.8 AP points and 4.1 AP points of very tiny higher than the state-of-the-art competitors on AI-TOD. Code is available at: \url{this https URL}.
Title:
```
  Assessing the Code Clone Detection Capability of Large Language Models
```
Authors: Zixian Zhang, Takfarinas Saber
Subjects: Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This study aims to assess the performance of two advanced Large Language Models (LLMs), GPT-3.5 and GPT-4, in the task of code clone detection. The evaluation involves testing the models on a variety of code pairs of different clone types and levels of similarity, sourced from two datasets: BigCloneBench (human-made) and GPTCloneBench (LLM-generated). Findings from the study indicate that GPT-4 consistently surpasses GPT-3.5 across all clone types. A correlation was observed between the GPTs' accuracy at identifying code clones and code similarity, with both GPT models exhibiting low effectiveness in detecting the most complex Type-4 code clones. Additionally, GPT models demonstrate a higher performance identifying code clones in LLM-generated code compared to humans-generated code. However, they do not reach impressive accuracy. These results emphasize the imperative for ongoing enhancements in LLM capabilities, particularly in the recognition of code clones and in mitigating their predisposition towards self-generated code clones--which is likely to become an issue as software engineers are more numerous to leverage LLM-enabled code generation and code refactoring tools.
Title:
```
  Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates
```
Authors: Dorothea MacPhail, David Harbecke, Lisa Raithel, Sebastian Möller
Subjects: Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract An adverse drug effect (ADE) is any harmful event resulting from medical drug treatment. Despite their importance, ADEs are often under-reported in official channels. Some research has therefore turned to detecting discussions of ADEs in social media. Impressive results have been achieved in various attempts to detect ADEs. In a high-stakes domain such as medicine, however, an in-depth evaluation of a model's abilities is crucial. We address the issue of thorough performance evaluation in English-language ADE detection with hand-crafted templates for four capabilities: Temporal order, negation, sentiment, and beneficial effect. We find that models with similar performance on held-out test sets have varying results on these capabilities.
Title:
```
  Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets
```
Authors: Kheir Eddine Daouadi, Yaakoub Boualleg, Kheir Eddine Haouaouchi
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Today, hate speech classification from Arabic tweets has drawn the attention of several researchers. Many systems and techniques have been developed to resolve this classification task. Nevertheless, two of the major challenges faced in this context are the limited performance and the problem of imbalanced data. In this study, we propose a novel approach that leverages ensemble learning and semi-supervised learning based on previously manually labeled. We conducted experiments on a benchmark dataset by classifying Arabic tweets into 5 distinct classes: non-hate, general hate, racial, religious, or sexism. Experimental results show that: (1) ensemble learning based on pre-trained language models outperforms existing related works; (2) Our proposed data augmentation improves the accuracy results of hate speech detection from Arabic tweets and outperforms existing related works. Our main contribution is the achievement of encouraging results in Arabic hate speech detection.
Keyword: face recognition

Title:
```
  Face Reconstruction Transfer Attack as Out-of-Distribution Generalization
```
Authors: Yoon Gyo Jung, Jaewoo Park, Xingbo Dong, Hojin Park, Andrew Beng Jin Teoh, Octavia Camps
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Understanding the vulnerability of face recognition systems to malicious attacks is of critical importance. Previous works have focused on reconstructing face images that can penetrate a targeted verification system. Even in the white-box scenario, however, naively reconstructed images misrepresent the identity information, hence the attacks are easily neutralized once the face system is updated or changed. In this paper, we aim to reconstruct face images which are capable of transferring face attacks on unseen encoders. We term this problem as Face Reconstruction Transfer Attack (FRTA) and show that it can be formulated as an out-of-distribution (OOD) generalization problem. Inspired by its OOD nature, we propose to solve FRTA by Averaged Latent Search and Unsupervised Validation with pseudo target (ALSUV). To strengthen the reconstruction attack on OOD unseen encoders, ALSUV reconstructs the face by searching the latent of amortized generator StyleGAN2 through multiple latent optimization, latent optimization trajectory averaging, and unsupervised validation with a pseudo target. We demonstrate the efficacy and generalization of our method on widely used face datasets, accompanying it with extensive ablation studies and visually, qualitatively, and quantitatively analyses. The source code will be released.
Keyword: augmentation

Title:
```
  Small but Fair! Fairness for Multimodal Human-Human and Robot-Human Mental Wellbeing Coaching
```
Authors: Jiaee Cheong, Micol Spitale, Hatice Gunes
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In recent years, the affective computing (AC) and human-robot interaction (HRI) research communities have put fairness at the centre of their research agenda. However, none of the existing work has addressed the problem of machine learning (ML) bias in HRI settings. In addition, many of the current datasets for AC and HRI are "small", making ML bias and debias analysis challenging. This paper presents the first work to explore ML bias analysis and mitigation of three small multimodal datasets collected within both a human-human and robot-human wellbeing coaching settings. The contributions of this work includes: i) being the first to explore the problem of ML bias and fairness within HRI settings; and ii) providing a multimodal analysis evaluated via modelling performance and fairness metrics across both high and low-level features and proposing a simple and effective data augmentation strategy (MixFeat) to debias the small datasets presented within this paper; and iii) conducting extensive experimentation and analyses to reveal ML fairness insights unique to AC and HRI research in order to distill a set of recommendations to aid AC and HRI researchers to be more engaged with fairness-aware ML-based research.
Title:
```
  Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction
```
Authors: Zhongxiang Fan, Zhaocheng Liu, Jian Liang, Dongying Kong, Han Li, Peng Jiang, Shuang Li, Kun Gai
Subjects: Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary issue. To address this, we introduce a novel and simple Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios, which can be seamlessly integrated into existing deep CTR models and may have potential applications to handle the "forgetting or overfitting" dilemma in the retraining and the well-known catastrophic forgetting problems. MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data or the Multi-Layer Perceptron (MLP) layers, and achieves data augmentation through training the MLP with varied embedding spaces. Our findings confirm that pre-trained MLP layers can adapt to new embedding spaces, enhancing performance without overfitting. This adaptability underscores the MLP layers' role in learning a matching function focused on the relative relationships among embeddings rather than their absolute positions. To our knowledge, MEDA represents the first multi-epoch training strategy tailored for deep CTR prediction models. We conduct extensive experiments on several public and business datasets, and the effectiveness of data augmentation and superiority over conventional single-epoch training are fully demonstrated. Besides, MEDA has exhibited significant benefits in a real-world online advertising system.
Title:
```
  Uncertainty Quantification in Table Structure Recognition
```
Authors: Kehinde Ajayi, Leizhen Zhang, Yi He, Jian Wu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Quantifying uncertainties for machine learning models is a critical step to reduce human verification effort by detecting predictions with low confidence. This paper proposes a method for uncertainty quantification (UQ) of table structure recognition (TSR). The proposed UQ method is built upon a mixture-of-expert approach termed Test-Time Augmentation (TTA). Our key idea is to enrich and diversify the table representations, to spotlight the cells with high recognition uncertainties. To evaluate the effectiveness, we proposed two heuristics to differentiate highly uncertain cells from normal cells, namely, masking and cell complexity quantification. Masking involves varying the pixel intensity to deem the detection uncertainty. Cell complexity quantification gauges the uncertainty of each cell by its topological relation with neighboring cells. The evaluation results based on standard benchmark datasets demonstrate that the proposed method is effective in quantifying uncertainty in TSR models. To our best knowledge, this study is the first of its kind to enable UQ in TSR tasks. Our code and data are available at: this https URL.
Title:
```
  Improving Trip Mode Choice Modeling Using Ensemble Synthesizer (ENSY)
```
Authors: Amirhossein Parsi, Melina Jafari, Sina Sabzekar, Zahra Amini
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Accurate classification of mode choice datasets is crucial for transportation planning and decision-making processes. However, conventional classification models often struggle to adequately capture the nuanced patterns of minority classes within these datasets, leading to sub-optimal accuracy. In response to this challenge, we present Ensemble Synthesizer (ENSY) which leverages probability distribution for data augmentation, a novel data model tailored specifically for enhancing classification accuracy in mode choice datasets. In our study, ENSY demonstrates remarkable efficacy by nearly quadrupling the F1 score of minority classes and improving overall classification accuracy by nearly 3%. To assess its performance comprehensively, we compare ENSY against various augmentation techniques including Random Oversampling, SMOTE-NC, and CTGAN. Through experimentation, ENSY consistently outperforms these methods across various scenarios, underscoring its robustness and effectiveness
Title:
```
  Analyzing Persuasive Strategies in Meme Texts: A Fusion of Language Models with Paraphrase Enrichment
```
Authors: Kota Shamanth Ramanath Nayak, Leila Kosseim
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This paper describes our approach to hierarchical multi-label detection of persuasion techniques in meme texts. Our model, developed as a part of the recent SemEval task, is based on fine-tuning individual language models (BERT, XLM-RoBERTa, and mBERT) and leveraging a mean-based ensemble model in addition to dataset augmentation through paraphrase generation from ChatGPT. The scope of the study encompasses enhancing model performance through innovative training techniques and data augmentation strategies. The problem addressed is the effective identification and classification of multiple persuasive techniques in meme texts, a task complicated by the diversity and complexity of such content. The objective of the paper is to improve detection accuracy by refining model training methods and examining the impact of balanced versus unbalanced training datasets. Novelty in the results and discussion lies in the finding that training with paraphrases enhances model performance, yet a balanced training set proves more advantageous than a larger unbalanced one. Additionally, the analysis reveals the potential pitfalls of indiscriminate incorporation of paraphrases from diverse distributions, which can introduce substantial noise. Results with the SemEval 2024 data confirm these insights, demonstrating improved model efficacy with the proposed methods.
Title:
```
  Simple Augmentations of Logical Rules for Neuro-Symbolic Knowledge Graph Completion
```
Authors: Ananjan Nandi, Navdeep Kaur, Parag Singla, Mausam
Subjects: Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract High-quality and high-coverage rule sets are imperative to the success of Neuro-Symbolic Knowledge Graph Completion (NS-KGC) models, because they form the basis of all symbolic inferences. Recent literature builds neural models for generating rule sets, however, preliminary experiments show that they struggle with maintaining high coverage. In this work, we suggest three simple augmentations to existing rule sets: (1) transforming rules to their abductive forms, (2) generating equivalent rules that use inverse forms of constituent relations and (3) random walks that propose new rules. Finally, we prune potentially low quality rules. Experiments over four datasets and five ruleset-baseline settings suggest that these simple augmentations consistently improve results, and obtain up to 7.1 pt MRR and 8.5 pt Hits@1 gains over using rules without augmentations.
Title:
```
  Are Data Augmentation Methods in Named Entity Recognition Applicable for Uncertainty Estimation?
```
Authors: Wataru Hashimoto, Hidetaka Kamigaito, Taro Watanabe
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This work investigates the impact of data augmentation on confidence calibration and uncertainty estimation in Named Entity Recognition (NER) tasks. For the future advance of NER in safety-critical fields like healthcare and finance, it is essential to achieve accurate predictions with calibrated confidence when applying Deep Neural Networks (DNNs), including Pre-trained Language Models (PLMs), as a real-world application. However, DNNs are prone to miscalibration, which limits their applicability. Moreover, existing methods for calibration and uncertainty estimation are computational expensive. Our investigation in NER found that data augmentation improves calibration and uncertainty in cross-genre and cross-lingual setting, especially in-domain setting. Furthermore, we showed that the calibration for NER tends to be more effective when the perplexity of the sentences generated by data augmentation is lower, and that increasing the size of the augmentation further improves calibration and uncertainty.
Title:
```
  Counterfactual Data Augmentation with Denoising Diffusion for Graph Anomaly Detection
```
Authors: Chunjing Xiao, Shikang Pang, Xovee Xu, Xuan Li, Goce Trajcevski, Fan Zhou
Subjects: Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract A critical aspect of Graph Neural Networks (GNNs) is to enhance the node representations by aggregating node neighborhood information. However, when detecting anomalies, the representations of abnormal nodes are prone to be averaged by normal neighbors, making the learned anomaly representations less distinguishable. To tackle this issue, we propose CAGAD -- an unsupervised Counterfactual data Augmentation method for Graph Anomaly Detection -- which introduces a graph pointer neural network as the heterophilic node detector to identify potential anomalies whose neighborhoods are normal-node-dominant. For each identified potential anomaly, we design a graph-specific diffusion model to translate a part of its neighbors, which are probably normal, into anomalous ones. At last, we involve these translated neighbors in GNN neighborhood aggregation to produce counterfactual representations of anomalies. Through aggregating the translated anomalous neighbors, counterfactual representations become more distinguishable and further advocate detection performance. The experimental results on four datasets demonstrate that CAGAD significantly outperforms strong baselines, with an average improvement of 2.35% on F1, 2.53% on AUC-ROC, and 2.79% on AUC-PR.
Title:
```
  Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
```
Authors: Junsung Park, Kyungmin Kim, Hyunjung Shim
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Existing LiDAR semantic segmentation methods often struggle with performance declines in adverse weather conditions. Previous research has addressed this issue by simulating adverse weather or employing universal data augmentation during training. However, these methods lack a detailed analysis and understanding of how adverse weather negatively affects LiDAR semantic segmentation performance. Motivated by this issue, we identified key factors of adverse weather and conducted a toy experiment to pinpoint the main causes of performance degradation: (1) Geometric perturbation due to refraction caused by fog or droplets in the air and (2) Point drop due to energy absorption and occlusions. Based on these findings, we propose new strategic data augmentation techniques. First, we introduced a Selective Jittering (SJ) that jitters points in the random range of depth (or angle) to mimic geometric perturbation. Additionally, we developed a Learnable Point Drop (LPD) to learn vulnerable erase patterns with Deep Q-Learning Network to approximate the point drop phenomenon from adverse weather conditions. Without precise weather simulation, these techniques strengthen the LiDAR semantic segmentation model by exposing it to vulnerable conditions identified by our data-centric analysis. Experimental results confirmed the suitability of the proposed data augmentation methods for enhancing robustness against adverse weather conditions. Our method attains a remarkable 39.5 mIoU on the SemanticKITTI-to-SemanticSTF benchmark, surpassing the previous state-of-the-art by over 5.4%p, tripling the improvement over the baseline compared to previous methods achieved.
Title:
```
  Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets
```
Authors: Kheir Eddine Daouadi, Yaakoub Boualleg, Kheir Eddine Haouaouchi
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Today, hate speech classification from Arabic tweets has drawn the attention of several researchers. Many systems and techniques have been developed to resolve this classification task. Nevertheless, two of the major challenges faced in this context are the limited performance and the problem of imbalanced data. In this study, we propose a novel approach that leverages ensemble learning and semi-supervised learning based on previously manually labeled. We conducted experiments on a benchmark dataset by classifying Arabic tweets into 5 distinct classes: non-hate, general hate, racial, religious, or sexism. Experimental results show that: (1) ensemble learning based on pre-trained language models outperforms existing related works; (2) Our proposed data augmentation improves the accuracy results of hate speech detection from Arabic tweets and outperforms existing related works. Our main contribution is the achievement of encouraging results in Arabic hate speech detection.

LeeKyungwook / get-arxiv-noti

New submissions for Wednesday, 3 July 2024 (showing 448 of 448 entries ) #1169

Keyword: detection

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Keyword: face recognition

Title:

Keyword: augmentation

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title: