New submissions for Mon, 12 Feb 24

Keyword: detection

Uncertainty Awareness of Large Language Models Under Code Distribution Shifts: A Benchmark Study

Authors: Authors: Yufei Li, Simin Chen, Yanghong Guo, Wei Yang, Yue Dong, Cong Liu
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.05939
Pdf link: https://arxiv.org/pdf/2402.05939
Abstract Large Language Models (LLMs) have been widely employed in programming language analysis to enhance human productivity. Yet, their reliability can be compromised by various code distribution shifts, leading to inconsistent outputs. While probabilistic methods are known to mitigate such impact through uncertainty calibration and estimation, their efficacy in the language domain remains underexplored compared to their application in image-based tasks. In this work, we first introduce a large-scale benchmark dataset, incorporating three realistic patterns of code distribution shifts at varying intensities. Then we thoroughly investigate state-of-the-art probabilistic methods applied to CodeLlama using these shifted code snippets. We observe that these methods generally improve the uncertainty awareness of CodeLlama, with increased calibration quality and higher uncertainty estimation~(UE) precision. However, our study further reveals varied performance dynamics across different criteria (e.g., calibration error vs misclassification detection) and trade-off between efficacy and efficiency, highlighting necessary methodological selection tailored to specific contexts.
A hybrid IndRNNLSTM approach for real-time anomaly detection in software-defined networks
Authors: Authors: Sajjad Salem, Salman Asoudeh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2402.05943
Pdf link: https://arxiv.org/pdf/2402.05943
Abstract Anomaly detection in SDN using data flow prediction is a difficult task. This problem is included in the category of time series and regression problems. Machine learning approaches are challenging in this field due to the manual selection of features. On the other hand, deep learning approaches have important features due to the automatic selection of features. Meanwhile, RNN-based approaches have been used the most. The LSTM and GRU approaches learn dependent entities well; on the other hand, the IndRNN approach learns non-dependent entities in time series. The proposed approach tried to use a combination of IndRNN and LSTM approaches to learn dependent and non-dependent features. Feature selection approaches also provide a suitable view of features for the models; for this purpose, four feature selection models, Filter, Wrapper, Embedded, and Autoencoder were used. The proposed IndRNNLSTM algorithm, in combination with Embedded, was able to achieve MAE=1.22 and RMSE=9.92 on NSL-KDD data.
Tool wear monitoring using an online, automatic and low cost system based on local texture
Authors: Authors: M. T. García-Ordás, E. Alegre-Gutiérrez, R. Alaiz-Rodríguez, V. González-Castro
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.05977
Pdf link: https://arxiv.org/pdf/2402.05977
Abstract In this work we propose a new online, low cost and fast approach based on computer vision and machine learning to determine whether cutting tools used in edge profile milling processes are serviceable or disposable based on their wear level. We created a new dataset of 254 images of edge profile cutting heads which is, to the best of our knowledge, the first publicly available dataset with enough quality for this purpose. All the inserts were segmented and their cutting edges were cropped, obtaining 577 images of cutting edges: 301 functional and 276 disposable. The proposed method is based on (1) dividing the cutting edge image in different regions, called Wear Patches (WP), (2) characterising each one as worn or serviceable using texture descriptors based on different variants of Local Binary Patterns (LBP) and (3) determine, based on the state of these WP, if the cutting edge (and, therefore, the tool) is serviceable or disposable. We proposed and assessed five different patch division configurations. The individual WP were classified by a Support Vector Machine (SVM) with an intersection kernel. The best patch division configuration and texture descriptor for the WP achieves an accuracy of 90.26% in the detection of the disposable cutting edges. These results show a very promising opportunity for automatic wear monitoring in edge profile milling processes.
Antenna system for trilateral drone precise vertical landing
Authors: Authors: Víctor Araña-Pulido, Eugenio Jiménez-Yguácel, Francisco Cabrera-Almeida, Pedro Quintana-Morales
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2402.06011
Pdf link: https://arxiv.org/pdf/2402.06011
Abstract This article presents a radio frequency system that can be used to perform precise vertical landings of drones. The system is based on the three-way phase shift detection of a signal transmitted from the landing point. The antenna system is designed by taking into account parameters such as landing tracking area, analog-to-digital converter (ADC) resolution, phase detector output range, antenna polarization, and the effect of antenna axial ratio. The fabricated prototype consists of a landing point antenna that transmits a signal at 2.46 GHz, as well as a drone triantenna system that includes a phase shift detection circuitry, ADC, and a simple control program that provides the correction instructions for landing. The prototype provides an averaged output data rate (ODR) suitable for landing maneuvers (>300 Hz). A simple system calibration procedure (detector output zeroing) is performed by aligning the antenna system. The measurements performed at different altitudes demonstrate both the correct operation of the proposed solution and its viability as an instrument for precision vertical landings.
A versatile robotic hand with 3D perception, force sensing for autonomous manipulation
Authors: Authors: Nikolaus Correll, Dylan Kriegman, Stephen Otto, James Watson
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.06018
Pdf link: https://arxiv.org/pdf/2402.06018
Abstract We describe a force-controlled robotic gripper with built-in tactile and 3D perception. We also describe a complete autonomous manipulation pipeline consisting of object detection, segmentation, point cloud processing, force-controlled manipulation, and symbolic (re)-planning. The design emphasizes versatility in terms of applications, manufacturability, use of commercial off-the-shelf parts, and open-source software. We validate the design by characterizing force control (achieving up to 32N, controllable in steps of 0.08N), force measurement, and two manipulation demonstrations: assembly of the Siemens gear assembly problem, and a sensor-based stacking task requiring replanning. These demonstrate robust execution of long sequences of sensor-based manipulation tasks, which makes the resulting platform a solid foundation for researchers in task-and-motion planning, educators, and quick prototyping of household, industrial and warehouse automation tasks.
AntiCopyPaster 2.0: Whitebox just-in-time code duplicates extraction
Authors: Authors: Eman Abdullah AlOmar, Benjamin Knobloch, Thomas Kain, Christopher Kalish, Mohamed Wiem Mkaouer, Ali Ouni
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.06035
Pdf link: https://arxiv.org/pdf/2402.06035
Abstract AntiCopyPaster is an IntelliJ IDEA plugin, implemented to detect and refactor duplicate code interactively as soon as a duplicate is introduced. The plugin only recommends the extraction of a duplicate when it is worth it. In contrast to current Extract Method refactoring approaches, our tool seamlessly integrates with the developer's workflow and actively provides recommendations for refactorings. This work extends our tool to allow developers to customize the detection rules, i.e., metrics, based on their needs and preferences. The plugin and its source code are publicly available on GitHub at https://github.com/refactorings/anti-copy-paster. The demonstration video can be found on YouTube: https://youtu.be/ Y1sbfpds2Ms.
Descriptive Kernel Convolution Network with Improved Random Walk Kernel
Authors: Authors: Meng-Chieh Lee, Lingxiao Zhao, Leman Akoglu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06087
Pdf link: https://arxiv.org/pdf/2402.06087
Abstract Graph kernels used to be the dominant approach to feature engineering for structured data, which are superseded by modern GNNs as the former lacks learnability. Recently, a suite of Kernel Convolution Networks (KCNs) successfully revitalized graph kernels by introducing learnability, which convolves input with learnable hidden graphs using a certain graph kernel. The random walk kernel (RWK) has been used as the default kernel in many KCNs, gaining increasing attention. In this paper, we first revisit the RWK and its current usage in KCNs, revealing several shortcomings of the existing designs, and propose an improved graph kernel RWK+, by introducing color-matching random walks and deriving its efficient computation. We then propose RWK+CN, a KCN that uses RWK+ as the core kernel to learn descriptive graph features with an unsupervised objective, which can not be achieved by GNNs. Further, by unrolling RWK+, we discover its connection with a regular GCN layer, and propose a novel GNN layer RWK+Conv. In the first part of experiments, we demonstrate the descriptive learning ability of RWK+CN with the improved random walk kernel RWK+ on unsupervised pattern mining tasks; in the second part, we show the effectiveness of RWK+ for a variety of KCN architectures and supervised graph learning tasks, and demonstrate the expressiveness of RWK+Conv layer, especially on the graph-level tasks. RWK+ and RWK+Conv adapt to various real-world applications, including web applications such as bot detection in a web-scale Twitter social network, and community classification in Reddit social interaction networks.
Veni, Vidi, Vici: Solving the Myriad of Challenges before Knowledge Graph Learning
Authors: Authors: Jeffrey Sardina, Luca Costabello, Christophe Guéret
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06098
Pdf link: https://arxiv.org/pdf/2402.06098
Abstract Knowledge Graphs (KGs) have become increasingly common for representing large-scale linked data. However, their immense size has required graph learning systems to assist humans in analysis, interpretation, and pattern detection. While there have been promising results for researcher- and clinician- empowerment through a variety of KG learning systems, we identify four key deficiencies in state-of-the-art graph learning that simultaneously limit KG learning performance and diminish the ability of humans to interface optimally with these learning systems. These deficiencies are: 1) lack of expert knowledge integration, 2) instability to node degree extremity in the KG, 3) lack of consideration for uncertainty and relevance while learning, and 4) lack of explainability. Furthermore, we characterise state-of-the-art attempts to solve each of these problems and note that each attempt has largely been isolated from attempts to solve the other problems. Through a formalisation of these problems and a review of the literature that addresses them, we adopt the position that not only are deficiencies in these four key areas holding back human-KG empowerment, but that the divide-and-conquer approach to solving these problems as individual units rather than a whole is a significant barrier to the interface between humans and KG learning systems. We propose that it is only through integrated, holistic solutions to the limitations of KG learning systems that human and KG learning co-empowerment will be efficiently affected. We finally present our "Veni, Vidi, Vici" framework that sets a roadmap for effectively and efficiently shifting to a holistic co-empowerment model in both the KG learning and the broader machine learning domain.
Multiple Instance Learning for Cheating Detection and Localization in Online Examinations
Authors: Authors: Yemeng Liu, Jing Ren, Jianshuo Xu, Xiaomei Bai, Roopdeep Kaur, Feng Xia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06107
Pdf link: https://arxiv.org/pdf/2402.06107
Abstract The spread of the Coronavirus disease-2019 epidemic has caused many courses and exams to be conducted online. The cheating behavior detection model in examination invigilation systems plays a pivotal role in guaranteeing the equality of long-distance examinations. However, cheating behavior is rare, and most researchers do not comprehensively take into account features such as head posture, gaze angle, body posture, and background information in the task of cheating behavior detection. In this paper, we develop and present CHEESE, a CHEating detection framework via multiplE inStancE learning. The framework consists of a label generator that implements weak supervision and a feature encoder to learn discriminative features. In addition, the framework combines body posture and background features extracted by 3D convolution with eye gaze, head posture and facial features captured by OpenFace 2.0. These features are fed into the spatio-temporal graph module by stitching to analyze the spatio-temporal changes in video clips to detect the cheating behaviors. Our experiments on three datasets, UCF-Crime, ShanghaiTech and Online Exam Proctoring (OEP), prove the effectiveness of our method as compared to the state-of-the-art approaches, and obtain the frame-level AUC score of 87.58% on the OEP dataset.
Learning Contrastive Feature Representations for Facial Action Unit Detection
Authors: Authors: Ziqiao Shang, Bin Liu, Fei Teng, Tianrui Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06165
Pdf link: https://arxiv.org/pdf/2402.06165
Abstract The predominant approach to facial action unit (AU) detection revolves around a supervised multi-label binary classification problem. Existing methodologies often encode pixel-level information of AUs, thereby imposing substantial demands on model complexity and expressiveness. Moreover, this practice elevates the susceptibility to overfitting due to the presence of noisy AU labels. In the present study, we introduce a contrastive learning framework enhanced by both supervised and self-supervised signals. The objective is to acquire discriminative features, deviating from the conventional pixel-level learning paradigm within the domain of AU detection. To address the challenge posed by noisy AU labels, we augment the supervised signal through the introduction of a self-supervised signal. This augmentation is achieved through positive sample sampling, encompassing three distinct types of positive sample pairs. Furthermore, to mitigate the imbalanced distribution of each AU type, we employ an importance re-weighting strategy tailored for minority AUs. The resulting loss, denoted as AUNCE, is proposed to encapsulate this strategy. Our experimental assessments, conducted on two widely-utilized benchmark datasets (BP4D and DISFA), underscore the superior performance of our approach compared to state-of-the-art methods in the realm of AU detection.
MLS2LoD3: Refining low LoDs building models with MLS point clouds to reconstruct semantic LoD3 building models
Authors: Authors: Olaf Wysocki, Ludwig Hoegner, Uwe Stilla
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.06288
Pdf link: https://arxiv.org/pdf/2402.06288
Abstract Although highly-detailed LoD3 building models reveal great potential in various applications, they have yet to be available. The primary challenges in creating such models concern not only automatic detection and reconstruction but also standard-consistent modeling. In this paper, we introduce a novel refinement strategy enabling LoD3 reconstruction by leveraging the ubiquity of lower LoD building models and the accuracy of MLS point clouds. Such a strategy promises at-scale LoD3 reconstruction and unlocks LoD3 applications, which we also describe and illustrate in this paper. Additionally, we present guidelines for reconstructing LoD3 facade elements and their embedding into the CityGML standard model, disseminating gained knowledge to academics and professionals. We believe that our method can foster development of LoD3 reconstruction algorithms and subsequently enable their wider adoption.
A New Approach to Voice Authenticity
Authors: Authors: Nicolas M. Müller, Piotr Kawa, Shen Hu, Matthias Neu, Jennifer Williams, Philip Sperl, Konstantin Böttinger
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2402.06304
Pdf link: https://arxiv.org/pdf/2402.06304
Abstract Voice faking, driven primarily by recent advances in text-to-speech (TTS) synthesis technology, poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can be considered genuine, while fake speech comes from TTS synthesis. We argue that this binary distinction is oversimplified. For instance, altered playback speeds can be used for malicious purposes, like in the 'Drunken Nancy Pelosi' incident. Similarly, editing of audio clips can be done ethically, e.g., for brevity or summarization in news reporting or podcasts, but editing can also create misleading narratives. In this paper, we propose a conceptual shift away from the binary paradigm of audio being either 'fake' or 'real'. Instead, our focus is on pinpointing 'voice edits', which encompass traditional modifications like filters and cuts, as well as TTS synthesis and VC systems. We delineate 6 categories and curate a new challenge dataset rooted in the M-AILABS corpus, for which we present baseline detection systems. And most importantly, we argue that merely categorizing audio as fake or real is a dangerous over-simplification that will fail to move the field of speech technology forward.
SWITCH: An Exemplar for Evaluating Self-Adaptive ML-Enabled Systems
Authors: Authors: Arya Marda, Shubham Kulkarni, Karthik Vaidhyanathan
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.06351
Pdf link: https://arxiv.org/pdf/2402.06351
Abstract Addressing runtime uncertainties in Machine Learning-Enabled Systems (MLS) is crucial for maintaining Quality of Service (QoS). The Machine Learning Model Balancer is a concept that addresses these uncertainties by facilitating dynamic ML model switching, showing promise in improving QoS in MLS. Leveraging this concept, this paper introduces SWITCH, an exemplar developed to enhance self-adaptive capabilities in such systems through dynamic model switching in runtime. SWITCH is designed as a comprehensive web service catering to a broad range of ML scenarios, with its implementation demonstrated through an object detection use case. SWITCH provides researchers with a flexible platform to apply and evaluate their ML model switching strategies, aiming to enhance QoS in MLS. SWITCH features advanced input handling, real-time data processing, and logging for adaptation metrics supplemented with an interactive real-time dashboard for enhancing system observability. This paper details SWITCH's architecture, self-adaptation strategies through ML model switching, and its empirical validation through a case study, illustrating its potential to improve QoS in MLS. By enabling a hands-on approach to explore adaptive behaviors in ML systems, SWITCH contributes a valuable tool to the SEAMS community for research into self-adaptive mechanisms for MLS and their practical applications.
CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention
Authors: Authors: Yifeng Bai, Zhirong Chen, Pengpeng Liang, Erkang Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.06423
Pdf link: https://arxiv.org/pdf/2402.06423
Abstract In autonomous driving, 3D lane detection using monocular cameras is an important task for various downstream planning and control tasks. Recent CNN and Transformer approaches usually apply a two-stage scheme in the model design. The first stage transforms the image feature from a front image into a bird's-eye-view (BEV) representation. Subsequently, a sub-network processes the BEV feature map to generate the 3D detection results. However, these approaches heavily rely on a challenging image feature transformation module from a perspective view to a BEV representation. In our work, we present CurveFormer++, a single-stage Transformer-based method that does not require the image feature view transform module and directly infers 3D lane detection results from the perspective image features. Specifically, our approach models the 3D detection task as a curve propagation problem, where each lane is represented by a curve query with a dynamic and ordered anchor point set. By employing a Transformer decoder, the model can iteratively refine the 3D lane detection results. A curve cross-attention module is introduced in the Transformer decoder to calculate similarities between image features and curve queries of lanes. To handle varying lane lengths, we employ context sampling and anchor point restriction techniques to compute more relevant image features for a curve query. Furthermore, we apply a temporal fusion module that incorporates selected informative sparse curve queries and their corresponding anchor point sets to leverage historical lane information. In the experiments, we evaluate our approach for the 3D lane detection task on two publicly available real-world datasets. The results demonstrate that our method provides outstanding performance compared with both CNN and Transformer based methods. We also conduct ablation studies to analyze the impact of each component in our approach.
Explaining Veracity Predictions with Evidence Summarization: A Multi-Task Model Approach
Authors: Authors: Recep Firat Cekinel, Pinar Karagoz
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.06443
Pdf link: https://arxiv.org/pdf/2402.06443
Abstract The rapid dissemination of misinformation through social media increased the importance of automated fact-checking. Furthermore, studies on what deep neural models pay attention to when making predictions have increased in recent years. While significant progress has been made in this field, it has not yet reached a level of reasoning comparable to human reasoning. To address these gaps, we propose a multi-task explainable neural model for misinformation detection. Specifically, this work formulates an explanation generation process of the model's veracity prediction as a text summarization problem. Additionally, the performance of the proposed model is discussed on publicly available datasets and the findings are evaluated with related studies.
Refining Myocardial Infarction Detection: A Novel Multi-Modal Composite Kernel Strategy in One-Class Classification
Authors: Authors: Muhammad Uzair Zahid, Aysen Degerli, Fahad Sohrab, Serkan Kiranyaz, Moncef Gabbouj
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2402.06530
Pdf link: https://arxiv.org/pdf/2402.06530
Abstract Early detection of myocardial infarction (MI), a critical condition arising from coronary artery disease (CAD), is vital to prevent further myocardial damage. This study introduces a novel method for early MI detection using a one-class classification (OCC) algorithm in echocardiography. Our study overcomes the challenge of limited echocardiography data availability by adopting a novel approach based on Multi-modal Subspace Support Vector Data Description. The proposed technique involves a specialized MI detection framework employing multi-view echocardiography incorporating a composite kernel in the non-linear projection trick, fusing Gaussian and Laplacian sigmoid functions. Additionally, we enhance the update strategy of the projection matrices by adapting maximization for both or one of the modalities in the optimization process. Our method boosts MI detection capability by efficiently transforming features extracted from echocardiography data into an optimized lower-dimensional subspace. The OCC model trained specifically on target class instances from the comprehensive HMC-QU dataset that includes multiple echocardiography views indicates a marked improvement in MI detection accuracy. Our findings reveal that our proposed multi-view approach achieves a geometric mean of 71.24\%, signifying a substantial advancement in echocardiography-based MI diagnosis and offering more precise and efficient diagnostic tools.
Feature Density Estimation for Out-of-Distribution Detection via Normalizing Flows
Authors: Authors: Evan D. Cook, Marc-Antoine Lavoie, Steven L. Waslander
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.06537
Pdf link: https://arxiv.org/pdf/2402.06537
Abstract Out-of-distribution (OOD) detection is a critical task for safe deployment of learning systems in the open world setting. In this work, we investigate the use of feature density estimation via normalizing flows for OOD detection and present a fully unsupervised approach which requires no exposure to OOD data, avoiding researcher bias in OOD sample selection. This is a post-hoc method which can be applied to any pretrained model, and involves training a lightweight auxiliary normalizing flow model to perform the out-of-distribution detection via density thresholding. Experiments on OOD detection in image classification show strong results for far-OOD data detection with only a single epoch of flow training, including 98.2% AUROC for ImageNet-1k vs. Textures, which exceeds the state of the art by 7.8%. We additionally explore the connection between the feature space distribution of the pretrained model and the performance of our method. Finally, we provide insights into training pitfalls that have plagued normalizing flows for use in OOD detection.
Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA
Authors: Authors: Marek Šuppa, Daniel Skala, Daniela Jašš, Samuel Sučík, Andrej Švec, Peter Hraška
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.06549
Pdf link: https://arxiv.org/pdf/2402.06549
Abstract This study details our approach for the CASE 2024 Shared Task on Climate Activism Stance and Hate Event Detection, focusing on Hate Speech Detection, Hate Speech Target Identification, and Stance Detection as classification challenges. We explored the capability of Large Language Models (LLMs), particularly GPT-4, in zero- or few-shot settings enhanced by retrieval augmentation and re-ranking for Tweet classification. Our goal was to determine if LLMs could match or surpass traditional methods in this context. We conducted an ablation study with LLaMA for comparison, and our results indicate that our models significantly outperformed the baselines, securing second place in the Target Detection task. The code for our submission is available at https://github.com/NaiveNeuron/bryndza-case-2024
Keyword: face recognition

There is no result

Keyword: augmentation

Phase-driven Domain Generalizable Learning for Nonstationary Time Series
Authors: Authors: Payal Mohapatra, Lixu Wang, Qi Zhu
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2402.05960
Pdf link: https://arxiv.org/pdf/2402.05960
Abstract Monitoring and recognizing patterns in continuous sensing data is crucial for many practical applications. These real-world time-series data are often nonstationary, characterized by varying statistical and spectral properties over time. This poses a significant challenge in developing learning models that can effectively generalize across different distributions. In this work, based on our observation that nonstationary statistics are intrinsically linked to the phase information, we propose a time-series learning framework, PhASER. It consists of three novel elements: 1) phase augmentation that diversifies non-stationarity while preserving discriminatory semantics, 2) separate feature encoding by viewing time-varying magnitude and phase as independent modalities, and 3) feature broadcasting by incorporating phase with a novel residual connection for inherent regularization to enhance distribution invariant learning. Upon extensive evaluation on 5 datasets from human activity recognition, sleep-stage classification, and gesture recognition against 10 state-of-the-art baseline methods, we demonstrate that PhASER consistently outperforms the best baselines by an average of 5% and up to 13% in some cases. Moreover, PhASER's principles can be applied broadly to boost the generalization ability of existing time series classification models.
Learning Contrastive Feature Representations for Facial Action Unit Detection
Authors: Authors: Ziqiao Shang, Bin Liu, Fei Teng, Tianrui Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06165
Pdf link: https://arxiv.org/pdf/2402.06165
Abstract The predominant approach to facial action unit (AU) detection revolves around a supervised multi-label binary classification problem. Existing methodologies often encode pixel-level information of AUs, thereby imposing substantial demands on model complexity and expressiveness. Moreover, this practice elevates the susceptibility to overfitting due to the presence of noisy AU labels. In the present study, we introduce a contrastive learning framework enhanced by both supervised and self-supervised signals. The objective is to acquire discriminative features, deviating from the conventional pixel-level learning paradigm within the domain of AU detection. To address the challenge posed by noisy AU labels, we augment the supervised signal through the introduction of a self-supervised signal. This augmentation is achieved through positive sample sampling, encompassing three distinct types of positive sample pairs. Furthermore, to mitigate the imbalanced distribution of each AU type, we employ an importance re-weighting strategy tailored for minority AUs. The resulting loss, denoted as AUNCE, is proposed to encapsulate this strategy. Our experimental assessments, conducted on two widely-utilized benchmark datasets (BP4D and DISFA), underscore the superior performance of our approach compared to state-of-the-art methods in the realm of AU detection.
Pushing Boundaries: Mixup's Influence on Neural Collapse
Authors: Authors: Quinn Fisher, Haoming Meng, Vardan Papyan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06171
Pdf link: https://arxiv.org/pdf/2402.06171
Abstract Mixup is a data augmentation strategy that employs convex combinations of training instances and their respective labels to augment the robustness and calibration of deep neural networks. Despite its widespread adoption, the nuanced mechanisms that underpin its success are not entirely understood. The observed phenomenon of Neural Collapse, where the last-layer activations and classifier of deep networks converge to a simplex equiangular tight frame (ETF), provides a compelling motivation to explore whether mixup induces alternative geometric configurations and whether those could explain its success. In this study, we delve into the last-layer activations of training data for deep networks subjected to mixup, aiming to uncover insights into its operational efficacy. Our investigation, spanning various architectures and dataset pairs, reveals that mixup's last-layer activations predominantly converge to a distinctive configuration different than one might expect. In this configuration, activations from mixed-up examples of identical classes align with the classifier, while those from different classes delineate channels along the decision boundary. Moreover, activations in earlier layers exhibit patterns, as if trained with manifold mixup. These findings are unexpected, as mixed-up features are not simple convex combinations of feature class means (as one might get, for example, by training mixup with the mean squared error loss). By analyzing this distinctive geometric configuration, we elucidate the mechanisms by which mixup enhances model calibration. To further validate our empirical observations, we conduct a theoretical analysis under the assumption of an unconstrained features model, utilizing the mixup loss. Through this, we characterize and derive the optimal last-layer features under the assumption that the classifier forms a simplex ETF.
Retrieve, Merge, Predict: Augmenting Tables with Data Lakes
Authors: Authors: Riccardo Cappuzzo (1), Gael Varoquaux (1), Aimee Coelho (2), Paolo Papotti (3) ((1) SODA Team - Inria Saclay, (2) Dataiku, (3) EURECOM)
Subjects: Databases (cs.DB); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06282
Pdf link: https://arxiv.org/pdf/2402.06282
Abstract We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks. We analyze alternative methods used in the three main steps: retrieving joinable tables, merging information, and predicting with the resultant table. As data lakes, the paper uses YADL (Yet Another Data Lake) -- a novel dataset we developed as a tool for benchmarking this data discovery task -- and Open Data US, a well-referenced real data lake. Through systematic exploration on both lakes, our study outlines the importance of accurately retrieving join candidates and the efficiency of simple merging methods. We report new insights on the benefits of existing solutions and on their limitations, aiming at guiding future research in this space.
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Authors: Authors: Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.06332
Pdf link: https://arxiv.org/pdf/2402.06332
Abstract The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at \url{https://github.com/InternLM/InternLM-Math}.
ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs
Authors: Authors: Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.06334
Pdf link: https://arxiv.org/pdf/2402.06334
Abstract ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels. The method addresses the challenge of limited labeled examples, leading to improvements in the effectiveness of IR models. However, the initial results were based on proprietary language models such as GPT-3.5, which posed constraints on dataset size due to its cost and data privacy. In this paper, we introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations. The method has been tested using different LLMs and datasets sizes to better comprehend the effective contribution of data augmentation. Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases. Notably, the data augmentation method proves advantageous even with large datasets, as evidenced by ExaRanker surpassing the target baseline by 0.6 nDCG@10 points in our study. To encourage further advancements by the research community, we have open-sourced both the code and datasets at https://github.com/unicamp-dl/ExaRanker.
Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA
Authors: Authors: Marek Šuppa, Daniel Skala, Daniela Jašš, Samuel Sučík, Andrej Švec, Peter Hraška
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.06549
Pdf link: https://arxiv.org/pdf/2402.06549
Abstract This study details our approach for the CASE 2024 Shared Task on Climate Activism Stance and Hate Event Detection, focusing on Hate Speech Detection, Hate Speech Target Identification, and Stance Detection as classification challenges. We explored the capability of Large Language Models (LLMs), particularly GPT-4, in zero- or few-shot settings enhanced by retrieval augmentation and re-ranking for Tweet classification. Our goal was to determine if LLMs could match or surpass traditional methods in this context. We conducted an ablation study with LLaMA for comparison, and our results indicate that our models significantly outperformed the baselines, securing second place in the Target Detection task. The code for our submission is available at https://github.com/NaiveNeuron/bryndza-case-2024

LeeKyungwook / get-arxiv-noti

New submissions for Mon, 12 Feb 24 #973

Keyword: detection

Uncertainty Awareness of Large Language Models Under Code Distribution Shifts: A Benchmark Study

A hybrid IndRNNLSTM approach for real-time anomaly detection in software-defined networks

Tool wear monitoring using an online, automatic and low cost system based on local texture

Antenna system for trilateral drone precise vertical landing

A versatile robotic hand with 3D perception, force sensing for autonomous manipulation

AntiCopyPaster 2.0: Whitebox just-in-time code duplicates extraction

Descriptive Kernel Convolution Network with Improved Random Walk Kernel

Veni, Vidi, Vici: Solving the Myriad of Challenges before Knowledge Graph Learning

Multiple Instance Learning for Cheating Detection and Localization in Online Examinations

Learning Contrastive Feature Representations for Facial Action Unit Detection

MLS2LoD3: Refining low LoDs building models with MLS point clouds to reconstruct semantic LoD3 building models

A New Approach to Voice Authenticity

SWITCH: An Exemplar for Evaluating Self-Adaptive ML-Enabled Systems

CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention

Explaining Veracity Predictions with Evidence Summarization: A Multi-Task Model Approach

Refining Myocardial Infarction Detection: A Novel Multi-Modal Composite Kernel Strategy in One-Class Classification

Feature Density Estimation for Out-of-Distribution Detection via Normalizing Flows

Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA

Keyword: face recognition

Keyword: augmentation

Phase-driven Domain Generalizable Learning for Nonstationary Time Series

Learning Contrastive Feature Representations for Facial Action Unit Detection

Pushing Boundaries: Mixup's Influence on Neural Collapse

Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs

Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA