New submissions for Wed, 14 Feb 24

Keyword: detection

LLMs Among Us: Generative AI Participating in Digital Discourse

Authors: Authors: Kristina Radivojevic, Nicholas Clark, Paul Brenner
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2402.07940
Pdf link: https://arxiv.org/pdf/2402.07940
Abstract The emergence of Large Language Models (LLMs) has great potential to reshape the landscape of many social media platforms. While this can bring promising opportunities, it also raises many threats, such as biases and privacy concerns, and may contribute to the spread of propaganda by malicious actors. We developed the "LLMs Among Us" experimental framework on top of the Mastodon social media platform for bot and human participants to communicate without knowing the ratio or nature of bot and human participants. We built 10 personas with three different LLMs, GPT-4, LLama 2 Chat, and Claude. We conducted three rounds of the experiment and surveyed participants after each round to measure the ability of LLMs to pose as human participants without human detection. We found that participants correctly identified the nature of other users in the experiment only 42% of the time despite knowing the presence of both bots and humans. We also found that the choice of persona had substantially more impact on human perception than the choice of mainstream LLMs.
Dumviri: Detecting Trackers and Mixed Trackers with a Breakage Detector
Authors: Authors: He Shuang, Lianying Zhao, David Lie
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.08031
Pdf link: https://arxiv.org/pdf/2402.08031
Abstract Previous automatic tracker detection work lacks features to recognize web page breakage and often resort to manual analysis to assess the breakage caused by blocking trackers. We introduce Dumviri, which incorporates a breakage detector that can automatically detect web page breakage caused by erroneously blocking a resource that is needed by the page to function properly. This addition allows Dumviri to prevent functional resources from being misclassified as trackers and increases overall detection accuracy. We designed Dumviri to take differential features. We further find that these features are agnostic to analysis granularity and enable Dumviri to predict tracking resources at the request field granularity, allowing Dumviri to handle some mixed trackers. Evaluating Dumviri on 15K pages shows its ability to replicate the labels of human-generated filter lists with an accuracy of 97.44%. Through a manual analysis, we found that Dumviri identified previously unreported trackers and its breakage detector can identify rules that cause web page breakage in commonly used filter lists like EasyPrivacy. In the case of mixed trackers, Dumviri, being the first automated mixed tracker detector, achieves a 79.09% accuracy. We have confirmed 22 previously unreported unique trackers and 26 unique mixed trackers. We promptly reported these findings to privacy developers, and we will publish our filter lists in uBlock Origin's extended syntax.
Out-of-Distribution Detection and Data Drift Monitoring using Statistical Process Control
Authors: Authors: Ghada Zamzmi, Kesavan Venkatesh, Brandon Nelson, Smriti Prathapan, Paul H. Yi, Berkman Sahiner, Jana G. Delfino
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2402.08088
Pdf link: https://arxiv.org/pdf/2402.08088
Abstract Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety. Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift monitoring. SPC is advantageous as it visually and statistically highlights deviations from the expected distribution. To demonstrate the utility of the proposed framework for monitoring data drift in radiological images, we investigated different design choices, including methods for extracting feature representations, drift quantification, and SPC parameter selection. Results: We demonstrate the effectiveness of our framework for two tasks: 1) differentiating axial vs. non-axial computed tomography (CT) images and 2) separating chest x-ray (CXR) from other modalities. For both tasks, we achieved high accuracy in detecting OOD inputs, with 0.913 in CT and 0.995 in CXR, and sensitivity of 0.980 in CT and 0.984 in CXR. Our framework was also adept at monitoring data streams and identifying the time a drift occurred. In a simulation with 100 daily CXR cases, we detected a drift in OOD input percentage from 0-1% to 3-5% within two days, maintaining a low false-positive rate. Through additional experimental results, we demonstrate the framework's data-agnostic nature and independence from the underlying model's structure. Conclusion: We propose a framework for OOD detection and drift monitoring that is agnostic to data, modality, and model. The framework is customizable and can be adapted for specific applications.
CMA-R:Causal Mediation Analysis for Explaining Rumour Detection
Authors: Authors: Lin Tian, Xiuzhen Zhang, Jey Han Lau
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.08155
Pdf link: https://arxiv.org/pdf/2402.08155
Abstract We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter. Interventions at the input and network level reveal the causal impacts of tweets and words in the model output. We find that our approach CMA-R -- Causal Mediation Analysis for Rumour detection -- identifies salient tweets that explain model predictions and show strong agreement with human judgements for critical tweets determining the truthfulness of stories. CMA-R can further highlight causally impactful words in the salient tweets, providing another layer of interpretability and transparency into these blackbox rumour detection systems. Code is available at: https://github.com/ltian678/cma-r.
Monolithic Silicon-Photonics Linear-Algebra Accelerators Enabling Next-Gen Massive MIMO
Authors: Authors: Tzu-Chien Hsueh, Yeshaiahu Fainman, Bill Lin
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP); Optics (physics.optics)
Arxiv link: https://arxiv.org/abs/2402.08192
Pdf link: https://arxiv.org/pdf/2402.08192
Abstract A system-on-chip (SoC) photonic-electronic linear-algebra accelerator with the features of wavelength-division-multiplexing (WDM) based broadband photodetections and high-dimensional matrix-inversion operations fabricated in advanced monolithic silicon-photonics (M-SiPh) semiconductor process technology is proposed to achieve substantial leaps in computation density and energy efficiency, including realistic considerations of energy/area overhead due to electronic/photonic on-chip conversions, integrations, and calibrations through holistic co-design methodologies to support linear-detection based massive multiple-input multiple-output (MIMO) decoding technology requiring the inversion of channel matrices and other emergent applications limited by linear-algebra computation capacities.
Confronting Discrimination in Classification: Smote Based on Marginalized Minorities in the Kernel Space for Imbalanced Data
Authors: Authors: Lingyun Zhong
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.08202
Pdf link: https://arxiv.org/pdf/2402.08202
Abstract Financial fraud detection poses a typical challenge characterized by class imbalance, where instances of fraud are extremely rare but can lead to unpredictable economic losses if misidentified. Precisely classifying these critical minority samples represents a challenging task within the classification. The primary difficulty arises from mainstream classifiers, which often exhibit "implicit discrimination" against minority samples in evaluation metrics, which results in frequent misclassifications, and the key to the problem lies in the overlap of feature spaces between majority and minority samples. To address these challenges, oversampling is a feasible solution, yet current classical oversampling methods often lack the necessary caution in sample selection, exacerbating feature space overlap. In response, we propose a novel classification oversampling approach based on the decision boundary and sample proximity relationships. This method carefully considers the distance between critical samples and the decision hyperplane, as well as the density of surrounding samples, resulting in an adaptive oversampling strategy in the kernel space. Finally, we test the proposed method on a classic financial fraud dataset, and the results show that our proposed method provides an effective and robust solution that can improve the classification accuracy of minorities.
APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks
Authors: Authors: Barathi Subramanian, Rathinaraja Jeyaraj, Rakhmonov Akhrorjon Akhmadjon Ugli, Jeonghong Kim
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2402.08244
Pdf link: https://arxiv.org/pdf/2402.08244
Abstract Activation function is a pivotal component of deep learning, facilitating the extraction of intricate data patterns. While classical activation functions like ReLU and its variants are extensively utilized, their static nature and simplicity, despite being advantageous, often limit their effectiveness in specialized tasks. The trainable activation functions also struggle sometimes to adapt to the unique characteristics of the data. Addressing these limitations, we introduce a novel trainable activation function, adaptive piecewise approximated activation linear unit (APALU), to enhance the learning performance of deep learning across a broad range of tasks. It presents a unique set of features that enable it to maintain stability and efficiency in the learning process while adapting to complex data representations. Experiments reveal significant improvements over widely used activation functions for different tasks. In image classification, APALU increases MobileNet and GoogleNet accuracy by 0.37% and 0.04%, respectively, on the CIFAR10 dataset. In anomaly detection, it improves the average area under the curve of One-CLASS Deep SVDD by 0.8% on the MNIST dataset, 1.81% and 1.11% improvements with DifferNet, and knowledge distillation, respectively, on the MVTech dataset. Notably, APALU achieves 100% accuracy on a sign language recognition task with a limited dataset. For regression tasks, APALU enhances the performance of deep neural networks and recurrent neural networks on different datasets. These improvements highlight the robustness and adaptability of APALU across diverse deep-learning applications.
Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles
Authors: Authors: Minh Dang Tu, Kieu Trang Le, Manh Duong Phung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.08251
Pdf link: https://arxiv.org/pdf/2402.08251
Abstract This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the use of a sliding window and a transformer to increase the information fed into the prediction head. The prediction head carries out the detection by evaluating feature maps with the Sigmoid function. The use of transformers with attention and sliding windows increases recognition accuracy while keeping the model at a reasonable number of parameters and computation requirements for embedded systems. Experiments conducted on public dataset VEDAI and our collected datasets show that our model has a higher accuracy than state-of-the-art methods such as ResNet, Faster RCNN, ComNet, ViT, YOLOv5, SMPNet, and DPNetV3. Experiments on the embedded computer Jetson AGX show that our model achieves a real-time computation speed with a stability rate of over 90%.
Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss
Authors: Authors: Kei Iino, Shunsuke Akamatsu, Hiroshi Watanabe, Shohei Enomoto, Akira Sakamoto, Takeharu Eda
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.08267
Pdf link: https://arxiv.org/pdf/2402.08267
Abstract Image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM; optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method.
Prompted Contextual Vectors for Spear-Phishing Detection
Authors: Authors: Daniel Nahmias, Gal Engelberg, Dan Klein, Asaf Shabtai
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.08309
Pdf link: https://arxiv.org/pdf/2402.08309
Abstract Spear-phishing attacks present a significant security challenge, with large language models (LLMs) escalating the threat by generating convincing emails and facilitating target reconnaissance. To address this, we propose a detection approach based on a novel document vectorization method that utilizes an ensemble of LLMs to create representation vectors. By prompting LLMs to reason and respond to human-crafted questions, we quantify the presence of common persuasion principles in the email's content, producing prompted contextual document vectors for a downstream supervised machine learning model. We evaluate our method using a unique dataset generated by a proprietary system that automates target reconnaissance and spear-phishing email creation. Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails, with the training set comprising only traditional phishing and benign emails. Key contributions include an innovative document vectorization method utilizing LLM reasoning, a publicly available dataset of high-quality spear-phishing emails, and the demonstrated effectiveness of our method in detecting such emails. This methodology can be utilized for various document classification tasks, particularly in adversarial problem domains.
LOSS-GAT: Label Propagation and One-Class Semi-Supervised Graph Attention Network for Fake News Detection
Authors: Authors: Batool Lakzaei, Mostafa Haghir Chehreghani, Alireza Bagheri
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2402.08401
Pdf link: https://arxiv.org/pdf/2402.08401
Abstract In the era of widespread social networks, the rapid dissemination of fake news has emerged as a significant threat, inflicting detrimental consequences across various dimensions of people's lives. Machine learning and deep learning approaches have been extensively employed for identifying fake news. However, a significant challenge in identifying fake news is the limited availability of labeled news datasets. Therefore, the One-Class Learning (OCL) approach, utilizing only a small set of labeled data from the interest class, can be a suitable approach to address this challenge. On the other hand, representing data as a graph enables access to diverse content and structural information, and label propagation methods on graphs can be effective in predicting node labels. In this paper, we adopt a graph-based model for data representation and introduce a semi-supervised and one-class approach for fake news detection, called LOSS-GAT. Initially, we employ a two-step label propagation algorithm, utilizing Graph Neural Networks (GNNs) as an initial classifier to categorize news into two groups: interest (fake) and non-interest (real). Subsequently, we enhance the graph structure using structural augmentation techniques. Ultimately, we predict the final labels for all unlabeled data using a GNN that induces randomness within the local neighborhood of nodes through the aggregation function. We evaluate our proposed method on five common datasets and compare the results against a set of baseline models, including both OCL and binary labeled models. The results demonstrate that LOSS-GAT achieves a notable improvement, surpassing 10%, with the advantage of utilizing only a limited set of labeled fake news. Noteworthy, LOSS-GAT even outperforms binary labeled models.
Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection
Authors: Authors: Colin Decourt, Rufin VanRullen, Didier Salle, Thomas Oberlin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.08427
Pdf link: https://arxiv.org/pdf/2402.08427
Abstract In recent years, driven by the need for safer and more autonomous transport systems, the automotive industry has shifted toward integrating a growing number of Advanced Driver Assistance Systems (ADAS). Among the array of sensors employed for object recognition tasks, radar sensors have emerged as a formidable contender due to their abilities in adverse weather conditions or low-light scenarios and their robustness in maintaining consistent performance across diverse environments. However, the small size of radar datasets and the complexity of the labelling of those data limit the performance of radar object detectors. Driven by the promising results of self-supervised learning in computer vision, this paper presents RiCL, an instance contrastive learning framework to pre-train radar object detectors. We propose to exploit the detection from the radar and the temporal information to pre-train the radar object detection model in a self-supervised way using contrastive learning. We aim to pre-train an object detector's backbone, head and neck to learn with fewer data. Experiments on the CARRADA and the RADDet datasets show the effectiveness of our approach in learning generic representations of objects in range-Doppler maps. Notably, our pre-training strategy allows us to use only 20% of the labelled data to reach a similar mAP@0.5 than a supervised approach using the whole training set.
ROSpace: Intrusion Detection Dataset for a ROS2-Based Cyber-Physical System
Authors: Authors: Tommaso Puccetti, Simone Nardi, Cosimo Cinquilli, Tommaso Zoppi, Andrea Ceccarelli
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.08468
Pdf link: https://arxiv.org/pdf/2402.08468
Abstract Most of the intrusion detection datasets to research machine learning-based intrusion detection systems (IDSs) are devoted to cyber-only systems, and they typically collect data from one architectural layer. Additionally, often the attacks are generated in dedicated attack sessions, without reproducing the realistic alternation and overlap of normal and attack actions. We present a dataset for intrusion detection by performing penetration testing on an embedded cyber-physical system built over Robot Operating System 2 (ROS2). Features are monitored from three architectural layers: the Linux operating system, the network, and the ROS2 services. The dataset is structured as a time series and describes the expected behavior of the system and its response to ROS2-specific attacks: it repeatedly alternates periods of attack-free operation with periods when a specific attack is being performed. Noteworthy, this allows measuring the time to detect an attacker and the number of malicious activities performed before detection. Also, it allows training an intrusion detector to minimize both, by taking advantage of the numerous alternating periods of normal and attack operations.
Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning
Authors: Authors: Mingyang Li, Hongyu Liu, Yixuan Li, Zejun Wang, Yuan Yuan, Honglin Dai
Subjects: Machine Learning (cs.LG); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2402.08539
Pdf link: https://arxiv.org/pdf/2402.08539
Abstract This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data resources. Through Spearman correlation coefficient analysis, we identify some features strongly correlated with AD diagnosis. We build and test three machine learning models using these features: random forest, XGBoost, and support vector machine (SVM). Among them, the XGBoost model performs the best in terms of diagnostic performance, achieving an accuracy of 91%. Overall, this study successfully overcomes the challenge of missing data and provides valuable insights into early detection of Alzheimer's disease, demonstrating its unique research value and practical significance.
Graph Feature Preprocessor: Real-time Extraction of Subgraph-based Features from Transaction Graphs
Authors: Authors: Jovan Blanuša, Maximo Cravero Baraja, Andreea Anghel, Luc von Niederhäusern, Erik Altman, Haris Pozidis, Kubilay Atasu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.08593
Pdf link: https://arxiv.org/pdf/2402.08593
Abstract In this paper, we present "Graph Feature Preprocessor", a software library for detecting typical money laundering and fraud patterns in financial transaction graphs in real time. These patterns are used to produce a rich set of transaction features for downstream machine learning training and inference tasks such as money laundering detection. We show that our enriched transaction features dramatically improve the prediction accuracy of gradient-boosting-based machine learning models. Our library exploits multicore parallelism, maintains a dynamic in-memory graph, and efficiently mines subgraph patterns in the incoming transaction stream, which enables it to be operated in a streaming manner. We evaluate our library using highly-imbalanced synthetic anti-money laundering (AML) and real-life Ethereum phishing datasets. In these datasets, the proportion of illicit transactions is very small, which makes the learning process challenging. Our solution, which combines our Graph Feature Preprocessor and gradient-boosting-based machine learning models, is able to detect these illicit transactions with higher minority-class F1 scores than standard graph neural networks. In addition, the end-to-end throughput rate of our solution executed on a multicore CPU outperforms the graph neural network baselines executed on a powerful V100 GPU. Overall, the combination of high accuracy, a high throughput rate, and low latency of our solution demonstrates the practical value of our library in real-world applications. Graph Feature Preprocessor has been integrated into IBM mainframe software products, namely "IBM Cloud Pak for Data on Z" and "AI Toolkit for IBM Z and LinuxONE".
A Cost-Sensitive Transformer Model for Prognostics Under Highly Imbalanced Industrial Data
Authors: Authors: Ali Beikmohammadi, Mohammad Hosein Hamian, Neda Khoeyniha, Tony Lindgren, Olof Steinert, Sindri Magnússon
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.08611
Pdf link: https://arxiv.org/pdf/2402.08611
Abstract The rapid influx of data-driven models into the industrial sector has been facilitated by the proliferation of sensor technology, enabling the collection of vast quantities of data. However, leveraging these models for failure detection and prognosis poses significant challenges, including issues like missing values and class imbalances. Moreover, the cost sensitivity associated with industrial operations further complicates the application of conventional models in this context. This paper introduces a novel cost-sensitive transformer model developed as part of a systematic workflow, which also integrates a hybrid resampler and a regression-based imputer. After subjecting our approach to rigorous testing using the APS failure dataset from Scania trucks and the SECOM dataset, we observed a substantial enhancement in performance compared to state-of-the-art methods. Moreover, we conduct an ablation study to analyze the contributions of different components in our proposed method. Our findings highlight the potential of our method in addressing the unique challenges of failure prediction in industrial settings, thereby contributing to enhanced reliability and efficiency in industrial operations.
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Authors: Authors: Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.08657
Pdf link: https://arxiv.org/pdf/2402.08657
Abstract Vision-Language Models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems. Nevertheless, these models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions without explicit spatial grounding. While it is possible to construct custom, supervised training pipelines with bounding box annotations that integrate with VLMs, these result in specialized and hard-to-scale models. In this paper, we aim to explore the limits of caption-based VLMs and instead propose to tackle the challenge in a simpler manner by i) keeping the weights of a caption-based VLM frozen and ii) not using any supervised detection data. To this end, we introduce an input-agnostic Positional Insert (PIN), a learnable spatial prompt, containing a minimal set of parameters that are slid inside the frozen VLM, unlocking object localisation capabilities. Our PIN module is trained with a simple next-token prediction task on synthetic data without requiring the introduction of new output heads. Our experiments demonstrate strong zero-shot localisation performances on a variety of images, including Pascal VOC, COCO, LVIS, and diverse images like paintings or cartoons.
Keyword: face recognition

There is no result

Keyword: augmentation

Advancing Data-driven Weather Forecasting: Time-Sliding Data Augmentation of ERA5
Authors: Authors: Minjong Cheon, Daehyun Kang, Yo-Hwan Choi, Seon-Yu Kang
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2402.08185
Pdf link: https://arxiv.org/pdf/2402.08185
Abstract Modern deep learning techniques, which mimic traditional numerical weather prediction (NWP) models and are derived from global atmospheric reanalysis data, have caused a significant revolution within a few years. In this new paradigm, our research introduces a novel strategy that deviates from the common dependence on high-resolution data, which is often constrained by computational resources, and instead utilizes low-resolution data (2.5 degrees) for global weather prediction and climate data analysis. Our main focus is evaluating data-driven weather prediction (DDWP) frameworks, specifically addressing sample size adequacy, structural improvements to the model, and the ability of climate data to represent current climatic trends. By using the Adaptive Fourier Neural Operator (AFNO) model via FourCastNet and a proposed time-sliding method to inflate the dataset of the ECMWF Reanalysis v5 (ERA5), this paper improves on conventional approaches by adding more variables and a novel approach to data augmentation and processing. Our findings reveal that despite the lower resolution, the proposed approach demonstrates considerable accuracy in predicting atmospheric conditions, effectively rivaling higher-resolution models. Furthermore, the study confirms the model's proficiency in reflecting current climate trends and its potential in predicting future climatic events, underscoring its utility in climate change strategies. This research marks a pivotal step in the realm of meteorological forecasting, showcasing the feasibility of lower-resolution data in producing reliable predictions and opening avenues for more accessible and inclusive climate modeling. The insights gleaned from this study not only contribute to the advancement of climate science but also lay the groundwork for future innovations in the field.
MetaTra: Meta-Learning for Generalized Trajectory Prediction in Unseen Domain
Authors: Authors: Xiaohe Li, Feilong Huang, Zide Fan, Fangli Mou, Yingyan Hou, Chen Qian, Lijie Wen
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.08221
Pdf link: https://arxiv.org/pdf/2402.08221
Abstract Trajectory prediction has garnered widespread attention in different fields, such as autonomous driving and robotic navigation. However, due to the significant variations in trajectory patterns across different scenarios, models trained in known environments often falter in unseen ones. To learn a generalized model that can directly handle unseen domains without requiring any model updating, we propose a novel meta-learning-based trajectory prediction method called MetaTra. This approach incorporates a Dual Trajectory Transformer (Dual-TT), which enables a thorough exploration of the individual intention and the interactions within group motion patterns in diverse scenarios. Building on this, we propose a meta-learning framework to simulate the generalization process between source and target domains. Furthermore, to enhance the stability of our prediction outcomes, we propose a Serial and Parallel Training (SPT) strategy along with a feature augmentation method named MetaMix. Experimental results on several real-world datasets confirm that MetaTra not only surpasses other state-of-the-art methods but also exhibits plug-and-play capabilities, particularly in the realm of domain generalization.
Improving Black-box Robustness with In-Context Rewriting
Authors: Authors: Kyle O'Brien, Nathan Ng, Isha Puri, Jorge Mendez, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi, Thomas Hartvigsen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.08225
Pdf link: https://arxiv.org/pdf/2402.08225
Abstract Machine learning models often excel on in-distribution (ID) data but struggle with unseen out-of-distribution (OOD) inputs. Most techniques for improving OOD robustness are not applicable to settings where the model is effectively a black box, such as when the weights are frozen, retraining is costly, or the model is leveraged via an API. Test-time augmentation (TTA) is a simple post-hoc technique for improving robustness that sidesteps black-box constraints by aggregating predictions across multiple augmentations of the test input. TTA has seen limited use in NLP due to the challenge of generating effective natural language augmentations. In this work, we propose LLM-TTA, which uses LLM-generated augmentations as TTA's augmentation function. LLM-TTA outperforms conventional augmentation functions across sentiment, toxicity, and news classification tasks for BERT and T5 models, with BERT's OOD robustness improving by an average of 4.30 percentage points without regressing average ID performance. We explore selectively augmenting inputs based on prediction entropy to reduce the rate of expensive LLM augmentations, allowing us to maintain performance gains while reducing the average number of generated augmentations by 57.76%. LLM-TTA is agnostic to the task model architecture, does not require OOD labels, and is effective across low and high-resource settings. We share our data, models, and code for reproducibility.
Distal Interference: Exploring the Limits of Model-Based Continual Learning
Authors: Authors: Heinrich van Deventer, Anna Sergeevna Bosman
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2402.08255
Pdf link: https://arxiv.org/pdf/2402.08255
Abstract Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new tasks are learned. Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference. This study analyses how gradient descent and overlapping representations between distant input points lead to distal interference and catastrophic interference. Distal interference refers to the phenomenon where training a model on a subset of the domain leads to non-local changes on other subsets of the domain. This study shows that uniformly trainable models without distal interference must be exponentially large. A novel antisymmetric bounded exponential layer B-spline ANN architecture named ABEL-Spline is proposed that can approximate any continuous function, is uniformly trainable, has polynomial computational complexity, and provides some guarantees for distal interference. Experiments are presented to demonstrate the theoretical properties of ABEL-Splines. ABEL-Splines are also evaluated on benchmark regression problems. It is concluded that the weaker distal interference guarantees in ABEL-Splines are insufficient for model-only continual learning. It is conjectured that continual learning with polynomial complexity models requires augmentation of the training data or algorithm.
LOSS-GAT: Label Propagation and One-Class Semi-Supervised Graph Attention Network for Fake News Detection
Authors: Authors: Batool Lakzaei, Mostafa Haghir Chehreghani, Alireza Bagheri
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2402.08401
Pdf link: https://arxiv.org/pdf/2402.08401
Abstract In the era of widespread social networks, the rapid dissemination of fake news has emerged as a significant threat, inflicting detrimental consequences across various dimensions of people's lives. Machine learning and deep learning approaches have been extensively employed for identifying fake news. However, a significant challenge in identifying fake news is the limited availability of labeled news datasets. Therefore, the One-Class Learning (OCL) approach, utilizing only a small set of labeled data from the interest class, can be a suitable approach to address this challenge. On the other hand, representing data as a graph enables access to diverse content and structural information, and label propagation methods on graphs can be effective in predicting node labels. In this paper, we adopt a graph-based model for data representation and introduce a semi-supervised and one-class approach for fake news detection, called LOSS-GAT. Initially, we employ a two-step label propagation algorithm, utilizing Graph Neural Networks (GNNs) as an initial classifier to categorize news into two groups: interest (fake) and non-interest (real). Subsequently, we enhance the graph structure using structural augmentation techniques. Ultimately, we predict the final labels for all unlabeled data using a GNN that induces randomness within the local neighborhood of nodes through the aggregation function. We evaluate our proposed method on five common datasets and compare the results against a set of baseline models, including both OCL and binary labeled models. The results demonstrate that LOSS-GAT achieves a notable improvement, surpassing 10%, with the advantage of utilizing only a limited set of labeled fake news. Noteworthy, LOSS-GAT even outperforms binary labeled models.
Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the Unknown
Authors: Authors: Cedric Derstroff, Jannis Brugger, Jannis Blüml, Mira Mezini, Stefan Kramer, Kristian Kersting
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.08511
Pdf link: https://arxiv.org/pdf/2402.08511
Abstract Monte-Carlo tree search (MCTS) is an effective anytime algorithm with a vast amount of applications. It strategically allocates computational resources to focus on promising segments of the search tree, making it a very attractive search algorithm in large search spaces. However, it often expends its limited resources on reevaluating previously explored regions when they remain the most promising path. Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation. Central to AmEx-MCTS is the decoupling of value updates, visit count updates, and the selected path during the tree search, thereby enabling the exclusion of already explored subtrees or leaves. This segregation preserves the utility of visit counts for both exploration-exploitation balancing and quality metrics within MCTS. The resultant augmentation facilitates in a considerably broader search using identical computational resources, preserving the essential characteristics of MCTS. The expanded coverage not only yields more precise estimations but also proves instrumental in larger and more complex problems. Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
Improving Generalization in Semantic Parsing by Increasing Natural Language Variation
Authors: Authors: Irina Saparina, Mirella Lapata
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.08666
Pdf link: https://arxiv.org/pdf/2402.08666
Abstract Text-to-SQL semantic parsing has made significant progress in recent years, with various models demonstrating impressive performance on the challenging Spider benchmark. However, it has also been shown that these models often struggle to generalize even when faced with small perturbations of previously (accurately) parsed expressions. This is mainly due to the linguistic form of questions in Spider which are overly specific, unnatural, and display limited variation. In this work, we use data augmentation to enhance the robustness of text-to-SQL parsers against natural language variations. Existing approaches generate question reformulations either via models trained on Spider or only introduce local changes. In contrast, we leverage the capabilities of large language models to generate more realistic and diverse questions. Using only a few prompts, we achieve a two-fold increase in the number of questions in Spider. Training on this augmented dataset yields substantial improvements on a range of evaluation sets, including robustness benchmarks and out-of-domain data.

LeeKyungwook / get-arxiv-noti

New submissions for Wed, 14 Feb 24 #977

Keyword: detection

LLMs Among Us: Generative AI Participating in Digital Discourse

Dumviri: Detecting Trackers and Mixed Trackers with a Breakage Detector

Out-of-Distribution Detection and Data Drift Monitoring using Statistical Process Control

CMA-R:Causal Mediation Analysis for Explaining Rumour Detection

Monolithic Silicon-Photonics Linear-Algebra Accelerators Enabling Next-Gen Massive MIMO

Confronting Discrimination in Classification: Smote Based on Marginalized Minorities in the Kernel Space for Imbalanced Data

APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks

Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles

Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss

Prompted Contextual Vectors for Spear-Phishing Detection

LOSS-GAT: Label Propagation and One-Class Semi-Supervised Graph Attention Network for Fake News Detection

Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection

ROSpace: Intrusion Detection Dataset for a ROS2-Based Cyber-Physical System

Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning

Graph Feature Preprocessor: Real-time Extraction of Subgraph-based Features from Transaction Graphs

A Cost-Sensitive Transformer Model for Prognostics Under Highly Imbalanced Industrial Data

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

Keyword: face recognition

Keyword: augmentation

Advancing Data-driven Weather Forecasting: Time-Sliding Data Augmentation of ERA5

MetaTra: Meta-Learning for Generalized Trajectory Prediction in Unseen Domain

Improving Black-box Robustness with In-Context Rewriting

Distal Interference: Exploring the Limits of Model-Based Continual Learning

LOSS-GAT: Label Propagation and One-Class Semi-Supervised Graph Attention Network for Fake News Detection

Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the Unknown

Improving Generalization in Semantic Parsing by Increasing Natural Language Variation