Abstract
The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.
Keyword: multiorgan
There is no result
Keyword: multi-organ
There is no result
Keyword: multi organ
There is no result
Keyword: SAM
An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification
Abstract
Predictive models may generate biased predictions when classifying imbalanced datasets. This happens when the model favors the majority class, leading to low performance in accurately predicting the minority class. To address this issue, balancing or resampling methods are critical pre-processing steps in the modeling process. However, there have been debates and questioning of the functionality of these methods in recent years. In particular, many candidate models may exhibit very similar predictive performance, which is called the Rashomon effect, in model selection. Selecting one of them without considering predictive multiplicity which is the case of yielding conflicting models' predictions for any sample may lead to a loss of using another model. In this study, in addition to the existing debates, the impact of balancing methods on predictive multiplicity is examined through the Rashomon effect. It is important because the blind model selection is risky from a set of approximately equally accurate models. This may lead to serious problems in model selection, validation, and explanation. To tackle this matter, we conducted real dataset experiments to observe the impact of balancing methods on predictive multiplicity through the Rashomon effect. Our findings showed that balancing methods inflate the predictive multiplicity, and they yield varying results. To monitor the trade-off between performance and predictive multiplicity for conducting the modeling process responsibly, we proposed using the extended performance-gain plot for the Rashomon effect.
Mitigating LLM Hallucinations via Conformal Abstention
Authors: Yasin Abbasi Yadkori, Ilja Kuzborskij, David Stutz, András György, Adam Fisch, Arnaud Doucet, Iuliya Beloshapka, Wei-Hung Weng, Yao-Yuan Yang, Csaba Szepesvári, Ali Taylan Cemgil, Nenad Tomasev
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answer. Building on earlier approaches that use self-consistency as a more reliable measure of model confidence, we propose using the LLM itself to self-evaluate the similarity between each of its sampled responses for a given query. We then further leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate). Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets, while also maintaining a significantly less conservative abstention rate on a dataset with long responses (Temporal Sequences) compared to baselines using log-probability scores to quantify uncertainty, while achieveing comparable performance on a dataset with short answers (TriviaQA). To evaluate the experiments automatically, one needs to determine if two responses are equivalent given a question. Following standard practice, we use a thresholded similarity function to determine if two responses match, but also provide a method for calibrating the threshold based on conformal prediction, with theoretical guarantees on the accuracy of the match prediction, which might be of independent interest.
Transfer Learning and Transformer Architecture for Financial Sentiment Analysis
Abstract
Financial sentiment analysis allows financial institutions like Banks and Insurance Companies to better manage the credit scoring of their customers in a better way. Financial domain uses specialized mechanisms which makes sentiment analysis difficult. In this paper, we propose a pre-trained language model which can help to solve this problem with fewer labelled data. We extend on the principles of Transfer learning and Transformation architecture principles and also take into consideration recent outbreak of pandemics like COVID. We apply the sentiment analysis to two different sets of data. We also take smaller training set and fine tune the same as part of the model.
Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model
Abstract
Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises questions about the feasibility of these models when encountering with the inevitable variations and errors inherent in real-world medical data. In this paper, we introduce MID-M, a novel framework that leverages the in-context learning capabilities of a general-domain Large Language Model (LLM) to process multimodal data via image descriptions. MID-M achieves a comparable or superior performance to task-specific fine-tuned LMMs and other general-domain ones, without the extensive domain-specific training or pre-training on multimodal data, with significantly fewer parameters. This highlights the potential of leveraging general-domain LLMs for domain-specific tasks and offers a sustainable and cost-effective alternative to traditional LMM developments. Moreover, the robustness of MID-M against data quality issues demonstrates its practical utility in real-world medical domain applications.
Improving Disease Detection from Social Media Text via Self-Augmentation and Contrastive Learning
Authors: Pervaiz Iqbal Khan, Andreas Dengel, Sheraz Ahmed
Abstract
Detecting diseases from social media has diverse applications, such as public health monitoring and disease spread detection. While language models (LMs) have shown promising performance in this domain, there remains ongoing research aimed at refining their discriminating representations. In this paper, we propose a novel method that integrates Contrastive Learning (CL) with language modeling to address this challenge. Our approach introduces a self-augmentation method, wherein hidden representations of the model are augmented with their own representations. This method comprises two branches: the first branch, a traditional LM, learns features specific to the given data, while the second branch incorporates augmented representations from the first branch to encourage generalization. CL further refines these representations by pulling pairs of original and augmented versions closer while pushing other samples away. We evaluate our method on three NLP datasets encompassing binary, multi-label, and multi-class classification tasks involving social media posts related to various diseases. Our approach demonstrates notable improvements over traditional fine-tuning methods, achieving up to a 2.48% increase in F1-score compared to baseline approaches and a 2.1% enhancement over state-of-the-art methods.
Efficient Sample-Specific Encoder Perturbations
Authors: Yassir Fathullah, Mark J. F. Gales
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Encoder-decoder foundation models have displayed state-of-the-art performance on a range of autoregressive sequence tasks. This paper proposes a simple and lightweight modification to such systems to control the behaviour according to a specific attribute of interest. This paper proposes a novel inference-efficient approach to modifying the behaviour of an encoder-decoder system according to a specific attribute of interest. Specifically, we show that a small proxy network can be used to find a sample-by-sample perturbation of the encoder output of a frozen foundation model to trigger the decoder to generate improved decodings. This work explores a specific realization of this framework focused on improving the COMET performance of Flan-T5 on Machine Translation and the WER of Whisper foundation models on Speech Recognition. Results display consistent improvements in performance evaluated through COMET and WER respectively. Furthermore, experiments also show that the proxies are robust to the exact nature of the data used to train them and can extend to other domains.
Unifying and extending Precision Recall metrics for assessing generative models
Abstract
With the recent success of generative models in image and text, the evaluation of generative models has gained a lot of attention. Whereas most generative models are compared in terms of scalar values such as Frechet Inception Distance (FID) or Inception Score (IS), in the last years (Sajjadi et al., 2018) proposed a definition of precision-recall curve to characterize the closeness of two distributions. Since then, various approaches to precision and recall have seen the light (Kynkaanniemi et al., 2019; Naeem et al., 2020; Park & Kim, 2023). They center their attention on the extreme values of precision and recall, but apart from this fact, their ties are elusive. In this paper, we unify most of these approaches under the same umbrella, relying on the work of (Simon et al., 2019). Doing so, we were able not only to recover entire curves, but also to expose the sources of the accounted pitfalls of the concerned metrics. We also provide consistency results that go well beyond the ones presented in the corresponding literature. Last, we study the different behaviors of the curves obtained experimentally.
Out-of-distribution detection based on subspace projection of high-dimensional features output by the last convolutional layer
Authors: Qiuyu Zhu, Yiwei He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Out-of-distribution (OOD) detection, crucial for reliable pattern classification, discerns whether a sample originates outside the training distribution. This paper concentrates on the high-dimensional features output by the final convolutional layer, which contain rich image features. Our key idea is to project these high-dimensional features into two specific feature subspaces, leveraging the dimensionality reduction capacity of the network's linear layers, trained with Predefined Evenly-Distribution Class Centroids (PEDCC)-Loss. This involves calculating the cosines of three projection angles and the norm values of features, thereby identifying distinctive information for in-distribution (ID) and OOD data, which assists in OOD detection. Building upon this, we have modified the batch normalization (BN) and ReLU layer preceding the fully connected layer, diminishing their impact on the output feature distributions and thereby widening the distribution gap between ID and OOD data features. Our method requires only the training of the classification network model, eschewing any need for input pre-processing or specific OOD data pre-tuning. Extensive experiments on several benchmark datasets demonstrates that our approach delivers state-of-the-art performance. Our code is available at https://github.com/Hewell0/ProjOOD.
Generative AI in Cybersecurity
Authors: Shivani Metta, Isaac Chang, Jack Parker, Michael P. Roman, Arturo F. Ehuan
Abstract
The dawn of Generative Artificial Intelligence (GAI), characterized by advanced models such as Generative Pre-trained Transformers (GPT) and other Large Language Models (LLMs), has been pivotal in reshaping the field of data analysis, pattern recognition, and decision-making processes. This surge in GAI technology has ushered in not only innovative opportunities for data processing and automation but has also introduced significant cybersecurity challenges. As GAI rapidly progresses, it outstrips the current pace of cybersecurity protocols and regulatory frameworks, leading to a paradox wherein the same innovations meant to safeguard digital infrastructures also enhance the arsenal available to cyber criminals. These adversaries, adept at swiftly integrating and exploiting emerging technologies, may utilize GAI to develop malware that is both more covert and adaptable, thus complicating traditional cybersecurity efforts. The acceleration of GAI presents an ambiguous frontier for cybersecurity experts, offering potent tools for threat detection and response, while concurrently providing cyber attackers with the means to engineer more intricate and potent malware. Through the joint efforts of Duke Pratt School of Engineering, Coalfire, and Safebreach, this research undertakes a meticulous analysis of how malicious agents are exploiting GAI to augment their attack strategies, emphasizing a critical issue for the integrity of future cybersecurity initiatives. The study highlights the critical need for organizations to proactively identify and develop more complex defensive strategies to counter the sophisticated employment of GAI in malware creation.
Clones, closed categories, and combinatory logic
Authors: Philip Saville
Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT)
Abstract
We give an exposition of the semantics of the simply-typed lambda-calculus, and its linear and ordered variants, using multi-ary structures. We define universal properties for multicategories, and use these to derive familiar rules for products, tensors, and exponentials. Finally we explain how to recover both the category-theoretic syntactic model and its semantic interpretation from the multi-ary framework. We then use these ideas to study the semantic interpretation of combinatory logic and the simply-typed lambda-calculus without products. We introduce extensional SK-clones and show these are sound and complete for both combinatory logic with extensional weak equality and the simply-typed lambda-calculus without products. We then show such SK-clones are equivalent to a variant of closed categories called SK-categories, so the simply-typed lambda-calculus without products is the internal language of SK-categories. As a corollary, we deduce that SK-categories have the same relationship to cartesian monoidal categories that closed categories have to monoidal categories.
Adapting Self-Supervised Learning for Computational Pathology
Authors: Eric Zimmermann, Neil Tenenholtz, James Hall, George Shaikovski, Michal Zelechowski, Adam Casson, Fausto Milletari, Julian Viret, Eugene Vorontsov, Siqi Liu, Kristen Severson
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Self-supervised learning (SSL) has emerged as a key technique for training networks that can generalize well to diverse tasks without task-specific supervision. This property makes SSL desirable for computational pathology, the study of digitized images of tissues, as there are many target applications and often limited labeled training samples. However, SSL algorithms and models have been primarily developed in the field of natural images and whether their performance can be improved by adaptation to particular domains remains an open question. In this work, we present an investigation of modifications to SSL for pathology data, specifically focusing on the DINOv2 algorithm. We propose alternative augmentations, regularization functions, and position encodings motivated by the characteristics of pathology images. We evaluate the impact of these changes on several benchmarks to demonstrate the value of tailored approaches.
Active Learning Enabled Low-cost Cell Image Segmentation Using Bounding Box Annotation
Authors: Yu Zhu, Qiang Yang, Li Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Cell image segmentation is usually implemented using fully supervised deep learning methods, which heavily rely on extensive annotated training data. Yet, due to the complexity of cell morphology and the requirement for specialized knowledge, pixel-level annotation of cell images has become a highly labor-intensive task. To address the above problems, we propose an active learning framework for cell segmentation using bounding box annotations, which greatly reduces the data annotation cost of cell segmentation algorithms. First, we generate a box-supervised learning method (denoted as YOLO-SAM) by combining the YOLOv8 detector with the Segment Anything Model (SAM), which effectively reduces the complexity of data annotation. Furthermore, it is integrated into an active learning framework that employs the MC DropBlock method to train the segmentation model with fewer box-annotated samples. Extensive experiments demonstrate that our model saves more than ninety percent of data annotation time compared to mask-supervised deep learning methods.
Optimization without retraction on the random generalized Stiefel manifold
Authors: Simon Vary, Pierre Ablin, Bin Gao, P.-A. Absil
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
Optimization over the set of matrices that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods, such as Riemannian approaches, which require a computationally expensive eigenvalue decomposition involving fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to a random estimate of the feasible set. Our method does not enforce the constraint in every iteration exactly, but instead it produces iterations that converge to a critical point on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian counterparts involving the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and GEVP.
Large Language Models are Inconsistent and Biased Evaluators
Authors: Rickard Stureborg, Dimitris Alikaniotis, Yoshi Suhara
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
The zero-shot capability of Large Language Models (LLMs) has enabled highly flexible, reference-free metrics for various tasks, making LLM evaluators common tools in NLP. However, the robustness of these LLM evaluators remains relatively understudied; existing work mainly pursued optimal performance in terms of correlating LLM scores with human expert scores. In this paper, we conduct a series of analyses using the SummEval dataset and confirm that LLMs are biased evaluators as they: (1) exhibit familiarity bias-a preference for text with lower perplexity, (2) show skewed and biased distributions of ratings, and (3) experience anchoring effects for multi-attribute judgments. We also found that LLMs are inconsistent evaluators, showing low "inter-sample" agreement and sensitivity to prompt differences that are insignificant to human understanding of text quality. Furthermore, we share recipes for configuring LLM evaluators to mitigate these limitations. Experimental results on the RoSE dataset demonstrate improvements over the state-of-the-art LLM evaluators.
Explainability Guided Adversarial Evasion Attacks on Malware Detectors
Abstract
As the focus on security of Artificial Intelligence (AI) is becoming paramount, research on crafting and inserting optimal adversarial perturbations has become increasingly critical. In the malware domain, this adversarial sample generation relies heavily on the accuracy and placement of crafted perturbation with the goal of evading a trained classifier. This work focuses on applying explainability techniques to enhance the adversarial evasion attack on a machine-learning-based Windows PE malware detector. The explainable tool identifies the regions of PE malware files that have the most significant impact on the decision-making process of a given malware detector, and therefore, the same regions can be leveraged to inject the adversarial perturbation for maximum efficiency. Profiling all the PE malware file regions based on their impact on the malware detector's decision enables the derivation of an efficient strategy for identifying the optimal location for perturbation injection. The strategy should incorporate the region's significance in influencing the malware detector's decision and the sensitivity of the PE malware file's integrity towards modifying that region. To assess the utility of explainable AI in crafting an adversarial sample of Windows PE malware, we utilize the DeepExplainer module of SHAP for determining the contribution of each region of PE malware to its detection by a CNN-based malware detector, MalConv. Furthermore, we analyzed the significance of SHAP values at a more granular level by subdividing each section of Windows PE into small subsections. We then performed an adversarial evasion attack on the subsections based on the corresponding SHAP values of the byte sequences.
Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization
Authors: Sam Reifenstein, Timothee Leleu, Yoshihisa Yamamoto
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization by accounting for the heterogeneous curvature of the objective function. The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum. This approach significantly reduces the error in estimating the gradient from noisy evaluations through sampling. We demonstrate the efficacy of our method through numerical experiments on artificial problems. Additionally, we show improved performance when tuning NP-hard combinatorial optimization solvers compared to existing state-of-the-art heuristic derivative-free and Bayesian optimization methods.
SoftMCL: Soft Momentum Contrastive Learning for Fine-grained Sentiment-aware Pre-training
Abstract
The pre-training for language models captures general language understanding but fails to distinguish the affective impact of a particular context to a specific word. Recent works have sought to introduce contrastive learning (CL) for sentiment-aware pre-training in acquiring affective information. Nevertheless, these methods present two significant limitations. First, the compatibility of the GPU memory often limits the number of negative samples, hindering the opportunities to learn good representations. In addition, using only a few sentiment polarities as hard labels, e.g., positive, neutral, and negative, to supervise CL will force all representations to converge to a few points, leading to the issue of latent space collapse. This study proposes a soft momentum contrastive learning (SoftMCL) for fine-grained sentiment-aware pre-training. Instead of hard labels, we introduce valence ratings as soft-label supervision for CL to fine-grained measure the sentiment similarities between samples. The proposed SoftMCL is conducted on both the word- and sentence-level to enhance the model's ability to learn affective information. A momentum queue was introduced to expand the contrastive samples, allowing storing and involving more negatives to overcome the limitations of hardware platforms. Extensive experiments were conducted on four different sentiment-related tasks, which demonstrates the effectiveness of the proposed SoftMCL method. The code and data of the proposed SoftMCL is available at: https://www.github.com/wangjin0818/SoftMCL/.
SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning
Abstract
Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.
Abstract
Artificial intelligence systems exhibit many useful capabilities, but they appear to lack understanding. This essay describes how we could go about constructing a machine capable of understanding. As John Locke (1689) pointed out words are signs for ideas, which we can paraphrase as thoughts and concepts. To understand a word is to know and be able to work with the underlying concepts for which it is an indicator. Understanding between a speaker and a listener occurs when the speaker casts his or her concepts into words and the listener recovers approximately those same concepts. Current models rely on the listener to construct any potential meaning. The diminution of behaviorism as a psychological paradigm and the rise of cognitivism provide examples of many experimental methods that can be used to determine whether and to what extent a machine might understand and to make suggestions about how that understanding might be instantiated.
Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization
Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal
Abstract
The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \textbf{M}arkovian sampling, \textbf{C}ontinuous state-action spaces, the performance of the \textbf{L}ast iterate, and \textbf{G}lobal optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$. We achieve this result through our novel use of the weak gradient domination property of MDP's and our unique analysis of the error in critic estimation.
A Model-based Multi-Agent Personalized Short-Video Recommender System
Abstract
Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.
Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition
Authors: Yichun Tai, Kun Yang, Tao Peng, Zhenzhen Huang, Zhijiang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To this end, we propose Stable Surface Defect Generation (StableSDG), which transfers the vast generation distribution embedded in Stable Diffusion model for steel surface defect image generation. To tackle with the distinctive distribution gap between steel surface images and generated images of the diffusion model, we propose two processes. First, we align the distribution by adapting parameters of the diffusion model, adopted both in the token embedding space and network parameter space. Besides, in the generation process, we propose image-oriented generation rather than from pure Gaussian noises. We conduct extensive experiments on steel surface defect dataset, demonstrating state-of-the-art performance on generating high-quality samples and training recognition models, and both designed processes are significant for the performance.
DALLMi: Domain Adaption for LLM-based Multi-label Classifier
Authors: Miruna Beţianu, Abele Mălan, Marco Aldinucci, Robert Birke, Lydia Chen
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Large language models (LLMs) increasingly serve as the backbone for classifying text associated with distinct domains and simultaneously several labels (classes). When encountering domain shifts, e.g., classifier of movie reviews from IMDb to Rotten Tomatoes, adapting such an LLM-based multi-label classifier is challenging due to incomplete label sets at the target domain and daunting training overhead. The existing domain adaptation methods address either image multi-label classifiers or text binary classifiers. In this paper, we design DALLMi, Domain Adaptation Large Language Model interpolator, a first-of-its-kind semi-supervised domain adaptation method for text data models based on LLMs, specifically BERT. The core of DALLMi is the novel variation loss and MixUp regularization, which jointly leverage the limited positively labeled and large quantity of unlabeled text and, importantly, their interpolation from the BERT word embeddings. DALLMi also introduces a label-balanced sampling strategy to overcome the imbalance between labeled and unlabeled data. We evaluate DALLMi against the partial-supervised and unsupervised approach on three datasets under different scenarios of label availability for the target domain. Our results show that DALLMi achieves higher mAP than unsupervised and partially-supervised approaches by 19.9% and 52.2%, respectively.
Which Identities Are Mobilized: Towards an automated detection of social group appeals in political texts
Abstract
This paper proposes a computational text classification strategy to identify references to social groups in European party manifestos and beyond. Our methodology uses machine learning techniques, including BERT and large language models, to capture group-based appeals in texts. We propose to combine automated identification of social groups using the Mistral-7B-v0.1 Large Language Model with Embedding Space-based filtering to extend a sample of core social groups to all social groups mentioned in party manifestos. By applying this approach to RRP's and mainstream parties' group images in manifestos, we explore whether electoral dynamics explain similarities in group appeals and potential convergence or divergence in party strategies. Contrary to expectations, increasing RRP support or mainstream parties' vote loss does not necessarily lead to convergence in group appeals. Nonetheless, our methodology enables mapping similarities in group appeals across time and space in 15 European countries from 1980 to 2021 and can be transferred to other use cases as well.
Common Randomness Generation from Sources with Infinite Polish Alphabet
Authors: Wafa Labidi, Rami Ezzine, Moritz Wiese, Christian Deppe, Holger Boche
Abstract
We investigate the problem of common randomness (CR) generation in the basic two-party communication setting in which a sender and a receiver aim to agree on a common random variable with high probability. The terminals observe independent and identically distributed (i.i.d.) samples of sources with an arbitrary distribution defined on a Polish alphabet and are allowed to communicate as little as possible over a noisy, memoryless channel. We establish single-letter upper and lower bounds on the CR capacity for the specified model. The derived bounds hold with equality except for at most countably many points where discontinuity issues might arise.
From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings
Authors: Firuz Juraev, Mohammed Abuhamad, Eric Chan-Tin, George K. Thiruvathukal, Tamer Abuhmed
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Deep Learning (DL) is rapidly maturing to the point that it can be used in safety- and security-crucial applications. However, adversarial samples, which are undetectable to the human eye, pose a serious threat that can cause the model to misbehave and compromise the performance of such applications. Addressing the robustness of DL models has become crucial to understanding and defending against adversarial attacks. In this study, we perform comprehensive experiments to examine the effect of adversarial attacks and defenses on various model architectures across well-known datasets. Our research focuses on black-box attacks such as SimBA, HopSkipJump, MGAAttack, and boundary attacks, as well as preprocessor-based defensive mechanisms, including bits squeezing, median smoothing, and JPEG filter. Experimenting with various models, our results demonstrate that the level of noise needed for the attack increases as the number of layers increases. Moreover, the attack success rate decreases as the number of layers increases. This indicates that model complexity and robustness have a significant relationship. Investigating the diversity and robustness relationship, our experiments with diverse models show that having a large number of parameters does not imply higher robustness. Our experiments extend to show the effects of the training dataset on model robustness. Using various datasets such as ImageNet-1000, CIFAR-100, and CIFAR-10 are used to evaluate the black-box attacks. Considering the multiple dimensions of our analysis, e.g., model complexity and training dataset, we examined the behavior of black-box attacks when models apply defenses. Our results show that applying defense strategies can significantly reduce attack effectiveness. This research provides in-depth analysis and insight into the robustness of DL models against various attacks, and defenses.
A Sonar-based AUV Positioning System for Underwater Environments with Low Infrastructure Density
Authors: Emilio Olivastri, Daniel Fusaro, Wanmeng Li, Simone Mosco, Alberto Pretto
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Abstract
The increasing demand for underwater vehicles highlights the necessity for robust localization solutions in inspection missions. In this work, we present a novel real-time sonar-based underwater global positioning algorithm for AUVs (Autonomous Underwater Vehicles) designed for environments with a sparse distribution of human-made assets. Our approach exploits two synergistic data interpretation frontends applied to the same stream of sonar data acquired by a multibeam Forward-Looking Sonar (FSD). These observations are fused within a Particle Filter (PF) either to weigh more particles that belong to high-likelihood regions or to solve symmetric ambiguities. Preliminary experiments carried out on a simulated environment resembling a real underwater plant provided promising results. This work represents a starting point towards future developments of the method and consequent exhaustive evaluations also in real-world scenarios.
Abstract
PU learning refers to the classification problem in which only part of positive samples are labeled. Existing PU learning methods treat unlabeled samples equally. However, in many real tasks, from common sense or domain knowledge, some unlabeled samples are more likely to be positive than others. In this paper, we propose soft label PU learning, in which unlabeled data are assigned soft labels according to their probabilities of being positive. Considering that the ground truth of TPR, FPR, and AUC are unknown, we then design PU counterparts of these metrics to evaluate the performances of soft label PU learning methods within validation data. We show that these new designed PU metrics are good substitutes for the real metrics. After that, a method that optimizes such metrics is proposed. Experiments on public datasets and real datasets for anti-cheat services from Tencent games demonstrate the effectiveness of our proposed method.
Cooperation and Federation in Distributed Radar Point Cloud Processing
Authors: S. Savazzi, V. Rampa, S. Kianoush, A. Minora, L. Costa
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
Abstract
The paper considers the problem of human-scale RF sensing utilizing a network of resource-constrained MIMO radars with low range-azimuth resolution. The radars operate in the mmWave band and obtain time-varying 3D point cloud (PC) information that is sensitive to body movements. They also observe the same scene from different views and cooperate while sensing the environment using a sidelink communication channel. Conventional cooperation setups allow the radars to mutually exchange raw PC information to improve ego sensing. The paper proposes a federation mechanism where the radars exchange the parameters of a Bayesian posterior measure of the observed PCs, rather than raw data. The radars act as distributed parameter servers to reconstruct a global posterior (i.e., federated posterior) using Bayesian tools. The paper quantifies and compares the benefits of radar federation with respect to cooperation mechanisms. Both approaches are validated by experiments with a real-time demonstration platform. Federation makes minimal use of the sidelink communication channel (20 {\div} 25 times lower bandwidth use) and is less sensitive to unresolved targets. On the other hand, cooperation reduces the mean absolute target estimation error of about 20%.
Abstract
This paper presents a novel self-supervised two-frame multi-camera metric depth estimation network, termed M${^2}$Depth, which is designed to predict reliable scale-aware surrounding depth in autonomous driving. Unlike the previous works that use multi-view images from a single time-step or multiple time-step images from a single camera, M${^2}$Depth takes temporally adjacent two-frame images from multiple cameras as inputs and produces high-quality surrounding depth. We first construct cost volumes in spatial and temporal domains individually and propose a spatial-temporal fusion module that integrates the spatial-temporal information to yield a strong volume presentation. We additionally combine the neural prior from SAM features with internal features to reduce the ambiguity between foreground and background and strengthen the depth edges. Extensive experimental results on nuScenes and DDAD benchmarks show M${^2}$Depth achieves state-of-the-art performance. More results can be found in https://heiheishuang.xyz/M2Depth .
MemorAI: Energy-Efficient Last-Level Cache Memory Optimization for Virtualized RANs
Authors: Ethan Sanchez Hidalgo, J. Xavier Salvat Lozano, Jose A. Ayala-Romero, Andres Garcia-Saavedra, Xi Li, Xavier Costa-Perez
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
The virtualization of Radio Access Networks (vRAN) is well on its way to become a reality, driven by its advantages such as flexibility and cost-effectiveness. However, virtualization comes at a high price - virtual Base Stations (vBSs) sharing the same computing platform incur a significant computing overhead due to in extremis consumption of shared cache memory resources. Consequently, vRAN suffers from increased energy consumption, which fuels the already high operational costs in 5G networks. This paper investigates cache memory allocation mechanisms' effectiveness in reducing total energy consumption. Using an experimental vRAN platform, we profile the energy consumption and CPU utilization of vBS as a function of the network state (e.g., traffic demand, modulation scheme). Then, we address the high dimensionality of the problem by decomposing it per vBS, which is possible thanks to the Last-Level Cache (LLC) isolation implemented in our system. Based on this, we train a vBS digital twin, which allows us to train offline a classifier, avoiding the performance degradation of the system during training. Our results show that our approach performs very closely to an offline optimal oracle, outperforming standard approaches used in today's deployments.
Sampling to Achieve the Goal: An Age-aware Remote Markov Decision Process
Authors: Aimin Li, Shaohua Wu, Gary C.F. Lee, Xiaomeng Cheng, Sumei Sun
Abstract
Age of Information (AoI) has been recognized as an important metric to measure the freshness of information. Central to this consensus is that minimizing AoI can enhance the freshness of information, thereby facilitating the accuracy of subsequent decision-making processes. However, to date the direct causal relationship that links AoI to the utility of the decision-making process is unexplored. To fill this gap, this paper provides a sampling-control co-design problem, referred to as an age-aware remote Markov Decision Process (MDP) problem, to explore this unexplored relationship. Our framework revisits the sampling problem in [1] with a refined focus: moving from AoI penalty minimization to directly optimizing goal-oriented remote decision-making process under random delay. We derive that the age-aware remote MDP problem can be reduced to a standard MDP problem without delays, and reveal that treating AoI solely as a metric for optimization is not optimal in achieving remote decision making. Instead, AoI can serve as important side information to facilitate remote decision making.
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach
Authors: Anton Plaksin, Vitaly Kalev
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. Based on these results, we present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
Few-sample Variational Inference of Bayesian Neural Networks with Arbitrary Nonlinearities
Abstract
Bayesian Neural Networks (BNNs) extend traditional neural networks to provide uncertainties associated with their outputs. On the forward pass through a BNN, predictions (and their uncertainties) are made either by Monte Carlo sampling network weights from the learned posterior or by analytically propagating statistical moments through the network. Though flexible, Monte Carlo sampling is computationally expensive and can be infeasible or impractical under resource constraints or for large networks. While moment propagation can ameliorate the computational costs of BNN inference, it can be difficult or impossible for networks with arbitrary nonlinearities, thereby restricting the possible set of network layers permitted with such a scheme. In this work, we demonstrate a simple yet effective approach for propagating statistical moments through arbitrary nonlinearities with only 3 deterministic samples, enabling few-sample variational inference of BNNs without restricting the set of network layers used. Furthermore, we leverage this approach to demonstrate a novel nonlinear activation function that we use to inject physics-informed prior information into output nodes of a BNN.
Histogram-Based Federated XGBoost using Minimal Variance Sampling for Federated Tabular Data
Authors: William Lindskog, Christian Prehofer, Sarandeep Singh
Abstract
Federated Learning (FL) has gained considerable traction, yet, for tabular data, FL has received less attention. Most FL research has focused on Neural Networks while Tree-Based Models (TBMs) such as XGBoost have historically performed better on tabular data. It has been shown that subsampling of training data when building trees can improve performance but it is an open problem whether such subsampling can improve performance in FL. In this paper, we evaluate a histogram-based federated XGBoost that uses Minimal Variance Sampling (MVS). We demonstrate the underlying algorithm and show that our model using MVS can improve performance in terms of accuracy and regression error in a federated setting. In our evaluation, our model using MVS performs better than uniform (random) sampling and no sampling at all. It achieves both outstanding local and global performance on a new set of federated tabular datasets. Federated XGBoost using MVS also outperforms centralized XGBoost in half of the studied cases.
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Abstract
Recent few-shot action recognition (FSAR) methods achieve promising performance by performing semantic matching on learned discriminative features. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, \etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocity Progressive-alignment (MVP-Shot) framework to progressively learn and align semantic-related action features at multi-velocity levels. Concretely, a Multi-Velocity Feature Alignment (MVFA) module is designed to measure the similarity between features from support and query videos with different velocity scales and then merge all similarity scores in a residual fashion. To avoid the multiple velocity features deviating from the underlying motion semantic, our proposed Progressive Semantic-Tailored Interaction (PSTI) module injects velocity-tailored text information into the video feature via feature interaction on channel and temporal domains at different velocities. The above two modules compensate for each other to predict query categories more accurately under the few-shot settings. Experimental results show our method outperforms current state-of-the-art methods on multiple standard few-shot benchmarks (i.e., HMDB51, UCF101, Kinetics, and SSv2-small).
A Mutual Information Perspective on Federated Contrastive Learning
Abstract
We investigate contrastive learning in the federated setting through the lens of SimCLR and multi-view mutual information maximization. In doing so, we uncover a connection between contrastive representation learning and user verification; by adding a user verification loss to each client's local SimCLR loss we recover a lower bound to the global multi-view mutual information. To accommodate for the case of when some labelled data are available at the clients, we extend our SimCLR variant to the federated semi-supervised setting. We see that a supervised SimCLR objective can be obtained with two changes: a) the contrastive loss is computed between datapoints that share the same label and b) we require an additional auxiliary head that predicts the correct labels from either of the two views. Along with the proposed SimCLR extensions, we also study how different sources of non-i.i.d.-ness can impact the performance of federated unsupervised learning through global mutual information maximization; we find that a global objective is beneficial for some sources of non-i.i.d.-ness but can be detrimental for others. We empirically evaluate our proposed extensions in various tasks to validate our claims and furthermore demonstrate that our proposed modifications generalize to other pretraining methods.
Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks
Abstract
The $\ell{1,\infty}$ norm is an efficient structured projection but the complexity of the best algorithm is unfortunately $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}^{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power. We generalize our method to tensors and we propose a new multi-level projection, having an induced decomposition that yields a linear parallel speedup up to an exponential speedup factor, resulting in a time complexity lower-bounded by the sum of the dimensions. Experiments show that our bi-level $\ell_{1,\infty}$ projection is $2.5$ times faster than the actual fastest algorithm provided by \textit{Chu et. al.} while providing same accuracy and better sparsity in neural networks applications.
Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures
Abstract
The capability of accurately determining code similarity is crucial in many tasks related to software development. For example, it might be essential to identify code duplicates for performing software maintenance. This research introduces a novel ensemble learning approach for code similarity assessment, combining the strengths of multiple unsupervised similarity measures. The key idea is that the strengths of a diverse set of similarity measures can complement each other and mitigate individual weaknesses, leading to improved performance. Preliminary results show that while Transformers-based CodeBERT and its variant GraphCodeBERT are undoubtedly the best option in the presence of abundant training data, in the case of specific small datasets (up to 500 samples), our ensemble achieves similar results, without prejudice to the interpretability of the resulting solution, and with a much lower associated carbon footprint due to training. The source code of this novel approach can be downloaded from https://github.com/jorge-martinez-gil/ensemble-codesim.
Equal Requests are Asymptotically Hardest for Data Recovery
Authors: Jüri Lember, Ago-Erik Riet
Subjects: Information Theory (cs.IT); Combinatorics (math.CO); Probability (math.PR)
Abstract
In a distributed storage system serving hot data, the data recovery performance becomes important, captured e.g. by the service rate. We give partial evidence for it being hardest to serve a sequence of equal user requests (as in PIR coding regime) both for concrete and random user requests and server contents. We prove that a constant request sequence is locally hardest to serve: If enough copies of each vector are stored in servers, then if a request sequence with all requests equal can be served then we can still serve it if a few requests are changed. For random iid server contents, with number of data symbols constant (for simplicity) and the number of servers growing, we show that the maximum number of user requests we can serve divided by the number of servers we need approaches a limit almost surely. For uniform server contents, we show this limit is 1/2, both for sequences of copies of a fixed request and of any requests, so it is at least as hard to serve equal requests as any requests. For iid requests independent from the uniform server contents the limit is at least 1/2 and equal to 1/2 if requests are all equal to a fixed request almost surely, confirming the same. As a building block, we deduce from a 1952 result of Marshall Hall, Jr. on abelian groups, that any collection of half as many requests as coded symbols in the doubled binary simplex code can be served by this code. This implies the fractional version of the Functional Batch Code Conjecture that allows half-servers.
Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation
Abstract
The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com/xzhouzeng/PRPose.
Can We Identify Unknown Audio Recording Environments in Forensic Scenarios?
Authors: Denise Moussa, Germans Hirsch, Christian Riess
Abstract
Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of the recorded audio to the recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide tools for closed-set recording environment classification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, closed-set tools are not applicable without retraining on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining. Instead, it is the first tool for robust few-shot classification of unseen environment locations. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, environment characteristics or recording position mismatches. Our code and datasets will be made publicly available upon acceptance.
Payout Races and Congested Channels: A Formal Analysis of Security in the Lightning Network
Authors: Ben Weintraub, Satwik Prabhu Kumble, Cristina Nita-Rotaru, Stefanie Roos
Abstract
The Lightning Network, a payment channel network with a market cap of over 192M USD, is designed to resolve Bitcoin's scalability issues through fast off-chain transactions. There are multiple Lightning Network client implementations, all of which conform to the same textual specifications known as BOLTs. Several vulnerabilities have been manually discovered, but to-date there have been few works systematically analyzing the security of the Lightning Network. In this work, we take a foundational approach to analyzing the security of the Lightning Network with the help of formal methods. Based on the BOLTs' specifications, we build a detailed formal model of the Lightning Network's single-hop payment protocol and verify it using the Spin model checker. Our model captures both concurrency and error semantics of the payment protocol. We then define several security properties which capture the correct intermediate operation of the protocol, ensuring that the outcome is always certain to both channel peers, and using them we re-discover a known attack previously reported in the literature along with a novel attack, referred to as a Payout Race. A Payout Race consists of a particular sequence of events that can lead to an ambiguity in the protocol in which innocent users can unwittingly lose funds. We confirm the practicality of this attack by reproducing it in a local testbed environment.
Towards a Formal Creativity Theory: Preliminary results in Novelty and Transformativeness
Abstract
Formalizing creativity-related concepts has been a long-term goal of Computational Creativity. To the same end, we explore Formal Learning Theory in the context of creativity. We provide an introduction to the main concepts of this framework and a re-interpretation of terms commonly found in creativity discussions, proposing formal definitions for novelty and transformational creativity. This formalisation marks the beginning of a research branch we call Formal Creativity Theory, exploring how learning can be included as preparation for exploratory behaviour and how learning is a key part of transformational creative behaviour. By employing these definitions, we argue that, while novelty is neither necessary nor sufficient for transformational creativity in general, when using an inspiring set, rather than a sequence of experiences, an agent actually requires novelty for transformational creativity to occur.
The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates
Authors: Giuseppe Russo Latona, Manoel Horta Ribeiro, Tim R. Davidson, Veniamin Veselovsky, Robert West
Abstract
Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8\%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4\%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4\%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Authors: Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Abstract
Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection, with special focus on generalization ability. To this end, the detection problem is reformulated in a speaker verification framework and fake audios are exposed by the mismatch between the voice sample under test and the voice of the claimed identity. With this paradigm, no fake speech sample is necessary in training, cutting off any link with the generation method at the root, and ensuring full generalization ability. Features are extracted by general-purpose large pre-trained models, with no need for training or fine-tuning on specific fake detection or speaker verification datasets. At detection time only a limited set of voice fragments of the identity under test is required. Experiments on several datasets widespread in the community show that detectors based on pre-trained models achieve excellent performance and show strong generalization ability, rivaling supervised methods on in-distribution data and largely overcoming them on out-of-distribution data.
Imitation Learning in Discounted Linear MDPs without exploration assumptions
Abstract
We present a new algorithm for imitation learning in infinite horizon linear MDPs dubbed ILARL which greatly improves the bound on the number of trajectories that the learner needs to sample from the environment. In particular, we remove exploration assumptions required in previous works and we improve the dependence on the desired accuracy $\epsilon$ from $\mathcal{O}\br{\epsilon^{-5}}$ to $\mathcal{O}\br{\epsilon^{-4}}$. Our result relies on a connection between imitation learning and online learning in MDPs with adversarial losses. For the latter setting, we present the first result for infinite horizon linear MDP which may be of independent interest. Moreover, we are able to provide a strengthen result for the finite horizon case where we achieve $\mathcal{O}\br{\epsilon^{-2}}$. Numerical experiments with linear function approximation shows that ILARL outperforms other commonly used algorithms.
Metalearners for Ranking Treatment Effects
Authors: Toon Vanderschueren, Wouter Verbeke, Felipe Moraes, Hugo Manuel Proença
Abstract
Efficiently allocating treatments with a budget constraint constitutes an important challenge across various domains. In marketing, for example, the use of promotions to target potential customers and boost conversions is limited by the available budget. While much research focuses on estimating causal effects, there is relatively limited work on learning to allocate treatments while considering the operational context. Existing methods for uplift modeling or causal inference primarily estimate treatment effects, without considering how this relates to a profit maximizing allocation policy that respects budget constraints. The potential downside of using these methods is that the resulting predictive model is not aligned with the operational context. Therefore, prediction errors are propagated to the optimization of the budget allocation problem, subsequently leading to a suboptimal allocation policy. We propose an alternative approach based on learning to rank. Our proposed methodology directly learns an allocation policy by prioritizing instances in terms of their incremental profit. We propose an efficient sampling procedure for the optimization of the ranking model to scale our methodology to large-scale data sets. Theoretically, we show how learning to rank can maximize the area under a policy's incremental profit curve. Empirically, we validate our methodology and show its effectiveness in practice through a series of experiments on both synthetic and real-world data.
Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning
Authors: Yijun Yan, Jinchang Ren, Barry Harrison, Oliver Lewis, Yinhe Li, Ping Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Peat, a crucial component in whisky production, imparts distinctive and irreplaceable flavours to the final product. However, the extraction of peat disrupts ancient ecosystems and releases significant amounts of carbon, contributing to climate change. This paper aims to address this issue by conducting a feasibility study on enhancing peat use efficiency in whisky manufacturing through non-destructive analysis using hyperspectral imaging. Results show that shot-wave infrared (SWIR) data is more effective for analyzing peat samples and predicting total phenol levels, with accuracies up to 99.81%.
Automatic Programming: Large Language Models and Beyond
Authors: Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, Patanamon Thongtanunam
Abstract
Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance
FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems
Abstract
This paper presents a framework for evaluating fairness in recommender systems powered by Large Language Models (RecLLMs), addressing the need for a unified approach that spans various fairness dimensions including sensitivity to user attributes, intrinsic fairness, and discussions of fairness based on underlying benefits. In addition, our framework introduces counterfactual evaluations and integrates diverse user group considerations to enhance the discourse on fairness evaluation for RecLLMs. Our key contributions include the development of a robust framework for fairness evaluation in LLM-based recommendations and a structured method to create \textit{informative user profiles} from demographic data, historical user preferences, and recent interactions. We argue that the latter is essential for enhancing personalization in such systems, especially in temporal-driven scenarios. We demonstrate the utility of our framework through practical applications on two datasets, LastFM-1K and ML-1M. We conduct experiments on a subsample of 80 users from each dataset, testing and assessing the effectiveness of various prompt construction scenarios and in-context learning, comprising more than 50 scenarios. This results in more than 4000 recommendations (80 * 50 = 4000). Our study reveals that while there are no significant unfairness issues in scenarios involving sensitive attributes, some concerns remain. However, in terms of intrinsic fairness, which does not involve direct sensitivity, unfairness across demographic groups remains significant. The code and data used for this paper are available at: \url{https://shorturl.at/awBFM}.
REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs
Authors: Deepa Tilwani, Yash Saxena, Ali Mohammadi, Edward Raff, Amit Sheth, Srinivasan Parthasarathy, Manas Gaur
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract
Automatic citation generation for sentences in a document or report is paramount for intelligence analysts, cybersecurity, news agencies, and education personnel. In this research, we investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries: (a) Direct Queries, LLMs are asked to provide author names of the given research article, and (b) Indirect Queries, LLMs are asked to provide the title of a mentioned article when given a sentence from a different article. To demonstrate where LLM stands in this task, we introduce a large dataset called REASONS comprising abstracts of the 12 most popular domains of scientific research on arXiv. From around 20K research articles, we make the following deductions on public and proprietary LLMs: (a) State-of-the-art, often called anthropomorphic GPT-4 and GPT-3.5, suffers from high pass percentage (PP) to minimize the hallucination rate (HR). When tested with Perplexity.ai (7B), they unexpectedly made more errors; (b) Augmenting relevant metadata lowered the PP and gave the lowest HR; (c) Advance retrieval-augmented generation (RAG) using Mistral demonstrates consistent and robust citation support on indirect queries and matched performance to GPT-3.5 and GPT-4. The HR across all domains and models decreased by an average of 41.93% and the PP was reduced to 0% in most cases. In terms of generation quality, the average F1 Score and BLEU were 68.09% and 57.51%, respectively; (d) Testing with adversarial samples showed that LLMs, including the Advance RAG Mistral, struggle to understand context, but the extent of this issue was small in Mistral and GPT-4-Preview. Our study con tributes valuable insights into the reliability of RAG for automated citation generation tasks.
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
Authors: Alessandro Montenegro, Marco Mussi, Alberto Maria Metelli, Matteo Papini
Abstract
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters. Stochastic controllers, however, are often undesirable from a practical perspective because of their lack of robustness, safety, and traceability. In common practice, stochastic (hyper)policies are learned only to deploy their deterministic version. In this paper, we make a step towards the theoretical understanding of this practice. After introducing a novel framework for modeling this scenario, we study the global convergence to the best deterministic policy, under (weak) gradient domination assumptions. Then, we illustrate how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy. Finally, we quantitatively compare action-based and parameter-based exploration, giving a formal guise to intuitive results.
A second-order semi-Lagrangian exponential scheme with application to the shallow-water equations on the rotating sphere
Authors: João Guilherme Caldas Steinstraesser, Pedro da Silva Peixoto, Martin Schreiber
Abstract
In this work, we study and extend a class of semi-Lagrangian exponential methods, which combine exponential time integration techniques, suitable for integrating stiff linear terms, with a semi-Lagrangian treatment of nonlinear advection terms. Partial differential equations involving both processes arise for instance in atmospheric circulation models. Through a truncation error analysis, we first show that previously formulated semi-Lagrangian exponential schemes are limited to first-order accuracy due to the discretization of the linear term; we then formulate a new discretization leading to a second-order accurate method. Also, a detailed stability study, both considering a linear stability analysis and an empirical simulation-based one, is conducted to compare several Eulerian and semi-Lagrangian exponential schemes, as well as a well-established semi-Lagrangian semi-implicit method, which is used in operational atmospheric models. Numerical simulations of the shallow-water equations on the rotating sphere, considering standard and challenging benchmark test cases, are performed to assess the orders of convergence, stability properties, and computational cost of each method. The proposed second-order semi-Lagrangian exponential method was shown to be more stable and accurate than the previously formulated schemes of the same class at the expense of larger wall-clock times; however, the method is more stable and has a similar cost compared to the well-established semi-Lagrangian semi-implicit; therefore, it is a competitive candidate for potential operational applications in atmospheric circulation modeling.
Keyword: webgpu
There is no result
Keyword: webgl
There is no result
Keyword: pre-rendering
There is no result
Keyword: prerendering
There is no result
Keyword: motion prediction
There is no result
Keyword: incremental learning
There is no result
Keyword: svm incremental
There is no result
Keyword: nerf
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Keyword: multiorgan
There is no result
Keyword: multi-organ
There is no result
Keyword: multi organ
There is no result
Keyword: SAM
An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification
Mitigating LLM Hallucinations via Conformal Abstention
Transfer Learning and Transformer Architecture for Financial Sentiment Analysis
Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model
Improving Disease Detection from Social Media Text via Self-Augmentation and Contrastive Learning
Efficient Sample-Specific Encoder Perturbations
Unifying and extending Precision Recall metrics for assessing generative models
Out-of-distribution detection based on subspace projection of high-dimensional features output by the last convolutional layer
Generative AI in Cybersecurity
Clones, closed categories, and combinatory logic
Adapting Self-Supervised Learning for Computational Pathology
Active Learning Enabled Low-cost Cell Image Segmentation Using Bounding Box Annotation
Optimization without retraction on the random generalized Stiefel manifold
Large Language Models are Inconsistent and Biased Evaluators
Explainability Guided Adversarial Evasion Attacks on Malware Detectors
Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization
SoftMCL: Soft Momentum Contrastive Learning for Fine-grained Sentiment-aware Pre-training
SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning
An Essay concerning machine understanding
Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization
A Model-based Multi-Agent Personalized Short-Video Recommender System
Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition
DALLMi: Domain Adaption for LLM-based Multi-label Classifier
Which Identities Are Mobilized: Towards an automated detection of social group appeals in political texts
Common Randomness Generation from Sources with Infinite Polish Alphabet
From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings
A Sonar-based AUV Positioning System for Underwater Environments with Low Infrastructure Density
Soft Label PU Learning
Cooperation and Federation in Distributed Radar Point Cloud Processing
M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
MemorAI: Energy-Efficient Last-Level Cache Memory Optimization for Virtualized RANs
Sampling to Achieve the Goal: An Age-aware Remote Markov Decision Process
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach
Few-sample Variational Inference of Bayesian Neural Networks with Arbitrary Nonlinearities
Histogram-Based Federated XGBoost using Minimal Variance Sampling for Federated Tabular Data
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
A Mutual Information Perspective on Federated Contrastive Learning
Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks
Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures
Equal Requests are Asymptotically Hardest for Data Recovery
Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation
Can We Identify Unknown Audio Recording Environments in Forensic Scenarios?
Payout Races and Congested Channels: A Formal Analysis of Security in the Lightning Network
Towards a Formal Creativity Theory: Preliminary results in Novelty and Transformativeness
The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Imitation Learning in Discounted Linear MDPs without exploration assumptions
Metalearners for Ranking Treatment Effects
Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning
Automatic Programming: Large Language Models and Beyond
FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems
REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
A second-order semi-Lagrangian exponential scheme with application to the shallow-water equations on the rotating sphere