New submissions for Fri, 28 Oct 22

Keyword: out of distribution detection

There is no result

Keyword: out-of-distribution detection

There is no result

Keyword: expected calibration error

A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs

Authors: Hans Hao-Hsun Hsu, Yuesong Shen, Daniel Cremers
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2210.15575
Pdf link: https://arxiv.org/pdf/2210.15575
Abstract Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration error (ECE) and the agree/disagree ECEs, which provide criteria for uncertainty estimation on graphs beyond the nodewise setting. Our experiments demonstrate that the proposed edgewise metrics can complement the nodewise results and yield additional insights. Moreover, we show that GNN models which consider the structured prediction problem on graphs tend to have better uncertainty estimations, which illustrates the benefit of going beyond the nodewise setting.
Keyword: overconfident

There is no result

Keyword: overconfidence

There is no result

Keyword: confidence

RulE: Neural-Symbolic Knowledge Graph Reasoning with Rule Embedding
Authors: Xiaojuan Tang, Song-Chun Zhu, Yitao Liang, Muhan Zhang
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2210.14905
Pdf link: https://arxiv.org/pdf/2210.14905
Abstract Knowledge graph (KG) reasoning is an important problem for knowledge graphs. It predicts missing links by reasoning on existing facts. Knowledge graph embedding (KGE) is one of the most popular methods to address this problem. It embeds entities and relations into low-dimensional vectors and uses the learned entity/relation embeddings to predict missing facts. However, KGE only uses zeroth-order (propositional) logic to encode existing triplets (e.g., Alice is Bob's wife."); it is unable to leverage first-order (predicate) logic to represent generally applicable logical \textbf{rules} (e.g.,$\forall x,y \colon x ~\text{is}~ y\text{'s wife} \rightarrow y ~\text{is}~ x\text{'s husband}$''). On the other hand, traditional rule-based KG reasoning methods usually rely on hard logical rule inference, making it brittle and hardly competitive with KGE. In this paper, we propose RulE, a novel and principled framework to represent and model logical rules and triplets. RulE jointly represents entities, relations and logical rules in a unified embedding space. By learning an embedding for each logical rule, RulE can perform logical rule inference in a soft way and give a confidence score to each grounded rule, similar to how KGE gives each triplet a confidence score. Compared to KGE alone, RulE allows injecting prior logical rule information into the embedding space, which improves the generalization of knowledge graph embedding. Besides, the learned confidence scores of rules improve the logical rule inference process by softly controlling the contribution of each rule, which alleviates the brittleness of logic. We evaluate our method with link prediction tasks. Experimental results on multiple benchmark KGs demonstrate the effectiveness of RulE.
Deep Latent Mixture Model for Recommendation
Authors: Jun Zhang, Ping Li, Wei Wang
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2210.15112
Pdf link: https://arxiv.org/pdf/2210.15112
Abstract Recent advances in neural networks have been successfully applied to many tasks in online recommendation applications. We propose a new framework called cone latent mixture model which makes use of hand-crafted state being able to factor distinct dependencies among multiple related documents. Specifically, it uses discriminative optimization techniques in order to generate effective multi-level knowledge bases, and uses online discriminative learning techniques in order to leverage these features. And for this joint model which uses confidence estimates for each topic and is able to learn a discriminatively trained jointly to automatically extracted salient features where discriminative training is only uses features and then is able to accurately trained.
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Authors: Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2210.15184
Pdf link: https://arxiv.org/pdf/2210.15184
Abstract Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages. However, this performance comes at the cost of significantly bloated models which are not practically deployable. Knowledge Distillation is one popular technique to develop competitive, lightweight models: In this work, we first evaluate its use to compress MT models focusing on languages with extremely limited training data. Through our analysis across 8 languages, we find that the variance in the performance of the distilled models due to their dependence on priors including the amount of synthetic data used for distillation, the student architecture, training hyperparameters and confidence of the teacher models, makes distillation a brittle compression mechanism. To mitigate this, we explore the use of post-training quantization for the compression of these models. Here, we find that while distillation provides gains across some low-resource languages, quantization provides more consistent performance trends for the entire range of languages, especially the lowest-resource languages in our target set.
TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack
Authors: Yu Cao, Dianqi Li, Meng Fang, Tianyi Zhou, Jun Gao, Yibing Zhan, Dacheng Tao
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2210.15221
Pdf link: https://arxiv.org/pdf/2210.15221
Abstract We present Twin Answer Sentences Attack (TASA), an adversarial attack method for question answering (QA) models that produces fluent and grammatical adversarial contexts while maintaining gold answers. Despite phenomenal progress on general adversarial attacks, few works have investigated the vulnerability and attack specifically for QA models. In this work, we first explore the biases in the existing models and discover that they mainly rely on keyword matching between the question and context, and ignore the relevant contextual relations for answer prediction. Based on two biases above, TASA attacks the target model in two folds: (1) lowering the model's confidence on the gold answer with a perturbed answer sentence; (2) misguiding the model towards a wrong answer with a distracting answer sentence. Equipped with designed beam search and filtering methods, TASA can generate more effective attacks than existing textual attack methods while sustaining the quality of contexts, in extensive experiments on five QA datasets and human evaluations.
Iterative pseudo-forced alignment by acoustic CTC loss for self-supervised ASR domain adaptation
Authors: Fernando López, Jordi Luque
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2210.15226
Pdf link: https://arxiv.org/pdf/2210.15226
Abstract High-quality data labeling from specific domains is costly and human time-consuming. In this work, we propose a self-supervised domain adaptation method, based upon an iterative pseudo-forced alignment algorithm. The produced alignments are employed to customize an end-to-end Automatic Speech Recognition (ASR) and iteratively refined. The algorithm is fed with frame-wise character posteriors produced by a seed ASR, trained with out-of-domain data, and optimized throughout a Connectionist Temporal Classification (CTC) loss. The alignments are computed iteratively upon a corpus of broadcast TV. The process is repeated by reducing the quantity of text to be aligned or expanding the alignment window until finding the best possible audio-text alignment. The starting timestamps, or temporal anchors, are produced uniquely based on the confidence score of the last aligned utterance. This score is computed with the paths of the CTC-alignment matrix. With this methodology, no human-revised text references are required. Alignments from long audio files with low-quality transcriptions, like TV captions, are filtered out by confidence score and ready for further ASR adaptation. The obtained results, on both the Spanish RTVE2022 and CommonVoice databases, underpin the feasibility of using CTC-based systems to perform: highly accurate audio-text alignments, domain adaptation and semi-supervised training of end-to-end ASR.
Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity
Authors: Dennis Ulmer, Jes Frellsen, Christian Hardmeier
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2210.15452
Pdf link: https://arxiv.org/pdf/2210.15452
Abstract We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are open-sourced in a software package.
Towards Correlated Sequential Rules
Authors: Lili Chen, Wensheng Gan, Chien-Ming Chen
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2210.15637
Pdf link: https://arxiv.org/pdf/2210.15637
Abstract The goal of high-utility sequential pattern mining (HUSPM) is to efficiently discover profitable or useful sequential patterns in a large number of sequences. However, simply being aware of utility-eligible patterns is insufficient for making predictions. To compensate for this deficiency, high-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns based on the appearance of premise sequential patterns. It has numerous applications, such as product recommendation and weather prediction. However, the existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules. To address this issue, we propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM. The proposed algorithm requires not only that each rule be correlated but also that the patterns in the antecedent and consequent of the high-utility sequential rule be correlated. The algorithm adopts a utility-list structure to avoid multiple database scans. Additionally, several pruning strategies are used to improve the algorithm's efficiency and performance. Based on several real-world datasets, subsequent experiments demonstrated that CoUSR is effective and efficient in terms of operation time and memory consumption.
Keyword: scaling

Neuro-symbolic partial differential equation solver
Authors: Pouria Mistani, Samira Pakravan, Rajesh Ilango, Sanjay Choudhry, Frederic Gibou
Subjects: Numerical Analysis (math.NA); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2210.14907
Pdf link: https://arxiv.org/pdf/2210.14907
Abstract We present a highly scalable strategy for developing mesh-free neuro-symbolic partial differential equation solvers from existing numerical discretizations found in scientific computing. This strategy is unique in that it can be used to efficiently train neural network surrogate models for the solution functions and the differential operators, while retaining the accuracy and convergence properties of state-of-the-art numerical solvers. This neural bootstrapping method is based on minimizing residuals of discretized differential systems on a set of random collocation points with respect to the trainable parameters of the neural network, achieving unprecedented resolution and optimal scaling for solving physical and biological systems.
Accountable Safety for Rollups
Authors: Ertem Nusret Tas, John Adler, Mustafa Al-Bassam, Ismail Khoffi, David Tse, Nima Vaziri
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2210.15017
Pdf link: https://arxiv.org/pdf/2210.15017
Abstract Accountability, the ability to provably identify protocol violators, gained prominence as the main economic argument for the security of proof-of-stake (PoS) protocols. Rollups, the most popular scaling solution for blockchains, typically use PoS protocols as their parent chain. We define accountability for rollups, and present an attack that shows the absence of accountability on existing designs. We provide an accountable rollup design and prove its security, both for the traditional `enshrined' rollups and for sovereign rollups, an emergent alternative built on lazy blockchains, tasked only with ordering and availability of the rollup data.
Deep-MDS Framework for Recovering the 3D Shape of 2D Landmarks from a Single Image
Authors: Shima Kamyab, Zohreh Azimifar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2210.15200
Pdf link: https://arxiv.org/pdf/2210.15200
Abstract In this paper, a low parameter deep learning framework utilizing the Non-metric Multi-Dimensional scaling (NMDS) method, is proposed to recover the 3D shape of 2D landmarks on a human face, in a single input image. Hence, NMDS approach is used for the first time to establish a mapping from a 2D landmark space to the corresponding 3D shape space. A deep neural network learns the pairwise dissimilarity among 2D landmarks, used by NMDS approach, whose objective is to learn the pairwise 3D Euclidean distance of the corresponding 2D landmarks on the input image. This scheme results in a symmetric dissimilarity matrix, with the rank larger than 2, leading the NMDS approach toward appropriately recovering the 3D shape of corresponding 2D landmarks. In the case of posed images and complex image formation processes like perspective projection which causes occlusion in the input image, we consider an autoencoder component in the proposed framework, as an occlusion removal part, which turns different input views of the human face into a profile view. The results of a performance evaluation using different synthetic and real-world human face datasets, including Besel Face Model (BFM), CelebA, CoMA - FLAME, and CASIA-3D, indicates the comparable performance of the proposed framework, despite its small number of training parameters, with the related state-of-the-art and powerful 3D reconstruction methods from the literature, in terms of efficiency and accuracy.
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability
Authors: Daniele De Sensi, Tiziano De Matteis, Konstantin Taranov, Salvatore Di Girolamo, Tobias Rahn, Torsten Hoefler
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2210.15315
Pdf link: https://arxiv.org/pdf/2210.15315
Abstract Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise HPC systems offer similar computational resources, their network architecture and performance may differ significantly. For example, these systems use fundamentally different network transport and routing protocols, which may introduce network noise that can eventually limit the application scaling. This work analyzes network performance, scalability, and cost of running HPC workloads on cloud systems. First, we consider latency, bandwidth, and collective communication patterns in detailed small-scale measurements, and then we simulate network performance at a larger scale. We validate our approach on four popular cloud providers and three on-premise HPC systems, showing that network (and also OS) noise can significantly impact performance and cost both at small and large scale.
What Language Model to Train if You Have One Million GPU Hours?
Authors: Teven Le Scao, Thomas Wang, Daniel Hesslow, Lucile Saulnier, Stas Bekman, M Saiful Bari, Stella Bideman, Hady Elsahar, Niklas Muennighoff, Jason Phang, Ofir Press, Colin Raffel, Victor Sanh, Sheng Shen, Lintang Sutawika, Jaesung Tae, Zheng Xin Yong, Julien Launay, Iz Beltagy
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2210.15424
Pdf link: https://arxiv.org/pdf/2210.15424
Abstract The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notably, it can be difficult to evaluate how modeling decisions may impact emergent capabilities, given that these capabilities arise mainly from sheer scale alone. In the process of building BLOOM--the Big Science Large Open-science Open-access Multilingual language model--our goal is to identify an architecture and training setup that makes the best use of our 1,000,000 A100-GPU-hours budget. Specifically, we perform an ablation study at the billion-parameter scale comparing different modeling practices and their impact on zero-shot generalization. In addition, we study the impact of various popular pre-training corpora on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of Transformers to choose the target model size, shape, and training setup. All our models and code are open-sourced at https://huggingface.co/bigscience .
Investigating the Origins of Fractality Based on Two Novel Fractal Network Models
Authors: Enikő Zakar-Polyák, Marcell Nagy, Roland Molontay
Subjects: Social and Information Networks (cs.SI); Discrete Mathematics (cs.DM); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2210.15505
Pdf link: https://arxiv.org/pdf/2210.15505
Abstract Numerous network models have been investigated to gain insights into the origins of fractality. In this work, we introduce two novel network models, to better understand the growing mechanism and structural characteristics of fractal networks. The Repulsion Based Fractal Model (RBFM) is built on the well-known Song-Havlin-Makse (SHM) model, but in RBFM repulsion is always present among a specific group of nodes. The model resolves the contradiction between the SHM model and the Hub Attraction Dynamical Growth model, by showing that repulsion is the characteristic that induces fractality. The Lattice Small-world Transition Model (LSwTM) was motivated by the fact that repulsion directly influences the node distances. Through LSwTM we study the fractal-small-world transition. The model illustrates the transition on a fixed number of nodes and edges using a preferential-attachment-based edge rewiring process. It shows that a small average distance works against fractal scaling, and also demonstrates that fractality is not a dichotomous property, continuous transition can be observed between the pure fractal and non-fractal characteristics.
Keyword: calibration

A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs
Authors: Hans Hao-Hsun Hsu, Yuesong Shen, Daniel Cremers
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2210.15575
Pdf link: https://arxiv.org/pdf/2210.15575
Abstract Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration error (ECE) and the agree/disagree ECEs, which provide criteria for uncertainty estimation on graphs beyond the nodewise setting. Our experiments demonstrate that the proposed edgewise metrics can complement the nodewise results and yield additional insights. Moreover, we show that GNN models which consider the structured prediction problem on graphs tend to have better uncertainty estimations, which illustrates the benefit of going beyond the nodewise setting.

ericbeyer / L-arxiv-interest-tracker

New submissions for Fri, 28 Oct 22 #674

Keyword: out of distribution detection

Keyword: out-of-distribution detection

Keyword: expected calibration error

A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs

Keyword: overconfident

Keyword: overconfidence

Keyword: confidence

RulE: Neural-Symbolic Knowledge Graph Reasoning with Rule Embedding

Deep Latent Mixture Model for Recommendation

Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack

Iterative pseudo-forced alignment by acoustic CTC loss for self-supervised ASR domain adaptation

Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

Towards Correlated Sequential Rules

Keyword: scaling

Neuro-symbolic partial differential equation solver

Accountable Safety for Rollups

Deep-MDS Framework for Recovering the 3D Shape of 2D Landmarks from a Single Image

Noise in the Clouds: Influence of Network Performance Variability on Application Scalability

What Language Model to Train if You Have One Million GPU Hours?

Investigating the Origins of Fractality Based on Two Novel Fractal Network Models

Keyword: calibration

A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs