New submissions for Mon, 12 Jun 23

Keyword: efficient

One-step Multi-view Clustering with Diverse Representation

Authors: Xinhang Wan, Jiyuan Liu, Jue Wang, Xinwang Liu, Siwei Wang, Yi Wen, Tianjiao Wan, En Zhu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.05437
Pdf link: https://arxiv.org/pdf/2306.05437
Abstract Multi-view clustering has attracted broad attention due to its capacity to utilize consistent and complementary information among views. Although tremendous progress has been made recently, most existing methods undergo high complexity, preventing them from being applied to large-scale tasks. Multi-view clustering via matrix factorization is a representative to address this issue. However, most of them map the data matrices into a fixed dimension, which limits the expressiveness of the model. Moreover, a range of methods suffer from a two-step process, i.e., multimodal learning and the subsequent $k$-means, inevitably causing a sub-optimal clustering result. In light of this, we propose a one-step multi-view clustering with diverse representation method, which incorporates multi-view learning and $k$-means into a unified framework. Specifically, we first project original data matrices into various latent spaces to attain comprehensive information and auto-weight them in a self-supervised manner. Then we directly use the information matrices under diverse dimensions to obtain consensus discrete clustering labels. The unified work of representation learning and clustering boosts the quality of the final results. Furthermore, we develop an efficient optimization algorithm to solve the resultant problem with proven convergence. Comprehensive experiments on various datasets demonstrate the promising clustering performance of our proposed method.
CLC: Cluster Assignment via Contrastive Representation Learning
Authors: Fei Ding, Dan Zhang, Yin Yang, Venkat Krovi, Feng Luo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.05439
Pdf link: https://arxiv.org/pdf/2306.05439
Abstract Clustering remains an important and challenging task of grouping samples into clusters without manual annotations. Recent works have achieved excellent results on small datasets by performing clustering on feature representations learned from self-supervised learning. However, for datasets with a large number of clusters, such as ImageNet, current methods still can not achieve high clustering performance. In this paper, we propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment. We decompose the representation into two parts: one encodes the categorical information under an equipartition constraint, and the other captures the instance-wise factors. We propose a contrastive loss using both parts of the representation. We theoretically analyze the proposed contrastive loss and reveal that CLC sets different weights for the negative samples while learning cluster assignments. Further gradient analysis shows that the larger weights tend to focus more on the hard negative samples. Therefore, the proposed loss has high expressiveness that enables us to efficiently learn cluster assignments. Experimental evaluation shows that CLC achieves overall state-of-the-art or highly competitive clustering performance on multiple benchmark datasets. In particular, we achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins (+ 10.2%).
On the Importance of Exploration for Generalization in Reinforcement Learning
Authors: Yiding Jiang, J. Zico Kolter, Roberta Raileanu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05483
Pdf link: https://arxiv.org/pdf/2306.05483
Abstract Existing approaches for improving generalization in deep reinforcement learning (RL) have mostly focused on representation learning, neglecting RL-specific aspects such as exploration. We hypothesize that the agent's exploration strategy plays a key role in its ability to generalize to new environments. Through a series of experiments in a tabular contextual MDP, we show that exploration is helpful not only for efficiently finding the optimal policy for the training environments but also for acquiring knowledge that helps decision making in unseen environments. Based on these observations, we propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high epistemic uncertainty through an ensemble of Q-value distributions. Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter, two benchmarks for generalization in RL with high-dimensional observations. The open-sourced implementation can be found at https://github.com/facebookresearch/ede .
Boosting with Tempered Exponential Measures
Authors: Richard Nock, Ehsan Amid, Manfred K. Warmuth
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.05487
Pdf link: https://arxiv.org/pdf/2306.05487
Abstract One of the most popular ML algorithms, AdaBoost, can be derived from the dual of a relative entropy minimization problem subject to the fact that the positive weights on the examples sum to one. Essentially, harder examples receive higher probabilities. We generalize this setup to the recently introduced {\it tempered exponential measure}s (TEMs) where normalization is enforced on a specific power of the measure and not the measure itself. TEMs are indexed by a parameter $t$ and generalize exponential families ($t=1$). Our algorithm, $t$-AdaBoost, recovers AdaBoost~as a special case ($t=1$). We show that $t$-AdaBoost retains AdaBoost's celebrated exponential convergence rate when $t\in [0,1)$ while allowing a slight improvement of the rate's hidden constant compared to $t=1$. $t$-AdaBoost partially computes on a generalization of classical arithmetic over the reals and brings notable properties like guaranteed bounded leveraging coefficients for $t\in [0,1)$. From the loss that $t$-AdaBoost minimizes (a generalization of the exponential loss), we show how to derive a new family of {\it tempered} losses for the induction of domain-partitioning classifiers like decision trees. Crucially, strict properness is ensured for all while their boosting rates span the full known spectrum. Experiments using $t$-AdaBoost+trees display that significant leverage can be achieved by tuning $t$.
Learnability with PAC Semantics for Multi-agent Beliefs
Authors: Ionela G. Mocanu, Vaishak Belle, Brendan Juba
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2306.05490
Pdf link: https://arxiv.org/pdf/2306.05490
Abstract The tension between deduction and induction is perhaps the most fundamental issue in areas such as philosophy, cognition and artificial intelligence. In an influential paper, Valiant recognised that the challenge of learning should be integrated with deduction. In particular, he proposed a semantics to capture the quality possessed by the output of Probably Approximately Correct (PAC) learning algorithms when formulated in a logic. Although weaker than classical entailment, it allows for a powerful model-theoretic framework for answering queries. In this paper, we provide a new technical foundation to demonstrate PAC learning with multi-agent epistemic logics. To circumvent the negative results in the literature on the difficulty of robust learning with the PAC semantics, we consider so-called implicit learning where we are able to incorporate observations to the background theory in service of deciding the entailment of an epistemic query. We prove correctness of the learning procedure and discuss results on the sample complexity, that is how many observations we will need to provably assert that the query is entailed given a user-specified error bound. Finally, we investigate under what circumstances this algorithm can be made efficient. On the last point, given that reasoning in epistemic logics especially in multi-agent epistemic logics is PSPACE-complete, it might seem like there is no hope for this problem. We leverage some recent results on the so-called Representation Theorem explored for single-agent and multi-agent epistemic logics with the only knowing operator to reduce modal reasoning to propositional reasoning.
PeFLL: A Lifelong Learning Approach to Personalized Federated Learning
Authors: Jonathan Scott, Hossein Zakerinia, Christoph H. Lampert
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05515
Pdf link: https://arxiv.org/pdf/2306.05515
Abstract Personalized federated learning (pFL) has emerged as a popular approach to dealing with the challenge of statistical heterogeneity between the data distributions of the participating clients. Instead of learning a single global model, pFL aims to learn an individual model for each client while still making use of the data available at other clients. In this work, we present PeFLL, a new pFL approach rooted in lifelong learning that performs well not only on clients present during its training phase, but also on any that may emerge in the future. PeFLL learns to output client specific models by jointly training an embedding network and a hypernetwork. The embedding network learns to represent clients in a latent descriptor space in a way that reflects their similarity to each other. The hypernetwork learns a mapping from this latent space to the space of possible client models. We demonstrate experimentally that PeFLL produces models of superior accuracy compared to previous methods, especially for clients not seen during training, and that it scales well to large numbers of clients. Moreover, generating a personalized model for a new client is efficient as no additional fine-tuning or optimization is required by either the client or the server. We also present theoretical results supporting PeFLL in the form of a new PAC-Bayesian generalization bound for lifelong learning and we prove the convergence of our proposed optimization procedure.
FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering
Authors: Megha Chakraborty, Khusbu Pahwa, Anku Rani, Adarsh Mahor, Aditya Pakala, Arghya Sarkar, Harshit Dave, Ishan Paul, Janvita Reddy, Preethi Gurumurthy, Ritvik G, Samahriti Mukherjee, Shreyas Chatterjee, Kinjal Sensharma, Dwip Dalal, Suryavardan S, Shreyash Mishra, Parth Patwa, Aman Chadha, Amit Sheth, Amitava Das
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2306.05523
Pdf link: https://arxiv.org/pdf/2306.05523
Abstract Combating disinformation is one of the burning societal crises -- about 67% of the American population believes that disinformation produces a lot of uncertainty, and 10% of them knowingly propagate disinformation. Evidence shows that disinformation can manipulate democratic processes and public opinion, causing disruption in the share market, panic and anxiety in society, and even death during crises. Therefore, disinformation should be identified promptly and, if possible, mitigated. With approximately 3.2 billion images and 720,000 hours of video shared online daily on social media platforms, scalable detection of multimodal disinformation requires efficient fact verification. Despite progress in automatic text-based fact verification (e.g., FEVER, LIAR), the research community lacks substantial effort in multimodal fact verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3 million samples that pushes the boundaries of the domain of fact verification via a multimodal fake news dataset, in addition to offering explainability through the concept of 5W question-answering. Salient features of the dataset include: (i) textual claims, (ii) ChatGPT-generated paraphrased claims, (iii) associated images, (iv) stable diffusion-generated additional images (i.e., visual paraphrases), (v) pixel-level image heatmap to foster image-text explainability of the claim, (vi) 5W QA pairs, and (vii) adversarial fake news stories.
AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization
Authors: Guan Wang, Weihua Li, Edmund M-K. Lai, Quan Bai
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.05537
Pdf link: https://arxiv.org/pdf/2306.05537
Abstract The rapid growth of information on the Internet has led to an overwhelming amount of opinions and comments on various activities, products, and services. This makes it difficult and time-consuming for users to process all the available information when making decisions. Text summarization, a Natural Language Processing (NLP) task, has been widely explored to help users quickly retrieve relevant information by generating short and salient content from long or multiple documents. Recent advances in pre-trained language models, such as ChatGPT, have demonstrated the potential of Large Language Models (LLMs) in text generation. However, LLMs require massive amounts of data and resources and are challenging to implement as offline applications. Furthermore, existing text summarization approaches often lack the ``adaptive" nature required to capture diverse aspects in opinion summarization, which is particularly detrimental to users with specific requirements or preferences. In this paper, we propose an Aspect-adaptive Knowledge-based Opinion Summarization model for product reviews, which effectively captures the adaptive nature required for opinion summarization. The model generates aspect-oriented summaries given a set of reviews for a particular product, efficiently providing users with useful information on specific aspects they are interested in, ensuring the generated summaries are more personalized and informative. Extensive experiments have been conducted using real-world datasets to evaluate the proposed model. The results demonstrate that our model outperforms state-of-the-art approaches and is adaptive and efficient in generating summaries that focus on particular aspects, enabling users to make well-informed decisions and catering to their diverse interests and preferences.
DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text
Authors: Jinyan Su, Terry Yue Zhuo, Di Wang, Preslav Nakov
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.05540
Pdf link: https://arxiv.org/pdf/2306.05540
Abstract With the rapid progress of large language models (LLMs) and the huge amount of text they generated, it becomes more and more impractical to manually distinguish whether a text is machine-generated. Given the growing use of LLMs in social media and education, it prompts us to develop methods to detect machine-generated text, preventing malicious usage such as plagiarism, misinformation, and propaganda. Previous work has studied several zero-shot methods, which require no training data. These methods achieve good performance, but there is still a lot of room for improvement. In this paper, we introduce two novel zero-shot methods for detecting machine-generated text by leveraging the log rank information. One is called DetectLLM-LRR, which is fast and efficient, and the other is called DetectLLM-NPR, which is more accurate, but slower due to the need for perturbations. Our experiments on three datasets and seven language models show that our proposed methods improve over the state of the art by 3.9 and 1.75 AUROC points absolute. Moreover, DetectLLM-NPR needs fewer perturbations than previous work to achieve the same level of performance, which makes it more practical for real-world use. We also investigate the efficiency--performance trade-off based on users preference on these two measures and we provide intuition for using them in practice effectively. We release the data and the code of both methods in https://github.com/mbzuai-nlp/DetectLLM
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping
Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05544
Pdf link: https://arxiv.org/pdf/2306.05544
Abstract Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.
A pseudo-reversible normalizing flow for stochastic dynamical systems with various initial distributions
Authors: Minglei Yang, Pengjun Wang, Diego del-Castillo-Negrete, Yanzhao Cao, Guannan Zhang
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2306.05580
Pdf link: https://arxiv.org/pdf/2306.05580
Abstract We present a pseudo-reversible normalizing flow method for efficiently generating samples of the state of a stochastic differential equation (SDE) with different initial distributions. The primary objective is to construct an accurate and efficient sampler that can be used as a surrogate model for computationally expensive numerical integration of SDE, such as those employed in particle simulation. After training, the normalizing flow model can directly generate samples of the SDE's final state without simulating trajectories. Existing normalizing flows for SDEs depend on the initial distribution, meaning the model needs to be re-trained when the initial distribution changes. The main novelty of our normalizing flow model is that it can learn the conditional distribution of the state, i.e., the distribution of the final state conditional on any initial state, such that the model only needs to be trained once and the trained model can be used to handle various initial distributions. This feature can provide a significant computational saving in studies of how the final state varies with the initial distribution. We provide a rigorous convergence analysis of the pseudo-reversible normalizing flow model to the target probability density function in the Kullback-Leibler divergence metric. Numerical experiments are provided to demonstrate the effectiveness of the proposed normalizing flow model.
The Viability of Domain Constrained Coalition Formation for Robotic Collectives
Authors: Grace Diehl, Julie A. Adams
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.05590
Pdf link: https://arxiv.org/pdf/2306.05590
Abstract Applications, such as military and disaster response, can benefit from robotic collectives' ability to perform multiple cooperative tasks (e.g., surveillance, damage assessments) efficiently across a large spatial area. Coalition formation algorithms can potentially facilitate collective robots' assignment to appropriate task teams; however, most coalition formation algorithms were designed for smaller multiple robot systems (i.e., 2-50 robots). Collectives' scale and domain-relevant constraints (i.e., distribution, near real-time, minimal communication) make coalition formation more challenging. This manuscript identifies the challenges inherent to designing coalition formation algorithms for very large collectives (e.g., 1000 robots). A survey of multiple robot coalition formation algorithms finds that most are unable to transfer directly to collectives, due to the identified system differences; however, auctions and hedonic games may be the most transferable. A simulation-based evaluation of three auction and hedonic game algorithms, applied to homogeneous and heterogeneous collectives, demonstrates that there are collective compositions for which no existing algorithm is viable; however, the experimental results and literature survey suggest paths forward.
Throughput of Hybrid UAV Networks with Scale-Free Topology
Authors: Zhiqing Wei, Ziyu Wang, Zeyang Meng, Ning Zhang, Huici Wu, Zhiyong Feng
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2306.05616
Pdf link: https://arxiv.org/pdf/2306.05616
Abstract Unmanned Aerial Vehicles (UAVs) hold great potential to support a wide range of applications due to the high maneuverability and flexibility. Compared with single UAV, UAV swarm carries out tasks efficiently in harsh environment, where the network resilience is of vital importance to UAV swarm. The network topology has a fundamental impact on the resilience of UAV network. It is discovered that scale-free network topology, as a topology that exists widely in nature, has the ability to enhance the network resilience. Besides, increasing network throughput can enhance the efficiency of information interaction, improving the network resilience. Facing these facts, this paper studies the throughput of UAV Network with scale-free topology. Introducing the hybrid network structure combining both ad hoc transmission mode and cellular transmission mode into UAV Network, the throughput of UAV Network is improved compared with that of pure ad hoc UAV network. Furthermore, this work also investigates the optimal setting of the hop threshold for the selection of ad hoc or cellular transmission mode. It is discovered that the optimal hop threshold is related with the number of UAVs and the parameters of scale-free topology. This paper may motivate the application of hybrid network structure into UAV Network.
Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs
Authors: Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05628
Pdf link: https://arxiv.org/pdf/2306.05628
Abstract To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP. Despite their great progress, comparatively little work has been done to explore the reliability of different knowledge points (nodes) in GNNs, especially their roles played during distillation. In this paper, we first quantify the knowledge reliability in GNN by measuring the invariance of their information entropy to noise perturbations, from which we observe that different knowledge points (1) show different distillation speeds (temporally); (2) are differentially distributed in the graph (spatially). To achieve reliable distillation, we propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point, based on which we sample a set of additional reliable knowledge points as supervision for training student MLPs. Extensive experiments show that KRD improves over the vanilla MLPs by 12.62% and outperforms its corresponding teacher GNNs by 2.16% averaged over 7 datasets and 3 GNN architectures.
Customizing General-Purpose Foundation Models for Medical Report Generation
Authors: Bang Yang, Asif Raza, Yuexian Zou, Tong Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.05642
Pdf link: https://arxiv.org/pdf/2306.05642
Abstract Medical caption prediction which can be regarded as a task of medical report generation (MRG), requires the automatic generation of coherent and accurate captions for the given medical images. However, the scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks capable of harnessing the potential artificial general intelligence power like large language models (LLMs). In this work, we propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs), in computer vision and natural language processing with a specific focus on medical report generation. Specifically, following BLIP-2, a state-of-the-art vision-language pre-training approach, we introduce our encoder-decoder-based MRG model. This model utilizes a lightweight query Transformer to connect two FMs: the giant vision Transformer EVA-ViT-g and a bilingual LLM trained to align with human intentions (referred to as ChatGLM-6B). Furthermore, we conduct ablative experiments on the trainable components of the model to identify the crucial factors for effective transfer learning. Our findings demonstrate that unfreezing EVA-ViT-g to learn medical image representations, followed by parameter-efficient training of ChatGLM-6B to capture the writing styles of medical reports, is essential for achieving optimal results. Our best attempt (PCLmed Team) achieved the 4th and the 2nd, respectively, out of 13 participating teams, based on the BERTScore and ROUGE-1 metrics, in the ImageCLEFmedical Caption 2023 Caption Prediction Task competition.
A fast reduced order method for linear parabolic inverse source problems
Authors: Yuxuan Huang, Yangwen Zhang
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2306.05677
Pdf link: https://arxiv.org/pdf/2306.05677
Abstract In this paper, we propose a novel, computationally efficient reduced order method to solve linear parabolic inverse source problems. Our approach provides accurate numerical solutions without relying on specific training data. The forward solution is constructed using a Krylov sequence, while the source term is recovered via the conjugate gradient (CG) method. Under a weak regularity assumption on the solution of the parabolic partial differential equations (PDEs), we establish convergence of the forward solution and provide a rigorous error estimate for our method. Numerical results demonstrate that our approach offers substantial computational savings compared to the traditional finite element method (FEM) and retains equivalent accuracy.
Space-time Trade-offs for the LCP Array of Wheeler DFAs
Authors: Nicola Cotumaccio, Travis Gagie, Dominik Köppl, Nicola Prezza
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2306.05684
Pdf link: https://arxiv.org/pdf/2306.05684
Abstract Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array requires $ O(n \log n) $ bits, $ n $ being the number of states, while the compact representation of Wheeler DFAs often requires much less space. In particular, the BOSS representation of a de Bruijn graph only requires a linear number of bits, if the size of alphabet is constant. In this paper, we propose a sampling technique that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits. We use our technique to provide a space-time trade-off to compute matching statistics on a Wheeler DFA. In addition, we show that by augmenting the BOSS representation of a $ k $-th order de Bruijn graph with a linear number of bits we can navigate the underlying variable-order de Bruijn graph in time logarithmic in $ k $, thus improving a previous bound by Boucher et al. which was linear in $ k $ [DCC 2015].
Single-Stage Visual Relationship Learning using Conditional Queries
Authors: Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.05689
Pdf link: https://arxiv.org/pdf/2306.05689
Abstract Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage models that are computationally efficient. With the advent of DETR, a set based detection model, one-stage models attempt to predict a set of subject-predicate-object triplets directly in a single shot. However, SGG is inherently a multi-task learning problem that requires modeling entity and predicate distributions simultaneously. In this paper, we propose Transformers with conditional queries for SGG, namely, TraCQ with a new formulation for SGG that avoids the multi-task learning problem and the combinatorial entity pair distribution. We employ a DETR-based encoder-decoder design and leverage conditional queries to significantly reduce the entity label space as well, which leads to 20% fewer parameters compared to state-of-the-art single-stage models. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset, yet is capable of end-to-end training and faster inference.
DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow
Authors: Risheek Garrepalli, Jisoo Jeong, Rajeswaran C Ravindran, Jamie Menjay Lin, Fatih Porikli
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.05691
Pdf link: https://arxiv.org/pdf/2306.05691
Abstract Recent advancements in neural network-based optical flow estimation often come with prohibitively high computational and memory requirements, presenting challenges in their model adaptation for mobile and low-power use cases. In this paper, we introduce a lightweight low-latency and memory-efficient model, Dynamic Iterative Field Transforms (DIFT), for optical flow estimation feasible for edge applications such as mobile, XR, micro UAVs, robotics and cameras. DIFT follows an iterative refinement framework leveraging variable resolution of cost volumes for correspondence estimation. We propose a memory efficient solution for cost volume processing to reduce peak memory. Also, we present a novel dynamic coarse-to-fine cost volume processing during various stages of refinement to avoid multiple levels of cost volumes. We demonstrate first real-time cost-volume based optical flow DL architecture on Snapdragon 8 Gen 1 HTP efficient mobile AI accelerator with 32 inf/sec and 5.89 EPE (endpoint error) on KITTI with manageable accuracy-performance tradeoffs.
Power Beacon Energy Consumption Minimization in Wireless Powered Backscatter Communication Networks
Authors: Haohang Yang, Yinghui Ye, Kai Liang, Xiaoli Chu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.05695
Pdf link: https://arxiv.org/pdf/2306.05695
Abstract Internet-of-Things (IoT) networks are expected to support the wireless connection of massive energy limited IoT nodes. The emerging wireless powered backscatter communications (WPBC) enable IoT nodes to harvest energy from the incident radio frequency signals transmitted by a power beacon (PB) to support their circuit operation, but the energy consumption of the PB (a potentially high cost borne by the network operator) has not been sufficiently studied for WPBC. In this paper, we aim to minimize the energy consumption of the PB while satisfying the throughput requirement per IoT node by jointly optimizing the time division multiple access (TDMA) time slot duration and backscatter reflection coefficient of each IoT node and the PB transmit power per time slot. As the formulated joint optimization problem is non-convex, we transform it into a convex problem by using auxiliary variables, then employ the Lagrange dual method to obtain the optimal solutions. To reduce the implementation complexity required for adjusting the PB's transmit power every time slot, we keep the PB transmit power constant in each time block and solve the corresponding PB energy consumption minimization problem by using auxiliary variables, the block coordinated decent method and the successive convex approximation technique. Based on the above solutions, two iterative algorithms are proposed for the dynamic PB transmit power scheme and the static PB transmit power scheme. The simulation results show that the dynamic PB transmit power scheme and the static PB transmit power scheme both achieve a lower PB energy consumption than the benchmark schemes, and the former achieves the lowest PB energy consumption.
Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization
Authors: Yan Sun, Li Shen, Dacheng Tao
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.05706
Pdf link: https://arxiv.org/pdf/2306.05706
Abstract Federated learning (FL) is a distributed paradigm that coordinates massive local clients to collaboratively train a global model via stage-wise local training processes on the heterogeneous dataset. Previous works have implicitly studied that FL suffers from the client-drift'' problem, which is caused by the inconsistent optimum across local clients. However, till now it still lacks solid theoretical analysis to explain the impact of this local inconsistency. To alleviate the negative impact of theclient drift'' and explore its substance in FL, in this paper, we first design an efficient FL algorithm \textit{FedInit}, which allows employing the personalized relaxed initialization state at the beginning of each local training stage. Specifically, \textit{FedInit} initializes the local state by moving away from the current global state towards the reverse direction of the latest local state. This relaxed initialization helps to revise the local divergence and enhance the local consistency level. Moreover, to further understand how inconsistency disrupts performance in FL, we introduce the excess risk analysis and study the divergence term to investigate the test error of the proposed \textit{FedInit} method. Our studies show that optimization error is not sensitive to this local inconsistency, while it mainly affects the generalization error bound in \textit{FedInit}. Extensive experiments are conducted to validate this conclusion. Our proposed \textit{FedInit} could achieve state-of-the-art~(SOTA) results compared to several advanced benchmarks without any additional costs. Meanwhile, stage-wise relaxed initialization could also be incorporated into the current advanced algorithms to achieve higher performance in the FL paradigm.
Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots
Authors: Jiange Yang, Wenhui Tan, Chuhao Jin, Bei Liu, Jianlong Fu, Ruihua Song, Limin Wang
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.05716
Pdf link: https://arxiv.org/pdf/2306.05716
Abstract Improving the generalization capabilities of general-purpose robotic agents has long been a significant challenge actively pursued by research communities. Existing approaches often rely on collecting large-scale real-world robotic data, such as the RT-1 dataset. However, these approaches typically suffer from low efficiency, limiting their capability in open-domain scenarios with new objects, and diverse backgrounds. In this paper, we propose a novel paradigm that effectively leverages language-grounded segmentation masks generated by state-of-the-art foundation models, to address a wide range of pick-and-place robot manipulation tasks in everyday scenarios. By integrating precise semantics and geometries conveyed from masks into our multi-view policy model, our approach can perceive accurate object poses and enable sample-efficient learning. Besides, such design facilitates effective generalization for grasping new objects with similar shapes observed during training. Our approach consists of two distinct steps. First, we introduce a series of foundation models to accurately ground natural language demands across multiple tasks. Second, we develop a Multi-modal Multi-view Policy Model that incorporates inputs such as RGB images, semantic masks, and robot proprioception states to jointly predict precise and executable robot actions. Extensive real-world experiments conducted on a Franka Emika robot arm validate the effectiveness of our proposed paradigm. Real-world demos are shown in YouTube (https://www.youtube.com/watch?v=1m9wNzfp_4E ) and Bilibili (https://www.bilibili.com/video/BV178411Z7H2/ ).
Advancing Counterfactual Inference through Quantile Regression
Authors: Shaoan Xie, Biwei Huang, Bin Gu, Tongliang Liu, Kun Zhang
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2306.05751
Pdf link: https://arxiv.org/pdf/2306.05751
Abstract The capacity to address counterfactual "what if" inquiries is crucial for understanding and making use of causal influences. Traditional counterfactual inference usually assumes a structural causal model is available. However, in practice, such a causal model is often unknown and may not be identifiable. This paper aims to perform reliable counterfactual inference based on the (learned) qualitative causal structure and observational data, without a given causal model or even directly estimating conditional distributions. We re-cast counterfactual reasoning as an extended quantile regression problem using neural networks. The approach is statistically more efficient than existing ones, and further makes it possible to develop the generalization ability of the estimated counterfactual outcome to unseen data and provide an upper bound on the generalization error. Experiment results on multiple datasets strongly support our theoretical claims.
Efficient GNN Explanation via Learning Removal-based Attribution
Authors: Yao Rong, Guanchu Wang, Qizhang Feng, Ninghao Liu, Zirui Liu, Enkelejda Kasneci, Xia Hu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.05760
Pdf link: https://arxiv.org/pdf/2306.05760
Abstract As Graph Neural Networks (GNNs) have been widely used in real-world applications, model explanations are required not only by users but also by legal regulations. However, simultaneously achieving high fidelity and low computational costs in generating explanations has been a challenge for current methods. In this work, we propose a framework of GNN explanation named LeArn Removal-based Attribution (LARA) to address this problem. Specifically, we introduce removal-based attribution and demonstrate its substantiated link to interpretability fidelity theoretically and experimentally. The explainer in LARA learns to generate removal-based attribution which enables providing explanations with high fidelity. A strategy of subgraph sampling is designed in LARA to improve the scalability of the training process. In the deployment, LARA can efficiently generate the explanation through a feed-forward pass. We benchmark our approach with other state-of-the-art GNN explanation methods on six datasets. Results highlight the effectiveness of our framework regarding both efficiency and fidelity. In particular, LARA is 3.5 times faster and achieves higher fidelity than the state-of-the-art method on the large dataset ogbn-arxiv (more than 160K nodes and 1M edges), showing its great potential in real-world applications. Our source code is available at https://anonymous.4open.science/r/LARA-10D8/README.md.
End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Authors: Anshul Nasery, Hardik Shah, Arun Sai Suggala, Prateek Jain
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05785
Pdf link: https://arxiv.org/pdf/2306.05785
Abstract Neural network (NN) compression via techniques such as pruning, quantization requires setting compression hyperparameters (e.g., number of channels to be pruned, bitwidths for quantization) for each layer either manually or via neural architecture search (NAS) which can be computationally expensive. We address this problem by providing an end-to-end technique that optimizes for model's Floating Point Operations (FLOPs) or for on-device latency via a novel $\frac{\ell_1}{\ell_2}$ latency surrogate. Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization. Crucially, it is fast and runs in almost the same amount of time as single model training; which is a significant training speed-up over standard NAS methods. For BERT compression on GLUE fine-tuning tasks, we achieve $50\%$ reduction in FLOPs with only $1\%$ drop in performance. For compressing MobileNetV3 on ImageNet-1K, we achieve $15\%$ reduction in FLOPs, and $11\%$ reduction in on-device latency without drop in accuracy, while still requiring $3\times$ less training compute than SOTA compression techniques. Finally, for transfer learning on smaller datasets, our technique identifies $1.2\times$-$1.4\times$ cheaper architectures than standard MobileNetV3, EfficientNet suite of architectures at almost the same training cost and accuracy.
Extending Kernel PCA through Dualization: Sparsity, Robustness and Fast Algorithms
Authors: Francesco Tonin, Alex Lambert, Panagiotis Patrinos, Johan A. K. Suykens
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.05815
Pdf link: https://arxiv.org/pdf/2306.05815
Abstract The goal of this paper is to revisit Kernel Principal Component Analysis (KPCA) through dualization of a difference of convex functions. This allows to naturally extend KPCA to multiple objective functions and leads to efficient gradient-based algorithms avoiding the expensive SVD of the Gram matrix. Particularly, we consider objective functions that can be written as Moreau envelopes, demonstrating how to promote robustness and sparsity within the same framework. The proposed method is evaluated on synthetic and real-world benchmarks, showing significant speedup in KPCA training time as well as highlighting the benefits in terms of robustness and sparsity.
Detecting Phishing Sites Using ChatGPT
Authors: Takashi Koide, Naoki Fukushi, Hiroki Nakano, Daiki Chiba
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.05816
Pdf link: https://arxiv.org/pdf/2306.05816
Abstract The rise of large language models (LLMs) has had a significant impact on various domains, including natural language processing and artificial intelligence. While LLMs such as ChatGPT have been extensively researched for tasks such as code generation and text synthesis, their application in detecting malicious web content, particularly phishing sites, has been largely unexplored. To combat the rising tide of automated cyber attacks facilitated by LLMs, it is imperative to automate the detection of malicious web content, which requires approaches that leverage the power of LLMs to analyze and classify phishing sites. In this paper, we propose a novel method that utilizes ChatGPT to detect phishing sites. Our approach involves leveraging a web crawler to gather information from websites and generate prompts based on this collected data. This approach enables us to detect various phishing sites without the need for fine-tuning machine learning models and identify social engineering techniques from the context of entire websites and URLs. To evaluate the performance of our proposed method, we conducted experiments using a dataset. The experimental results using GPT-4 demonstrated promising performance, with a precision of 98.3% and a recall of 98.4%. Comparative analysis between GPT-3.5 and GPT-4 revealed an enhancement in the latter's capability to reduce false negatives. These findings not only highlight the potential of LLMs in efficiently identifying phishing sites but also have significant implications for enhancing cybersecurity measures and protecting users from the dangers of online fraudulent activities.
Complexity of Reachability Problems in Neural Networks
Authors: Adrian Wurm
Subjects: Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2306.05818
Pdf link: https://arxiv.org/pdf/2306.05818
Abstract In this paper we investigate formal verification problems for Neural Network computations. Various reachability problems will be in the focus, such as: Given symbolic specifications of allowed inputs and outputs in form of Linear Programming instances, one question is whether valid inputs exist such that the given network computes a valid output? Does this property hold for all valid inputs? The former question's complexity has been investigated recently by S\"alzer and Lange for nets using the Rectified Linear Unit and the identity function as their activation functions. We complement their achievements by showing that the problem is NP-complete for piecewise linear functions with rational coefficients that are not linear, NP-hard for almost all suitable activation functions including non-linear ones that are continuous on an interval, complete for the Existential Theory of the Reals $\exists \mathbb R$ for every non-linear polynomial and $\exists \mathbb R$-hard for the exponential function and various sigmoidal functions. For the completeness results, linking the verification tasks with the theory of Constraint Satisfaction Problems turns out helpful.
Simulation of the 3D Radiative Transfer with Anisotropic Scattering for Convective Trails
Authors: Olivier Pironneau, Pierre-Henri Tournier
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2306.05833
Pdf link: https://arxiv.org/pdf/2306.05833
Abstract The integro-differential formulation of the RTE and its solution by iterations on the source has been extended here to handle anisotropic scattering. The iterative part of the method is O(N ln N ), thanks to an efficient use of H-matrices. The precision is good enough to evaluate the effect of sensitive parameters for the study of contrails. Most of the time the stratified 1D approximation should suffice, but in complex cases with high relief the 3D formulation is needed.
A Complete Proof Synthesis Method for the Cube of Type Systems
Authors: Gilles Dowek (DEDUCTEAM)
Subjects: Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2306.05835
Pdf link: https://arxiv.org/pdf/2306.05835
Abstract We present a complete proof synthesis method for the eight type systems of Barendregt's cube extended with $\eta$-conversion. Because these systems verify the proofs-as-objects paradigm, the proof synthesis method is a one level process merging unification and resolution. Then we present a variant of this method, which is incomplete but much more efficient. At last we show how to turn this algorithm into a unification algorithm.
Expectation-Complete Graph Representations with Homomorphisms
Authors: Pascal Welke, Maximilian Thiessen, Fabian Jogl, Thomas Gärtner
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2306.05838
Pdf link: https://arxiv.org/pdf/2306.05838
Abstract We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. Previous graph embeddings have limited expressiveness and either cannot distinguish all graphs or cannot be computed efficiently for every graph. To be able to approximate arbitrary functions on graphs, we are interested in efficient alternatives that become arbitrarily expressive with increasing resources. Our approach is based on Lov\'asz' characterisation of graph isomorphism through an infinite dimensional vector of homomorphism counts. Our empirical evaluation shows competitive results on several benchmark graph learning tasks.
Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
Authors: Ezgi Korkmaz, Jonah Brown-Cohen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.05873
Pdf link: https://arxiv.org/pdf/2306.05873
Abstract Learning in MDPs with highly complex state representations is currently possible due to multiple advancements in reinforcement learning algorithm design. However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss. Our method provides a theoretical basis for the fundamental cut-off between safe observations and adversarial observations. Furthermore, our technique is computationally efficient, and does not depend on the methods used to produce the worst-case directions. We conduct extensive experiments in the Arcade Learning Environment with several different adversarial attack techniques. Most significantly, we demonstrate the effectiveness of our approach even in the setting where non-robust directions are explicitly optimized to circumvent our proposed method.
Efficient parallelization strategy for real-time FE simulations
Authors: Ziqiu Zeng (MIMESIS, UNISTRA, ICube), Hadrien Courtecuisse (MIMESIS, UNISTRA, ICube)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.05893
Pdf link: https://arxiv.org/pdf/2306.05893
Abstract This paper introduces an efficient and generic framework for finite-element simulations under an implicit time integration scheme. Being compatible with generic constitutive models, a fast matrix assembly method exploits the fact that system matrices are created in a deterministic way as long as the mesh topology remains constant. Using the sparsity pattern of the assembled system brings about significant optimizations on the assembly stage. As a result, developed techniques of GPU-based parallelization can be directly applied with the assembled system. Moreover, an asynchronous Cholesky precondition scheme is used to improve the convergence of the system solver. On this basis, a GPU-based Cholesky preconditioner is developed, significantly reducing the data transfer between the CPU/GPU during the solving stage. We evaluate the performance of our method with different mesh elements and hyperelastic models and compare it with typical approaches on the CPU and the GPU.
TreeDQN: Learning to minimize Branch-and-Bound tree
Authors: Dmitry Sorokin, Alexander Kostin
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.05905
Pdf link: https://arxiv.org/pdf/2306.05905
Abstract Combinatorial optimization problems require an exhaustive search to find the optimal solution. A convenient approach to solving combinatorial optimization tasks in the form of Mixed Integer Linear Programs is Branch-and-Bound. Branch-and-Bound solver splits a task into two parts dividing the domain of an integer variable, then it solves them recursively, producing a tree of nested sub-tasks. The efficiency of the solver depends on the branchning heuristic used to select a variable for splitting. In the present work, we propose a reinforcement learning method that can efficiently learn the branching heuristic. We view the variable selection task as a tree Markov Decision Process, prove that the Bellman operator adapted for the tree Markov Decision Process is contracting in mean, and propose a modified learning objective for the reinforcement learning agent. Our agent requires less training data and produces smaller trees compared to previous reinforcement learning methods.
Sketch2Stress: Sketching with Structural Stress Awareness
Authors: Deng Yu, Chufeng Xiao, Manfred Lau, Hongbo Fu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2306.05911
Pdf link: https://arxiv.org/pdf/2306.05911
Abstract In the process of product design and digital fabrication, the structural analysis of a designed prototype is a fundamental and essential step. However, such a step is usually invisible or inaccessible to designers at the early sketching phase. This limits the user's ability to consider a shape's physical properties and structural soundness. To bridge this gap, we introduce a novel approach Sketch2Stress that allows users to perform structural analysis of desired objects at the sketching stage. This method takes as input a 2D freehand sketch and one or multiple locations of user-assigned external forces. With the specially-designed two-branch generative-adversarial framework, it automatically predicts a normal map and a corresponding structural stress map distributed over the user-sketched underlying object. In this way, our method empowers designers to easily examine the stress sustained everywhere and identify potential problematic regions of their sketched object. Furthermore, combined with the predicted normal map, users are able to conduct a region-wise structural analysis efficiently by aggregating the stress effects of multiple forces in the same direction. Finally, we demonstrate the effectiveness and practicality of our system with extensive experiments and user studies.
GAN-CAN: A Novel Attack to Behavior-Based Driver Authentication Systems
Authors: Emad Efatinasab, Francesco Marchiori, Denis Donadel, Alessandro Brighente
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.05923
Pdf link: https://arxiv.org/pdf/2306.05923
Abstract For many years, car keys have been the sole mean of authentication in vehicles. Whether the access control process is physical or wireless, entrusting the ownership of a vehicle to a single token is prone to stealing attempts. For this reason, many researchers started developing behavior-based authentication systems. By collecting data in a moving vehicle, Deep Learning (DL) models can recognize patterns in the data and identify drivers based on their driving behavior. This can be used as an anti-theft system, as a thief would exhibit a different driving style compared to the vehicle owner's. However, the assumption that an attacker cannot replicate the legitimate driver behavior falls under certain conditions. In this paper, we propose GAN-CAN, the first attack capable of fooling state-of-the-art behavior-based driver authentication systems in a vehicle. Based on the adversary's knowledge, we propose different GAN-CAN implementations. Our attack leverages the lack of security in the Controller Area Network (CAN) to inject suitably designed time-series data to mimic the legitimate driver. Our design of the malicious time series results from the combination of different Generative Adversarial Networks (GANs) and our study on the safety importance of the injected values during the attack. We tested GAN-CAN in an improved version of the most efficient driver behavior-based authentication model in the literature. We prove that our attack can fool it with an attack success rate of up to 0.99. We show how an attacker, without prior knowledge of the authentication system, can steal a car by deploying GAN-CAN in an off-the-shelf system in under 22 minutes.
Positivity certificates for linear recurrences
Authors: Alaa Ibrahim, Bruno Salvy
Subjects: Symbolic Computation (cs.SC); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2306.05930
Pdf link: https://arxiv.org/pdf/2306.05930
Abstract We show that for solutions of linear recurrences with polynomial coefficients of Poincar\'e type and with a unique simple dominant eigenvalue, positivity reduces to deciding the genericity of initial conditions in a precisely defined way. We give an algorithm that produces a certificate of positivity that is a data-structure for a proof by induction. This induction works by showing that an explicitly computed cone is contracted by the iteration of the recurrence.
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
Authors: Tal Daniel, Aviv Tamar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05957
Pdf link: https://arxiv.org/pdf/2306.05957
Abstract We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation. In comparison to existing slot- or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, deep dynamic latent particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform ``what-if'' generation -- predict the consequence of changing properties of objects in the initial frames, and DLP's compact structure enables efficient diffusion-based unconditional video generation. Videos, code and pre-trained models are available: https://taldatech.github.io/ddlp-web
Automating Model Comparison in Factor Graphs
Authors: Bart van Erp, Wouter W. L. Nuijten, Thijs van de Laar, Bert de Vries
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.05965
Pdf link: https://arxiv.org/pdf/2306.05965
Abstract Bayesian state and parameter estimation have been automated effectively in the literature, however, this has not yet been the case for model comparison, which therefore still requires error-prone and time-consuming manual derivations. As a result, model comparison is often overlooked and ignored, despite its importance. This paper efficiently automates Bayesian model averaging, selection, and combination by message passing on a Forney-style factor graph with a custom mixture node. Parameter and state inference, and model comparison can then be executed simultaneously using message passing with scale factors. This approach shortens the model design cycle and allows for the straightforward extension to hierarchical and temporal model priors to accommodate for modeling complicated time-varying processes.
Efficient Tensor-Product Spectral-Element Operators with the Summation-by-Parts Property on Curved Triangles and Tetrahedra
Authors: Tristan Montoya, David W. Zingg
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.05975
Pdf link: https://arxiv.org/pdf/2306.05975
Abstract We present an extension of the summation-by-parts (SBP) framework to tensor-product spectral-element operators in collapsed coordinates. The proposed approach enables the construction of provably stable discretizations of arbitrary order which combine the geometric flexibility of unstructured triangular and tetrahedral meshes with the efficiency of sum-factorization algorithms. Specifically, a methodology is developed for constructing triangular and tetrahedral spectral-element operators of any order which possess the SBP property (i.e. satisfying a discrete analogue of integration by parts) as well as a tensor-product decomposition. Such operators are then employed within the context of discontinuous spectral-element methods based on nodal expansions collocated at the tensor-product quadrature nodes as well as modal expansions employing Proriol-Koornwinder-Dubiner polynomials, the latter approach resolving the time step limitation associated with the singularity of the collapsed coordinate transformation. Energy-stable formulations for curvilinear meshes are obtained using a skew-symmetric splitting of the metric terms, and a weight-adjusted approximation is used to efficiently invert the curvilinear modal mass matrix. The proposed schemes are compared to those using non-tensorial multidimensional SBP operators, and are found to offer comparable accuracy to such schemes in the context of smooth linear advection problems on curved meshes, but at a reduced computational cost for higher polynomial degrees.
Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit
Authors: Xiaotong Cheng, Setareh Maghsudi
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.05998
Pdf link: https://arxiv.org/pdf/2306.05998
Abstract We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points. The agents face the identical piecewise-stationary MAB problem. The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step. Our proposed solution, Restarted Bayesian Online Change Point Detection in Cooperative Upper Confidence Bound Algorithm (RBO-Coop-UCB), involves an efficient multi-agent UCB algorithm as its core enhanced with a Bayesian change point detector. We also develop a simple restart decision cooperation that improves decision-making. Theoretically, we establish that the expected group regret of RBO-Coop-UCB is upper bounded by $\mathcal{O}(KNM\log T + K\sqrt{MT\log T})$, where K is the number of agents, M is the number of arms, and T is the number of time steps. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed method outperforms the state-of-the-art algorithms.
Semi-online Scheduling with Lookahead
Authors: Debasis Dwibedy, Rakesh Mohanty
Subjects: Data Structures and Algorithms (cs.DS); Operating Systems (cs.OS)
Arxiv link: https://arxiv.org/abs/2306.06003
Pdf link: https://arxiv.org/pdf/2306.06003
Abstract The knowledge of future partial information in the form of a lookahead to design efficient online algorithms is a theoretically-efficient and realistic approach to solving computational problems. Design and analysis of semi-online algorithms with extra-piece-of-information (EPI) as a new input parameter has gained the attention of the theoretical computer science community in the last couple of decades. Though competitive analysis is a pessimistic worst-case performance measure to analyze online algorithms, it has immense theoretical value in developing the foundation and advancing the state-of-the-art contributions in online and semi-online scheduling. In this paper, we study and explore the impact of lookahead as an EPI in the context of online scheduling in identical machine frameworks. We introduce a $k$-lookahead model and design improved competitive semi-online algorithms. For a $2$-identical machine setting, we prove a lower bound of $\frac{4}{3}$ and design an optimal algorithm with a matching upper bound of $\frac{4}{3}$ on the competitive ratio. For a $3$-identical machine setting, we show a lower bound of $\frac{15}{11}$ and design a $\frac{16}{11}$-competitive improved semi-online algorithm.
SNeL: A Structured Neuro-Symbolic Language for Entity-Based Multimodal Scene Understanding
Authors: Silvan Ferreira, Allan Martins, Ivanovitch Silva
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.06036
Pdf link: https://arxiv.org/pdf/2306.06036
Abstract In the evolving landscape of artificial intelligence, multimodal and Neuro-Symbolic paradigms stand at the forefront, with a particular emphasis on the identification and interaction with entities and their relations across diverse modalities. Addressing the need for complex querying and interaction in this context, we introduce SNeL (Structured Neuro-symbolic Language), a versatile query language designed to facilitate nuanced interactions with neural networks processing multimodal data. SNeL's expressive interface enables the construction of intricate queries, supporting logical and arithmetic operators, comparators, nesting, and more. This allows users to target specific entities, specify their properties, and limit results, thereby efficiently extracting information from a scene. By aligning high-level symbolic reasoning with low-level neural processing, SNeL effectively bridges the Neuro-Symbolic divide. The language's versatility extends to a variety of data types, including images, audio, and text, making it a powerful tool for multimodal scene understanding. Our evaluations demonstrate SNeL's potential to reshape the way we interact with complex neural networks, underscoring its efficacy in driving targeted information extraction and facilitating a deeper understanding of the rich semantics encapsulated in multimodal AI models.
Combining a Meta-Policy and Monte-Carlo Planning for Scalable Type-Based Reasoning in Partially Observable Environments
Authors: Jonathon Schwartz, Hanna Kurniawati, Marcus Hutter
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2306.06067
Pdf link: https://arxiv.org/pdf/2306.06067
Abstract The design of autonomous agents that can interact effectively with other agents without prior coordination is a core problem in multi-agent systems. Type-based reasoning methods achieve this by maintaining a belief over a set of potential behaviours for the other agents. However, current methods are limited in that they assume full observability of the state and actions of the other agent or do not scale efficiently to larger problems with longer planning horizons. Addressing these limitations, we propose Partially Observable Type-based Meta Monte-Carlo Planning (POTMMCP) - an online Monte-Carlo Tree Search based planning method for type-based reasoning in large partially observable environments. POTMMCP incorporates a novel meta-policy for guiding search and evaluating beliefs, allowing it to search more effectively to longer horizons using less planning time. We show that our method converges to the optimal solution in the limit and empirically demonstrate that it effectively adapts online to diverse sets of other agents across a range of environments. Comparisons with the state-of-the art method on problems with up to $10^{14}$ states and $10^8$ observations indicate that POTMMCP is able to compute better solutions significantly faster.
Improved flood mapping for efficient policy design by fusion of Sentinel-1, Sentinel-2, and Landsat-9 imagery to identify population and infrastructure exposed to floods
Authors: Usman Nazir, Muhammad Ahmad Waseem, Falak Sher Khan, Rabia Saeed, Syed Muhammad Hasan, Momin Uppal, Zubair Khalid
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.06074
Pdf link: https://arxiv.org/pdf/2306.06074
Abstract A reliable yet inexpensive tool for the estimation of flood water spread is conducive for efficient disaster management. The application of optical and SAR imagery in tandem provides a means of extended availability and enhanced reliability of flood mapping. We propose a methodology to merge these two types of imagery into a common data space and demonstrate its use in the identification of affected populations and infrastructure for the 2022 floods in Pakistan. The merging of optical and SAR data provides us with improved observations in cloud-prone regions; that is then used to gain additional insights into flood mapping applications. The use of open source datasets from WorldPop and OSM for population and roads respectively makes the exercise globally replicable. The integration of flood maps with spatial data on population and infrastructure facilitates informed policy design. We have shown that within the top five flood-affected districts in Sindh province, Pakistan, the affected population accounts for 31 %, while the length of affected roads measures 1410.25 km out of a total of 7537.96 km.
DeepSeaNet: Improving Underwater Object Detection using EfficientDet
Authors: Sanyam Jain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.06075
Pdf link: https://arxiv.org/pdf/2306.06075
Abstract Marine animals and deep underwater objects are difficult to recognize and monitor for safety of aquatic life. There is an increasing challenge when the water is saline with granular particles and impurities. In such natural adversarial environment, traditional approaches like CNN start to fail and are expensive to compute. This project involves implementing and evaluating various object detection models, including EfficientDet, YOLOv5, YOLOv8, and Detectron2, on an existing annotated underwater dataset, called the Brackish-Dataset. The dataset comprises annotated image sequences of fish, crabs, starfish, and other aquatic animals captured in Limfjorden water with limited visibility. The aim of this research project is to study the efficiency of newer models on the same dataset and contrast them with the previous results based on accuracy and inference time. Firstly, I compare the results of YOLOv3 (31.10% mean Average Precision (mAP)), YOLOv4 (83.72% mAP), YOLOv5 (97.6%), YOLOv8 (98.20%), EfficientDet (98.56% mAP) and Detectron2 (95.20% mAP) on the same dataset. Secondly, I provide a modified BiSkFPN mechanism (BiFPN neck with skip connections) to perform complex feature fusion in adversarial noise which makes modified EfficientDet robust to perturbations. Third, analyzed the effect on accuracy of EfficientDet (98.63% mAP) and YOLOv5 by adversarial learning (98.04% mAP). Last, I provide class activation map based explanations (CAM) for the two models to promote Explainability in black box models. Overall, the results indicate that modified EfficientDet achieved higher accuracy with five-fold cross validation than the other models with 88.54% IoU of feature maps.
Error Feedback Can Accurately Compress Preconditioners
Authors: Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic, Dan Alistarh
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.06098
Pdf link: https://arxiv.org/pdf/2306.06098
Abstract Leveraging second-order information at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to medium-scale models, as they must store a sliding window of gradients, whose memory requirements are multiplicative in the model dimension. In this paper, we address this issue via an efficient and simple-to-implement error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. Specifically, our approach compresses the gradient information via sparsification or low-rank compression \emph{before} it is fed into the preconditioner, feeding the compression error back into future iterations. Extensive experiments on deep neural networks for vision show that this approach can compress full-matrix preconditioners by up to two orders of magnitude without impact on accuracy, effectively removing the memory overhead of full-matrix preconditioning for implementations of full-matrix Adagrad (GGT) and natural gradient (M-FAC). Our code is available at https://github.com/IST-DASLab/EFCP.
Keyword: faster

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping
Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05544
Pdf link: https://arxiv.org/pdf/2306.05544
Abstract Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.
Single-Stage Visual Relationship Learning using Conditional Queries
Authors: Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.05689
Pdf link: https://arxiv.org/pdf/2306.05689
Abstract Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage models that are computationally efficient. With the advent of DETR, a set based detection model, one-stage models attempt to predict a set of subject-predicate-object triplets directly in a single shot. However, SGG is inherently a multi-task learning problem that requires modeling entity and predicate distributions simultaneously. In this paper, we propose Transformers with conditional queries for SGG, namely, TraCQ with a new formulation for SGG that avoids the multi-task learning problem and the combinatorial entity pair distribution. We employ a DETR-based encoder-decoder design and leverage conditional queries to significantly reduce the entity label space as well, which leads to 20% fewer parameters compared to state-of-the-art single-stage models. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset, yet is capable of end-to-end training and faster inference.
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
Authors: Haogeng Liu, Tao Wang, Jie Cao, Ran He, Jianhua Tao
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.05708
Pdf link: https://arxiv.org/pdf/2306.05708
Abstract Denoising Diffusion Probabilistic Models have shown extraordinary ability on various generative tasks. However, their slow inference speed renders them impractical in speech synthesis. This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. Firstly, we employ linear interpolation between the target and noise to design a diffusion sequence for training, while previously the diffusion path that links the noise and target is a curved segment. When decreasing the number of sampling steps (i.e., the number of line segments used to fit the path), the ease of fitting straight lines compared to curves allows us to generate higher quality samples from a random noise with fewer iterations. Secondly, to reduce computational complexity and achieve effective global modeling of noisy speech, LinDiff employs a patch-based processing approach that partitions the input signal into small patches. The patch-wise token leverages Transformer architecture for effective modeling of global information. Adversarial training is used to further improve the sample quality with decreased sampling steps. We test proposed method with speech synthesis conditioned on acoustic feature (Mel-spectrograms). Experimental results verify that our model can synthesize high-quality speech even with only one diffusion step. Both subjective and objective evaluations demonstrate that our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed (3 diffusion steps).
Efficient GNN Explanation via Learning Removal-based Attribution
Authors: Yao Rong, Guanchu Wang, Qizhang Feng, Ninghao Liu, Zirui Liu, Enkelejda Kasneci, Xia Hu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.05760
Pdf link: https://arxiv.org/pdf/2306.05760
Abstract As Graph Neural Networks (GNNs) have been widely used in real-world applications, model explanations are required not only by users but also by legal regulations. However, simultaneously achieving high fidelity and low computational costs in generating explanations has been a challenge for current methods. In this work, we propose a framework of GNN explanation named LeArn Removal-based Attribution (LARA) to address this problem. Specifically, we introduce removal-based attribution and demonstrate its substantiated link to interpretability fidelity theoretically and experimentally. The explainer in LARA learns to generate removal-based attribution which enables providing explanations with high fidelity. A strategy of subgraph sampling is designed in LARA to improve the scalability of the training process. In the deployment, LARA can efficiently generate the explanation through a feed-forward pass. We benchmark our approach with other state-of-the-art GNN explanation methods on six datasets. Results highlight the effectiveness of our framework regarding both efficiency and fidelity. In particular, LARA is 3.5 times faster and achieves higher fidelity than the state-of-the-art method on the large dataset ogbn-arxiv (more than 160K nodes and 1M edges), showing its great potential in real-world applications. Our source code is available at https://anonymous.4open.science/r/LARA-10D8/README.md.
Motion-DVAE: Unsupervised learning for fast human motion denoising
Authors: Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Renaud Séguier
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.05846
Pdf link: https://arxiv.org/pdf/2306.05846
Abstract Pose and motion priors are crucial for recovering realistic and accurate human motion from noisy observations. Substantial progress has been made on pose and shape estimation from images, and recent works showed impressive results using priors to refine frame-wise predictions. However, a lot of motion priors only model transitions between consecutive poses and are used in time-consuming optimization procedures, which is problematic for many applications requiring real-time motion capture. We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion. As part of the dynamical variational autoencoder (DVAE) models family, Motion-DVAE combines the generative capability of VAE models and the temporal modeling of recurrent architectures. Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches in a single framework for real-time 3D human pose estimation. Experiments show that the proposed approach reaches competitive performance with state-of-the-art methods while being much faster.
Towards Universally Optimal Shortest Paths Algorithms in the Hybrid Model
Authors: Philipp Schneider
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2306.05977
Pdf link: https://arxiv.org/pdf/2306.05977
Abstract A drawback of the classic approach for complexity analysis of distributed graph problems is that it mostly informs about the complexity of notorious classes of ``worst case'' graphs. Algorithms that are used to prove a tight (existential) bound are essentially optimized to perform well on such worst case graphs. However, such graphs are often either unlikely or actively avoided in practice, where benign graph instances usually admit much faster solutions. To circumnavigate these drawbacks, the concept of universal complexity analysis in the distributed setting was suggested by [Kutten and Peleg, PODC'95] and actively pursued by [Haeupler et al., STOC'21]. Here, the aim is to gauge the complexity of a distributed graph problem depending on the given graph instance. The challenge is to identify and understand the graph property that allows to accurately quantify the complexity of a distributed problem on a given graph. In the present work, we consider distributed shortest paths problems in the HYBRID model of distributed computing, where nodes have simultaneous access to two different modes of communication: one is restricted by locality and the other is restricted by congestion. We identify the graph parameter of neighborhood quality and show that it accurately describes a universal bound for the complexity of certain class of shortest paths problems in the HYBRID model.
Combining a Meta-Policy and Monte-Carlo Planning for Scalable Type-Based Reasoning in Partially Observable Environments
Authors: Jonathon Schwartz, Hanna Kurniawati, Marcus Hutter
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2306.06067
Pdf link: https://arxiv.org/pdf/2306.06067
Abstract The design of autonomous agents that can interact effectively with other agents without prior coordination is a core problem in multi-agent systems. Type-based reasoning methods achieve this by maintaining a belief over a set of potential behaviours for the other agents. However, current methods are limited in that they assume full observability of the state and actions of the other agent or do not scale efficiently to larger problems with longer planning horizons. Addressing these limitations, we propose Partially Observable Type-based Meta Monte-Carlo Planning (POTMMCP) - an online Monte-Carlo Tree Search based planning method for type-based reasoning in large partially observable environments. POTMMCP incorporates a novel meta-policy for guiding search and evaluating beliefs, allowing it to search more effectively to longer horizons using less planning time. We show that our method converges to the optimal solution in the limit and empirically demonstrate that it effectively adapts online to diverse sets of other agents across a range of environments. Comparisons with the state-of-the art method on problems with up to $10^{14}$ states and $10^8$ observations indicate that POTMMCP is able to compute better solutions significantly faster.
Keyword: mobile

Lightweight Monocular Depth Estimation via Token-Sharing Transformer
Authors: Dong-Jae Lee, Jae Young Lee, Hyounguk Shon, Eojindl Yi, Yeong-Hun Park, Sung-Sik Cho, Junmo Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2306.05682
Pdf link: https://arxiv.org/pdf/2306.05682
Abstract Depth estimation is an important task in various robotics systems and applications. In mobile robotics systems, monocular depth estimation is desirable since a single RGB camera can be deployable at a low cost and compact size. Due to its significant and growing needs, many lightweight monocular depth estimation networks have been proposed for mobile robotics systems. While most lightweight monocular depth estimation methods have been developed using convolution neural networks, the Transformer has been gradually utilized in monocular depth estimation recently. However, massive parameters and large computational costs in the Transformer disturb the deployment to embedded devices. In this paper, we present a Token-Sharing Transformer (TST), an architecture using the Transformer for monocular depth estimation, optimized especially in embedded devices. The proposed TST utilizes global token sharing, which enables the model to obtain an accurate depth prediction with high throughput in embedded devices. Experimental results show that TST outperforms the existing lightweight monocular depth estimation methods. On the NYU Depth v2 dataset, TST can deliver depth maps up to 63.4 FPS in NVIDIA Jetson nano and 142.6 FPS in NVIDIA Jetson TX2, with lower errors than the existing methods. Furthermore, TST achieves real-time depth estimation of high-resolution images on Jetson TX2 with competitive results.
DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow
Authors: Risheek Garrepalli, Jisoo Jeong, Rajeswaran C Ravindran, Jamie Menjay Lin, Fatih Porikli
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.05691
Pdf link: https://arxiv.org/pdf/2306.05691
Abstract Recent advancements in neural network-based optical flow estimation often come with prohibitively high computational and memory requirements, presenting challenges in their model adaptation for mobile and low-power use cases. In this paper, we introduce a lightweight low-latency and memory-efficient model, Dynamic Iterative Field Transforms (DIFT), for optical flow estimation feasible for edge applications such as mobile, XR, micro UAVs, robotics and cameras. DIFT follows an iterative refinement framework leveraging variable resolution of cost volumes for correspondence estimation. We propose a memory efficient solution for cost volume processing to reduce peak memory. Also, we present a novel dynamic coarse-to-fine cost volume processing during various stages of refinement to avoid multiple levels of cost volumes. We demonstrate first real-time cost-volume based optical flow DL architecture on Snapdragon 8 Gen 1 HTP efficient mobile AI accelerator with 32 inf/sec and 5.89 EPE (endpoint error) on KITTI with manageable accuracy-performance tradeoffs.
End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Authors: Anshul Nasery, Hardik Shah, Arun Sai Suggala, Prateek Jain
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05785
Pdf link: https://arxiv.org/pdf/2306.05785
Abstract Neural network (NN) compression via techniques such as pruning, quantization requires setting compression hyperparameters (e.g., number of channels to be pruned, bitwidths for quantization) for each layer either manually or via neural architecture search (NAS) which can be computationally expensive. We address this problem by providing an end-to-end technique that optimizes for model's Floating Point Operations (FLOPs) or for on-device latency via a novel $\frac{\ell_1}{\ell_2}$ latency surrogate. Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization. Crucially, it is fast and runs in almost the same amount of time as single model training; which is a significant training speed-up over standard NAS methods. For BERT compression on GLUE fine-tuning tasks, we achieve $50\%$ reduction in FLOPs with only $1\%$ drop in performance. For compressing MobileNetV3 on ImageNet-1K, we achieve $15\%$ reduction in FLOPs, and $11\%$ reduction in on-device latency without drop in accuracy, while still requiring $3\times$ less training compute than SOTA compression techniques. Finally, for transfer learning on smaller datasets, our technique identifies $1.2\times$-$1.4\times$ cheaper architectures than standard MobileNetV3, EfficientNet suite of architectures at almost the same training cost and accuracy.
DeepStay: Stay Region Extraction from Location Trajectories using Weak Supervision
Authors: Christian Löwens, Daniela Thyssens, Emma Andersson, Christina Jenkins, Lars Schmidt-Thieme
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.06068
Pdf link: https://arxiv.org/pdf/2306.06068
Abstract Nowadays, mobile devices enable constant tracking of the user's position and location trajectories can be used to infer personal points of interest (POIs) like homes, workplaces, or stores. A common way to extract POIs is to first identify spatio-temporal regions where a user spends a significant amount of time, known as stay regions (SRs). Common approaches to SR extraction are evaluated either solely unsupervised or on a small-scale private dataset, as popular public datasets are unlabeled. Most of these methods rely on hand-crafted features or thresholds and do not learn beyond hyperparameter optimization. Therefore, we propose a weakly and self-supervised transformer-based model called DeepStay, which is trained on location trajectories to predict stay regions. To the best of our knowledge, this is the first approach based on deep learning and the first approach that is evaluated on a public, labeled dataset. Our SR extraction method outperforms state-of-the-art methods. In addition, we conducted a limited experiment on the task of transportation mode detection from GPS trajectories using the same architecture and achieved significantly higher scores than the state-of-the-art. Our code is available at https://github.com/christianll9/deepstay.
Keyword: pruning

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Authors: Anshul Nasery, Hardik Shah, Arun Sai Suggala, Prateek Jain
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05785
Pdf link: https://arxiv.org/pdf/2306.05785
Abstract Neural network (NN) compression via techniques such as pruning, quantization requires setting compression hyperparameters (e.g., number of channels to be pruned, bitwidths for quantization) for each layer either manually or via neural architecture search (NAS) which can be computationally expensive. We address this problem by providing an end-to-end technique that optimizes for model's Floating Point Operations (FLOPs) or for on-device latency via a novel $\frac{\ell_1}{\ell_2}$ latency surrogate. Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization. Crucially, it is fast and runs in almost the same amount of time as single model training; which is a significant training speed-up over standard NAS methods. For BERT compression on GLUE fine-tuning tasks, we achieve $50\%$ reduction in FLOPs with only $1\%$ drop in performance. For compressing MobileNetV3 on ImageNet-1K, we achieve $15\%$ reduction in FLOPs, and $11\%$ reduction in on-device latency without drop in accuracy, while still requiring $3\times$ less training compute than SOTA compression techniques. Finally, for transfer learning on smaller datasets, our technique identifies $1.2\times$-$1.4\times$ cheaper architectures than standard MobileNetV3, EfficientNet suite of architectures at almost the same training cost and accuracy.
Keyword: diffusion

Word-Level Explanations for Analyzing Bias in Text-to-Image Models
Authors: Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05500
Pdf link: https://arxiv.org/pdf/2306.05500
Abstract Text-to-image models take a sentence (i.e., prompt) and generate images associated with this input prompt. These models have created award wining-art, videos, and even synthetic datasets. However, text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex. This paper investigates which word in the input prompt is responsible for bias in generated images. We introduce a method for computing scores for each word in the prompt; these scores represent its influence on biases in the model's output. Our method follows the principle of \emph{explaining by removing}, leveraging masked language models to calculate the influence scores. We perform experiments on Stable Diffusion to demonstrate that our method identifies the replication of societal stereotypes in generated images.
FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering
Authors: Megha Chakraborty, Khusbu Pahwa, Anku Rani, Adarsh Mahor, Aditya Pakala, Arghya Sarkar, Harshit Dave, Ishan Paul, Janvita Reddy, Preethi Gurumurthy, Ritvik G, Samahriti Mukherjee, Shreyas Chatterjee, Kinjal Sensharma, Dwip Dalal, Suryavardan S, Shreyash Mishra, Parth Patwa, Aman Chadha, Amit Sheth, Amitava Das
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2306.05523
Pdf link: https://arxiv.org/pdf/2306.05523
Abstract Combating disinformation is one of the burning societal crises -- about 67% of the American population believes that disinformation produces a lot of uncertainty, and 10% of them knowingly propagate disinformation. Evidence shows that disinformation can manipulate democratic processes and public opinion, causing disruption in the share market, panic and anxiety in society, and even death during crises. Therefore, disinformation should be identified promptly and, if possible, mitigated. With approximately 3.2 billion images and 720,000 hours of video shared online daily on social media platforms, scalable detection of multimodal disinformation requires efficient fact verification. Despite progress in automatic text-based fact verification (e.g., FEVER, LIAR), the research community lacks substantial effort in multimodal fact verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3 million samples that pushes the boundaries of the domain of fact verification via a multimodal fake news dataset, in addition to offering explainability through the concept of 5W question-answering. Salient features of the dataset include: (i) textual claims, (ii) ChatGPT-generated paraphrased claims, (iii) associated images, (iv) stable diffusion-generated additional images (i.e., visual paraphrases), (v) pixel-level image heatmap to foster image-text explainability of the claim, (vi) 5W QA pairs, and (vii) adversarial fake news stories.
Explicit synchronous partitioned scheme for coupled reduced order models based on composite reduced bases
Authors: Amy de Castro, Pavel Bochev, Paul Kuberry, Irina Tezaur
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.05531
Pdf link: https://arxiv.org/pdf/2306.05531
Abstract This paper formulates, analyzes, and demonstrates numerically a method for the partitioned solution of coupled interface problems involving combinations of projection-based reduced order models (ROM) and/or full order methods (FOMs). The method builds on the partitioned scheme developed in [1], which starts from a well-posed formulation of the coupled interface problem and uses its dual Schur complement to obtain an approximation of the interface flux. Explicit time integration of this problem decouples its subdomain equations and enables their independent solution on each subdomain. Extension of this partitioned scheme to coupled ROM-ROM or ROM-FOM problems required formulations with non-singular Schur complements. To obtain these problems, we project a well-posed coupled FOM-FOM problem onto a composite reduced basis comprising separate sets of basis vectors for the interface and interior variables, and use the interface reduced basis as a Lagrange multiplier. Our analysis confirms that the resulting coupled ROM-ROM and ROM-FOM problems have provably non-singular Schur complements, independent of the mesh size and the reduced basis size. In the ROM-FOM case, analysis shows that one can also use the interface FOM space as a Lagrange multiplier. We illustrate the theoretical and computational properties of the partitioned scheme through reproductive and predictive tests for a model advection-diffusion transmission problem.
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping
Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05544
Pdf link: https://arxiv.org/pdf/2306.05544
Abstract Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.
Reconstructing the somatotopic organization of the corticospinal tract remains a challenge for modern tractography methods
Authors: Jianzhong He, Fan Zhang, Yiang Pan, Yuanjing Feng, Jarrett Rushmore, Erickson Torio, Yogesh Rathi, Nikos Makris, Ron Kikinis, Alexandra J.Golby, Lauren J.ODonnell
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.05623
Pdf link: https://arxiv.org/pdf/2306.05623
Abstract The corticospinal tract (CST) is a critically important white matter fiber tract in the human brain that enables control of voluntary movements of the body. Diffusion MRI tractography is the only method that enables the study of the anatomy and variability of the CST pathway in human health. In this work, we explored the performance of six widely used tractography methods for reconstructing the CST and its somatotopic organization. We perform experiments using diffusion MRI data from the Human Connectome Project. Four quantitative measurements including reconstruction rate, the WM-GM interface coverage, anatomical distribution of streamlines, and correlation with cortical volumes to assess the advantages and limitations of each method. Overall, we conclude that while current tractography methods have made progress toward the well-known challenge of improving the reconstruction of the lateral projections of the CST, the overall problem of performing a comprehensive CST reconstruction, including clinically important projections in the lateral (hand and face area) and medial portions (leg area), remains an important challenge for diffusion MRI tractography.
RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models
Authors: Xingchen Zhou, Ying He, F. Richard Yu, Jianqiang Li, You Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2306.05668
Pdf link: https://arxiv.org/pdf/2306.05668
Abstract The emergence of Neural Radiance Fields (NeRF) has promoted the development of synthesized high-fidelity views of the intricate real world. However, it is still a very demanding task to repaint the content in NeRF. In this paper, we propose a novel framework that can take RGB images as input and alter the 3D content in neural scenes. Our work leverages existing diffusion models to guide changes in the designated 3D content. Specifically, we semantically select the target object and a pre-trained diffusion model will guide the NeRF model to generate new 3D objects, which can improve the editability, diversity, and application range of NeRF. Experiment results show that our algorithm is effective for editing 3D objects in NeRF under different text prompts, including editing appearance, shape, and more. We validate our method on both real-world datasets and synthetic-world datasets for these editing tasks. Please visit https://repaintnerf.github.io for a better view of our results.
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
Authors: Haogeng Liu, Tao Wang, Jie Cao, Ran He, Jianhua Tao
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.05708
Pdf link: https://arxiv.org/pdf/2306.05708
Abstract Denoising Diffusion Probabilistic Models have shown extraordinary ability on various generative tasks. However, their slow inference speed renders them impractical in speech synthesis. This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. Firstly, we employ linear interpolation between the target and noise to design a diffusion sequence for training, while previously the diffusion path that links the noise and target is a curved segment. When decreasing the number of sampling steps (i.e., the number of line segments used to fit the path), the ease of fitting straight lines compared to curves allows us to generate higher quality samples from a random noise with fewer iterations. Secondly, to reduce computational complexity and achieve effective global modeling of noisy speech, LinDiff employs a patch-based processing approach that partitions the input signal into small patches. The patch-wise token leverages Transformer architecture for effective modeling of global information. Adversarial training is used to further improve the sample quality with decreased sampling steps. We test proposed method with speech synthesis conditioned on acoustic feature (Mel-spectrograms). Experimental results verify that our model can synthesize high-quality speech even with only one diffusion step. Both subjective and objective evaluations demonstrate that our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed (3 diffusion steps).
Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Authors: Yida Chen, Fernanda Viégas, Martin Wattenberg
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05720
Pdf link: https://arxiv.org/pdf/2306.05720
Abstract Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output.
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
Authors: Tal Daniel, Aviv Tamar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05957
Pdf link: https://arxiv.org/pdf/2306.05957
Abstract We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation. In comparison to existing slot- or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, deep dynamic latent particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform ``what-if'' generation -- predict the consequence of changing properties of objects in the initial frames, and DLP's compact structure enables efficient diffusion-based unconditional video generation. Videos, code and pre-trained models are available: https://taldatech.github.io/ddlp-web
Neural FIM for learning Fisher Information Metrics from point cloud data
Authors: Oluwadamilola Fasina, Guilluame Huguet, Alexander Tong, Yanlei Zhang, Guy Wolf, Maximilian Nickel, Ian Adelstein, Smita Krishnaswamy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.06062
Pdf link: https://arxiv.org/pdf/2306.06062
Abstract Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the underlying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM's utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).
Keyword: adaptive

AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization
Authors: Guan Wang, Weihua Li, Edmund M-K. Lai, Quan Bai
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.05537
Pdf link: https://arxiv.org/pdf/2306.05537
Abstract The rapid growth of information on the Internet has led to an overwhelming amount of opinions and comments on various activities, products, and services. This makes it difficult and time-consuming for users to process all the available information when making decisions. Text summarization, a Natural Language Processing (NLP) task, has been widely explored to help users quickly retrieve relevant information by generating short and salient content from long or multiple documents. Recent advances in pre-trained language models, such as ChatGPT, have demonstrated the potential of Large Language Models (LLMs) in text generation. However, LLMs require massive amounts of data and resources and are challenging to implement as offline applications. Furthermore, existing text summarization approaches often lack the ``adaptive" nature required to capture diverse aspects in opinion summarization, which is particularly detrimental to users with specific requirements or preferences. In this paper, we propose an Aspect-adaptive Knowledge-based Opinion Summarization model for product reviews, which effectively captures the adaptive nature required for opinion summarization. The model generates aspect-oriented summaries given a set of reviews for a particular product, efficiently providing users with useful information on specific aspects they are interested in, ensuring the generated summaries are more personalized and informative. Extensive experiments have been conducted using real-world datasets to evaluate the proposed model. The results demonstrate that our model outperforms state-of-the-art approaches and is adaptive and efficient in generating summaries that focus on particular aspects, enabling users to make well-informed decisions and catering to their diverse interests and preferences.
Learning Domain-Aware Detection Head with Prompt Tuning
Authors: Haochen Li, Rui Zhang, Hantao Yao, Xinkai Song, Yifan Hao, Yongwei Zhao, Ling Li, Yunji Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.05718
Pdf link: https://arxiv.org/pdf/2306.05718
Abstract Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. However, existing methods focus on reducing the domain bias of the detection backbone by inferring a discriminative visual encoder, while ignoring the domain bias in the detection head. Inspired by the high generalization of vision-language models (VLMs), applying a VLM as the robust detection backbone following a domain-aware detection head is a reasonable way to learn the discriminative detector for each domain, rather than reducing the domain bias in traditional methods. To achieve the above issue, we thus propose a novel DAOD framework named Domain-Aware detection head with Prompt tuning (DA-Pro), which applies the learnable domain-adaptive prompt to generate the dynamic detection head for each domain. Formally, the domain-adaptive prompt consists of the domain-invariant tokens, domain-specific tokens, and the domain-related textual description along with the class label. Furthermore, two constraints between the source and target domains are applied to ensure that the domain-adaptive prompt can capture the domains-shared and domain-specific knowledge. A prompt ensemble strategy is also proposed to reduce the effect of prompt disturbance. Comprehensive experiments over multiple cross-domain adaptation tasks demonstrate that using the domain-adaptive prompt can produce an effectively domain-related detection head for boosting domain-adaptive object detection.
DP-HyPO: An Adaptive Private Hyperparameter Optimization Framework
Authors: Hua Wang, Sheng Gao, Huanyu Zhang, Weijie J. Su, Milan Shen
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2306.05734
Pdf link: https://arxiv.org/pdf/2306.05734
Abstract Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to allow privacy-preserving hyperparameter optimization is to uniformly and randomly select hyperparameters for a number of runs, subsequently reporting the best-performing hyperparameter. In contrast, in non-private settings, practitioners commonly utilize "adaptive" hyperparameter optimization methods such as Gaussian process-based optimization, which select the next candidate based on information gathered from previous outputs. This substantial contrast between private and non-private hyperparameter optimization underscores a critical concern. In our paper, we introduce DP-HyPO, a pioneering framework for "adaptive" private hyperparameter optimization, aiming to bridge the gap between private and non-private hyperparameter optimization. To accomplish this, we provide a comprehensive differential privacy analysis of our framework. Furthermore, we empirically demonstrate the effectiveness of DP-HyPO on a diverse set of real-world and synthetic datasets.
Adaptivity Complexity for Causal Graph Discovery
Authors: Davin Choo, Kirankumar Shiragur
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Methodology (stat.ME); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.05781
Pdf link: https://arxiv.org/pdf/2306.05781
Abstract Causal discovery from interventional data is an important problem, where the task is to design an interventional strategy that learns the hidden ground truth causal graph $G(V,E)$ on $|V| = n$ nodes while minimizing the number of performed interventions. Most prior interventional strategies broadly fall into two categories: non-adaptive and adaptive. Non-adaptive strategies decide on a single fixed set of interventions to be performed while adaptive strategies can decide on which nodes to intervene on sequentially based on past interventions. While adaptive algorithms may use exponentially fewer interventions than their non-adaptive counterparts, there are practical concerns that constrain the amount of adaptivity allowed. Motivated by this trade-off, we study the problem of $r$-adaptivity, where the algorithm designer recovers the causal graph under a total of $r$ sequential rounds whilst trying to minimize the total number of interventions. For this problem, we provide a $r$-adaptive algorithm that achieves $O(\min{r,\log n} \cdot n^{1/\min{r,\log n}})$ approximation with respect to the verification number, a well-known lower bound for adaptive algorithms. Furthermore, for every $r$, we show that our approximation is tight. Our definition of $r$-adaptivity interpolates nicely between the non-adaptive ($r=1$) and fully adaptive ($r=n$) settings where our approximation simplifies to $O(n)$ and $O(\log n)$ respectively, matching the best-known approximation guarantees for both extremes. Our results also extend naturally to the bounded size interventions.
Reflective Conditions for Radiative Transfer in Integral Form with H-Matrices
Authors: Olivier Pironneau, Pierre-Henri Tournier
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2306.05789
Pdf link: https://arxiv.org/pdf/2306.05789
Abstract In a recent article the authors showed that the radiative Transfer equations with multiple frequencies and scattering can be formulated as a nonlinear integral system. In the present article, the formulation is extended to handle reflective boundary conditions. The fixed point method to solve the system is shown to be monotone. The discretization is done with a $P^1$ Finite Element Method. The convolution integrals are precomputed at every vertices of the mesh and stored in compressed hierarchical matrices, using Partially Pivoted Adaptive Cross-Approximation. Then the fixed point iterations involve only matrix vector products. The method is $O(N\sqrt[3]{N}\ln N)$, with respect to the number of vertices, when everything is smooth. A numerical implementation is proposed and tested on two examples. As there are some analogies with ray tracing the programming is complex.
Domain-Agnostic Batch Bayesian Optimization with Diverse Constraints via Bayesian Quadrature
Authors: Masaki Adachi, Satoshi Hayakawa, Xingchen Wan, Martin Jørgensen, Harald Oberhauser, Michael A. Osborne
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Computation (stat.CO); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.05843
Pdf link: https://arxiv.org/pdf/2306.05843
Abstract Real-world optimisation problems often feature complex combinations of (1) diverse constraints, (2) discrete and mixed spaces, and are (3) highly parallelisable. (4) There are also cases where the objective function cannot be queried if unknown constraints are not satisfied, e.g. in drug discovery, safety on animal experiments (unknown constraints) must be established before human clinical trials (querying objective function) may proceed. However, most existing works target each of the above three problems in isolation and do not consider (4) unknown constraints with query rejection. For problems with diverse constraints and/or unconventional input spaces, it is difficult to apply these techniques as they are often mutually incompatible. We propose cSOBER, a domain-agnostic prudent parallel active sampler for Bayesian optimisation, based on SOBER of Adachi et al. (2023). We consider infeasibility under unknown constraints as a type of integration error that we can estimate. We propose a theoretically-driven approach that propagates such error as a tolerance in the quadrature precision that automatically balances exploitation and exploration with the expected rejection rate. Moreover, our method flexibly accommodates diverse constraints and/or discrete and mixed spaces via adaptive tolerance, including conventional zero-risk cases. We show that cSOBER outperforms competitive baselines on diverse real-world blackbox-constrained problems, including safety-constrained drug discovery, and human-relationship-aware team optimisation over graph-structured space.
Adaptive Multi-Armed Bandit Learning for Task Offloading in Edge Computing
Authors: Lin Wang, Jingjing Zhang
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.05856
Pdf link: https://arxiv.org/pdf/2306.05856
Abstract The widespread adoption of edge computing has emerged as a prominent trend for alleviating task processing delays and reducing energy consumption. However, the dynamic nature of network conditions and the varying computation capacities of edge servers (ESs) can introduce disparities between computation loads and available computing resources in edge computing networks, potentially leading to inadequate service quality. To address this challenge, this paper investigates a practical scenario characterized by dynamic task offloading. Initially, we examine traditional Multi-armed Bandit (MAB) algorithms, namely the $\varepsilon$-greedy algorithm and the UCB1-based algorithm. However, both algorithms exhibit certain weaknesses in effectively addressing the tidal data traffic patterns. Consequently, based on MAB, we propose an adaptive task offloading algorithm (ATOA) that overcomes these limitations. By conducting extensive simulations, we demonstrate the superiority of our ATOA solution in reducing task processing latency compared to conventional MAB methods. This substantiates the effectiveness of our approach in enhancing the performance of edge computing networks and improving overall service quality.
Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects
Authors: Zhuofan Ying, Peter Hase, Mohit Bansal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05963
Pdf link: https://arxiv.org/pdf/2306.05963
Abstract Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD settings where the contexts are either irrelevant (Background-Invariance) or beneficial (Object-Disambiguation), reflecting the diverse contextual challenges faced in biological vision. We then analyze model performance in these two different OOD settings and demonstrate that models that excel in one setting tend to struggle in the other. Notably, prior works on learning causal features improve on one setting but hurt in the other. This underscores the importance of generalizing across both OOD settings, as this ability is crucial for both human cognition and robust AI systems. Next, to better understand the model properties contributing to OOD generalization, we use representational geometry analysis and our own probing methods to examine a population of models, and we discover that those with more factorized representations and appropriate feature weighting are more successful in handling Background-Invariance and Object-Disambiguation tests. We further validate these findings through causal intervention on representation factorization and feature weighting to demonstrate their causal effect on performance. Lastly, we propose new augmentation methods to enhance model generalization. These methods outperform strong baselines, yielding improvements in both in-distribution and OOD tests. In conclusion, to replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations and appropriately weight both kinds of features.
CARSO: Counter-Adversarial Recall of Synthetic Observations
Authors: Emanuele Ballarin, Alessio Ansuini, Luca Bortolussi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.06081
Pdf link: https://arxiv.org/pdf/2306.06081
Abstract In this paper, we propose a novel adversarial defence mechanism for image classification -- CARSO -- inspired by cues from cognitive neuroscience. The method is synergistically complementary to adversarial training and relies on knowledge of the internal representation of the attacked classifier. Exploiting a generative model for adversarial purification, conditioned on such representation, it samples reconstructions of inputs to be finally classified. Experimental evaluation by a well-established benchmark of varied, strong adaptive attacks, across diverse image datasets and classifier architectures, shows that CARSO is able to defend the classifier significantly better than state-of-the-art adversarial training alone -- with a tolerable clean accuracy toll. Furthermore, the defensive architecture succeeds in effectively shielding itself from unforeseen threats, and end-to-end attacks adapted to fool stochastic defences. Code and pre-trained models are available at https://github.com/emaballarin/CARSO .
Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Authors: Konstantin Mishchenko, Aaron Defazio
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.06101
Pdf link: https://arxiv.org/pdf/2306.06101
Abstract We consider the problem of estimating the learning rate in adaptive methods, such as Adagrad and Adam. We describe two techniques, Prodigy and Resetting, to provably estimate the distance to the solution $D$, which is needed to set the learning rate optimally. Our techniques are modifications of the D-Adaptation method for learning-rate-free learning. Our methods improve upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test our methods on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approaches consistently outperform D-Adaptation and reach test accuracy values close to that of hand-tuned Adam.
Keyword: quantization

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Authors: Anshul Nasery, Hardik Shah, Arun Sai Suggala, Prateek Jain
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05785
Pdf link: https://arxiv.org/pdf/2306.05785
Abstract Neural network (NN) compression via techniques such as pruning, quantization requires setting compression hyperparameters (e.g., number of channels to be pruned, bitwidths for quantization) for each layer either manually or via neural architecture search (NAS) which can be computationally expensive. We address this problem by providing an end-to-end technique that optimizes for model's Floating Point Operations (FLOPs) or for on-device latency via a novel $\frac{\ell_1}{\ell_2}$ latency surrogate. Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization. Crucially, it is fast and runs in almost the same amount of time as single model training; which is a significant training speed-up over standard NAS methods. For BERT compression on GLUE fine-tuning tasks, we achieve $50\%$ reduction in FLOPs with only $1\%$ drop in performance. For compressing MobileNetV3 on ImageNet-1K, we achieve $15\%$ reduction in FLOPs, and $11\%$ reduction in on-device latency without drop in accuracy, while still requiring $3\times$ less training compute than SOTA compression techniques. Finally, for transfer learning on smaller datasets, our technique identifies $1.2\times$-$1.4\times$ cheaper architectures than standard MobileNetV3, EfficientNet suite of architectures at almost the same training cost and accuracy.

A-suozhang / GetArxivDaily

New submissions for Mon, 12 Jun 23 #80

Keyword: efficient

One-step Multi-view Clustering with Diverse Representation

CLC: Cluster Assignment via Contrastive Representation Learning

On the Importance of Exploration for Generalization in Reinforcement Learning

Boosting with Tempered Exponential Measures

Learnability with PAC Semantics for Multi-agent Beliefs

PeFLL: A Lifelong Learning Approach to Personalized Federated Learning

FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering

AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization

DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

A pseudo-reversible normalizing flow for stochastic dynamical systems with various initial distributions

The Viability of Domain Constrained Coalition Formation for Robotic Collectives

Throughput of Hybrid UAV Networks with Scale-Free Topology

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

Customizing General-Purpose Foundation Models for Medical Report Generation

A fast reduced order method for linear parabolic inverse source problems

Space-time Trade-offs for the LCP Array of Wheeler DFAs

Single-Stage Visual Relationship Learning using Conditional Queries

DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow

Power Beacon Energy Consumption Minimization in Wireless Powered Backscatter Communication Networks

Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Advancing Counterfactual Inference through Quantile Regression

Efficient GNN Explanation via Learning Removal-based Attribution

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates

Extending Kernel PCA through Dualization: Sparsity, Robustness and Fast Algorithms

Detecting Phishing Sites Using ChatGPT

Complexity of Reachability Problems in Neural Networks

Simulation of the 3D Radiative Transfer with Anisotropic Scattering for Convective Trails

A Complete Proof Synthesis Method for the Cube of Type Systems

Expectation-Complete Graph Representations with Homomorphisms

Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

Efficient parallelization strategy for real-time FE simulations

TreeDQN: Learning to minimize Branch-and-Bound tree

Sketch2Stress: Sketching with Structural Stress Awareness

GAN-CAN: A Novel Attack to Behavior-Based Driver Authentication Systems

Positivity certificates for linear recurrences

DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

Automating Model Comparison in Factor Graphs

Efficient Tensor-Product Spectral-Element Operators with the Summation-by-Parts Property on Curved Triangles and Tetrahedra

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

Semi-online Scheduling with Lookahead

SNeL: A Structured Neuro-Symbolic Language for Entity-Based Multimodal Scene Understanding

Combining a Meta-Policy and Monte-Carlo Planning for Scalable Type-Based Reasoning in Partially Observable Environments

Improved flood mapping for efficient policy design by fusion of Sentinel-1, Sentinel-2, and Landsat-9 imagery to identify population and infrastructure exposed to floods

DeepSeaNet: Improving Underwater Object Detection using EfficientDet

Error Feedback Can Accurately Compress Preconditioners

Keyword: faster

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

Single-Stage Visual Relationship Learning using Conditional Queries

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

Efficient GNN Explanation via Learning Removal-based Attribution

Motion-DVAE: Unsupervised learning for fast human motion denoising

Towards Universally Optimal Shortest Paths Algorithms in the Hybrid Model

Combining a Meta-Policy and Monte-Carlo Planning for Scalable Type-Based Reasoning in Partially Observable Environments

Keyword: mobile

Lightweight Monocular Depth Estimation via Token-Sharing Transformer

DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates

DeepStay: Stay Region Extraction from Location Trajectories using Weak Supervision

Keyword: pruning

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates

Keyword: diffusion

Word-Level Explanations for Analyzing Bias in Text-to-Image Models

FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering

Explicit synchronous partitioned scheme for coupled reduced order models based on composite reduced bases

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

Reconstructing the somatotopic organization of the corticospinal tract remains a challenge for modern tractography methods

RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

Neural FIM for learning Fisher Information Metrics from point cloud data

Keyword: adaptive

AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization

Learning Domain-Aware Detection Head with Prompt Tuning

DP-HyPO: An Adaptive Private Hyperparameter Optimization Framework