New submissions for Wed, 12 Jul 23

Keyword: efficient

Q-YOLO: Efficient Inference for Real-time Object Detection

Authors: Mingze Wang, Huixin Sun, Jun Shi, Xuhui Liu, Baochang Zhang, Xianbin Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.04816
Pdf link: https://arxiv.org/pdf/2307.04816
Abstract Real-time object detection plays a vital role in various computer vision applications. However, deploying real-time object detectors on resource-constrained platforms poses challenges due to high computational and memory requirements. This paper describes a low-bit quantization method to build a highly efficient one-stage detector, dubbed as Q-YOLO, which can effectively address the performance degradation problem caused by activation distribution imbalance in traditional quantized YOLO models. Q-YOLO introduces a fully end-to-end Post-Training Quantization (PTQ) pipeline with a well-designed Unilateral Histogram-based (UH) activation quantization scheme, which determines the maximum truncation values through histogram analysis by minimizing the Mean Squared Error (MSE) quantization errors. Extensive experiments on the COCO dataset demonstrate the effectiveness of Q-YOLO, outperforming other PTQ methods while achieving a more favorable balance between accuracy and computational cost. This research contributes to advancing the efficient deployment of object detection models on resource-limited edge devices, enabling real-time detection with reduced computational and memory overhead.
SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees
Authors: Aleksei Sorokin, Xinran Zhu, Eric Hans Lee, Bolong Cheng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Mathematical Software (cs.MS)
Arxiv link: https://arxiv.org/abs/2307.04849
Pdf link: https://arxiv.org/pdf/2307.04849
Abstract Gradient boosted trees (GBTs) are ubiquitous models used by researchers, machine learning (ML) practitioners, and data scientists because of their robust performance, interpretable behavior, and ease-of-use. One critical challenge in training GBTs is the tuning of their hyperparameters. In practice, selecting these hyperparameters is often done manually. Recently, the ML community has advocated for tuning hyperparameters through black-box optimization and developed state-of-the-art systems to do so. However, applying such systems to tune GBTs suffers from two drawbacks. First, these systems are not \textit{model-aware}, rather they are designed to apply to a \textit{generic} model; this leaves significant optimization performance on the table. Second, using these systems requires \textit{domain knowledge} such as the choice of hyperparameter search space, which is an antithesis to the automatic experimentation that black-box optimization aims to provide. In this paper, we present SigOpt Mulch, a model-aware hyperparameter tuning system specifically designed for automated tuning of GBTs that provides two improvements over existing systems. First, Mulch leverages powerful techniques in metalearning and multifidelity optimization to perform model-aware hyperparameter optimization. Second, it automates the process of learning performant hyperparameters by making intelligent decisions about the optimization search space, thus reducing the need for user domain knowledge. These innovations allow Mulch to identify good GBT hyperparameters far more efficiently -- and in a more seamless and user-friendly way -- than existing black-box hyperparameter tuning systems.
SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features
Authors: Sanjay Kariyappa, Leonidas Tsepenekas, Freddy Lécué, Daniele Magazzeni
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.04850
Pdf link: https://arxiv.org/pdf/2307.04850
Abstract The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.
Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning
Authors: Gaurav Bagwe, Xiaoyong Yuan, Miao Pan, Lan Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.04869
Pdf link: https://arxiv.org/pdf/2307.04869
Abstract Federated continual learning (FCL) learns incremental tasks over time from confidential datasets distributed across clients. This paper focuses on rehearsal-free FCL, which has severe forgetting issues when learning new tasks due to the lack of access to historical task data. To address this issue, we propose Fed-CPrompt based on prompt learning techniques to obtain task-specific prompts in a communication-efficient way. Fed-CPrompt introduces two key components, asynchronous prompt learning, and contrastive continual loss, to handle asynchronous task arrival and heterogeneous data distributions in FCL, respectively. Extensive experiments demonstrate the effectiveness of Fed-CPrompt in achieving SOTA rehearsal-free FCL performance.
Temporal network compression via network hashing
Authors: Rémi Vaudaine, Pierre Borgnat, Paulo Goncalves, Rémi Gribonval, Márton Karsai
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Data Structures and Algorithms (cs.DS); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2307.04890
Pdf link: https://arxiv.org/pdf/2307.04890
Abstract Pairwise temporal interactions between entities can be represented as temporal networks, which code the propagation of processes such as epidemic spreading or information cascades, evolving on top of them. The largest outcome of these processes is directly linked to the structure of the underlying network. Indeed, a node of a network at given time cannot affect more nodes in the future than it can reach via time-respecting paths. This set of nodes reachable from a source defines an out-component, which identification is costly. In this paper, we propose an efficient matrix algorithm to tackle this issue and show that it outperforms other state-of-the-art methods. Secondly, we propose a hashing framework to coarsen large temporal networks into smaller proxies on which out-components are easier to estimate, and then recombined to obtain the initial components. Our graph hashing solution has implications in privacy respecting representation of temporal networks.
Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer
Authors: Zhun Yang, Adam Ishay, Joohyung Lee
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.04895
Pdf link: https://arxiv.org/pdf/2307.04895
Abstract Constraint satisfaction problems (CSPs) are about finding values of variables that satisfy the given constraints. We show that Transformer extended with recurrence is a viable approach to learning to solve CSPs in an end-to-end manner, having clear advantages over state-of-the-art methods such as Graph Neural Networks, SATNet, and some neuro-symbolic models. With the ability of Transformer to handle visual input, the proposed Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. We also show how to leverage deductive knowledge of discrete constraints in the Transformer's inductive learning to achieve sample-efficient learning and semi-supervised learning for CSPs.
FedYolo: Augmenting Federated Learning with Pretrained Transformers
Authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.04905
Pdf link: https://arxiv.org/pdf/2307.04905
Abstract The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.
Probabilistic Counterexample Guidance for Safer Reinforcement Learning
Authors: Xiaotong Ji, Antonio Filieri
Subjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2307.04927
Pdf link: https://arxiv.org/pdf/2307.04927
Abstract Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counterexample generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.
Intrinsically motivated graph exploration using network theories of human curiosity
Authors: Shubhankar P. Patankar, Mathieu Ouellet, Juan Cervino, Alejandro Ribeiro, Kieran A. Murphy, Dani S. Bassett
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2307.04962
Pdf link: https://arxiv.org/pdf/2307.04962
Abstract Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by the visited nodes in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to larger environments and to longer exploratory walks than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that curiosity-based recommendations are more predictive of human behavior than PageRank centrality for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.
Secrets of RLHF in Large Language Models Part I: PPO
Authors: Rui Zheng, Shihan Dou, Songyang Gao, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Limao Xiong, Lu Chen, Zhiheng Xi, Yuhao Zhou, Nuo Xu, Wenbin Lai, Minghao Zhu, Rongxiang Weng, Wensen Cheng, Cheng Chang, Zhangyue Yin, Yuan Hua, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.04964
Pdf link: https://arxiv.org/pdf/2307.04964
Abstract Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes
Model-Driven Sensing-Node Selection and Power Allocation for Tracking Maneuvering Targets in Perceptive Mobile Networks
Authors: Lei Xie, Shenghui Song, Yonina C. Eldar
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.04977
Pdf link: https://arxiv.org/pdf/2307.04977
Abstract Maneuvering target tracking will be an important service of future wireless networks to assist innovative applications such as intelligent transportation. However, tracking maneuvering targets by cellular networks faces many challenges. For example, the dense network and high-speed targets make the selection of the sensing nodes (SNs), e.g., base stations, and the associated power allocation very difficult, given the stringent latency requirement of sensing applications. Existing methods have demonstrated engaging tracking performance, but with very high computational complexity. In this paper, we propose a model-driven deep learning approach for SN selection to meet the latency requirement. To this end, we first propose an iterative SN selection method by jointly exploiting the majorization-minimization (MM) framework and the alternating direction method of multipliers (ADMM). Then, we unfold the iterative algorithm as a deep neural network (DNN) and prove its convergence. The proposed model-driven method has a low computational complexity, because the number of layers is less than the number of iterations required by the original algorithm, and each layer only involves simple matrix-vector additions/multiplications. Finally, we propose an efficient power allocation method based on fixed point (FP) water filling (WF) and solve the joint SN selection and power allocation problem under the alternative optimization framework. Simulation results show that the proposed method achieves better performance than the conventional optimization-based methods with much lower computational complexity.
Monotone deep Boltzmann machines
Authors: Zhili Feng, Ezra Winston, J. Zico Kolter
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.04990
Pdf link: https://arxiv.org/pdf/2307.04990
Abstract Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in order to allow for more efficient inference. In this work, we revisit the generic DBM approach, and ask the question: are there other possible restrictions to their design that would enable efficient (approximate) inference? In particular, we develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer, but restricts the \emph{weights} in a manner that guarantees the existence and global uniqueness of a mean-field fixed point. To do this, we leverage tools from the recently-proposed monotone Deep Equilibrium model and show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution. While this approach is still largely conceptual, it is the first architecture that allows for efficient approximate inference in fully-general weight structures for DBMs. We apply this approach to simple deep convolutional Boltzmann architectures and demonstrate that it allows for tasks such as the joint completion and classification of images, within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Authors: Zixuan Ma, Haojie Wang, Jingze Xing, Liyan Zheng, Chen Zhang, Huanqi Cao, Kezhao Huang, Shizhi Tang, Penghan Wang, Jidong Zhai
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2307.04995
Pdf link: https://arxiv.org/pdf/2307.04995
Abstract Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
Optimization of Adams-type difference formulas in Hilbert space $W_2^{(2,1)}(0,1)$
Authors: Kh. M. Shadimetov, R. S. Karimov
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.05026
Pdf link: https://arxiv.org/pdf/2307.05026
Abstract In this paper, we consider the problem of constructing new optimal explicit and implicit Adams-type difference formulas for finding an approximate solution to the Cauchy problem for an ordinary differential equation in a Hilbert space. In this work, I minimize the norm of the error functional of the difference formula with respect to the coefficients, we obtain a system of linear algebraic equations for the coefficients of the difference formulas. This system of equations is reduced to a system of equations in convolution and the system of equations is completely solved using a discrete analog of a differential operator $d^2/dx^2-1$. Here we present an algorithm for constructing optimal explicit and implicit difference formulas in a specific Hilbert space. In addition, comparing the Euler method with optimal explicit and implicit difference formulas, numerical experiments are given. Experiments show that the optimal formulas give a good approximation compared to the Euler method.
Number Systems for Deep Neural Network Architectures: A Survey
Authors: Ghada Alsuhli, Vasileios Sakellariou, Hani Saleh, Mahmoud Al-Qutayri, Baker Mohammad, Thanos Stouraitis
Subjects: Neural and Evolutionary Computing (cs.NE); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05035
Pdf link: https://arxiv.org/pdf/2307.05035
Abstract Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted
Strong convergence in the infinite horizon of numerical methods for stochastic differential equations
Authors: Wei Liu, Yudong Wang
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2307.05039
Pdf link: https://arxiv.org/pdf/2307.05039
Abstract The strong convergence of numerical methods for stochastic differential equations (SDEs) for $t\in[0,\infty)$ is proved. The result is applicable to any one-step numerical methods with Markov property that have the finite time strong convergence and the uniformly bounded moment. In addition, the convergence of the numerical stationary distribution to the underlying one can be derived from this result. To demonstrate the application of this result, the strong convergence in the infinite horizon of the backward Euler-Maruyama method in the $L^p$ sense for some small $p\in (0,1)$ is proved for SDEs with super-linear coefficients, which is also a a standalone new result. Numerical simulations are provided to illustrate the theoretical results.
Maximizing Social Welfare in Score-Based Social Distance Games
Authors: Robert Ganian, Thekla Hamm, Dušan Knop, Sanjukta Roy, Šimon Schierreich, Ondřej Suchý
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.05061
Pdf link: https://arxiv.org/pdf/2307.05061
Abstract Social distance games have been extensively studied as a coalition formation model where the utilities of agents in each coalition were captured using a utility function u that took into account distances in a given social network. In this paper, we consider a non-normalized score-based definition of social distance games where the utility function u_v depends on a generic scoring vector v, which may be customized to match the specifics of each individual application scenario. As our main technical contribution, we establish the tractability of computing a welfare-maximizing partitioning of the agents into coalitions on tree-like networks, for every score-based function u_v. We provide more efficient algorithms when dealing with specific choices of u_v or simpler networks, and also extend all of these results to computing coalitions that are Nash stable or individually rational. We view these results as a further strong indication of the usefulness of the proposed score-based utility function: even on very simple networks, the problem of computing a welfare-maximizing partitioning into coalitions remains open for the originally considered canonical function u.
A Theory of Bounded Inductive Rationality
Authors: Caspar Oesterheld (Carnegie Mellon University), Abram Demski (Machine Intelligence Research Institute), Vincent Conitzer (Carnegie Mellon University)
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05068
Pdf link: https://arxiv.org/pdf/2307.05068
Abstract The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to.
SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation
Authors: Zhengxin Lei, Feng Xu, Jiangtao Wei, Feng Cai, Feng Wang, Ya-Qiu Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.05087
Pdf link: https://arxiv.org/pdf/2307.05087
Abstract SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms with neural networks to propose a novel NeRF model for SAR image generation. Following the mapping and projection pinciples, a set of SAR images is modeled implicitly as a function of attenuation coefficients and scattering intensities in the 3D imaging space through a differentiable rendering equation. SAR-NeRF is then constructed to learn the distribution of attenuation coefficients and scattering intensities of voxels, where the vectorized form of 3D voxel SAR rendering equation and the sampling relationship between the 3D space voxels and the 2D view ray grids are analytically derived. Through quantitative experiments on various datasets, we thoroughly assess the multi-view representation and generalization capabilities of SAR-NeRF. Additionally, it is found that SAR-NeRF augumented dataset can significantly improve SAR target classification performance under few-shot learning setup, where a 10-type classification accuracy of 91.6\% can be achieved by using only 12 images per class.
Rational Solutions of Parametric First-Order Algebraic Differential Equations
Authors: Sebastian Falkensteiner, Rafael Sendra
Subjects: Symbolic Computation (cs.SC)
Arxiv link: https://arxiv.org/abs/2307.05102
Pdf link: https://arxiv.org/pdf/2307.05102
Abstract In this paper we give a procedure for finding rational solutions of a given first-order ODE with functional and constant coefficients which occur in a rational way. We derive an associated system with the same solvability, and sufficient and necessary conditions for the existence of rational solutions are given. In the case where all parametric coefficients are constant, we give an algorithm to compute the rational solutions. In the case where one functional coefficient appears, we algorithmically find rational general solutions which rationally depend on the appearing transcendental constant. In the other cases, the presented procedure is not completely algorithmic.
Conformalization of Sparse Generalized Linear Models
Authors: Etash Kumar Guha, Eugene Ndiaye, Xiaoming Huo
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.05109
Pdf link: https://arxiv.org/pdf/2307.05109
Abstract Given a sequence of observable variables ${(x_1, y_1), \ldots, (x_n, yn)}$, the conformal prediction method estimates a confidence set for $y{n+1}$ given $x{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization
Authors: Kunal Suri, Prakhar Mishra, Saumajit Saha, Atul Singh
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05162
Pdf link: https://arxiv.org/pdf/2307.05162
Abstract Finetuning Large Language Models helps improve the results for domain-specific use cases. End-to-end finetuning of large language models is time and resource intensive and has high storage requirements to store the finetuned version of the large language model. Parameter Efficient Fine Tuning (PEFT) methods address the time and resource challenges by keeping the large language model as a fixed base and add additional layers, which the PEFT methods finetune. This paper demonstrates the evaluation results for one such PEFT method Low Rank Adaptation (LoRA), for Clinical Dialogue Summarization. The evaluation results show that LoRA works at par with end-to-end finetuning for a large language model. The paper presents the evaluations done for solving both the Subtask A and B from ImageCLEFmedical {https://www.imageclef.org/2023/medical}
Neural Quantile Optimization for Edge-Cloud Computing
Authors: Bin Du, He Zhang, Xiangle Cheng, Lei Zhang
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.05170
Pdf link: https://arxiv.org/pdf/2307.05170
Abstract We seek the best traffic allocation scheme for the edge-cloud computing network that satisfies constraints and minimizes the cost based on burstable billing. First, for a fixed network topology, we formulate a family of integer programming problems with random parameters describing the various traffic demands. Then, to overcome the difficulty caused by the discrete feature of the problem, we generalize the Gumbel-softmax reparameterization method to induce an unconstrained continuous optimization problem as a regularized continuation of the discrete problem. Finally, we introduce the Gumbel-softmax sampling network to solve the optimization problems via unsupervised learning. The network structure reflects the edge-cloud computing topology and is trained to minimize the expectation of the cost function for unconstrained continuous optimization problems. The trained network works as an efficient traffic allocation scheme sampler, remarkably outperforming the random strategy in feasibility and cost function value. Besides testing the quality of the output allocation scheme, we examine the generalization property of the network by increasing the time steps and the number of users. We also feed the solution to existing integer optimization solvers as initial conditions and verify the warm-starts can accelerate the short-time iteration process. The framework is general with solid performance, and the decoupled feature of the random neural networks is adequate for practical implementations.
Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Authors: Long Bai, Mobarakol Islam, Hongliang Ren
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.05182
Pdf link: https://arxiv.org/pdf/2307.05182
Abstract Medical students and junior surgeons often rely on senior surgeons and specialists to answer their questions when learning surgery. However, experts are often busy with clinical and academic work, and have little time to give guidance. Meanwhile, existing deep learning (DL)-based surgical Visual Question Answering (VQA) systems can only provide simple answers without the location of the answers. In addition, vision-language (ViL) embedding is still a less explored research in these kinds of tasks. Therefore, a surgical Visual Question Localized-Answering (VQLA) system would be helpful for medical students and junior surgeons to learn and understand from recorded surgical videos. We propose an end-to-end Transformer with Co-Attention gaTed Vision-Language (CAT-ViL) for VQLA in surgical scenarios, which does not require feature extraction through detection models. The CAT-ViL embedding module is designed to fuse heterogeneous features from visual and textual sources. The fused embedding will feed a standard Data-Efficient Image Transformer (DeiT) module, before the parallel classifier and detector for joint prediction. We conduct the experimental validation on public surgical videos from MICCAI EndoVis Challenge 2017 and 2018. The experimental results highlight the superior performance and robustness of our proposed model compared to the state-of-the-art approaches. Ablation studies further prove the outstanding performance of all the proposed components. The proposed method provides a promising solution for surgical scene understanding, and opens up a primary step in the Artificial Intelligence (AI)-based VQLA system for surgical training. Our code is publicly available.
Membership Inference Attacks on DNNs using Adversarial Perturbations
Authors: Hassan Ali, Adnan Qayyum, Ala Al-Fuqaha, Junaid Qadir
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.05193
Pdf link: https://arxiv.org/pdf/2307.05193
Abstract Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attack (LiRA) and enhanced MI attack (EMIA) -- only perform well on complex datasets (e.g., CIFAR-10 and Imagenet) where the target DNN overfits its train set, but perform poorly on simpler datasets (0% TPR by both attacks on Fashion-MNIST, 2% and 0% TPR respectively by LiRA and EMIA on MNIST at 1% FPR). To address this, firstly, we unify current MI attacks by presenting a framework divided into three stages -- preparation, indication and decision. Secondly, we utilize the framework to propose two novel attacks: (1) Adversarial Membership Inference Attack (AMIA) efficiently utilizes the membership and the non-membership information of the subjects while adversarially minimizing a novel loss function, achieving 6% TPR on both Fashion-MNIST and MNIST datasets; and (2) Enhanced AMIA (E-AMIA) combines EMIA and AMIA to achieve 8% and 4% TPRs on Fashion-MNIST and MNIST datasets respectively, at 1% FPR. Thirdly, we introduce two novel augmented indicators that positively leverage the loss information in the Gaussian neighborhood of a subject. This improves TPR of all four attacks on average by 2.5% and 0.25% respectively on Fashion-MNIST and MNIST datasets at 1% FPR. Finally, we propose simple, yet novel, evaluation metric, the running TPR average (RTA) at a given FPR, that better distinguishes different MI attacks in the low FPR region. We also show that AMIA and E-AMIA are more transferable to the unknown DNNs (other than the target DNN) and are more robust to DP-SGD training as compared to LiRA and EMIA.
The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework
Authors: Chao Wang, Zheng Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05201
Pdf link: https://arxiv.org/pdf/2307.05201
Abstract In the context of label-efficient learning on video data, the distillation method and the structural design of the teacher-student architecture have a significant impact on knowledge distillation. However, the relationship between these factors has been overlooked in previous research. To address this gap, we propose a new weakly supervised learning framework for knowledge distillation in video classification that is designed to improve the efficiency and accuracy of the student model. Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages. We also employ the progressive cascade training method to address the accuracy loss caused by the large capacity gap between the teacher and the student. Additionally, we propose a pseudo-label optimization strategy to improve the initial data label. To optimize the loss functions of different distillation substages during the training process, we introduce a new loss method based on feature distribution. We conduct extensive experiments on both real and simulated data sets, demonstrating that our proposed approach outperforms existing distillation methods in terms of knowledge distillation for video classification tasks. Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
Attribute Controlled Dialogue Prompting
Authors: Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05228
Pdf link: https://arxiv.org/pdf/2307.05228
Abstract Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous prompting assume fixed prompts for all data samples within a task, neglecting the fact that inputs vary greatly in some tasks such as open-domain dialogue generation. In this paper, we present a novel, instance-specific prompt-tuning algorithm for dialogue generation. Specifically, we generate prompts based on instance-level control code, rather than the conversation history, to explore their impact on controlled dialogue generation. Experiments on popular open-domain dialogue datasets, evaluated on both automated metrics and human evaluation, demonstrate that our method is superior to prompting baselines and comparable to fine-tuning with only 5%-6% of total parameters.
Does pre-training on brain-related tasks results in better deep-learning-based brain age biomarkers?
Authors: Bruno Machado Pacheco, Victor Hugo Rocha de Oliveira, Augusto Braga Fernandes Antunes, Saulo Domingos de Souza Pedro, Danilo Silva
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.05241
Pdf link: https://arxiv.org/pdf/2307.05241
Abstract Brain age prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.
OpenAL: An Efficient Deep Active Learning Framework for Open-Set Pathology Image Classification
Authors: Linhao Qu, Yingfan Ma, Zhiwei Yang, Manning Wang, Zhijian Song
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05254
Pdf link: https://arxiv.org/pdf/2307.05254
Abstract Active learning (AL) is an effective approach to select the most informative samples to label so as to reduce the annotation cost. Existing AL methods typically work under the closed-set assumption, i.e., all classes existing in the unlabeled sample pool need to be classified by the target model. However, in some practical clinical tasks, the unlabeled pool may contain not only the target classes that need to be fine-grainedly classified, but also non-target classes that are irrelevant to the clinical tasks. Existing AL methods cannot work well in this scenario because they tend to select a large number of non-target samples. In this paper, we formulate this scenario as an open-set AL problem and propose an efficient framework, OpenAL, to address the challenge of querying samples from an unlabeled pool with both target class and non-target class samples. Experiments on fine-grained classification of pathology images show that OpenAL can significantly improve the query quality of target class samples and achieve higher performance than current state-of-the-art AL methods. Code is available at https://github.com/miccaiif/OpenAL.
Integrated Planning in Hospitals: A Review
Authors: Sebastian Rachuba, Melanie Reuter-Oppermann, Clemens Thielen
Subjects: Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2307.05258
Pdf link: https://arxiv.org/pdf/2307.05258
Abstract Efficient planning of scarce resources in hospitals is a challenging task for which a large variety of Operations Research and Management Science approaches have been developed since the 1950s. While efficient planning of single resources such as operating rooms, beds, or specific types of staff can already lead to enormous efficiency gains, integrated planning of several resources has been shown to hold even greater potential, and a large number of integrated planning approaches have been presented in the literature over the past decades. This paper provides the first literature review that focuses specifically on the Operations Research and Management Science literature related to integrated planning of different resources in hospitals. We collect the relevant literature and analyze it regarding different aspects such as uncertainty modeling and the use of real-life data. Several cross comparisons reveal interesting insights concerning, e.g., relations between the modeling and solution methods used and the practical implementation of the approaches developed. Moreover, we provide a high-level taxonomy for classifying different resource-focused integration approaches and point out gaps in the literature as well as promising directions for future research.
On the efficient preconditioning of the Stokes equations in tight geometries
Authors: Vladislav Pimanov, Oleg Iliev, Ivan Oseledets, Ekaterina Muravleva
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.05266
Pdf link: https://arxiv.org/pdf/2307.05266
Abstract If the Stokes equations are properly discretized, it is well-known that the Schur complement matrix is spectrally equivalent to the identity matrix. Moreover, in the case of simple geometries, it is often observed that most of its eigenvalues are equal to one. These facts form the basis for the famous Uzawa and Krylov-Uzawa algorithms. However, in the case of complex geometries, the Schur complement matrix can become arbitrarily ill-conditioned having a significant portion of non-unit eigenvalues, which makes the established Uzawa preconditioner inefficient. In this article, we study the Schur complement formulation for the staggered finite-difference discretization of the Stokes problem in 3D CT images and synthetic 2D geometries. We numerically investigate the performance of the CG iterative method with the Uzawa and SIMPLE preconditioners and draw several conclusions. First, we show that in the case of low porosity, CG with the SIMPLE preconditioner converges faster to the discrete pressure and provides a more accurate calculation of sample permeability. Second, we show that an increase in the surface-to-volume ratio leads to an increase in the condition number of the Schur complement matrix, while the dependence is inverse for the Schur complement matrix preconditioned with the SIMPLE. As an explanation, we conjecture that the no-slip boundary conditions are the reason for non-unit eigenvalues of the Schur complement.
A Mixed Reality System for Interaction\with Heterogeneous Robotic Systems
Authors: Valeria Villani, Beatrice Capelli, Lorenzo Sabattini
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.05280
Pdf link: https://arxiv.org/pdf/2307.05280
Abstract The growing spread of robots for service and industrial purposes calls for versatile, intuitive and portable interaction approaches. In particular, in industrial environments, operators should be able to interact with robots in a fast, effective, and possibly effortless manner. To this end, reality enhancement techniques have been used to achieve efficient management and simplify interactions, in particular in manufacturing and logistics processes. Building upon this, in this paper we propose a system based on mixed reality that allows a ubiquitous interface for heterogeneous robotic systems in dynamic scenarios, where users are involved in different tasks and need to interact with different robots. By means of mixed reality, users can interact with a robot through manipulation of its virtual replica, which is always colocated with the user and is extracted when interaction is needed. The system has been tested in a simulated intralogistics setting, where different robots are present and require sporadic intervention by human operators, who are involved in other tasks. In our setting we consider the presence of drones and AGVs with different levels of autonomy, calling for different user interventions. The proposed approach has been validated in virtual reality, considering quantitative and qualitative assessment of performance and user's feedback.
Navigating Uncertainty: The Role of Short-Term Trajectory Prediction in Autonomous Vehicle Safety
Authors: Sushil Sharma, Ganesh Sistu, Lucie Yahiaoui, Arindam Das, Mark Halton, Ciarán Eising
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05288
Pdf link: https://arxiv.org/pdf/2307.05288
Abstract Autonomous vehicles require accurate and reliable short-term trajectory predictions for safe and efficient driving. While most commercial automated vehicles currently use state machine-based algorithms for trajectory forecasting, recent efforts have focused on end-to-end data-driven systems. Often, the design of these models is limited by the availability of datasets, which are typically restricted to generic scenarios. To address this limitation, we have developed a synthetic dataset for short-term trajectory prediction tasks using the CARLA simulator. This dataset is extensive and incorporates what is considered complex scenarios - pedestrians crossing the road, vehicles overtaking - and comprises 6000 perspective view images with corresponding IMU and odometry information for each frame. Furthermore, an end-to-end short-term trajectory prediction model using convolutional neural networks (CNN) and long short-term memory (LSTM) networks has also been developed. This model can handle corner cases, such as slowing down near zebra crossings and stopping when pedestrians cross the road, without the need for explicit encoding of the surrounding environment. In an effort to accelerate this research and assist others, we are releasing our dataset and model to the research community. Our datasets are publicly available on https://github.com/navigatinguncertainty.
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform
Authors: Mateusz Wójcik, Witold Kościukiewicz, Mateusz Baran, Tomasz Kajdanowicz, Adam Gonczarek
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.05399
Pdf link: https://arxiv.org/pdf/2307.05399
Abstract Production deployments in complex systems require ML architectures to be highly efficient and usable against multiple tasks. Particularly demanding are classification problems in which data arrives in a streaming fashion and each class is presented separately. Recent methods with stochastic gradient learning have been shown to struggle in such setups or have limitations like memory buffers, and being restricted to specific domains that disable its usage in real-world scenarios. For this reason, we present a fully differentiable architecture based on the Mixture of Experts model, that enables the training of high-performance classifiers when examples from each class are presented separately. We conducted exhaustive experiments that proved its applicability in various domains and ability to learn online in production environments. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.
Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores
Authors: Shukai Liu, Chenming Wu, Ying Li, Liangjun Zhang
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05405
Pdf link: https://arxiv.org/pdf/2307.05405
Abstract Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of large amount of interactive feedback. This paper presents a new method that uses scores provided by humans, instead of pairwise preferences, to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by human negatively impact the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method on robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores, while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.
Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection
Authors: Hao Fu, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05422
Pdf link: https://arxiv.org/pdf/2307.05422
Abstract This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
Self-Supervised Learning with Lie Symmetries for Partial Differential Equations
Authors: Grégoire Mialon, Quentin Garrido, Hannah Lawrence, Danyal Rehman, Yann LeCun, Bobak T. Kiani
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.05432
Pdf link: https://arxiv.org/pdf/2307.05432
Abstract Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs.
One-Versus-Others Attention: Scalable Multimodal Integration
Authors: Michal Golovanevsky, Eva Schiller, Akira Nair, Ritambhara Singh, Carsten Eickhoff
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05435
Pdf link: https://arxiv.org/pdf/2307.05435
Abstract Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.
AutoDecoding Latent 3D Diffusion Models
Authors: Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc Van Gool, Sergey Tulyakov
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05445
Pdf link: https://arxiv.org/pdf/2307.05445
Abstract We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. Our approach is flexible enough to use either existing camera supervision or no camera information at all -- instead efficiently learning it during training. Our evaluations demonstrate that our generation results outperform state-of-the-art alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.
Polynomial-Time Linear-Swap Regret Minimization in Imperfect-Information Sequential Games
Authors: Gabriele Farina, Charilaos Pipis
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2307.05448
Pdf link: https://arxiv.org/pdf/2307.05448
Abstract No-regret learners seek to minimize the difference between the loss they cumulated through the actions they played, and the loss they would have cumulated in hindsight had they consistently modified their behavior according to some strategy transformation function. The size of the set of transformations considered by the learner determines a natural notion of rationality. As the set of transformations each learner considers grows, the strategies played by the learners recover more complex game-theoretic equilibria, including correlated equilibria in normal-form games and extensive-form correlated equilibria in extensive-form games. At the extreme, a no-swap-regret agent is one that minimizes regret against the set of all functions from the set of strategies to itself. While it is known that the no-swap-regret condition can be attained efficiently in nonsequential (normal-form) games, understanding what is the strongest notion of rationality that can be attained efficiently in the worst case in sequential (extensive-form) games is a longstanding open problem. In this paper we provide a positive result, by showing that it is possible, in any sequential game, to retain polynomial-time (in the game tree size) iterations while achieving sublinear regret with respect to all linear transformations of the mixed strategy space, a notion called no-linear-swap regret. This notion of hindsight rationality is as strong as no-swap-regret in nonsequential games, and stronger than no-trigger-regret in sequential games -- thereby proving the existence of a subset of extensive-form correlated equilibria robust to linear deviations, which we call linear-deviation correlated equilibria, that can be approached efficiently.
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Authors: Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05463
Pdf link: https://arxiv.org/pdf/2307.05463
Abstract Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks. However, existing egocentric VLP frameworks utilize separate video and language encoders and learn task-specific cross-modal information only during fine-tuning, limiting the development of a unified system. In this work, we introduce the second generation of egocentric video-language pre-training (EgoVLPv2), a significant improvement from the previous generation, by incorporating cross-modal fusion directly into the video and language backbones. EgoVLPv2 learns strong video-text representation during pre-training and reuses the cross-modal attention modules to support different downstream tasks in a flexible and efficient manner, reducing fine-tuning costs. Moreover, our proposed fusion in the backbone strategy is more lightweight and compute-efficient than stacking additional fusion-specific layers. Extensive experiments on a wide range of VL tasks demonstrate the effectiveness of EgoVLPv2 by achieving consistent state-of-the-art performance over strong baselines across all downstream. Our project page can be found at https://shramanpramanick.github.io/EgoVLPv2/.
My3DGen: Building Lightweight Personalized 3D Generative Model
Authors: Luchao Qi, Jiaye Wu, Shengze Wang, Soumyadip Sengupta
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05468
Pdf link: https://arxiv.org/pdf/2307.05468
Abstract Our paper presents My3DGen, a practical system for creating a personalized and lightweight 3D generative prior using as few as 10 images. My3DGen can reconstruct multi-view consistent images from an input test image, and generate novel appearances by interpolating between any two images of the same individual. While recent studies have demonstrated the effectiveness of personalized generative priors in producing high-quality 2D portrait reconstructions and syntheses, to the best of our knowledge, we are the first to develop a personalized 3D generative prior. Instead of fine-tuning a large pre-trained generative model with millions of parameters to achieve personalization, we propose a parameter-efficient approach. Our method involves utilizing a pre-trained model with fixed weights as a generic prior, while training a separate personalized prior through low-rank decomposition of the weights in each convolution and fully connected layer. However, parameter-efficient few-shot fine-tuning on its own often leads to overfitting. To address this, we introduce a regularization technique based on symmetry of human faces. This regularization enforces that novel view renderings of a training sample, rendered from symmetric poses, exhibit the same identity. By incorporating this symmetry prior, we enhance the quality of reconstruction and synthesis, particularly for non-frontal (profile) faces. Our final system combines low-rank fine-tuning with symmetry regularization and significantly surpasses the performance of pre-trained models, e.g. EG3D. It introduces only approximately 0.6 million additional parameters per identity compared to 31 million for full finetuning of the original model. As a result, our system achieves a 50-fold reduction in model size without sacrificing the quality of the generated 3D faces. Code will be available at our project page: https://luchaoqi.github.io/my3dgen.
Keyword: faster

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization
Authors: Simin Chen, Shiyi Wei, Cong Liu, Wei Yang
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2307.04963
Pdf link: https://arxiv.org/pdf/2307.04963
Abstract DL compiler's primary function is to translate DNN programs written in high-level DL frameworks such as PyTorch and TensorFlow into portable executables. These executables can then be flexibly executed by the deployed host programs. However, existing DL compilers rely on a tracing mechanism, which involves feeding a runtime input to a neural network program and tracing the program execution paths to generate the computational graph necessary for compilation. Unfortunately, this mechanism falls short when dealing with modern dynamic neural networks (DyNNs) that possess varying computational graphs depending on the inputs. Consequently, conventional DL compilers struggle to accurately compile DyNNs into executable code. To address this limitation, we propose \tool, a general approach that enables any existing DL compiler to successfully compile DyNNs. \tool tackles the dynamic nature of DyNNs by introducing a compilation mechanism that redistributes the control and data flow of the original DNN programs during the compilation process. Specifically, \tool develops program analysis and program transformation techniques to convert a dynamic neural network into multiple sub-neural networks. Each sub-neural network is devoid of conditional statements and is compiled independently. Furthermore, \tool synthesizes a host module that models the control flow of the DyNNs and facilitates the invocation of the sub-neural networks. Our evaluation demonstrates the effectiveness of \tool, achieving a 100\% success rate in compiling all dynamic neural networks. Moreover, the compiled executables generated by \tool exhibit significantly improved performance, running between $1.12\times$ and $20.21\times$ faster than the original DyNNs executed on general-purpose DL frameworks.
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Authors: Zixuan Ma, Haojie Wang, Jingze Xing, Liyan Zheng, Chen Zhang, Huanqi Cao, Kezhao Huang, Shizhi Tang, Penghan Wang, Jidong Zhai
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2307.04995
Pdf link: https://arxiv.org/pdf/2307.04995
Abstract Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
Best approximation results and essential boundary conditions for novel types of weak adversarial network discretizations for PDEs
Authors: Silvia Bertoluzza, Erik Burman, Cuiyu He
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.05012
Pdf link: https://arxiv.org/pdf/2307.05012
Abstract In this paper, we provide a theoretical analysis of the recently introduced weakly adversarial networks (WAN) method, used to approximate partial differential equations in high dimensions. We address the existence and stability of the solution, as well as approximation bounds. More precisely, we prove the existence of discrete solutions, intended in a suitable weak sense, for which we prove a quasi-best approximation estimate similar to Cea's lemma, a result commonly found in finite element methods. We also propose two new stabilized WAN-based formulas that avoid the need for direct normalization. Furthermore, we analyze the method's effectiveness for the Dirichlet boundary problem that employs the implicit representation of the geometry. The key requirement for achieving the best approximation outcome is to ensure that the space for the test network satisfies a specific condition, known as the inf-sup condition, essentially requiring that the test network set is sufficiently large when compared to the trial space. The method's accuracy, however, is only determined by the space of the trial network. We also devise a pseudo-time XNODE neural network class for static PDE problems, yielding significantly faster convergence results than the classical DNN network.
Deep Probabilistic Movement Primitives with a Bayesian Aggregator
Authors: Michael Przystupa, Faezeh Haghverd, Martin Jagersand, Samuele Tosatto
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05141
Pdf link: https://arxiv.org/pdf/2307.05141
Abstract Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.
Using Linear Regression for Iteratively Training Neural Networks
Authors: Harshad Khadilkar
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05189
Pdf link: https://arxiv.org/pdf/2307.05189
Abstract We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for simple problems) the approach is more stable and faster than gradient-based backpropagation.
U-CREAT: Unsupervised Case Retrieval using Events extrAcTion
Authors: Abhinav Joshi, Akshat Sharma, Sai Kiran Tanikella, Ashutosh Modi
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05260
Pdf link: https://arxiv.org/pdf/2307.05260
Abstract The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).
On the efficient preconditioning of the Stokes equations in tight geometries
Authors: Vladislav Pimanov, Oleg Iliev, Ivan Oseledets, Ekaterina Muravleva
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.05266
Pdf link: https://arxiv.org/pdf/2307.05266
Abstract If the Stokes equations are properly discretized, it is well-known that the Schur complement matrix is spectrally equivalent to the identity matrix. Moreover, in the case of simple geometries, it is often observed that most of its eigenvalues are equal to one. These facts form the basis for the famous Uzawa and Krylov-Uzawa algorithms. However, in the case of complex geometries, the Schur complement matrix can become arbitrarily ill-conditioned having a significant portion of non-unit eigenvalues, which makes the established Uzawa preconditioner inefficient. In this article, we study the Schur complement formulation for the staggered finite-difference discretization of the Stokes problem in 3D CT images and synthetic 2D geometries. We numerically investigate the performance of the CG iterative method with the Uzawa and SIMPLE preconditioners and draw several conclusions. First, we show that in the case of low porosity, CG with the SIMPLE preconditioner converges faster to the discrete pressure and provides a more accurate calculation of sample permeability. Second, we show that an increase in the surface-to-volume ratio leads to an increase in the condition number of the Schur complement matrix, while the dependence is inverse for the Schur complement matrix preconditioned with the SIMPLE. As an explanation, we conjecture that the no-slip boundary conditions are the reason for non-unit eigenvalues of the Schur complement.
Keyword: mobile

A Kalman Filter based Low Complexity Throughput Prediction Algorithm for 5G Cellular Networks
Authors: Mayukh Biswas, Ayan Chakraborty, Basabdatta Palit
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.04819
Pdf link: https://arxiv.org/pdf/2307.04819
Abstract Throughput Prediction is one of the primary preconditions for the uninterrupted operation of several network-aware mobile applications, namely video streaming. Recent works have advocated using Machine Learning (ML) and Deep Learning (DL) for cellular network throughput prediction. In contrast, this work has proposed a low computationally complex simple solution which models the future throughput as a multiple linear regression of several present network parameters and present throughput. It then feeds the variance of prediction error and measurement error, which is inherent in any measurement setup but unaccounted for in existing works, to a Kalman filter-based prediction-correction approach to obtain the optimal estimates of the future throughput. Extensive experiments across seven publicly available 5G throughput datasets for different prediction window lengths have shown that the proposed method outperforms the baseline ML and DL algorithms by delivering more accurate results within a shorter timeframe for inferencing and retraining. Furthermore, in comparison to its ML and DL counterparts, the proposed throughput prediction method is also found to deliver higher QoE to both streaming and live video users when used in conjunction with popular Model Predictive Control (MPC) based adaptive bitrate streaming algorithms.
FedYolo: Augmenting Federated Learning with Pretrained Transformers
Authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.04905
Pdf link: https://arxiv.org/pdf/2307.04905
Abstract The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.
Kinematically-Decoupled Impedance Control for Fast Object Visual Servoing and Grasping on Quadruped Manipulators
Authors: Riccardo Parosi, Mattia Risiglione, Darwin G. Caldwell, Claudio Semini, Victor Barasuol
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.04918
Pdf link: https://arxiv.org/pdf/2307.04918
Abstract We propose a control pipeline for SAG (Searching, Approaching, and Grasping) of objects, based on a decoupled arm kinematic chain and impedance control, which integrates image-based visual servoing (IBVS). The kinematic decoupling allows for fast end-effector motions and recovery that leads to robust visual servoing. The whole approach and pipeline can be generalized for any mobile platform (wheeled or tracked vehicles), but is most suitable for dynamically moving quadruped manipulators thanks to their reactivity against disturbances. The compliance of the impedance controller makes the robot safer for interactions with humans and the environment. We demonstrate the performance and robustness of the proposed approach with various experiments on our 140 kg HyQReal quadruped robot equipped with a 7-DoF manipulator arm. The experiments consider dynamic locomotion, tracking under external disturbances, and fast motions of the target object.
The smarty4covid dataset and knowledge base: a framework enabling interpretable analysis of audio signals
Authors: Konstantia Zarkogianni, Edmund Dervakos, George Filandrianos, Theofanis Ganitidis, Vasiliki Gkatzou, Aikaterini Sakagianni, Raghu Raghavendra, C.L. Max Nikias, Giorgos Stamou, Konstantina S. Nikita
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2307.05096
Pdf link: https://arxiv.org/pdf/2307.05096
Abstract Harnessing the power of Artificial Intelligence (AI) and m-health towards detecting new bio-markers indicative of the onset and progress of respiratory abnormalities/conditions has greatly attracted the scientific and research interest especially during COVID-19 pandemic. The smarty4covid dataset contains audio signals of cough (4,676), regular breathing (4,665), deep breathing (4,695) and voice (4,291) as recorded by means of mobile devices following a crowd-sourcing approach. Other self reported information is also included (e.g. COVID-19 virus tests), thus providing a comprehensive dataset for the development of COVID-19 risk detection models. The smarty4covid dataset is released in the form of a web-ontology language (OWL) knowledge base enabling data consolidation from other relevant datasets, complex queries and reasoning. It has been utilized towards the development of models able to: (i) extract clinically informative respiratory indicators from regular breathing records, and (ii) identify cough, breath and voice segments in crowd-sourced audio recordings. A new framework utilizing the smarty4covid OWL knowledge base towards generating counterfactual explanations in opaque AI-based COVID-19 risk detection models is proposed and validated.
Keyword: pruning

There is no result

Keyword: diffusion

Collaborative Score Distillation for Consistent Visual Synthesis
Authors: Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.04787
Pdf link: https://arxiv.org/pdf/2307.04787
Abstract Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.
Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models
Authors: Alexander W. Bergman, Wang Yifan, Gordon Wetzstein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.04859
Pdf link: https://arxiv.org/pdf/2307.04859
Abstract The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. These methods directly leverage pre-trained 2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance fields of generic objects. However, due to the lack of geometry and texture priors, these methods have limited control over the generated 3D objects, making it difficult to operate inside a specific domain, e.g., human heads. In this work, we develop a new approach to text-guided 3D head avatar generation to address this limitation. Our framework directly operates on the geometry and texture of an articulable 3D morphable model (3DMM) of a head, and introduces novel optimization procedures to update the geometry and texture while keeping the 2D and 3D facial features aligned. The result is a 3D head avatar that is consistent with the text description and can be readily articulated using the deformation model of the 3DMM. We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task. The latter are typically based on CLIP, which is known to provide limited diversity of generation and accuracy for 3D object generation.
DDGM: Solving inverse problems by Diffusive Denoising of Gradient-based Minimization
Authors: Kyle Luther, H. Sebastian Seung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.04946
Pdf link: https://arxiv.org/pdf/2307.04946
Abstract Inverse problems generally require a regularizer or prior for a good solution. A recent trend is to train a convolutional net to denoise images, and use this net as a prior when solving the inverse problem. Several proposals depend on a singular value decomposition of the forward operator, and several others backpropagate through the denoising net at runtime. Here we propose a simpler approach that combines the traditional gradient-based minimization of reconstruction error with denoising. Noise is also added at each step, so the iterative dynamics resembles a Langevin or diffusion process. Both the level of added noise and the size of the denoising step decay exponentially with time. We apply our method to the problem of tomographic reconstruction from electron micrographs acquired at multiple tilt angles. With empirical studies using simulated tilt views, we find parameter settings for our method that produce good results. We show that high accuracy can be achieved with as few as 50 denoising steps. We also compare with DDRM and DPS, more complex diffusion methods of the kinds mentioned above. These methods are less accurate (as measured by MSE and SSIM) for our tomography problem, even after the generation hyperparameters are optimized. Finally we extend our method to reconstruction of arbitrary-sized images and show results on 128 $\times$ 1568 pixel images
Diffusion idea exploration for art generation
Authors: Nikhil Verma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.04978
Pdf link: https://arxiv.org/pdf/2307.04978
Abstract Cross-Modal learning tasks have picked up pace in recent times. With plethora of applications in diverse areas, generation of novel content using multiple modalities of data has remained a challenging problem. To address the same, various generative modelling techniques have been proposed for specific tasks. Novel and creative image generation is one important aspect for industrial application which could help as an arm for novel content generation. Techniques proposed previously used Generative Adversarial Network(GAN), autoregressive models and Variational Autoencoders (VAE) for accomplishing similar tasks. These approaches are limited in their capability to produce images guided by either text instructions or rough sketch images decreasing the overall performance of image generator. We used state of the art diffusion models to generate creative art by primarily leveraging text with additional support of rough sketches. Diffusion starts with a pattern of random dots and slowly converts that pattern into a design image using the guiding information fed into the model. Diffusion models have recently outperformed other generative models in image generation tasks using cross modal data as guiding information. The initial experiments for this task of novel image generation demonstrated promising qualitative results.
Comparing Social Network Dynamic Operators
Authors: Edoardo Baccini (University of Groningen), Zoé Christoff (University of Groningen)
Subjects: Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2307.05055
Pdf link: https://arxiv.org/pdf/2307.05055
Abstract Numerous logics have been developed to reason either about threshold-induced opinion diffusion in a network, or about similarity-driven network structure evolution, or about both. In this paper, we first introduce a logic containing different dynamic operators to capture changes that are 'asynchronous' (opinion change only, network-link change only) and changes that are 'synchronous' (both at the same time). Second, we show that synchronous operators cannot, in general, be replaced by asynchronous operators and vice versa. Third, we characterise the class of models on which the synchronous operator can be reduced to sequences of asynchronous operators.
Solving Minimal Residual Methods in $W^{-1,p}$ with large Exponents $p$
Authors: Johannes Storn
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.05178
Pdf link: https://arxiv.org/pdf/2307.05178
Abstract We introduce a numerical scheme that approximates solutions to linear PDE's by minimizing a residual in the $W^{-1,p}(\Omega)$ norm with exponents $p> 2$. The resulting problem is solved by regularized Kacanov iterations, allowing to compute the solution to the non-linear minimization problem even for large exponents $p\gg 2$. Such large exponents remedy instabilities of finite element methods for problems like convection-dominated diffusion.
Turing patterns in a 3D morpho-chemical bulk-surface reaction-diffusion system for battery modeling
Authors: Massimo Frittelli, Ivonne Sgura, Benedetto Bozzini
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.05285
Pdf link: https://arxiv.org/pdf/2307.05285
Abstract In this paper we introduce a bulk-surface reaction-diffusion (BSRD) model in three space dimensions that extends the DIB morphochemical model to account for the electrolyte contribution in the application, in order to study structure formation during discharge-charge processes in batteries. Here we propose to approximate the model by the Bulk-Surface Virtual Element Method on a tailor-made mesh that proves to be competitive with fast bespoke methods for PDEs on Cartesian grids. We present a selection of numerical simulations that accurately match the classical morphologies found in experiments. Finally, we compare the Turing patterns obtained by the coupled 3D BS-DIB model with those obtained with the original 2D version.
Stability analysis of a second-order difference scheme for the time-fractional mixed sub-diffusion and diffusion-wave equation
Authors: Anatoly A. Alikhanov, Mohammad Shahbazi Asl, Chengming Huang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.05349
Pdf link: https://arxiv.org/pdf/2307.05349
Abstract This study investigates a class of initial-boundary value problems pertaining to the time-fractional mixed sub-diffusion and diffusion-wave equation (SDDWE). To facilitate the development of a numerical method and analysis, the original problem is transformed into a new integro-differential model which includes the Caputo derivatives and the Riemann-Liouville fractional integrals with orders belonging to (0,1). By providing an a priori estimate of the solution, we have established the existence and uniqueness of a numerical solution for the problem. We propose a second-order method to approximate the fractional Riemann-Liouville integral and employ an L2 type formula to approximate the Caputo derivative. This results in a method with a temporal accuracy of second-order for approximating the considered model. The proof of the unconditional stability of the proposed difference scheme is established. Moreover, we demonstrate the proposed method's potential to construct and analyze a second-order L2-type numerical scheme for a broader class of the time-fractional mixed SDDWEs with multi-term time-fractional derivatives. Numerical results are presented to assess the accuracy of the method and validate the theoretical findings.
On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models
Authors: Marija Ivanovska, Vitomir Štruc
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05397
Pdf link: https://arxiv.org/pdf/2307.05397
Abstract The detection of malicious Deepfakes is a constantly evolving problem, that requires continuous monitoring of detectors, to ensure they are able to detect image manipulations generated by the latest emerging models. In this paper, we present a preliminary study that investigates the vulnerability of single-image Deepfake detectors to attacks created by a representative of the newest generation of generative methods, i.e. Denoising Diffusion Models (DDMs). Our experiments are run on FaceForensics++, a commonly used benchmark dataset, consisting of Deepfakes generated with various techniques for face swapping and face reenactment. The analysis shows, that reconstructing existing Deepfakes with only one denoising diffusion step significantly decreases the accuracy of all tested detectors, without introducing visually perceptible image changes.
Metropolis Sampling for Constrained Diffusion Models
Authors: Nic Fishman, Leo Klarner, Emile Mathieu, Michael Hutchinson, Valentin de Bortoli
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05439
Pdf link: https://arxiv.org/pdf/2307.05439
Abstract Denoising diffusion models have recently emerged as the predominant paradigm for generative modelling. Their extension to Riemannian manifolds has facilitated their application to an array of problems in the natural sciences. Yet, in many practical settings, such manifolds are defined by a set of constraints and are not covered by the existing (Riemannian) diffusion model methodology. Recent work has attempted to address this issue by employing novel noising processes based on logarithmic barrier methods or reflected Brownian motions. However, the associated samplers are computationally burdensome as the complexity of the constraints increases. In this paper, we introduce an alternative simple noising scheme based on Metropolis sampling that affords substantial gains in computational efficiency and empirical performance compared to the earlier samplers. Of independent interest, we prove that this new process corresponds to a valid discretisation of the reflected Brownian motion. We demonstrate the scalability and flexibility of our approach on a range of problem settings with convex and non-convex constraints, including applications from geospatial modelling, robotics and protein design.
AutoDecoding Latent 3D Diffusion Models
Authors: Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc Van Gool, Sergey Tulyakov
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05445
Pdf link: https://arxiv.org/pdf/2307.05445
Abstract We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. Our approach is flexible enough to use either existing camera supervision or no camera information at all -- instead efficiently learning it during training. Our evaluations demonstrate that our generation results outperform state-of-the-art alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.
Keyword: adaptive

A Kalman Filter based Low Complexity Throughput Prediction Algorithm for 5G Cellular Networks
Authors: Mayukh Biswas, Ayan Chakraborty, Basabdatta Palit
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.04819
Pdf link: https://arxiv.org/pdf/2307.04819
Abstract Throughput Prediction is one of the primary preconditions for the uninterrupted operation of several network-aware mobile applications, namely video streaming. Recent works have advocated using Machine Learning (ML) and Deep Learning (DL) for cellular network throughput prediction. In contrast, this work has proposed a low computationally complex simple solution which models the future throughput as a multiple linear regression of several present network parameters and present throughput. It then feeds the variance of prediction error and measurement error, which is inherent in any measurement setup but unaccounted for in existing works, to a Kalman filter-based prediction-correction approach to obtain the optimal estimates of the future throughput. Extensive experiments across seven publicly available 5G throughput datasets for different prediction window lengths have shown that the proposed method outperforms the baseline ML and DL algorithms by delivering more accurate results within a shorter timeframe for inferencing and retraining. Furthermore, in comparison to its ML and DL counterparts, the proposed throughput prediction method is also found to deliver higher QoE to both streaming and live video users when used in conjunction with popular Model Predictive Control (MPC) based adaptive bitrate streaming algorithms.
On Detecting Some Defective Items in Group Testing
Authors: Nader H. Bshouty, Catherine A. Haddad-Zaknoon
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.04822
Pdf link: https://arxiv.org/pdf/2307.04822
Abstract Group testing is an approach aimed at identifying up to $d$ defective items among a total of $n$ elements. This is accomplished by examining subsets to determine if at least one defective item is present. In our study, we focus on the problem of identifying a subset of $\ell\leq d$ defective items. We develop upper and lower bounds on the number of tests required to detect $\ell$ defective items in both the adaptive and non-adaptive settings while considering scenarios where no prior knowledge of $d$ is available, and situations where an estimate of $d$ or at least some non-trivial upper bound on $d$ is available. When no prior knowledge on $d$ is available, we prove a lower bound of $ \Omega(\frac{\ell \log^2n}{\log \ell +\log\log n})$ tests in the randomized non-adaptive settings and an upper bound of $O(\ell \log^2 n)$ for the same settings. Furthermore, we demonstrate that any non-adaptive deterministic algorithm must ask $\Theta(n)$ tests, signifying a fundamental limitation in this scenario. For adaptive algorithms, we establish tight bounds in different scenarios. In the deterministic case, we prove a tight bound of $\Theta(\ell\log{(n/\ell)})$. Moreover, in the randomized settings, we derive a tight bound of $\Theta(\ell\log{(n/d)})$. When $d$, or at least some non-trivial estimate of $d$, is known, we prove a tight bound of $\Theta(d\log (n/d))$ for the deterministic non-adaptive settings, and $\Theta(\ell\log(n/d))$ for the randomized non-adaptive settings. In the adaptive case, we present an upper bound of $O(\ell \log (n/\ell))$ for the deterministic settings, and a lower bound of $\Omega(\ell\log(n/d)+\log n)$. Additionally, we establish a tight bound of $\Theta(\ell \log(n/d))$ for the randomized adaptive settings.
AmadeusGPT: a natural language interface for interactive animal behavioral analysis
Authors: Shaokai Ye, Jessy Lauer, Mu Zhou, Alexander Mathis, Mackenzie W. Mathis
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2307.04858
Pdf link: https://arxiv.org/pdf/2307.04858
Abstract The process of quantifying and analyzing animal behavior involves translating the naturally occurring descriptive language of their actions into machine-readable code. Yet, codifying behavior analysis is often challenging without deep understanding of animal behavior and technical machine learning knowledge. To limit this gap, we introduce AmadeusGPT: a natural language interface that turns natural language descriptions of behaviors into machine-executable code. Large-language models (LLMs) such as GPT3.5 and GPT4 allow for interactive language-based queries that are potentially well suited for making interactive behavior analysis. However, the comprehension capability of these LLMs is limited by the context window size, which prevents it from remembering distant conversations. To overcome the context window limitation, we implement a novel dual-memory mechanism to allow communication between short-term and long-term memory using symbols as context pointers for retrieval and saving. Concretely, users directly use language-based definitions of behavior and our augmented GPT develops code based on the core AmadeusGPT API, which contains machine learning, computer vision, spatio-temporal reasoning, and visualization modules. Users then can interactively refine results, and seamlessly add new behavioral modules as needed. We benchmark AmadeusGPT and show we can produce state-of-the-art performance on the MABE 2022 behavior challenge tasks. Note, an end-user would not need to write any code to achieve this. Thus, collectively AmadeusGPT presents a novel way to merge deep biological knowledge, large-language models, and core computer vision modules into a more naturally intelligent system. Code and demos can be found at: https://github.com/AdaptiveMotorControlLab/AmadeusGPT.
DFR: Depth from Rotation by Uncalibrated Image Rectification with Latitudinal Motion Assumption
Authors: Yongcong Zhang, Yifei Xue, Ming Liao, Huiqing Zhang, Yizhen Lao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05129
Pdf link: https://arxiv.org/pdf/2307.05129
Abstract Despite the increasing prevalence of rotating-style capture (e.g., surveillance cameras), conventional stereo rectification techniques frequently fail due to the rotation-dominant motion and small baseline between views. In this paper, we tackle the challenge of performing stereo rectification for uncalibrated rotating cameras. To that end, we propose Depth-from-Rotation (DfR), a novel image rectification solution that analytically rectifies two images with two-point correspondences and serves for further depth estimation. Specifically, we model the motion of a rotating camera as the camera rotates on a sphere with fixed latitude. The camera's optical axis lies perpendicular to the sphere's surface. We call this latitudinal motion assumption. Then we derive a 2-point analytical solver from directly computing the rectified transformations on the two images. We also present a self-adaptive strategy to reduce the geometric distortion after rectification. Extensive synthetic and real data experiments demonstrate that the proposed method outperforms existing works in effectiveness and efficiency by a significant margin.
SecFlow: Adaptive Security-Aware Workflow Management System in Multi-Cloud Environments
Authors: Nafiseh Soveizi, Fatih Turkmen
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2307.05137
Pdf link: https://arxiv.org/pdf/2307.05137
Abstract In this paper, we propose an architecture for a security-aware workflow management system (WfMS) we call SecFlow in answer to the recent developments of combining workflow management systems with Cloud environments and the still lacking abilities of such systems to ensure the security and privacy of cloud-based workflows. The SecFlow architecture focuses on full workflow life cycle coverage as, in addition to the existing approaches to design security-aware processes, there is a need to fill in the gap of maintaining security properties of workflows during their execution phase. To address this gap, we derive the requirements for such a security-aware WfMS and design a system architecture that meets these requirements. SecFlow integrates key functional components such as secure model construction, security-aware service selection, security violation detection, and adaptive response mechanisms while considering all potential malicious parties in multi-tenant and cloud-based WfMS.
Unbiased Scene Graph Generation via Two-stage Causal Modeling
Authors: Shuzhou Sun, Shuaifeng Zhi, Qing Liao, Janne Heikkilä, Li Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.05276
Pdf link: https://arxiv.org/pdf/2307.05276
Abstract Despite the impressive performance of recent unbiased Scene Graph Generation (SGG) methods, the current debiasing literature mainly focuses on the long-tailed distribution problem, whereas it overlooks another source of bias, i.e., semantic confusion, which makes the SGG model prone to yield false predictions for similar relationships. In this paper, we explore a debiasing procedure for the SGG task leveraging causal inference. Our central insight is that the Sparse Mechanism Shift (SMS) in causality allows independent intervention on multiple biases, thereby potentially preserving head category performance while pursuing the prediction of high-informative tail relationships. However, the noisy datasets lead to unobserved confounders for the SGG task, and thus the constructed causal models are always causal-insufficient to benefit from SMS. To remedy this, we propose Two-stage Causal Modeling (TsCM) for the SGG task, which takes the long-tailed distribution and semantic confusion as confounders to the Structural Causal Model (SCM) and then decouples the causal intervention into two stages. The first stage is causal representation learning, where we use a novel Population Loss (P-Loss) to intervene in the semantic confusion confounder. The second stage introduces the Adaptive Logit Adjustment (AL-Adjustment) to eliminate the long-tailed distribution confounder to complete causal calibration learning. These two stages are model agnostic and thus can be used in any SGG model that seeks unbiased predictions. Comprehensive experiments conducted on the popular SGG backbones and benchmarks show that our TsCM can achieve state-of-the-art performance in terms of mean recall rate. Furthermore, TsCM can maintain a higher recall rate than other debiasing methods, which indicates that our method can achieve a better tradeoff between head and tail relationships.
Smart Environment for Adaptive Learning of Cybersecurity Skills
Authors: Jan Vykopal, Pavel Seda, Valdemar Švábenský, Pavel Čeleda
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.05281
Pdf link: https://arxiv.org/pdf/2307.05281
Abstract Hands-on computing education requires a realistic learning environment that enables students to gain and deepen their skills. Available learning environments, including virtual and physical labs, provide students with real-world computer systems but rarely adapt the learning environment to individual students of various proficiency and background. We designed a unique and novel smart environment for adaptive training of cybersecurity skills. The environment collects a variety of student data to assign a suitable learning path through the training. To enable such adaptiveness, we proposed, developed, and deployed a new tutor model and a training format. We evaluated the learning environment using two different adaptive trainings attended by 114 students of various proficiency. The results show students were assigned tasks with a more appropriate difficulty, which enabled them to successfully complete the training. Students reported that they enjoyed the training, felt the training difficulty was appropriately designed, and would attend more training sessions like these. Instructors can use the environment for teaching any topic involving real-world computer networks and systems because it is not tailored to particular training. We freely released the software along with exemplary training so that other instructors can adopt the innovations in their teaching practice.
Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators
Authors: Sikai Bai, Shuaicheng Li, Weiming Zhuang, Kunlin Yang, Jun Hou, Shuai Yi, Shuai Zhang, Junyu Gao, Jie Zhang, Song Guo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.05358
Pdf link: https://arxiv.org/pdf/2307.05358
Abstract Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.
Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores
Authors: Shukai Liu, Chenming Wu, Ying Li, Liangjun Zhang
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.05405
Pdf link: https://arxiv.org/pdf/2307.05405
Abstract Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of large amount of interactive feedback. This paper presents a new method that uses scores provided by humans, instead of pairwise preferences, to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by human negatively impact the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method on robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores, while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.
Dynamic Tolling in Arc-based Traffic Assignment Models
Authors: Chih-Yuan Chiu, Chinmay Maheshwari, Pan-Yang Su, Shankar Sastry
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.05466
Pdf link: https://arxiv.org/pdf/2307.05466
Abstract Tolling in traffic networks offers a popular measure to minimize overall congestion. Existing toll designs primarily focus on congestion in route-based traffic assignment models (TAMs), in which travelers make a single route selection from their source to destination. However, these models do not reflect real-world traveler decisions because they preclude deviations from a chosen route, and because the enumeration of all routes is computationally expensive. To address these limitations, our work focuses on arc-based TAMs, in which travelers sequentially select individual arcs (or edges) on the network to reach their destination. We first demonstrate that marginal pricing, a tolling scheme commonly used in route-based TAMs, also achieves socially optimal congestion levels in our arc-based formulation. Then, we use perturbed best response dynamics to model the evolution of travelers' arc selection preferences over time, and a marginal pricing scheme to the social planner's adaptive toll updates in response. We prove that our adaptive learning and marginal pricing dynamics converge to a neighborhood of the socially optimal loads and tolls. We then present empirical results that verify our theoretical claims.
Keyword: quantization

Q-YOLO: Efficient Inference for Real-time Object Detection
Authors: Mingze Wang, Huixin Sun, Jun Shi, Xuhui Liu, Baochang Zhang, Xianbin Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.04816
Pdf link: https://arxiv.org/pdf/2307.04816
Abstract Real-time object detection plays a vital role in various computer vision applications. However, deploying real-time object detectors on resource-constrained platforms poses challenges due to high computational and memory requirements. This paper describes a low-bit quantization method to build a highly efficient one-stage detector, dubbed as Q-YOLO, which can effectively address the performance degradation problem caused by activation distribution imbalance in traditional quantized YOLO models. Q-YOLO introduces a fully end-to-end Post-Training Quantization (PTQ) pipeline with a well-designed Unilateral Histogram-based (UH) activation quantization scheme, which determines the maximum truncation values through histogram analysis by minimizing the Mean Squared Error (MSE) quantization errors. Extensive experiments on the COCO dataset demonstrate the effectiveness of Q-YOLO, outperforming other PTQ methods while achieving a more favorable balance between accuracy and computational cost. This research contributes to advancing the efficient deployment of object detection models on resource-limited edge devices, enabling real-time detection with reduced computational and memory overhead.

A-suozhang / GetArxivDaily

New submissions for Wed, 12 Jul 23 #100

Keyword: efficient

Q-YOLO: Efficient Inference for Real-time Object Detection

SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees

SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning

Temporal network compression via network hashing

Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer

FedYolo: Augmenting Federated Learning with Pretrained Transformers

Probabilistic Counterexample Guidance for Safer Reinforcement Learning

Intrinsically motivated graph exploration using network theories of human curiosity

Secrets of RLHF in Large Language Models Part I: PPO

Model-Driven Sensing-Node Selection and Power Allocation for Tracking Maneuvering Targets in Perceptive Mobile Networks

Monotone deep Boltzmann machines

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Optimization of Adams-type difference formulas in Hilbert space $W_2^{(2,1)}(0,1)$

Number Systems for Deep Neural Network Architectures: A Survey

Strong convergence in the infinite horizon of numerical methods for stochastic differential equations

Maximizing Social Welfare in Score-Based Social Distance Games

A Theory of Bounded Inductive Rationality

SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation

Rational Solutions of Parametric First-Order Algebraic Differential Equations

Conformalization of Sparse Generalized Linear Models

SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization

Neural Quantile Optimization for Edge-Cloud Computing

Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Membership Inference Attacks on DNNs using Adversarial Perturbations

The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework

Attribute Controlled Dialogue Prompting

Does pre-training on brain-related tasks results in better deep-learning-based brain age biomarkers?

OpenAL: An Efficient Deep Active Learning Framework for Open-Set Pathology Image Classification

Integrated Planning in Hospitals: A Review

On the efficient preconditioning of the Stokes equations in tight geometries

A Mixed Reality System for Interaction\with Heterogeneous Robotic Systems

Navigating Uncertainty: The Role of Short-Term Trajectory Prediction in Autonomous Vehicle Safety

Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection

Self-Supervised Learning with Lie Symmetries for Partial Differential Equations

One-Versus-Others Attention: Scalable Multimodal Integration

AutoDecoding Latent 3D Diffusion Models

Polynomial-Time Linear-Swap Regret Minimization in Imperfect-Information Sequential Games

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

My3DGen: Building Lightweight Personalized 3D Generative Model

Keyword: faster

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Best approximation results and essential boundary conditions for novel types of weak adversarial network discretizations for PDEs

Deep Probabilistic Movement Primitives with a Bayesian Aggregator

Using Linear Regression for Iteratively Training Neural Networks

U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

On the efficient preconditioning of the Stokes equations in tight geometries

Keyword: mobile

A Kalman Filter based Low Complexity Throughput Prediction Algorithm for 5G Cellular Networks

FedYolo: Augmenting Federated Learning with Pretrained Transformers

Kinematically-Decoupled Impedance Control for Fast Object Visual Servoing and Grasping on Quadruped Manipulators

The smarty4covid dataset and knowledge base: a framework enabling interpretable analysis of audio signals

Keyword: pruning

Keyword: diffusion

Collaborative Score Distillation for Consistent Visual Synthesis

Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models

DDGM: Solving inverse problems by Diffusive Denoising of Gradient-based Minimization

Diffusion idea exploration for art generation

Comparing Social Network Dynamic Operators

Solving Minimal Residual Methods in $W^{-1,p}$ with large Exponents $p$

Turing patterns in a 3D morpho-chemical bulk-surface reaction-diffusion system for battery modeling

Stability analysis of a second-order difference scheme for the time-fractional mixed sub-diffusion and diffusion-wave equation

On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models

Metropolis Sampling for Constrained Diffusion Models

AutoDecoding Latent 3D Diffusion Models

Keyword: adaptive

A Kalman Filter based Low Complexity Throughput Prediction Algorithm for 5G Cellular Networks

On Detecting Some Defective Items in Group Testing

AmadeusGPT: a natural language interface for interactive animal behavioral analysis

DFR: Depth from Rotation by Uncalibrated Image Rectification with Latitudinal Motion Assumption

SecFlow: Adaptive Security-Aware Workflow Management System in Multi-Cloud Environments

Unbiased Scene Graph Generation via Two-stage Causal Modeling

Smart Environment for Adaptive Learning of Cybersecurity Skills

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators