New submissions for Thu, 6 Apr 23

Keyword: efficient

A Compositional Resilience Index for Computationally Efficient Safety Analysis of Interconnected Systems

Authors: Luyao Niu, Abdullah Al Maruf, Andrew Clark, J. Sukarno Mertoguno, Radha Poovendran
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.02058
Pdf link: https://arxiv.org/pdf/2304.02058
Abstract Interconnected systems such as power systems and chemical processes are often required to satisfy safety properties in the presence of faults and attacks. Verifying safety of these systems, however, is computationally challenging due to nonlinear dynamics, high dimensionality, and combinatorial number of possible faults and attacks that can be incurred by the subsystems interconnected within the network. In this paper, we develop a compositional resilience index to verify safety properties of interconnected systems under faults and attacks. The resilience index is a tuple serving the following two purposes. First, it quantifies how a safety property is impacted when a subsystem is compromised by faults and attacks. Second, the resilience index characterizes the needed behavior of a subsystem during normal operations to ensure safety violations will not occur when future adverse events occur. We develop a set of sufficient conditions on the dynamics of each subsystem to satisfy its safety constraint, and leverage these conditions to formulate an optimization program to compute the resilience index. When multiple subsystems are interconnected and their resilience indices are given, we show that the safety constraints of the interconnected system can be efficiently verified by solving a system of linear inequalities. We demonstrate our developed resilience index using a numerical case study on chemical reactors connected in series.
GUTS: Generalized Uncertainty-Aware Thompson Sampling for Multi-Agent Active Search
Authors: Nikhil Angad Bakshi, Tejus Gupta, Ramina Ghods, Jeff Schneider
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.02075
Pdf link: https://arxiv.org/pdf/2304.02075
Abstract Robotic solutions for quick disaster response are essential to ensure minimal loss of life, especially when the search area is too dangerous or too vast for human rescuers. We model this problem as an asynchronous multi-agent active-search task where each robot aims to efficiently seek objects of interest (OOIs) in an unknown environment. This formulation addresses the requirement that search missions should focus on quick recovery of OOIs rather than full coverage of the search region. Previous approaches fail to accurately model sensing uncertainty, account for occlusions due to foliage or terrain, or consider the requirement for heterogeneous search teams and robustness to hardware and communication failures. We present the Generalized Uncertainty-aware Thompson Sampling (GUTS) algorithm, which addresses these issues and is suitable for deployment on heterogeneous multi-robot systems for active search in large unstructured environments. We show through simulation experiments that GUTS consistently outperforms existing methods such as parallelized Thompson Sampling and exhaustive search, recovering all OOIs in 80% of all runs. In contrast, existing approaches recover all OOIs in less than 40% of all runs. We conduct field tests using our multi-robot system in an unstructured environment with a search area of approximately 75,000 sq. m. Our system demonstrates robustness to various failure modes, achieving full recovery of OOIs (where feasible) in every field run, and significantly outperforming our baseline.
MadEye: Boosting Live Video Analytics Accuracy with Adaptive Camera Configurations
Authors: Mike Wong, Murali Ramanujam, Guha Balakrishnan, Ravi Netravali
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.02101
Pdf link: https://arxiv.org/pdf/2304.02101
Abstract Camera orientations (i.e., rotation and zoom) govern the content that a camera captures in a given scene, which in turn heavily influences the accuracy of live video analytics pipelines. However, existing analytics approaches leave this crucial adaptation knob untouched, instead opting to only alter the way that captured images from fixed orientations are encoded, streamed, and analyzed. We present MadEye, a camera-server system that automatically and continually adapts orientations to maximize accuracy for the workload and resource constraints at hand. To realize this using commodity pan-tilt-zoom (PTZ) cameras, MadEye embeds (1) a search algorithm that rapidly explores the massive space of orientations to identify a fruitful subset at each time, and (2) a novel knowledge distillation strategy to efficiently (with only camera resources) select the ones that maximize workload accuracy. Experiments on diverse workloads show that MadEye boosts accuracy by 2.9-25.7% for the same resource usage, or achieves the same accuracy with 2-3.7x lower resource costs.
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Authors: Peiyao Wang, Haibin Ling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02110
Pdf link: https://arxiv.org/pdf/2304.02110
Abstract Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue. Existing works have proposed a variety of solutions such as boundary-aware networks, multi-stage refinement, and temporal smoothness losses. However, most of them take advantage of frame-wise supervision, which cannot effectively tackle the evaluation metrics with different granularities. In this paper, for the desirable large receptive field, we first develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention. Then we decouple two inherent goals in action segmentation, ie, (1) individual identification solved by frame-wise supervision, and (2) temporal reasoning tackled by action set prediction. Afterward, an action alignment module fuses these different granularity predictions, leading to more accurate and smoother action segmentation. We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method, accompanied by extensive ablation studies. The code will be made available later.
Initialization Approach for Nonlinear State-Space Identification via the Subspace Encoder Approach
Authors: Rishi Ramkannan, Gerben I. Beintema, Roland Tóth, Maarten Schoukens
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.02119
Pdf link: https://arxiv.org/pdf/2304.02119
Abstract The SUBNET neural network architecture has been developed to identify nonlinear state-space models from input-output data. To achieve this, it combines the rolled-out nonlinear state-space equations and a state encoder function, both parameterised as a neural network. The encoder function is introduced to reconstruct the current state from past input-output data. Hence it enables the forward simulation of the rolled-out state-space model. While this approach has shown to provide high-accuracy and consistent model estimation, its convergence can be significantly improved by efficient initialization of the training process. This paper focuses on such an initialisation of the subspace encoder approach using the Best Linear Approximation (BLA). Using the BLA provided state-space matrices and its associated reconstructability map both the state-transition part of the network and the encoder are initialized. The performance of the improved initialisation scheme is evaluated on a Wiener-Hammerstein simulation example and a benchmark dataset. The results show that for a weakly nonlinear system, the proposed initialisation based on the linear reconstructability map results in a faster convergence and a better model quality.
The Bit Complexity of Efficient Continuous Optimization
Authors: Mehrdad Ghadiri, Richard Peng, Santosh S. Vempala
Subjects: Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.02124
Pdf link: https://arxiv.org/pdf/2304.02124
Abstract We analyze the bit complexity of efficient algorithms for fundamental optimization problems, such as linear regression, $p$-norm regression, and linear programming (LP). State-of-the-art algorithms are iterative, and in terms of the number of arithmetic operations, they match the current time complexity of multiplying two $n$-by-$n$ matrices (up to polylogarithmic factors). However, previous work has typically assumed infinite precision arithmetic, and due to complicated inverse maintenance techniques, the actual running times of these algorithms are unknown. To settle the running time and bit complexity of these algorithms, we demonstrate that a core common subroutine, known as \emph{inverse maintenance}, is backward-stable. Additionally, we show that iterative approaches for solving constrained weighted regression problems can be accomplished with bounded-error pre-conditioners. Specifically, we prove that linear programs can be solved approximately in matrix multiplication time multiplied by polylog factors that depend on the condition number $\kappa$ of the matrix and the inner and outer radius of the LP problem. $p$-norm regression can be solved approximately in matrix multiplication time multiplied by polylog factors in $\kappa$. Lastly, linear regression can be solved approximately in input-sparsity time multiplied by polylog factors in $\kappa$. Furthermore, we present results for achieving lower than matrix multiplication time for $p$-norm regression by utilizing faster solvers for sparse linear systems.
Sequential Linearithmic Time Optimal Unimodal Fitting When Minimizing Univariate Linear Losses
Authors: Kaan Gokcesu, Hakan Gokcesu
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.02141
Pdf link: https://arxiv.org/pdf/2304.02141
Abstract This paper focuses on optimal unimodal transformation of the score outputs of a univariate learning model under linear loss functions. We demonstrate that the optimal mapping between score values and the target region is a rectangular function. To produce this optimal rectangular fit for the observed samples, we propose a sequential approach that can its estimation with each incoming new sample. Our approach has logarithmic time complexity per iteration and is optimally efficient.
Dynamic Adversarial Resource Allocation: the dDAB Game
Authors: Daigo Shishika, Yue Guan, Jason R. Marden, Michael Dorothy, Panagiotis Tsiotras, Vijay Kumar
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2304.02172
Pdf link: https://arxiv.org/pdf/2304.02172
Abstract This work proposes a dynamic and adversarial resource allocation problem in a graph environment, which is referred to as the dynamic Defender-Attacker Blotto (dDAB) game. A team of defender robots is tasked to ensure numerical advantage at every node in the graph against a team of attacker robots. The engagement is formulated as a discrete-time dynamic game, where the two teams reallocate their robots in sequence and each robot can move at most one hop at each time step. The game terminates with the attacker's victory if any node has more attacker robots than defender robots. Our goal is to identify the necessary and sufficient number of defender robots to guarantee defense. Through a reachability analysis, we first solve the problem for the case where the attacker team stays as a single group. The results are then generalized to the case where the attacker team can freely split and merge into subteams. Crucially, our analysis indicates that there is no incentive for the attacker team to split, which significantly reduces the search space for the attacker's winning strategies and also enables us to design defender counter-strategies using superposition. We also present an efficient numerical algorithm to identify the necessary and sufficient number of defender robots to defend a given graph. Finally, we present illustrative examples to verify the efficacy of the proposed framework.
Explainable Automated Debugging via Large Language Model-driven Scientific Debugging
Authors: Sungmin Kang, Bei Chen, Shin Yoo, Jian-Guang Lou
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2304.02195
Pdf link: https://arxiv.org/pdf/2304.02195
Abstract Automated debugging techniques have the potential to reduce developer effort in debugging, and have matured enough to be adopted by industry. However, one critical issue with existing techniques is that, while developers want rationales for the provided automatic debugging results, existing techniques are ill-suited to provide them, as their deduction process differs significantly from that of human developers. Inspired by the way developers interact with code when debugging, we propose Automated Scientific Debugging (AutoSD), a technique that given buggy code and a bug-revealing test, prompts large language models to automatically generate hypotheses, uses debuggers to actively interact with buggy code, and thus automatically reach conclusions prior to patch generation. By aligning the reasoning of automated debugging more closely with that of human developers, we aim to produce intelligible explanations of how a specific patch has been generated, with the hope that the explanation will lead to more efficient and accurate developer decisions. Our empirical analysis on three program repair benchmarks shows that AutoSD performs competitively with other program repair baselines, and that it can indicate when it is confident in its results. Furthermore, we perform a human study with 20 participants, including six professional developers, to evaluate the utility of explanations from AutoSD. Participants with access to explanations could judge patch correctness in roughly the same time as those without, but their accuracy improved for five out of six real-world bugs studied: 70% of participants answered that they wanted explanations when using repair tools, while 55% answered that they were satisfied with the Scientific Debugging presentation.
PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data
Authors: A. Ravishankar Rao, Subrata Garai, Soumyabrata Dey, Hang Peng
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.02208
Pdf link: https://arxiv.org/pdf/2304.02208
Abstract With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly adapted to changing circumstances. We present an efficient outlier detection technique, termed PIKS (Pruned iterative-k means searchlight), which combines an iterative k-means algorithm with a pruned searchlight based scan. We apply this technique to identify outliers in two publicly available healthcare datasets from the New York Statewide Planning and Research Cooperative System, and California's Office of Statewide Health Planning and Development. We provide a comparison of our technique with three other existing outlier detection techniques, consisting of auto-encoders, isolation forests and feature bagging. We identified outliers in conditions including suicide rates, immunity disorders, social admissions, cardiomyopathies, and pregnancy in the third trimester. We demonstrate that the PIKS technique produces results consistent with other techniques such as the auto-encoder. However, the auto-encoder needs to be trained, which requires several parameters to be tuned. In comparison, the PIKS technique has far fewer parameters to tune. This makes it advantageous for fast, "out-of-the-box" data exploration. The PIKS technique is scalable and can readily ingest new datasets. Hence, it can provide valuable, up-to-date insights to citizens, patients and policy-makers. We have made our code open source, and with the availability of open data, other researchers can easily reproduce and extend our work. This will help promote a deeper understanding of healthcare policies and public health issues.
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
Authors: Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02211
Pdf link: https://arxiv.org/pdf/2304.02211
Abstract In clinical scenarios, multi-specialist consultation could significantly benefit the diagnosis, especially for intricate cases. This inspires us to explore a "multi-expert joint diagnosis" mechanism to upgrade the existing "single expert" framework commonly seen in the current literature. To this end, we propose METransformer, a method to realize this idea with a transformer-based backbone. The key design of our method is the introduction of multiple learnable "expert" tokens into both the transformer encoder and decoder. In the encoder, each expert token interacts with both vision tokens and other expert tokens to learn to attend different image regions for image representation. These expert tokens are encouraged to capture complementary information by an orthogonal loss that minimizes their overlap. In the decoder, each attended expert token guides the cross-attention between input words and visual tokens, thus influencing the generated report. A metrics-based expert voting strategy is further developed to generate the final report. By the multi-experts concept, our model enjoys the merits of an ensemble-based approach but through a manner that is computationally more efficient and supports more sophisticated interactions among experts. Experimental results demonstrate the promising performance of our proposed model on two widely used benchmarks. Last but not least, the framework-level innovation makes our work ready to incorporate advances on existing "single-expert" models to further improve its performance.
BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
Authors: Junheum Park, Jintae Kim, Chang-Su Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02225
Pdf link: https://arxiv.org/pdf/2304.02225
Abstract A novel 4K video frame interpolator based on bilateral transformer (BiFormer) is proposed in this paper, which performs three steps: global motion estimation, local motion refinement, and frame synthesis. First, in global motion estimation, we predict symmetric bilateral motion fields at a coarse scale. To this end, we propose BiFormer, the first transformer-based bilateral motion estimator. Second, we refine the global motion fields efficiently using blockwise bilateral cost volumes (BBCVs). Third, we warp the input frames using the refined motion fields and blend them to synthesize an intermediate frame. Extensive experiments demonstrate that the proposed BiFormer algorithm achieves excellent interpolation performance on 4K datasets. The source codes are available at https://github.com/JunHeum/BiFormer.
Towards Efficient Task-Driven Model Reprogramming with Foundation Models
Authors: Shoukai Xu, Jiangchao Yao, Ran Luo, Shuhai Zhang, Zihao Lian, Mingkui Tan, Yaowei Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02263
Pdf link: https://arxiv.org/pdf/2304.02263
Abstract Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. Moreover, the data used for pretraining foundation models are usually invisible and very different from the target data of downstream tasks. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task that has a quite different architecture with only downstream target data. Existing transfer learning or knowledge distillation methods depend on either the same model structure or finetuning of the foundation model. Thus, naively introducing these methods can be either infeasible or very inefficient. To address this, we propose a Task-Driven Model Reprogramming (TDMR) framework. Specifically, we reprogram the foundation model to project the knowledge into a proxy space, which alleviates the adverse effect of task mismatch and domain inconsistency. Then, we reprogram the target model via progressive distillation from the proxy space to efficiently learn the knowledge from the reprogrammed foundation model. TDMR is compatible with different pre-trained model types (CNN, transformer or their mix) and limited target data, and promotes the wide applications of vision foundation models to downstream tasks in a cost-effective manner. Extensive experiments on different downstream classification tasks and target model structures demonstrate the effectiveness of our methods with both CNNs and transformer foundation models.
About optimal loss function for training physics-informed neural networks under respecting causality
Authors: Vasiliy A. Es'kin, Danil V. Davydov, Ekaterina D. Egorova, Alexey O. Malkhanov, Mikhail A. Akhukov, Mikhail E. Smorkalov
Subjects: Numerical Analysis (math.NA); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2304.02282
Pdf link: https://arxiv.org/pdf/2304.02282
Abstract A method is presented that allows to reduce a problem described by differential equations with initial and boundary conditions to the problem described only by differential equations. The advantage of using the modified problem for physics-informed neural networks (PINNs) methodology is that it becomes possible to represent the loss function in the form of a single term associated with differential equations, thus eliminating the need to tune the scaling coefficients for the terms related to boundary and initial conditions. The weighted loss functions respecting causality were modified and new weighted loss functions based on generalized functions are derived. Numerical experiments have been carried out for a number of problems, demonstrating the accuracy of the proposed methods.
Deep Quantigraphic Image Enhancement via Comparametric Equations
Authors: Xiaomeng Wu, Yongqing Sun, Akisato Kimura
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02285
Pdf link: https://arxiv.org/pdf/2304.02285
Abstract Most recent methods of deep image enhancement can be generally classified into two types: decompose-and-enhance and illumination estimation-centric. The former is usually less efficient, and the latter is constrained by a strong assumption regarding image reflectance as the desired enhancement result. To alleviate this constraint while retaining high efficiency, we propose a novel trainable module that diversifies the conversion from the low-light image and illumination map to the enhanced image. It formulates image enhancement as a comparametric equation parameterized by a camera response function and an exposure compensation ratio. By incorporating this module in an illumination estimation-centric DNN, our method improves the flexibility of deep image enhancement, limits the computational burden to illumination estimation, and allows for fully unsupervised learning adaptable to the diverse demands of different tasks.
A step towards the applicability of algorithms based on invariant causal learning on observational data
Authors: Borja Guerrero Santillan
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2304.02286
Pdf link: https://arxiv.org/pdf/2304.02286
Abstract Machine learning can benefit from causal discovery for interpretation and from causal inference for generalization. In this line of research, a few invariant learning algorithms for out-of-distribution (OOD) generalization have been proposed by using multiple training environments to find invariant relationships. Some of them are focused on causal discovery as Invariant Causal Prediction (ICP), which finds causal parents of a variable of interest, and some directly provide a causal optimal predictor that generalizes well in OOD environments as Invariant Risk Minimization (IRM). This group of algorithms works under the assumption of multiple environments that represent different interventions in the causal inference context. Those environments are not normally available when working with observational data and real-world applications. Here we propose a method to generate them in an efficient way. We assess the performance of this unsupervised learning problem by implementing ICP on simulated data. We also show how to apply ICP efficiently integrated with our method for causal discovery. Finally, we proposed an improved version of our method in combination with ICP for datasets with multiple covariates where ICP and other causal discovery methods normally degrade in performance.
Efficient Deduplication and Leakage Detection in Large Scale Image Datasets with a focus on the CrowdAI Mapping Challenge Dataset
Authors: Yeshwanth Kumar Adimoolam, Bodhiswatta Chatterjee, Charalambos Poullis, Melinos Averkiou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02296
Pdf link: https://arxiv.org/pdf/2304.02296
Abstract Recent advancements in deep learning and computer vision have led to widespread use of deep neural networks to extract building footprints from remote-sensing imagery. The success of such methods relies on the availability of large databases of high-resolution remote sensing images with high-quality annotations. The CrowdAI Mapping Challenge Dataset is one of these datasets that has been used extensively in recent years to train deep neural networks. This dataset consists of $ \sim\ $280k training images and $ \sim\ $60k testing images, with polygonal building annotations for all images. However, issues such as low-quality and incorrect annotations, extensive duplication of image samples, and data leakage significantly reduce the utility of deep neural networks trained on the dataset. Therefore, it is an imperative pre-condition to adopt a data validation pipeline that evaluates the quality of the dataset prior to its use. To this end, we propose a drop-in pipeline that employs perceptual hashing techniques for efficient de-duplication of the dataset and identification of instances of data leakage between training and testing splits. In our experiments, we demonstrate that nearly 250k($ \sim\ $90%) images in the training split were identical. Moreover, our analysis on the validation split demonstrates that roughly 56k of the 60k images also appear in the training split, resulting in a data leakage of 93%. The source code used for the analysis and de-duplication of the CrowdAI Mapping Challenge dataset is publicly available at https://github.com/yeshwanth95/CrowdAI_Hash_and_search .
FASTAGEDS: Fast Approximate Graph Entity Dependency Discovery
Authors: Guangtong Zhou, Selasi Kwashie, Yidi Zhang, Michael Bewong, Vincent M. Nofong, Debo Cheng, Keqing He, Zaiwen Feng
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2304.02323
Pdf link: https://arxiv.org/pdf/2304.02323
Abstract This paper studies the discovery of approximate rules in property graphs. We propose a semantically meaningful measure of error for mining graph entity dependencies (GEDs) at almost hold, to tolerate errors and inconsistencies that exist in real-world graphs. We present a new characterisation of GED satisfaction, and devise a depth-first search strategy to traverse the search space of candidate rules efficiently. Further, we perform experiments to demonstrate the feasibility and scalability of our solution, FASTAGEDS, with three real-world graphs.
Direction splitting of $\varphi$-functions in exponential integrators for $d$-dimensional problems in Kronecker form
Authors: Marco Caliari, Fabio Cassini
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.02327
Pdf link: https://arxiv.org/pdf/2304.02327
Abstract In this manuscript, we propose an efficient, practical and easy-to-implement way to approximate actions of $\varphi$-functions for matrices with $d$-dimensional Kronecker sum structure in the context of exponential integrators up to second order. The method is based on a direction splitting of the involved matrix functions, which lets us exploit the highly efficient level 3 BLAS for the actual computation of the required actions in a $\mu$-mode fashion. The approach has been successfully tested on two- and three-dimensional problems with various exponential integrators, resulting in a consistent speedup with respect to a technique designed to compute actions of $\varphi$-functions for Kronecker sums.
SMPConv: Self-moving Point Representations for Continuous Convolution
Authors: Sanghyeon Kim, Eunbyung Park
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02330
Pdf link: https://arxiv.org/pdf/2304.02330
Abstract Continuous convolution has recently gained prominence due to its ability to handle irregularly sampled data and model long-term dependency. Also, the promising experimental results of using large convolutional kernels have catalyzed the development of continuous convolution since they can construct large kernels very efficiently. Leveraging neural networks, more specifically multilayer perceptrons (MLPs), is by far the most prevalent approach to implementing continuous convolution. However, there are a few drawbacks, such as high computational costs, complex hyperparameter tuning, and limited descriptive power of filters. This paper suggests an alternative approach to building a continuous convolution without neural networks, resulting in more computationally efficient and improved performance. We present self-moving point representations where weight parameters freely move, and interpolation schemes are used to implement continuous functions. When applied to construct convolutional kernels, the experimental results have shown improved performance with drop-in replacement in the existing frameworks. Due to its lightweight structure, we are first to demonstrate the effectiveness of continuous convolution in a large-scale setting, e.g., ImageNet, presenting the improvements over the prior arts. Our code is available on https://github.com/sangnekim/SMPConv
Efficient Optimization-based Cable Force Allocation for Geometric Control of Multiple Quadrotors Transporting a Payload
Authors: Khaled Wahba, Wolfgang Hönig
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.02359
Pdf link: https://arxiv.org/pdf/2304.02359
Abstract We consider transporting a heavy payload that is attached to multiple quadrotors. The current state-of-the-art controllers either do not avoid inter-robot collision at all, leading to crashes when tasked with carrying payloads that are small in size compared to the cable lengths, or use computational demanding nonlinear optimization. We propose an extension to an existing efficient geometric payload transport controller to effectively avoid such collisions by designing an optimized cable force allocation method, and thus retaining the original stability properties. Our approach introduces a cascade of carefully designed quadratic programs that can be solved efficiently on highly constrained embedded flight controllers. We demonstrate our method on challenging scenarios with up to three small quadrotors with various payloads and cable lengths, with our controller running in real-time directly on the robots.
Robust Performance Analysis for Time-Varying Multi-Agent Systems with Stochastic Packet Loss
Authors: Christian Hespe, Herbert Werner
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.02393
Pdf link: https://arxiv.org/pdf/2304.02393
Abstract Recently, a scalable approach to system analysis and controller synthesis for homogeneous multi-agent systems with Bernoulli distributed packet loss has been proposed. As a key result of that line of work, it was shown how to obtain upper bounds on the $H_2$-norm that are robust with respect to uncertain interconnection topologies. The main contribution of the current paper is to show that the same upper bounds hold not only for uncertain but also time-varying topologies that are superimposed with the stochastic packet loss. Because the results are formulated in terms of linear matrix inequalities that are independent of the number of agents, multi-agent systems of any size can be analysed efficiently. The applicability of the approach is demonstrated on a numerical first-order consensus example, on which the obtained upper bounds are compared to estimates from Monte-Carlo simulations.
Relative Entropy-Based Waveform Optimization for Rician Target Detection with Dual-Function Radar Communication Systems
Authors: Xuyang Wang, Bo Tang, Wenjun Wu, Da Li
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.02409
Pdf link: https://arxiv.org/pdf/2304.02409
Abstract In this paper, we consider waveform design for dualfunction radar-communication systems based on multiple-inputmultiple-out arrays. To achieve better Rician target detection performance, we use the relative entropy associated with the formulated detection problem as the design metric. We also impose a multiuser interference energy constraint on the waveforms to ensure the achievable sum-rate of the communications. Two algorithms are presented to tackle the nonlinear non-convex waveform design problem. In the first algorithm, we derive a quadratic function to minorize the objective function. To tackle the quadratically constrained quadratic programming problem at each iteration, a semidefinite relaxation approach followed by a rank-one decomposition procedure and an efficient alternating direction method of multipliers (ADMM) are proposed, respectively. In the second algorithm, we present a novel ADMM algorithm to tackle the optimization problem and employ an efficient minorization-maximization approach in the inner loop of the ADMM algorithm. Numerical results demonstrate the superiority of both algorithms. Moreover, the presented algorithms can be extended to synthesize peak-to-average-power ratio constrained waveforms, which allows the radio frequency amplifier to operate at an increased efficiency.
Payload Grasping and Transportation by a Quadrotor with a Hook-Based Manipulator
Authors: Péter Antal, Tamás Péni, Roland Tóth
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.02444
Pdf link: https://arxiv.org/pdf/2304.02444
Abstract The paper proposes an efficient trajectory planning and control approach for payload grasping and transportation using an aerial manipulator. The proposed manipulator structure consists of a hook attached to a quadrotor using a 1 DoF revolute joint. To perform payload grasping, transportation, and release, first, time-optimal reference trajectories are designed through specific waypoints to ensure the fast and reliable execution of the tasks. Then, a two-stage motion control approach is developed based on a robust geometric controller for precise and reliable reference tracking and a linear--quadratic payload regulator for rapid setpoint stabilization of the payload swing. The proposed control architecture and design are evaluated in a high-fidelity physical simulator with external disturbances and also in real flight experiments.
Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms
Authors: Valentino Santucci, Josu Ceberio
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2304.02458
Pdf link: https://arxiv.org/pdf/2304.02458
Abstract Problems with solutions represented by permutations are very prominent in combinatorial optimization. Thus, in recent decades, a number of evolutionary algorithms have been proposed to solve them, and among them, those based on probability models have received much attention. In that sense, most efforts have focused on introducing algorithms that are suited for solving ordering/ranking nature problems. However, when it comes to proposing probability-based evolutionary algorithms for assignment problems, the works have not gone beyond proposing simple and in most cases univariate models. In this paper, we explore the use of Doubly Stochastic Matrices (DSM) for optimizing matching and assignment nature permutation problems. To that end, we explore some learning and sampling methods to efficiently incorporate DSMs within the picture of evolutionary algorithms. Specifically, we adopt the framework of estimation of distribution algorithms and compare DSMs to some existing proposals for permutation problems. Conducted preliminary experiments on instances of the quadratic assignment problem validate this line of research and show that DSMs may obtain very competitive results, while computational cost issues still need to be further investigated.
Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings
Authors: Ulf A. Hamster, Ji-Ung Lee, Alexander Geyken, Iryna Gurevych
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.02481
Pdf link: https://arxiv.org/pdf/2304.02481
Abstract Training and inference on edge devices often requires an efficient setup due to computational limitations. While pre-computing data representations and caching them on a server can mitigate extensive edge device computation, this leads to two challenges. First, the amount of storage required on the server that scales linearly with the number of instances. Second, the bandwidth required to send extensively large amounts of data to an edge device. To reduce the memory footprint of pre-computed data representations, we propose a simple, yet effective approach that uses randomly initialized hyperplane projections. To further reduce their size by up to 98.96%, we quantize the resulting floating-point representations into binary vectors. Despite the greatly reduced size, we show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point.
Opening the random forest black box by the analysis of the mutual impact of features
Authors: Lucas F. Voges, Lukas C. Jarren, Stephan Seifert
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.02490
Pdf link: https://arxiv.org/pdf/2304.02490
Abstract Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples. Here we propose two novel approaches that focus on the mutual impact of features in random forests. Mutual forest impact (MFI) is a relation parameter that evaluates the mutual association of the featurs to the outcome and, hence, goes beyond the analysis of correlation coefficients. Mutual impurity reduction (MIR) is an importance measure that combines this relation parameter with the importance of the individual features. MIR and MFI are implemented together with testing procedures that generate p-values for the selection of related and important features. Applications to various simulated data sets and the comparison to other methods for feature selection and relation analysis show that MFI and MIR are very promising to shed light on the complex relationships between features and outcome. In addition, they are not affected by common biases, e.g. that features with many possible splits or high minor allele frequencies are prefered.
Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM
Authors: Uday Kumar Reddy Vengalam, Yongchao Liu, Tong Geng, Hui Wu, Michael Huang
Subjects: Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2304.02525
Pdf link: https://arxiv.org/pdf/2304.02525
Abstract Nature apparently does a lot of computation constantly. If we can harness some of that computation at an appropriate level, we can potentially perform certain type of computation (much) faster and more efficiently than we can do with a von Neumann computer. Indeed, many powerful algorithms are inspired by nature and are thus prime candidates for nature-based computation. One particular branch of this effort that has seen some recent rapid advances is Ising machines. Some Ising machines are already showing better performance and energy efficiency for optimization problems. Through design iterations and co-evolution between hardware and algorithm, we expect more benefits from nature-based computing systems. In this paper, we make a case for an augmented Ising machine suitable for both training and inference using an energy-based machine learning algorithm. We show that with a small change, the Ising substrate accelerate key parts of the algorithm and achieve non-trivial speedup and efficiency gain. With a more substantial change, we can turn the machine into a self-sufficient gradient follower to virtually complete training entirely in hardware. This can bring about 29x speedup and about 1000x reduction in energy compared to a Tensor Processing Unit (TPU) host.
Conformal Off-Policy Evaluation in Markov Decision Processes
Authors: Daniele Foffano, Alessio Russo, Alexandre Proutiere
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.02574
Pdf link: https://arxiv.org/pdf/2304.02574
Abstract Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when experimenting is expensive, risky or unethical). For such applications, the reward of a given policy (the target policy) must be estimated using historical data gathered under a different policy (the behavior policy). Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees. We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty. The main challenge in OPE stems from the distribution shift due to the discrepancies between the target and the behavior policies. We propose and empirically evaluate different ways to deal with this shift. Some of these methods yield conformalized intervals with reduced length compared to existing approaches, while maintaining the same certainty level.
Energy Efficiency of Unsourced Random Access over the Binary-Input Gaussian Channel
Authors: Anton Glebov, Pavel Rybin, Kirill Andreev, Alexey Frolov
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2304.02598
Pdf link: https://arxiv.org/pdf/2304.02598
Abstract We investigate the fundamental limits of the unsourced random access over the binary-input Gaussian channel. By fundamental limits, we mean the minimal energy per bit required to achieve the target per-user probability of error. The original method proposed by Y. Polyanskiy (2017) and based on Gallager's trick does not work well for binary signaling. We utilize Fano's method, which is based on the choice of the so-called ``good'' region. We apply this method for the cases of Gaussian and binary codebooks and obtain two achievability bounds. The first bound is very close to Polyanskiy's bound but does not lead to any improvement. At the same time, the numerical results show that the bound for the binary case practically coincides with the bound for the Gaussian codebook. Thus, we conclude that binary modulation does not lead to performance degradation, and energy-efficient schemes with binary modulation do exist.
A Checklist to Publish Collections as Data in GLAM Institutions
Authors: Gustavo Candela, Nele Gabriëls, Sally Chambers, Thuy-An Pham, Sarah Ames, Neil Fitzgerald, Katrine Hofmann, Victor Harbo, Abigail Potter, Meghan Ferriter, Eileen Manchester, Alba Irollo, Ellen Van Keer, Mahendra Mahey, Olga Holownia, Milena Dobreva
Subjects: Digital Libraries (cs.DL)
Arxiv link: https://arxiv.org/abs/2304.02603
Pdf link: https://arxiv.org/pdf/2304.02603
Abstract Large-scale digitization in Galleries, Libraries, Archives and Museums (GLAM) created the conditions for providing access to collections as data. It opened new opportunities to explore, use and reuse digital collections. Strong proponents of collections as data are the Innovation Labs which provided numerous examples of publishing datasets under open licenses in order to reuse digital content in novel and creative ways. Within the current transition to the emerging data spaces, clouds for cultural heritage and open science, the need to identify practices which support more GLAM institutions to offer datasets becomes a priority, especially within the smaller and medium-sized institutions. This paper answers the need to support GLAM institutions in facilitating the transition into publishing their digital content and to introduce collections as data services; this will also help their future efficient contribution to data spaces and cultural heritage clouds. It offers a checklist that can be used for both creating and evaluating digital collections suitable for computational use. The main contributions of this paper are i) a methodology for devising a checklist to create and assess digital collections for computational use; ii) a checklist to create and assess digital collections suitable for use with computational methods; iii) the assessment of the checklist against the practice of institutions innovating in the Collections as data field; and iv) the results obtained after the application and recommendations for the use of the checklist in GLAM institutions.
Dynamic Point Fields
Authors: Sergey Prokudin, Qianli Ma, Maxime Raafat, Julien Valentin, Siyu Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02626
Pdf link: https://arxiv.org/pdf/2304.02626
Abstract Recent years have witnessed significant progress in the field of neural surface reconstruction. While the extensive focus was put on volumetric and implicit approaches, a number of works have shown that explicit graphics primitives such as point clouds can significantly reduce computational complexity, without sacrificing the reconstructed surface quality. However, less emphasis has been put on modeling dynamic surfaces with point primitives. In this work, we present a dynamic point field model that combines the representational benefits of explicit point-based graphics with implicit deformation networks to allow efficient modeling of non-rigid 3D surfaces. Using explicit surface primitives also allows us to easily incorporate well-established constraints such as-isometric-as-possible regularisation. While learning this deformation model is prone to local optima when trained in a fully unsupervised manner, we propose to additionally leverage semantic information such as keypoint dynamics to guide the deformation learning. We demonstrate our model with an example application of creating an expressive animatable human avatar from a collection of 3D scans. Here, previous methods mostly rely on variants of the linear blend skinning paradigm, which fundamentally limits the expressivity of such models when dealing with complex cloth appearances such as long skirts. We show the advantages of our dynamic point field framework in terms of its representational power, learning efficiency, and robustness to out-of-distribution novel poses.
HNeRV: A Hybrid Neural Representation for Videos
Authors: Hao Chen, Matt Gwilliam, Ser-Nam Lim, Abhinav Shrivastava
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02633
Pdf link: https://arxiv.org/pdf/2304.02633
Abstract Implicit neural representations store videos as neural networks and have performed well for various vision tasks such as video compression and denoising. With frame index or positional index as input, implicit representations (NeRV, E-NeRV, \etc) reconstruct video from fixed and content-agnostic embeddings. Such embedding largely limits the regression capacity and internal generalization for video interpolation. In this paper, we propose a Hybrid Neural Representation for Videos (HNeRV), where a learnable encoder generates content-adaptive embeddings, which act as the decoder input. Besides the input embedding, we introduce HNeRV blocks, which ensure model parameters are evenly distributed across the entire network, such that higher layers (layers near the output) can have more capacity to store high-resolution content and video details. With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks for both reconstruction quality ($+4.7$ PSNR) and convergence speed ($16\times$ faster), and shows better internal generalization. As a simple and efficient video representation, HNeRV also shows decoding advantages for speed, flexibility, and deployment, compared to traditional codecs~(H.264, H.265) and learning-based compression methods. Finally, we explore the effectiveness of HNeRV on downstream tasks such as video compression and video inpainting. We provide project page at https://haochen-rye.github.io/HNeRV, and Code at https://github.com/haochen-rye/HNeRV
Segment Anything
Authors: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.02643
Pdf link: https://arxiv.org/pdf/2304.02643
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.
Keyword: faster

Initialization Approach for Nonlinear State-Space Identification via the Subspace Encoder Approach
Authors: Rishi Ramkannan, Gerben I. Beintema, Roland Tóth, Maarten Schoukens
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.02119
Pdf link: https://arxiv.org/pdf/2304.02119
Abstract The SUBNET neural network architecture has been developed to identify nonlinear state-space models from input-output data. To achieve this, it combines the rolled-out nonlinear state-space equations and a state encoder function, both parameterised as a neural network. The encoder function is introduced to reconstruct the current state from past input-output data. Hence it enables the forward simulation of the rolled-out state-space model. While this approach has shown to provide high-accuracy and consistent model estimation, its convergence can be significantly improved by efficient initialization of the training process. This paper focuses on such an initialisation of the subspace encoder approach using the Best Linear Approximation (BLA). Using the BLA provided state-space matrices and its associated reconstructability map both the state-transition part of the network and the encoder are initialized. The performance of the improved initialisation scheme is evaluated on a Wiener-Hammerstein simulation example and a benchmark dataset. The results show that for a weakly nonlinear system, the proposed initialisation based on the linear reconstructability map results in a faster convergence and a better model quality.
The Bit Complexity of Efficient Continuous Optimization
Authors: Mehrdad Ghadiri, Richard Peng, Santosh S. Vempala
Subjects: Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.02124
Pdf link: https://arxiv.org/pdf/2304.02124
Abstract We analyze the bit complexity of efficient algorithms for fundamental optimization problems, such as linear regression, $p$-norm regression, and linear programming (LP). State-of-the-art algorithms are iterative, and in terms of the number of arithmetic operations, they match the current time complexity of multiplying two $n$-by-$n$ matrices (up to polylogarithmic factors). However, previous work has typically assumed infinite precision arithmetic, and due to complicated inverse maintenance techniques, the actual running times of these algorithms are unknown. To settle the running time and bit complexity of these algorithms, we demonstrate that a core common subroutine, known as \emph{inverse maintenance}, is backward-stable. Additionally, we show that iterative approaches for solving constrained weighted regression problems can be accomplished with bounded-error pre-conditioners. Specifically, we prove that linear programs can be solved approximately in matrix multiplication time multiplied by polylog factors that depend on the condition number $\kappa$ of the matrix and the inner and outer radius of the LP problem. $p$-norm regression can be solved approximately in matrix multiplication time multiplied by polylog factors in $\kappa$. Lastly, linear regression can be solved approximately in input-sparsity time multiplied by polylog factors in $\kappa$. Furthermore, we present results for achieving lower than matrix multiplication time for $p$-norm regression by utilizing faster solvers for sparse linear systems.
Efficient CNNs via Passive Filter Pruning
Authors: Arshdeep Singh, Mark D. Plumbley
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.02319
Pdf link: https://arxiv.org/pdf/2304.02319
Abstract Convolutional neural networks (CNNs) have shown state-of-the-art performance in various applications. However, CNNs are resource-hungry due to their requirement of high computational complexity and memory storage. Recent efforts toward achieving computational efficiency in CNNs involve filter pruning methods that eliminate some of the filters in CNNs based on the \enquote{importance} of the filters. The majority of existing filter pruning methods are either "active", which use a dataset and generate feature maps to quantify filter importance, or "passive", which compute filter importance using entry-wise norm of the filters without involving data. Under a high pruning ratio where large number of filters are to be pruned from the network, the entry-wise norm methods eliminate relatively smaller norm filters without considering the significance of the filters in producing the node output, resulting in degradation in the performance. To address this, we present a passive filter pruning method where the filters are pruned based on their contribution in producing output by considering the operator norm of the filters. The proposed pruning method generalizes better across various CNNs compared to that of the entry-wise norm-based pruning methods. In comparison to the existing active filter pruning methods, the proposed pruning method is at least 4.5 times faster in computing filter importance and is able to achieve similar performance compared to that of the active filter pruning methods. The efficacy of the proposed pruning method is evaluated on audio scene classification and image classification using various CNNs architecture such as VGGish, DCASE21_Net, VGG-16 and ResNet-50.
Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts
Authors: Navid Hashemi, Justin Ruths, Jyotirmoy V. Deshmukh
Subjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.02324
Pdf link: https://arxiv.org/pdf/2304.02324
Abstract Many real-world systems often involve physical components or operating environments with highly nonlinear and uncertain dynamics. A number of different control algorithms can be used to design optimal controllers for such systems, assuming a reasonably high-fidelity model of the actual system. However, the assumptions made on the stochastic dynamics of the model when designing the optimal controller may no longer be valid when the system is deployed in the real-world. The problem addressed by this paper is the following: Suppose we obtain an optimal trajectory by solving a control problem in the training environment, how do we ensure that the real-world system trajectory tracks this optimal trajectory with minimal amount of error in a deployment environment. In other words, we want to learn how we can adapt an optimal trained policy to distribution shifts in the environment. Distribution shifts are problematic in safety-critical systems, where a trained policy may lead to unsafe outcomes during deployment. We show that this problem can be cast as a nonlinear optimization problem that could be solved using heuristic method such as particle swarm optimization (PSO). However, if we instead consider a convex relaxation of this problem, we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.
Unfolded Self-Reconstruction LSH: Towards Machine Unlearning in Approximate Nearest Neighbour Search
Authors: Kim Yong Tan, Lyu Yueming, Yew-Soon Ong, Ivor Tsang
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2304.02350
Pdf link: https://arxiv.org/pdf/2304.02350
Abstract Approximate nearest neighbour (ANN) search is an essential component of search engines, recommendation systems, etc. Many recent works focus on learning-based data-distribution-dependent hashing and achieve good retrieval performance. However, due to increasing demand for users' privacy and security, we often need to remove users' data information from Machine Learning (ML) models to satisfy specific privacy and security requirements. This need requires the ANN search algorithm to support fast online data deletion and insertion. Current learning-based hashing methods need retraining the hash function, which is prohibitable due to the vast time-cost of large-scale data. To address this problem, we propose a novel data-dependent hashing method named unfolded self-reconstruction locality-sensitive hashing (USR-LSH). Our USR-LSH unfolded the optimization update for instance-wise data reconstruction, which is better for preserving data information than data-independent LSH. Moreover, our USR-LSH supports fast online data deletion and insertion without retraining. To the best of our knowledge, we are the first to address the machine unlearning of retrieval problems. Empirically, we demonstrate that USR-LSH outperforms the state-of-the-art data-distribution-independent LSH in ANN tasks in terms of precision and recall. We also show that USR-LSH has significantly faster data deletion and insertion time than learning-based data-dependent hashing.
On the Power of Threshold-Based Algorithms for Detecting Cycles in the CONGEST Model
Authors: Pierre Fraigniaud, Maël Luce, Ioan Todinca
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.02360
Pdf link: https://arxiv.org/pdf/2304.02360
Abstract It is known that, for every $k\geq 2$, $C{2k}$-freeness can be decided by a generic Monte-Carlo algorithm running in $n^{1-1/\Theta(k^2)}$ rounds in the CONGEST model. For $2\leq k\leq 5$, faster Monte-Carlo algorithms do exist, running in $O(n^{1-1/k})$ rounds, based on upper bounding the number of messages to be forwarded, and aborting search sub-routines for which this number exceeds certain thresholds. We investigate the possible extension of these threshold-based algorithms, for the detection of larger cycles. We first show that, for every $k\geq 6$, there exists an infinite family of graphs containing a $2k$-cycle for which any threshold-based algorithm fails to detect that cycle. Hence, in particular, neither $C{12}$-freeness nor $C{14}$-freeness can be decided by threshold-based algorithms. Nevertheless, we show that ${C{12},C_{14}}$-freeness can still be decided by a threshold-based algorithm, running in $O(n^{1-1/7})= O(n^{0.857\dots})$ rounds, which is faster than using the generic algorithm, which would run in $O(n^{1-1/22})\simeq O(n^{0.954\dots})$ rounds. Moreover, we exhibit an infinite collection of families of cycles such that threshold-based algorithms can decide $\mathcal{F}$-freeness for every $\mathcal{F}$ in this collection.
HyPFuzz: Formal-Assisted Processor Fuzzing
Authors: Chen Chen, Rahul Kande, Nathan Nyugen, Flemming Andersen, Aakash Tyagi, Ahmad-Reza Sadeghi, Jeyavijayan Rajendran
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.02485
Pdf link: https://arxiv.org/pdf/2304.02485
Abstract Recent research has shown that hardware fuzzers can effectively detect security vulnerabilities in modern processors. However, existing hardware fuzzers do not fuzz well the hard-to-reach design spaces. Consequently, these fuzzers cannot effectively fuzz security-critical control- and data-flow logic in the processors, hence missing security vulnerabilities. To tackle this challenge, we present HyPFuzz, a hybrid fuzzer that leverages formal verification tools to help fuzz the hard-to-reach part of the processors. To increase the effectiveness of HyPFuzz, we perform optimizations in time and space. First, we develop a scheduling strategy to prevent under- or over-utilization of the capabilities of formal tools and fuzzers. Second, we develop heuristic strategies to select points in the design space for the formal tool to target. We evaluate HyPFuzz on five widely-used open-source processors. HyPFuzz detected all the vulnerabilities detected by the most recent processor fuzzer and found three new vulnerabilities that were missed by previous extensive fuzzing and formal verification. This led to two new common vulnerabilities and exposures (CVE) entries. HyPFuzz also achieves 11.68$\times$ faster coverage than the most recent processor fuzzer.
APIHarvest: Harvesting API Information from Various Online Sources
Authors: Ferdian Thung, Kisub Kim, Ting Zhang, Ivana Clairine Irsan, Ratnadira Widyasari, Zhou Yang, David Lo
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2304.02514
Pdf link: https://arxiv.org/pdf/2304.02514
Abstract Using APIs to develop software applications is the norm. APIs help developers to build applications faster as they do not need to reinvent the wheel. It is therefore important for developers to understand the APIs that they plan to use. Developers should also make themselves aware of relevant information updates about APIs. In order to do so, developers need to find and keep track of relevant information about the APIs that they are concerned with. Yet, the API information is scattered across various online sources, which makes it difficult to track by hand. Moreover, identifying content that is related to an API is not trivial. Motivated by these challenges, in this work, we introduce a tool named \tool that aims to ease the process of finding API information from various online sources. \tool is built on works that link APIs or libraries to various online sources. It supports finding API information on GitHub repositories, Stack Overflow's posts, tweets, YouTube videos, and common vulnerability and exposure (CVE) entries; and is extensible to support other sources.
Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM
Authors: Uday Kumar Reddy Vengalam, Yongchao Liu, Tong Geng, Hui Wu, Michael Huang
Subjects: Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2304.02525
Pdf link: https://arxiv.org/pdf/2304.02525
Abstract Nature apparently does a lot of computation constantly. If we can harness some of that computation at an appropriate level, we can potentially perform certain type of computation (much) faster and more efficiently than we can do with a von Neumann computer. Indeed, many powerful algorithms are inspired by nature and are thus prime candidates for nature-based computation. One particular branch of this effort that has seen some recent rapid advances is Ising machines. Some Ising machines are already showing better performance and energy efficiency for optimization problems. Through design iterations and co-evolution between hardware and algorithm, we expect more benefits from nature-based computing systems. In this paper, we make a case for an augmented Ising machine suitable for both training and inference using an energy-based machine learning algorithm. We show that with a small change, the Ising substrate accelerate key parts of the algorithm and achieve non-trivial speedup and efficiency gain. With a more substantial change, we can turn the machine into a self-sufficient gradient follower to virtually complete training entirely in hardware. This can bring about 29x speedup and about 1000x reduction in energy compared to a Tensor Processing Unit (TPU) host.
HNeRV: A Hybrid Neural Representation for Videos
Authors: Hao Chen, Matt Gwilliam, Ser-Nam Lim, Abhinav Shrivastava
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02633
Pdf link: https://arxiv.org/pdf/2304.02633
Abstract Implicit neural representations store videos as neural networks and have performed well for various vision tasks such as video compression and denoising. With frame index or positional index as input, implicit representations (NeRV, E-NeRV, \etc) reconstruct video from fixed and content-agnostic embeddings. Such embedding largely limits the regression capacity and internal generalization for video interpolation. In this paper, we propose a Hybrid Neural Representation for Videos (HNeRV), where a learnable encoder generates content-adaptive embeddings, which act as the decoder input. Besides the input embedding, we introduce HNeRV blocks, which ensure model parameters are evenly distributed across the entire network, such that higher layers (layers near the output) can have more capacity to store high-resolution content and video details. With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks for both reconstruction quality ($+4.7$ PSNR) and convergence speed ($16\times$ faster), and shows better internal generalization. As a simple and efficient video representation, HNeRV also shows decoding advantages for speed, flexibility, and deployment, compared to traditional codecs~(H.264, H.265) and learning-based compression methods. Finally, we explore the effectiveness of HNeRV on downstream tasks such as video compression and video inpainting. We provide project page at https://haochen-rye.github.io/HNeRV, and Code at https://github.com/haochen-rye/HNeRV
Keyword: mobile

Coarse Grained FLS-based Processor with Prognostic Malfunction Feature for UAM Drones using FPGA
Authors: Hossam O. Ahmed
Subjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.02099
Pdf link: https://arxiv.org/pdf/2304.02099
Abstract Many overall safety factors need to be considered in the next generation of Urban Air Mobility (UAM) systems and addressing these can become the anchor point for such technology to reach consent for worldwide application. On the other hand, fulfilling the safety requirements from an exponential increase of prolific UAM systems, is extremely complicated, and requires careful consideration of a variety of issues. One of the key goals of these Unmanned Air Systems (UAS) is the requirement to support the launch and control of hundreds of thousands of these advanced drones in the air simultaneously. Given the impracticalities of training the corresponding number of expert pilots, achieving this goal can only be realized through safe operation in either fullautonomous or semi-autonomous modes. According to many recent studies, the majority of flight accidents are concentrated on the last three stages of a flight trip, which include the Initial Approach, Final Approach, and Landing Phases of an airplane trip. Therefore, this paper proposes a novel decentralized processing system for enhancing the safety factors during the critical phases of Vertical and/or Short Take-Off and Landing (V/STOL) drones. This has been achieved by adopting several processing and control algorithms such as an Open Fuzzy Logic System (FLS) integrated with a Flight Rules Unit (FRU), FIR filters, and a novel Prognostic Malfunction processing unit. After applying several optimization techniques, this novel coarse-grained Autonomous Landing Guidance Assistance System (ALGAS3) processing architecture has been optimized to achieve a maximum computational processing performance of 70.82 Giga Operations per Second (GOPS). Also, the proposed ALGAS3 system shows an ultra-low dynamic thermal power dissipation (I/O and core) of 145.4 mW which is ideal for mobile avionic systems using INTEL 5CGXFC9D6F27C7 FPGA chip.
Proprioception and reaction for walking among entanglements
Authors: Justin K. Yim, Jiming Ren, David Ologan, Selvin Garcia Gonzalez, Aaron M. Johnson
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.02129
Pdf link: https://arxiv.org/pdf/2304.02129
Abstract Entanglements like vines and branches in natural settings or cords and pipes in human spaces prevent mobile robots from accessing many environments. Legged robots should be effective in these settings, and more so than wheeled or tracked platforms, but naive controllers quickly become entangled and stuck. In this paper we present a method for proprioception aimed specifically at the task of sensing entanglements of a robot's legs as well as a reaction strategy to disentangle legs during their swing phase as they advance to their next foothold. We demonstrate our proprioception and reaction strategy enables traversal of entanglements of many stiffnesses and geometries succeeding in 14 out of 16 trials in laboratory tests, as well as a natural outdoor environment.
Minimum algorithm sizes for self-stabilizing gathering and related problems of autonomous mobile robots
Authors: Yuichi Asahiro, Masafumi Yamashita
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.02212
Pdf link: https://arxiv.org/pdf/2304.02212
Abstract We investigate a swarm of autonomous mobile robots in the Euclidean plane. A robot has a function called {\em target function} to determine the destination point from the robots' positions. All robots in the swarm conventionally take the same target function, but there is apparent limitation in problem-solving ability. We allow the robots to take different target functions. The number of different target functions necessary and sufficient to solve a problem $\Pi$ is called the {\em minimum algorithm size} (MAS) for $\Pi$. We establish the MASs for solving the gathering and related problems from {\bf any} initial configuration, i.e., in a {\bf self-stabilizing} manner. We show, for example, for $1 \leq c \leq n$, there is a problem $\Pi_c$ such that the MAS for the $\Pi_c$ is $c$, where $n$ is the size of swarm. The MAS for the gathering problem is 2, and the MAS for the fault tolerant gathering problem is 3, when $1 \leq f (< n)$ robots may crash, but the MAS for the problem of gathering all robot (including faulty ones) at a point is not solvable (even if all robots have distinct target functions), as long as a robot may crash.
DEFLOW: Self-supervised 3D Motion Estimation of Debris Flow
Authors: Liyuan Zhu, Yuru Jia, Shengyu Huang, Nicholas Meyer, Andreas Wieser, Konrad Schindler, Jordan Aaron
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02569
Pdf link: https://arxiv.org/pdf/2304.02569
Abstract Existing work on scene flow estimation focuses on autonomous driving and mobile robotics, while automated solutions are lacking for motion in nature, such as that exhibited by debris flows. We propose DEFLOW, a model for 3D motion estimation of debris flows, together with a newly captured dataset. We adopt a novel multi-level sensor fusion architecture and self-supervision to incorporate the inductive biases of the scene. We further adopt a multi-frame temporal processing module to enable flow speed estimation over time. Our model achieves state-of-the-art optical flow and depth estimation on our dataset, and fully automates the motion estimation for debris flows. The source code and dataset are available at project page.
Keyword: pruning

Semantic Communications for Image Recovery and Classification via Deep Joint Source and Channel Coding
Authors: Zhonghao Lyu, Guangxu Zhu, Jie Xu, Bo Ai, Shuguang Cui
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.02317
Pdf link: https://arxiv.org/pdf/2304.02317
Abstract With the recent advancements in edge artificial intelligence (AI), future sixth-generation (6G) networks need to support new AI tasks such as classification and clustering apart from data recovery. Motivated by the success of deep learning, the semantic-aware and task-oriented communications with deep joint source and channel coding (JSCC) have emerged as new paradigm shifts in 6G from the conventional data-oriented communications with separate source and channel coding (SSCC). However, most existing works focused on the deep JSCC designs for one task of data recovery or AI task execution independently, which cannot be transferred to other unintended tasks. Differently, this paper investigates the JSCC semantic communications to support multi-task services, by performing the image data recovery and classification task execution simultaneously. First, we propose a new end-to-end deep JSCC framework by unifying the coding rate reduction maximization and the mean square error (MSE) minimization in the loss function. Here, the coding rate reduction maximization facilitates the learning of discriminative features for enabling to perform classification tasks directly in the feature space, and the MSE minimization helps the learning of informative features for high-quality image data recovery. Next, to further improve the robustness against variational wireless channels, we propose a new gated deep JSCC design, in which a gated net is incorporated for adaptively pruning the output features to adjust their dimensions based on channel conditions. Finally, we present extensive numerical experiments to validate the performance of our proposed deep JSCC designs as compared to various benchmark schemes.
Efficient CNNs via Passive Filter Pruning
Authors: Arshdeep Singh, Mark D. Plumbley
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.02319
Pdf link: https://arxiv.org/pdf/2304.02319
Abstract Convolutional neural networks (CNNs) have shown state-of-the-art performance in various applications. However, CNNs are resource-hungry due to their requirement of high computational complexity and memory storage. Recent efforts toward achieving computational efficiency in CNNs involve filter pruning methods that eliminate some of the filters in CNNs based on the \enquote{importance} of the filters. The majority of existing filter pruning methods are either "active", which use a dataset and generate feature maps to quantify filter importance, or "passive", which compute filter importance using entry-wise norm of the filters without involving data. Under a high pruning ratio where large number of filters are to be pruned from the network, the entry-wise norm methods eliminate relatively smaller norm filters without considering the significance of the filters in producing the node output, resulting in degradation in the performance. To address this, we present a passive filter pruning method where the filters are pruned based on their contribution in producing output by considering the operator norm of the filters. The proposed pruning method generalizes better across various CNNs compared to that of the entry-wise norm-based pruning methods. In comparison to the existing active filter pruning methods, the proposed pruning method is at least 4.5 times faster in computing filter importance and is able to achieve similar performance compared to that of the active filter pruning methods. The efficacy of the proposed pruning method is evaluated on audio scene classification and image classification using various CNNs architecture such as VGGish, DCASE21_Net, VGG-16 and ResNet-50.
Keyword: voxel

There is no result

Keyword: lidar

Re-Evaluating LiDAR Scene Flow for Autonomous Driving
Authors: Nathaniel Chodosh, Deva Ramanan, Simon Lucey
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02150
Pdf link: https://arxiv.org/pdf/2304.02150
Abstract Current methods for self-supervised LiDAR scene flow estimation work poorly on real data. A variety of flaws in common evaluation protocols have caused leading approaches to focus on problems that do not exist in real data. We analyze a suite of recent works and find that despite their focus on deep learning, the main challenges of the LiDAR scene flow problem -- removing the dominant rigid motion and robustly estimating the simple motions that remain -- can be more effectively solved with classical techniques such as ICP motion compensation and enforcing piecewise rigid assumptions. We combine these steps with a test-time optimization method to form a state-of-the-art system that does not require any training data. Because our final approach is dataless, it can be applied on different datasets with diverse LiDAR rigs without retraining. Our proposed approach outperforms all existing methods on Argoverse 2.0, halves the error rate on NuScenes, and even rivals the performance of supervised networks on Waymo and lidarKITTI.
GINA-3D: Learning to Generate Implicit Neural Assets in the Wild
Authors: Bokui Shen, Xinchen Yan, Charles R. Qi, Mahyar Najibi, Boyang Deng, Leonidas Guibas, Yin Zhou, Dragomir Anguelov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.02163
Pdf link: https://arxiv.org/pdf/2304.02163
Abstract Modeling the 3D world from sensor data for simulation is a scalable way of developing testing and validation environments for robotic learning problems such as autonomous driving. However, manually creating or re-creating real-world-like environments is difficult, expensive, and not scalable. Recent generative model techniques have shown promising progress to address such challenges by learning 3D assets using only plentiful 2D images -- but still suffer limitations as they leverage either human-curated image datasets or renderings from manually-created synthetic 3D environments. In this paper, we introduce GINA-3D, a generative model that uses real-world driving data from camera and LiDAR sensors to create realistic 3D implicit neural assets of diverse vehicles and pedestrians. Compared to the existing image datasets, the real-world driving setting poses new challenges due to occlusions, lighting-variations and long-tail distributions. GINA-3D tackles these challenges by decoupling representation learning and generative modeling into two stages with a learned tri-plane latent structure, inspired by recent advances in generative modeling of images. To evaluate our approach, we construct a large-scale object-centric dataset containing over 520K images of vehicles and pedestrians from the Waymo Open Dataset, and a new set of 80K images of long-tail instances such as construction equipment, garbage trucks, and cable cars. We compare our model with existing approaches and demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.
Can a Laplace PDE Define Air Corridors through Low-Altitude Airspace?
Authors: Aeris El Asslouj, Ella Atkins, Hossein Rastgoftar
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2304.02175
Pdf link: https://arxiv.org/pdf/2304.02175
Abstract This paper develops a high-density air corridor traffic flow model for Uncrewed Aircraft System (UAS) operation in urban low altitude airspace. To maximize throughput with safe separation guarantees, we define an airspace spatiotemporal planning problem. For the spatial planning, we propose a multi-floor UAS coordination structure divided into a finite number of air corridors safely wrapping buildings and obstacles. We use the USGS Lidar data to map buildings and in turn generate air corridors by modeling UAS coordination as ideal fluid flow with the streamlines obtained by solving the Laplace partial differential equation (PDE). Proper boundary conditions for the differential equations are imposed to direct air corridors along the floors desired motion direction. For temporal planning, we use 4-dimensional path-finding through the corridor network with A* search to maximize airspace usability given each UAS initial and destination waypoint pair.
Keyword: diffusion

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
Authors: Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2304.02051
Pdf link: https://arxiv.org/pdf/2304.02051
Abstract Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations will be publicly released at: https://github.com/aimagelab/multimodal-garment-designer.
A Diffusion-based Method for Multi-turn Compositional Image Generation
Authors: Chao Wang, Xiaoyu Yang, Jinmiao Huang, Kevin Ferreira
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.02192
Pdf link: https://arxiv.org/pdf/2304.02192
Abstract Multi-turn compositional image generation (M-CIG) is a challenging task that aims to iteratively manipulate a reference image given a modification text. While most of the existing methods for M-CIG are based on generative adversarial networks (GANs), recent advances in image generation have demonstrated the superiority of diffusion models over GANs. In this paper, we propose a diffusion-based method for M-CIG named conditional denoising diffusion with image compositional matching (CDD-ICM). We leverage CLIP as the backbone of image and text encoders, and incorporate a gated fusion mechanism, originally proposed for question answering, to compositionally fuse the reference image and the modification text at each turn of M-CIG. We introduce a conditioning scheme to generate the target image based on the fusion results. To prioritize the semantic quality of the generated target image, we learn an auxiliary image compositional match (ICM) objective, along with the conditional denoising diffusion (CDD) objective in a multi-task learning framework. Additionally, we also perform ICM guidance and classifier-free guidance to improve performance. Experimental results show that CDD-ICM achieves state-of-the-art results on two benchmark datasets for M-CIG, i.e., CoDraw and i-CLEVR.
JPEG Compressed Images Can Bypass Protections Against AI Editing
Authors: Pedro Sandoval-Segura, Jonas Geiping, Tom Goldstein
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02234
Pdf link: https://arxiv.org/pdf/2304.02234
Abstract Recently developed text-to-image diffusion models make it easy to edit or create high-quality images. Their ease of use has raised concerns about the potential for malicious editing or deepfake creation. Imperceptible perturbations have been proposed as a means of protecting images from malicious editing by preventing diffusion models from generating realistic images. However, we find that the aforementioned perturbations are not robust to JPEG compression, which poses a major weakness because of the common usage and availability of JPEG. We discuss the importance of robustness for additive imperceptible perturbations and encourage alternative approaches to protect images against editing.
Few-shot Semantic Image Synthesis with Class Affinity Transfer
Authors: Marlène Careil, Jakob Verbeek, Stéphane Lathuilière
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02321
Pdf link: https://arxiv.org/pdf/2304.02321
Abstract Semantic image synthesis aims to generate photo realistic images given a semantic segmentation map. Despite much recent progress, training them still requires large datasets of images annotated with per-pixel label maps that are extremely tedious to obtain. To alleviate the high annotation cost, we propose a transfer method that leverages a model trained on a large source dataset to improve the learning ability on small target datasets via estimated pairwise relations between source and target classes. The class affinity matrix is introduced as a first layer to the source model to make it compatible with the target label maps, and the source model is then further finetuned for the target domain. To estimate the class affinities we consider different approaches to leverage prior knowledge: semantic segmentation on the source domain, textual label embeddings, and self-supervised vision features. We apply our approach to GAN-based and diffusion-based architectures for semantic synthesis. Our experiments show that the different ways to estimate class affinity can be effectively combined, and that our approach significantly improves over existing state-of-the-art transfer approaches for generative image models.
Goal-Conditioned Imitation Learning using Score-based Diffusion Policies
Authors: Moritz Reuss, Maximilian Li, Xiaogang Jia, Rudolf Lioutikov
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.02532
Pdf link: https://arxiv.org/pdf/2304.02532
Abstract We propose a new policy representation based on score-based diffusion models (SDMs). We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL) to learn general-purpose goal-specified policies from large uncurated datasets without rewards. Our new goal-conditioned policy architecture "$\textbf{BE}$havior generation with $\textbf{S}$c$\textbf{O}$re-based Diffusion Policies" (BESO) leverages a generative, score-based diffusion model as its policy. BESO decouples the learning of the score model from the inference sampling process, and, hence allows for fast sampling strategies to generate goal-specified behavior in just 3 denoising steps, compared to 30+ steps of other diffusion based policies. Furthermore, BESO is highly expressive and can effectively capture multi-modality present in the solution space of the play data. Unlike previous methods such as Latent Plans or C-Bet, BESO does not rely on complex hierarchical policies or additional clustering for effective goal-conditioned behavior learning. Finally, we show how BESO can even be used to learn a goal-independent policy from play-data using classifier-free guidance. To the best of our knowledge this is the first work that a) represents a behavior policy based on such a decoupled SDM b) learns an SDM based policy in the domain of GCIL and c) provides a way to simultaneously learn a goal-dependent and a goal-independent policy from play-data. We evaluate BESO through detailed simulation and show that it consistently outperforms several state-of-the-art goal-conditioned imitation learning methods on challenging benchmarks. We additionally provide extensive ablation studies and experiments to demonstrate the effectiveness of our method for effective goal-conditioned behavior generation.
Generative Novel View Synthesis with 3D-Aware Diffusion Models
Authors: Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.02602
Pdf link: https://arxiv.org/pdf/2304.02602
Abstract We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume. This latent feature field captures the distribution over possible scene representations and improves our method's ability to generate view-consistent novel renderings. In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences. We demonstrate state-of-the-art results on synthetic renderings and room-scale scenes; we also show compelling results for challenging, real-world objects.
GenPhys: From Physical Processes to Generative Models
Authors: Ziming Liu, Di Luo, Yilun Xu, Tommi Jaakkola, Max Tegmark
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2304.02637
Pdf link: https://arxiv.org/pdf/2304.02637
Abstract Since diffusion models (DM) and the more recent Poisson flow generative models (PFGM) are inspired by physical processes, it is reasonable to ask: Can physical processes offer additional new generative models? We show that the answer is yes. We introduce a general family, Generative Models from Physical Processes (GenPhys), where we translate partial differential equations (PDEs) describing physical processes to generative models. We show that generative models can be constructed from s-generative PDEs (s for smooth). GenPhys subsume the two existing generative models (DM and PFGM) and even give rise to new families of generative models, e.g., "Yukawa Generative Models" inspired from weak interactions. On the other hand, some physical processes by default do not belong to the GenPhys family, e.g., the wave equation and the Schr\"{o}dinger equation, but could be made into the GenPhys family with some modifications. Our goal with GenPhys is to explore and expand the design space of generative models.
Keyword: dynamic

A Bibliometric Review of Large Language Models Research from 2017 to 2023
Authors: Lizhou Fan, Lingyao Li, Zihui Ma, Sanggyu Lee, Huizi Yu, Libby Hemphill
Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2304.02020
Pdf link: https://arxiv.org/pdf/2304.02020
Abstract Large language models (LLMs) are a class of language models that have demonstrated outstanding performance across a range of natural language processing (NLP) tasks and have become a highly sought-after research area, because of their ability to generate human-like language and their potential to revolutionize science and technology. In this study, we conduct bibliometric and discourse analyses of scholarly literature on LLMs. Synthesizing over 5,000 publications, this paper serves as a roadmap for researchers, practitioners, and policymakers to navigate the current landscape of LLMs research. We present the research trends from 2017 to early 2023, identifying patterns in research paradigms and collaborations. We start with analyzing the core algorithm developments and NLP tasks that are fundamental in LLMs research. We then investigate the applications of LLMs in various fields and domains including medicine, engineering, social science, and humanities. Our review also reveals the dynamic, fast-paced evolution of LLMs research. Overall, this paper offers valuable insights into the current state, impact, and potential of LLMs research and its applications.
Online Joint Assortment-Inventory Optimization under MNL Choices
Authors: Yong Liang, Xiaojie Mao, Shiyuan Wang
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2304.02022
Pdf link: https://arxiv.org/pdf/2304.02022
Abstract We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the realized demands about the attraction parameters while maximizing the expected total profit over time. In this paper, we propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory. Our algorithm builds on a new estimator for the MNL attraction parameters, a novel approach to incentivize exploration by adaptively tuning certain known and unknown parameters, and an optimization oracle to static single-cycle assortment-inventory planning problems with given parameters. We establish a regret upper bound for our algorithm and a lower bound for the online joint assortment-inventory optimization problem, suggesting that our algorithm achieves nearly optimal regret rate, provided that the static optimization oracle is exact. Then we incorporate more practical approximate static optimization oracles into our algorithm, and bound from above the impact of static optimization errors on the regret of our algorithm. At last, we perform numerical studies to demonstrate the effectiveness of our proposed algorithm.
Online augmentation of learned grasp sequence policies for more adaptable and data-efficient in-hand manipulation
Authors: Ethan K. Gordon, Rana Soltani Zarrin
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.02052
Pdf link: https://arxiv.org/pdf/2304.02052
Abstract When using a tool, the grasps used for picking it up, reposing, and holding it in a suitable pose for the desired task could be distinct. Therefore, a key challenge for autonomous in-hand tool manipulation is finding a sequence of grasps that facilitates every step of the tool use process while continuously maintaining force closure and stability. Due to the complexity of modeling the contact dynamics, reinforcement learning (RL) techniques can provide a solution in this continuous space subject to highly parameterized physical models. However, these techniques impose a trade-off in adaptability and data efficiency. At test time the tool properties, desired trajectory, and desired application forces could differ substantially from training scenarios. Adapting to this necessitates more data or computationally expensive online policy updates. In this work, we apply the principles of discrete dynamic programming (DP) to augment RL performance with domain knowledge. Specifically, we first design a computationally simple approximation of our environment. We then demonstrate in physical simulation that performing tree searches (i.e., lookaheads) and policy rollouts with this approximation can improve an RL-derived grasp sequence policy with minimal additional online computation. Additionally, we show that pretraining a deep RL network with the DP-derived solution to the discretized problem can speed up policy training.
A Compositional Resilience Index for Computationally Efficient Safety Analysis of Interconnected Systems
Authors: Luyao Niu, Abdullah Al Maruf, Andrew Clark, J. Sukarno Mertoguno, Radha Poovendran
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.02058
Pdf link: https://arxiv.org/pdf/2304.02058
Abstract Interconnected systems such as power systems and chemical processes are often required to satisfy safety properties in the presence of faults and attacks. Verifying safety of these systems, however, is computationally challenging due to nonlinear dynamics, high dimensionality, and combinatorial number of possible faults and attacks that can be incurred by the subsystems interconnected within the network. In this paper, we develop a compositional resilience index to verify safety properties of interconnected systems under faults and attacks. The resilience index is a tuple serving the following two purposes. First, it quantifies how a safety property is impacted when a subsystem is compromised by faults and attacks. Second, the resilience index characterizes the needed behavior of a subsystem during normal operations to ensure safety violations will not occur when future adverse events occur. We develop a set of sufficient conditions on the dynamics of each subsystem to satisfy its safety constraint, and leverage these conditions to formulate an optimization program to compute the resilience index. When multiple subsystems are interconnected and their resilience indices are given, we show that the safety constraints of the interconnected system can be efficiently verified by solving a system of linear inequalities. We demonstrate our developed resilience index using a numerical case study on chemical reactors connected in series.
Coarse Grained FLS-based Processor with Prognostic Malfunction Feature for UAM Drones using FPGA
Authors: Hossam O. Ahmed
Subjects: Systems and Control (eess.SY); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.02099
Pdf link: https://arxiv.org/pdf/2304.02099
Abstract Many overall safety factors need to be considered in the next generation of Urban Air Mobility (UAM) systems and addressing these can become the anchor point for such technology to reach consent for worldwide application. On the other hand, fulfilling the safety requirements from an exponential increase of prolific UAM systems, is extremely complicated, and requires careful consideration of a variety of issues. One of the key goals of these Unmanned Air Systems (UAS) is the requirement to support the launch and control of hundreds of thousands of these advanced drones in the air simultaneously. Given the impracticalities of training the corresponding number of expert pilots, achieving this goal can only be realized through safe operation in either fullautonomous or semi-autonomous modes. According to many recent studies, the majority of flight accidents are concentrated on the last three stages of a flight trip, which include the Initial Approach, Final Approach, and Landing Phases of an airplane trip. Therefore, this paper proposes a novel decentralized processing system for enhancing the safety factors during the critical phases of Vertical and/or Short Take-Off and Landing (V/STOL) drones. This has been achieved by adopting several processing and control algorithms such as an Open Fuzzy Logic System (FLS) integrated with a Flight Rules Unit (FRU), FIR filters, and a novel Prognostic Malfunction processing unit. After applying several optimization techniques, this novel coarse-grained Autonomous Landing Guidance Assistance System (ALGAS3) processing architecture has been optimized to achieve a maximum computational processing performance of 70.82 Giga Operations per Second (GOPS). Also, the proposed ALGAS3 system shows an ultra-low dynamic thermal power dissipation (I/O and core) of 145.4 mW which is ideal for mobile avionic systems using INTEL 5CGXFC9D6F27C7 FPGA chip.
ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention
Authors: Alec Diaz-Arias, Dmitriy Shin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.02147
Pdf link: https://arxiv.org/pdf/2304.02147
Abstract Recently, fully-transformer architectures have replaced the defacto convolutional architecture for the 3D human pose estimation task. In this paper we propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer that leverages a new \textbf{\textit{dynamic multi-headed convolutional self-attention}} mechanism for monocular 3D human pose estimation. We designed a spatial and temporal convolutional transformer to comprehensively model human joint relations within individual frames and globally across the motion sequence. Moreover, we introduce a novel notion of \textbf{\textit{temporal joints profile}} for our temporal ConvFormer that fuses complete temporal information immediately for a local neighborhood of joint features. We have quantitatively and qualitatively validated our method on three common benchmark datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have been conducted to identify the optimal hyper-parameter set. These experiments demonstrated that we achieved a \textbf{significant parameter reduction relative to prior transformer models} while attaining State-of-the-Art (SOTA) or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on all three metrics for the MPI-INF-3DHP dataset and for all three subjects on HumanEva under Protocol II.
Dynamic Adversarial Resource Allocation: the dDAB Game
Authors: Daigo Shishika, Yue Guan, Jason R. Marden, Michael Dorothy, Panagiotis Tsiotras, Vijay Kumar
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2304.02172
Pdf link: https://arxiv.org/pdf/2304.02172
Abstract This work proposes a dynamic and adversarial resource allocation problem in a graph environment, which is referred to as the dynamic Defender-Attacker Blotto (dDAB) game. A team of defender robots is tasked to ensure numerical advantage at every node in the graph against a team of attacker robots. The engagement is formulated as a discrete-time dynamic game, where the two teams reallocate their robots in sequence and each robot can move at most one hop at each time step. The game terminates with the attacker's victory if any node has more attacker robots than defender robots. Our goal is to identify the necessary and sufficient number of defender robots to guarantee defense. Through a reachability analysis, we first solve the problem for the case where the attacker team stays as a single group. The results are then generalized to the case where the attacker team can freely split and merge into subteams. Crucially, our analysis indicates that there is no incentive for the attacker team to split, which significantly reduces the search space for the attacker's winning strategies and also enables us to design defender counter-strategies using superposition. We also present an efficient numerical algorithm to identify the necessary and sufficient number of defender robots to defend a given graph. Finally, we present illustrative examples to verify the efficacy of the proposed framework.
Redrafting Requirements Modeling Using a Single Multilevel Diagram
Authors: Sabah Al-Fedaghi
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2304.02188
Pdf link: https://arxiv.org/pdf/2304.02188
Abstract The complexity of software-based systems has increased significantly, especially with regards to capturing requirements along with dependencies among requirements. A conceptual model is a way of thinking about and making sense of the real world s complexities. In this paper, we focused on two approaches in this context: (a) multiple models applied to the same system with simultaneous usage of dissimilar notations vs. (b) a single model that utilizes a single framework of notations. In the first approach, inconsistencies arise among models that require a great deal of painstaking discipline and coordination between them. The multiple-model notion is based on the claim that it is not possible to present all application views in a single representation, so diverse models are used, with each model representing a different view. This article advocates a second approach that utilizes a single model with multilevel (static/dynamic and behavioral) specification. To substantiate this approach s feasibility, we embrace the occurrence-only model, which comprises (a) Stoic ontology, (b) thinging machine (TM) language and (c) Lupascian logic. In this paper, we focus on TM modeling as the mechanism of single-model building. We claim that a TM can be a unifying diagrammatic language for virtually all current modeling languages. To demonstrate such a claim, we redraft almost all the diagrammatic representations in The Handbook of Requirements Modeling of the International Requirements Engineering Board. This redrafting includes context, class, activity, use case, data flow and state diagrams. The results seem to indicate that there are no difficulties in representing all views in TM.
Folklore Sampling is Optimal for Exact Hopsets: Confirming the $\sqrt{n}$ Barrier
Authors: Greg Bodwin, Gary Hoppenworth
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.02193
Pdf link: https://arxiv.org/pdf/2304.02193
Abstract For a graph $G$, a $D$-diameter-reducing exact hopset is a small set of additional edges $H$ that, when added to $G$, maintains its graph metric but guarantees that all node pairs have a shortest path in $G \cup H$ using at most $D$ edges. A shortcut set is the analogous concept for reachability. These objects have been studied since the early '90s due to applications in parallel, distributed, dynamic, and streaming graph algorithms. For most of their history, the state-of-the-art construction for either object was a simple folklore algorithm, based on randomly sampling nodes to hit long paths in the graph. However, recent breakthroughs of Kogan and Parter [SODA '22] and Bernstein and Wein [SODA '23] have finally improved over the folklore diameter bound of $\widetilde{O}(n^{1/2})$ for shortcut sets and for $(1+\epsilon)$-approximate hopsets. For both objects it is now known that one can use $O(n)$ hop-edges to reduce diameter to $\widetilde{O}(n^{1/3})$. The only setting where folklore sampling remains unimproved is for exact hopsets. Can these improvements be continued? We settle this question negatively by constructing graphs on which any exact hopset of $O(n)$ edges has diameter $\widetilde{\Omega}(n^{1/2})$. This improves on the previous lower bound of $\widetilde{\Omega}(n^{1/3})$ by Kogan and Parter [FOCS '22]. Using similar ideas, we also polynomially improve the current lower bounds for shortcut sets, constructing graphs on which any shortcut set of $O(n)$ edges reduces diameter to $\widetilde{\Omega}(n^{1/4})$. This improves on the previous lower bound of $\Omega(n^{1/6})$ by Huang and Pettie [SIAM J. Disc. Math. '18]. We also extend our constructions to provide lower bounds against $O(p)$-size exact hopsets and shortcut sets for other values of $p$; in particular, we show that folklore sampling is near-optimal for exact hopsets in the entire range of $p \in [1, n^2]$.
Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models
Authors: Jan van den Brand, Zhao Song, Tianyi Zhou
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2304.02207
Pdf link: https://arxiv.org/pdf/2304.02207
Abstract Large language models (LLMs) have made fundamental changes in human life. The attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1, Transformers, GPT-2, 3, 3.5 and 4. Inspired by previous theoretical study of static version of the attention multiplication problem [Zandieh, Han, Daliri, and Karbasi arXiv 2023, Alman and Song arXiv 2023]. In this work, we formally define a dynamic version of attention matrix multiplication problem. There are matrices $Q,K, V \in \mathbb{R}^{n \times d}$, they represent query, key and value in LLMs. In each iteration we update one entry in $K$ or $V$. In the query stage, we receive $(i,j) \in [n] \times [d]$ as input, and want to answer $(D^{-1} A V)_{i,j}$, where $A:=\exp(QK^\top) \in \mathbb{R}^{n \times n}$ is a square matrix and $D := \mathrm{diag}(A {\bf 1}_n) \in \mathbb{R}^{n \times n}$ is a diagonal matrix. Here ${\bf 1}_n$ denote a length-$n$ vector that all the entries are ones. We provide two results: an algorithm and a conditional lower bound. $\bullet$ On one hand, inspired by the lazy update idea from [Demetrescu and Italiano FOCS 2000, Sankowski FOCS 2004, Cohen, Lee and Song STOC 2019, Brand SODA 2020], we provide a data-structure that uses $O(n^{\omega(1,1,\tau)-\tau})$ amortized update time, and $O(n^{1+\tau})$ worst-case query time. $\bullet$ On the other hand, show that unless the hinted matrix vector multiplication conjecture [Brand, Nanongkai and Saranurak FOCS 2019] is false, there is no algorithm that can use both $O(n^{\omega(1,1,\tau) - \tau- \Omega(1)})$ amortized update time, and $O(n^{1+\tau-\Omega(1)})$ worst query time. In conclusion, our algorithmic result is conditionally optimal unless hinted matrix vector multiplication conjecture is false.
DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation
Authors: Fengyi Shen, Akhil Gurram, Ziyuan Liu, He Wang, Alois Knoll
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02222
Pdf link: https://arxiv.org/pdf/2304.02222
Abstract Domain adaptive semantic segmentation methods commonly utilize stage-wise training, consisting of a warm-up and a self-training stage. However, this popular approach still faces several challenges in each stage: for warm-up, the widely adopted adversarial training often results in limited performance gain, due to blind feature alignment; for self-training, finding proper categorical thresholds is very tricky. To alleviate these issues, we first propose to replace the adversarial training in the warm-up stage by a novel symmetric knowledge distillation module that only accesses the source domain data and makes the model domain generalizable. Surprisingly, this domain generalizable warm-up model brings substantial performance improvement, which can be further amplified via our proposed cross-domain mixture data augmentation technique. Then, for the self-training stage, we propose a threshold-free dynamic pseudo-label selection mechanism to ease the aforementioned threshold problem and make the model better adapted to the target domain. Extensive experiments demonstrate that our framework achieves remarkable and consistent improvements compared to the prior arts on popular benchmarks. Codes and models are available at https://github.com/fy-vision/DiGA
Topological Characterization of Consensus Solvability in Directed Dynamic Networks
Authors: Hugo Rincon Galeana, Ulrich Schmid, Kyrill Winkler, Ami Paz, Stefan Schmid
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.02316
Pdf link: https://arxiv.org/pdf/2304.02316
Abstract Consensus is one of the most fundamental problems in distributed computing. This paper studies the consensus problem in a synchronous dynamic directed network, in which communication is controlled by an oblivious message adversary. The question when consensus is possible in this model has already been studied thoroughly in the literature from a combinatorial perspective, and is known to be challenging. This paper presents a topological perspective on consensus solvability under oblivious message adversaries, which provides interesting new insights. Our main contribution is a topological characterization of consensus solvability, which also leads to explicit decision procedures. Our approach is based on the novel notion of a communication pseudosphere, which can be seen as the message-passing analog of the well-known standard chromatic subdivision for wait-free shared memory systems. We further push the elegance and expressiveness of the "geometric" reasoning enabled by the topological approach by dealing with uninterpreted complexes, which considerably reduce the size of the protocol complex, and by labeling facets with information flow arrows, which give an intuitive meaning to the implicit epistemic status of the faces in a protocol complex.
Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts
Authors: Navid Hashemi, Justin Ruths, Jyotirmoy V. Deshmukh
Subjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.02324
Pdf link: https://arxiv.org/pdf/2304.02324
Abstract Many real-world systems often involve physical components or operating environments with highly nonlinear and uncertain dynamics. A number of different control algorithms can be used to design optimal controllers for such systems, assuming a reasonably high-fidelity model of the actual system. However, the assumptions made on the stochastic dynamics of the model when designing the optimal controller may no longer be valid when the system is deployed in the real-world. The problem addressed by this paper is the following: Suppose we obtain an optimal trajectory by solving a control problem in the training environment, how do we ensure that the real-world system trajectory tracks this optimal trajectory with minimal amount of error in a deployment environment. In other words, we want to learn how we can adapt an optimal trained policy to distribution shifts in the environment. Distribution shifts are problematic in safety-critical systems, where a trained policy may lead to unsafe outcomes during deployment. We show that this problem can be cast as a nonlinear optimization problem that could be solved using heuristic method such as particle swarm optimization (PSO). However, if we instead consider a convex relaxation of this problem, we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.
Constructing and deconstructing bias: modeling privilege and mentorship in agent-based simulations
Authors: Andria L. Smith, Simon Heuschkel, Ksenia Keplinger, Charley M. Wu
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2304.02351
Pdf link: https://arxiv.org/pdf/2304.02351
Abstract Bias exists in how we pick leaders, who we perceive as being influential, and who we interact with, not only in society, but in organizational contexts. Drawing from leadership emergence and social influence theories, we investigate potential interventions that support diverse leaders. Using agent-based simulations, we model a collective search process on a fitness landscape. Agents combine individual and social learning, and are represented as a feature vector blending relevant (e.g., individual learning characteristics) and irrelevant (e.g., race or gender) features. Agents use rational principles of learning to estimate feature weights on the basis of performance predictions, which are used to dynamically define social influence in their network. We show how biases arise based on historic privilege, but can be drastically reduced through the use of an intervention (e.g. mentorship). This work provides important insights into the cognitive mechanisms underlying bias construction and deconstruction, while pointing towards real-world interventions to be tested in future empirical work.
Impact Sensitivity Analysis of Cooperative Adaptive Cruise Control Against Resource-Limited Adversaries
Authors: Mischa Huisman, Carlos Murguia, Erjen Lefeber, Nathan van de Wouw
Subjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.02395
Pdf link: https://arxiv.org/pdf/2304.02395
Abstract Cooperative Adaptive Cruise Control (CACC) is a promising technology that allows groups of vehicles to form in automated tightly-coupled platoons. CACC schemes exploit Vehicle-to-Vehicle (V2V) wireless communications to exchange kinematic information among adjacent vehicles. However, the use of communication networks brings security concerns as cyberattacks could access the vehicles' internal networks and computers to disrupt their operation and even cause crashes. In this manuscript, we present a sensitivity analysis of standard CACC schemes against a class of resource-limited attacks. We present a modelling framework that allows us to systematically compute outer ellipsoidal approximations of reachable sets induced by attacks. We use the size of these sets as a security metric to quantify the potential damage of attacks entering the dynamics at different points and study how two key system parameters (sampling and headway constant) change these metrics. We carry out the latter sensitivity analysis for two different controller implementations (as given the available sensors there is an infinite number of realizations of the same controller) and show how different implementations can significantly affect the impact of attacks. We present extensive simulation experiments to illustrate our ideas.
AutoRL Hyperparameter Landscapes
Authors: Aditya Mohan, Carolin Benjamins, Konrad Wienecke, Alexander Dockhorn, Marius Lindauer
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.02396
Pdf link: https://arxiv.org/pdf/2304.02396
Abstract Although Reinforcement Learning (RL) has shown to be capable of producing impressive results, its use is limited by the impact of its hyperparameters on performance. This often makes it difficult to achieve good results in practice. Automated RL (AutoRL) addresses this difficulty, yet little is known about the dynamics of the hyperparameter landscapes that hyperparameter optimization (HPO) methods traverse in search of optimal configurations. In view of existing AutoRL approaches dynamically adjusting hyperparameter configurations, we propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. Addressing an important open question on the legitimacy of such dynamic AutoRL approaches, we provide thorough empirical evidence that the hyperparameter landscapes strongly vary over time across representative algorithms from RL literature (DQN and SAC) in different kinds of environments (Cartpole and Hopper). This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
Adaptive Data Augmentation for Contrastive Learning
Authors: Yuhan Zhang, He Zhu, Shan Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02451
Pdf link: https://arxiv.org/pdf/2304.02451
Abstract In computer vision, contrastive learning is the most advanced unsupervised learning framework. Yet most previous methods simply apply fixed composition of data augmentations to improve data efficiency, which ignores the changes in their optimal settings over training. Thus, the pre-determined parameters of augmentation operations cannot always fit well with an evolving network during the whole training period, which degrades the quality of the learned representations. In this work, we propose AdDA, which implements a closed-loop feedback structure to a generic contrastive learning network. AdDA works by allowing the network to adaptively adjust the augmentation compositions according to the real-time feedback. This online adjustment helps maintain the dynamic optimal composition and enables the network to acquire more generalizable representations with minimal computational overhead. AdDA achieves competitive results under the common linear protocol on ImageNet-100 classification (+1.11% on MoCo v2).
FPGA-Patch: Mitigating Remote Side-Channel Attacks on FPGAs using Dynamic Patch Generation
Authors: Mahya Morid Ahmadi, Lilas Alrahis, Ozgur Sinanoglu, Muhammad Shafique
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.02510
Pdf link: https://arxiv.org/pdf/2304.02510
Abstract We propose FPGA-Patch, the first-of-its-kind defense that leverages automated program repair concepts to thwart power side-channel attacks on cloud FPGAs. FPGA-Patch generates isofunctional variants of the target hardware by injecting faults and finding transformations that eliminate failure. The obtained variants display different hardware characteristics, ensuring a maximal diversity in power traces once dynamically swapped at run-time. Yet, FPGA-Patch forces the variants to have enough similarity, enabling bitstream compression and minimizing dynamic exchange costs. Considering AES running on AMD/Xilinx FPGA, FPGA-Patch increases the attacker's effort by three orders of magnitude, while preserving the performance of AES and a minimal area overhead of 14.2%.
Sensor-based Planning and Control for Robotic Systems: Introducing Clarity and Perceivability
Authors: Devansh R Agrawal, Dimitra Panagou
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.02578
Pdf link: https://arxiv.org/pdf/2304.02578
Abstract We introduce an information measure, termed clarity, motivated by information entropy, and show that it has intuitive properties relevant to dynamic coverage control and informative path planning. Clarity defines the quality of the information we have about a variable of interest in an environment on a scale of [0, 1], and has useful properties for control and planning such as: (I) clarity lower bounds the expected estimation error of any estimator, and (II) given noisy measurements, clarity monotonically approaches a level q_infty < 1. We establish a connection between coverage controllers and information theory via clarity, suggesting a coverage model that is physically consistent with how information is acquired. Next, we define the notion of perceivability of an environment under a given robotic (or more generally, sensing and control) system, i.e., whether the system has sufficient sensing and actuation capabilities to gather desired information. We show that perceivability relates to the reachability of an augmented system, and derive the corresponding Hamilton-Jacobi-Bellman equations to determine perceivability. In simulations, we demonstrate how clarity is a useful concept for planning trajectories, how perceivability can be determined using reachability analysis, and how a Control Barrier Function (CBF) based controller can dramatically reduce the computational burden.
Dynamic Point Fields
Authors: Sergey Prokudin, Qianli Ma, Maxime Raafat, Julien Valentin, Siyu Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.02626
Pdf link: https://arxiv.org/pdf/2304.02626
Abstract Recent years have witnessed significant progress in the field of neural surface reconstruction. While the extensive focus was put on volumetric and implicit approaches, a number of works have shown that explicit graphics primitives such as point clouds can significantly reduce computational complexity, without sacrificing the reconstructed surface quality. However, less emphasis has been put on modeling dynamic surfaces with point primitives. In this work, we present a dynamic point field model that combines the representational benefits of explicit point-based graphics with implicit deformation networks to allow efficient modeling of non-rigid 3D surfaces. Using explicit surface primitives also allows us to easily incorporate well-established constraints such as-isometric-as-possible regularisation. While learning this deformation model is prone to local optima when trained in a fully unsupervised manner, we propose to additionally leverage semantic information such as keypoint dynamics to guide the deformation learning. We demonstrate our model with an example application of creating an expressive animatable human avatar from a collection of 3D scans. Here, previous methods mostly rely on variants of the linear blend skinning paradigm, which fundamentally limits the expressivity of such models when dealing with complex cloth appearances such as long skirts. We show the advantages of our dynamic point field framework in terms of its representational power, learning efficiency, and robustness to out-of-distribution novel poses.

A-suozhang / GetArxivDaily

New submissions for Thu, 6 Apr 23 #25

Keyword: efficient

A Compositional Resilience Index for Computationally Efficient Safety Analysis of Interconnected Systems

GUTS: Generalized Uncertainty-Aware Thompson Sampling for Multi-Agent Active Search

MadEye: Boosting Live Video Analytics Accuracy with Adaptive Camera Configurations

DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation

Initialization Approach for Nonlinear State-Space Identification via the Subspace Encoder Approach

The Bit Complexity of Efficient Continuous Optimization

Sequential Linearithmic Time Optimal Unimodal Fitting When Minimizing Univariate Linear Losses

Dynamic Adversarial Resource Allocation: the dDAB Game

Explainable Automated Debugging via Large Language Model-driven Scientific Debugging

PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data

METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation

Towards Efficient Task-Driven Model Reprogramming with Foundation Models

About optimal loss function for training physics-informed neural networks under respecting causality

Deep Quantigraphic Image Enhancement via Comparametric Equations

A step towards the applicability of algorithms based on invariant causal learning on observational data

Efficient Deduplication and Leakage Detection in Large Scale Image Datasets with a focus on the CrowdAI Mapping Challenge Dataset

FASTAGEDS: Fast Approximate Graph Entity Dependency Discovery

Direction splitting of $\varphi$-functions in exponential integrators for $d$-dimensional problems in Kronecker form

SMPConv: Self-moving Point Representations for Continuous Convolution

Efficient Optimization-based Cable Force Allocation for Geometric Control of Multiple Quadrotors Transporting a Payload

Robust Performance Analysis for Time-Varying Multi-Agent Systems with Stochastic Packet Loss

Relative Entropy-Based Waveform Optimization for Rician Target Detection with Dual-Function Radar Communication Systems

Payload Grasping and Transportation by a Quadrotor with a Hook-Based Manipulator

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings

Opening the random forest black box by the analysis of the mutual impact of features

Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM

Conformal Off-Policy Evaluation in Markov Decision Processes

Energy Efficiency of Unsourced Random Access over the Binary-Input Gaussian Channel

A Checklist to Publish Collections as Data in GLAM Institutions

Dynamic Point Fields

HNeRV: A Hybrid Neural Representation for Videos

Segment Anything

Keyword: faster

Initialization Approach for Nonlinear State-Space Identification via the Subspace Encoder Approach

The Bit Complexity of Efficient Continuous Optimization

Efficient CNNs via Passive Filter Pruning

Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts

Unfolded Self-Reconstruction LSH: Towards Machine Unlearning in Approximate Nearest Neighbour Search

On the Power of Threshold-Based Algorithms for Detecting Cycles in the CONGEST Model

HyPFuzz: Formal-Assisted Processor Fuzzing

APIHarvest: Harvesting API Information from Various Online Sources

Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM

HNeRV: A Hybrid Neural Representation for Videos

Keyword: mobile

Coarse Grained FLS-based Processor with Prognostic Malfunction Feature for UAM Drones using FPGA

Proprioception and reaction for walking among entanglements

Minimum algorithm sizes for self-stabilizing gathering and related problems of autonomous mobile robots

DEFLOW: Self-supervised 3D Motion Estimation of Debris Flow

Keyword: pruning

Semantic Communications for Image Recovery and Classification via Deep Joint Source and Channel Coding

Efficient CNNs via Passive Filter Pruning

Keyword: voxel

Keyword: lidar

Re-Evaluating LiDAR Scene Flow for Autonomous Driving

GINA-3D: Learning to Generate Implicit Neural Assets in the Wild

Can a Laplace PDE Define Air Corridors through Low-Altitude Airspace?

Keyword: diffusion

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

A Diffusion-based Method for Multi-turn Compositional Image Generation

JPEG Compressed Images Can Bypass Protections Against AI Editing

Few-shot Semantic Image Synthesis with Class Affinity Transfer

Goal-Conditioned Imitation Learning using Score-based Diffusion Policies

Generative Novel View Synthesis with 3D-Aware Diffusion Models

GenPhys: From Physical Processes to Generative Models

Keyword: dynamic

A Bibliometric Review of Large Language Models Research from 2017 to 2023

Online Joint Assortment-Inventory Optimization under MNL Choices

Online augmentation of learned grasp sequence policies for more adaptable and data-efficient in-hand manipulation

A Compositional Resilience Index for Computationally Efficient Safety Analysis of Interconnected Systems

Coarse Grained FLS-based Processor with Prognostic Malfunction Feature for UAM Drones using FPGA

ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention

Dynamic Adversarial Resource Allocation: the dDAB Game

Redrafting Requirements Modeling Using a Single Multilevel Diagram

Folklore Sampling is Optimal for Exact Hopsets: Confirming the $\sqrt{n}$ Barrier

Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models