Abstract
In recent years, newly developed methods to train spiking neural networks (SNNs) have rendered them as a plausible alternative to Artificial Neural Networks (ANNs) in terms of accuracy, while at the same time being much more energy efficient at inference and potentially at training time. However, it is still unclear what constitutes a good initialisation for an SNN. We often use initialisation schemes developed for ANN training which are often inadequate and require manual tuning. In this paper, we attempt to tackle this issue by using techniques from the ANN initialisation literature as well as computational neuroscience results. We show that the problem of weight initialisation for ANNs is a more nuanced problem than it is for ANNs due to the spike-and-reset non-linearity of SNNs and the firing rate collapse problem. We firstly identify and propose several solutions to the firing rate collapse problem under different sets of assumptions which successfully solve the issue by leveraging classical random walk and Wiener processes results. Secondly, we devise a general strategy for SNN initialisation which combines variance propagation techniques from ANNs and different methods to obtain the expected firing rate and membrane potential distribution based on diffusion and shot-noise approximations. Altogether, we obtain theoretical results to solve the SNN initialisation which consider the membrane potential distribution in the presence of a threshold. Yet, to what extent can these methods be successfully applied to SNNs on real datasets remains an open question.
Training Neural Networks without Backpropagation: A Deeper Dive into the Likelihood Ratio Method
Abstract
Backpropagation (BP) is the most important gradient estimation method for training neural networks in deep learning. However, the literature shows that neural networks trained by BP are vulnerable to adversarial attacks. We develop the likelihood ratio (LR) method, a new gradient estimation method, for training a broad range of neural network architectures, including convolutional neural networks, recurrent neural networks, graph neural networks, and spiking neural networks, without recursive gradient computation. We propose three methods to efficiently reduce the variance of the gradient estimation in the neural network training process. Our experiments yield numerical results for training different neural networks on several datasets. All results demonstrate that the LR method is effective for training various neural networks and significantly improves the robustness of the neural networks under adversarial attacks relative to the BP method.
Numerical implementation of generalized V-line transforms on 2D vector fields and their inversions
Authors: Gaik Ambartsoumian, Mohammad Javad Latifi Jebelli, Rohit Kumar Mishra
Abstract
The paper discusses numerical implementations of various inversion schemes for generalized V-line transforms on vector fields introduced in [6]. It demonstrates the possibility of efficient recovery of an unknown vector field from five different types of data sets, with and without noise. We examine the performance of the proposed algorithms in a variety of setups, and illustrate our results with numerical simulations on different phantoms.
Abstract
In this work, we develop a new algorithm to solve large-scale incompressible time-dependent fluid--structure interaction (FSI) problems using a matrix-free finite element method in arbitrary Lagrangian--Eulerian (ALE) frame of reference. We derive a semi-implicit time integration scheme which improves the geometry-convective explicit (GCE) scheme for problems involving the interaction between incompressible hyperelastic solids and incompressible fluids. The proposed algorithm relies on the reformulation of the time-discrete problem as a generalized Stokes problem with strongly variable coefficients, for which optimal preconditioners have recently been developed. The resulting algorithm is scalable, optimal, and robust: we test our implementation on model problems that mimic classical Turek benchmarks in two and three dimensions, and investigate timing and scalability results.
Convex Geometric Trajectory Tracking using Lie Algebraic MPC for Autonomous Marine Vehicles
Abstract
Controlling marine vehicles in challenging environments is a complex task due to the presence of nonlinear hydrodynamics and uncertain external disturbances. Despite nonlinear model predictive control (MPC) showing potential in addressing these issues, its practical implementation is often constrained by computational limitations. In this paper, we propose an efficient controller for trajectory tracking of marine vehicles by employing a convex error-state MPC on the Lie group. By leveraging the inherent geometric properties of the Lie group, we can construct globally valid error dynamics and formulate a quadratic programming-based optimization problem. Our proposed MPC demonstrates effectiveness in trajectory tracking through extensive-numerical simulations, including scenarios involving ocean currents. Notably, our method substantially reduces computation time compared to nonlinear MPC, making it well-suited for real-time control applications with long prediction horizons or involving small marine vehicles.
Gaussian Process Port-Hamiltonian Systems: Bayesian Learning with Physics Prior
Authors: Thomas Beckers, Jacob Seidman, Paris Perdikaris, George J. Pappas
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Abstract
Data-driven approaches achieve remarkable results for the modeling of complex dynamics based on collected data. However, these models often neglect basic physical principles which determine the behavior of any real-world system. This omission is unfavorable in two ways: The models are not as data-efficient as they could be by incorporating physical prior knowledge, and the model itself might not be physically correct. We propose Gaussian Process Port-Hamiltonian systems (GP-PHS) as a physics-informed Bayesian learning approach with uncertainty quantification. The Bayesian nature of GP-PHS uses collected data to form a distribution over all possible Hamiltonians instead of a single point estimate. Due to the underlying physics model, a GP-PHS generates passive systems with respect to designated inputs and outputs. Further, the proposed approach preserves the compositional nature of Port-Hamiltonian systems.
Phase Field Modeling and Numerical Algorithm for Two-Phase Dielectric Fluid Flows
Authors: Jielin Yang, Ivan C. Christov, Suchuan Dong
Abstract
We develop a method for modeling and simulating a class of two-phase flows consisting of two immiscible incompressible dielectric fluids and their interactions with imposed external electric fields in two and three dimensions. We first present a thermodynamically-consistent and reduction-consistent phase field model for two-phase dielectric fluids. The model honors the conservation laws and thermodynamic principles, and has the property that, if only one fluid component is present in the system, the two-phase formulation will exactly reduce to that of the corresponding single-phase system. In particular, this model accommodates an equilibrium solution that is compatible with the zero-velocity requirement based on physics. This property provides a simpler method for simulating the equilibrium state of two-phase dielectric systems. We further present an efficient numerical algorithm, together with a spectral-element (for two dimensions) or a hybrid Fourier-spectral/spectral-element (for three dimensions) discretization in space, for simulating this class of problems. This algorithm computes different dynamic variables successively in an un-coupled fashion, and involves only coefficient matrices that are time-independent in the resultant linear algebraic systems upon discretization, even when the physical properties (e.g. permittivity, density, viscosity) of the two dielectric fluids are different. This property is crucial and enables us to employ fast Fourier transforms for three-dimensional problems. Ample numerical simulations of two-phase dielectric flows under imposed voltage are presented to demonstrate the performance of the method herein and to compare the simulation results with theoretical models and experimental data.
Scalable Adaptive Traffic Light Control Over a Traffic Network Including Transit Delays
Abstract
We study the Traffic Light Control (TLC) problem for a traffic network with multiple intersections in an artery, including the effect of transit delays for vehicles moving from one intersection to the next. The goal is to minimize the overall mean waiting time and improve the ``green wave'' properties in such systems. Using a stochastic hybrid system model with parametric traffic light controllers, we use Infinitesimal Perturbation Analysis (IPA) to derive a data-driven cost gradient estimator with respect to these parameters. We then iteratively adjust them through an online gradient-based algorithm. We show that the event-driven nature of the IPA estimators driving the controllers leads to scalable computationally efficient controllers as the dimensionality of the traffic network increases.
Adaptive Federated Pruning in Hierarchical Wireless Networks
Abstract
Federated Learning (FL) is a promising privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets. Hierarchical FL (HFL), as a device-edge-cloud aggregation hierarchy, can enjoy both the cloud server's access to more datasets and the edge servers' efficient communications with devices. However, the learning latency increases with the HFL network scale due to the increasing number of edge servers and devices with limited local computation capability and communication bandwidth. To address this issue, in this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale. We present the convergence analysis of an upper on the l2 norm of gradients for HFL with model pruning, analyze the computation and communication latency of the proposed model pruning scheme, and formulate an optimization problem to maximize the convergence rate under a given latency threshold by jointly optimizing the pruning ratio and wireless resource allocation. By decoupling the optimization problem and using Karush Kuhn Tucker (KKT) conditions, closed-form solutions of pruning ratio and wireless resource allocation are derived. Simulation results show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
Physics-informed Convolutional Recurrent Surrogate Model for Reservoir Simulation with Well Controls
Authors: Jungang Chen, Eduardo Gildin, John E. Killough (Texas A&M University)
Abstract
This paper presents a novel surrogate model for modeling subsurface fluid flow with well controls using a physics-informed convolutional recurrent neural network (PICRNN). The model uses a convolutional long-short term memory (ConvLSTM) to capture the spatiotemporal dependencies of the state evolution dynamics in the porous flow. The ConvLSTM is linked to the state space equations, enabling the incorporation of a discrete-time sequence of well control. The model requires initial state condition and a sequence of well controls as inputs, and predicts the state variables of the system, such as pressure, as output. By minimizing the residuals of reservoir flow state-space equations, the network is trained without the need for labeled data. The model is designed to serve as a surrogate model for predicting future reservoir states based on the initial reservoir state and input engineering controls. Boundary conditions are enforced into the state-space equations so no additional loss term is needed. Three numerical cases are studied, demonstrating the model's effectiveness in predicting reservoir dynamics based on future well/system controls. The proposed model provides a new approach for efficient and accurate prediction of subsurface fluid flow, with potential applications in optimal control design for reservoir engineering.
Private Training Set Inspection in MLaaS
Authors: Mingxue Xu, Tongtong Xu, Po-Yu Chen
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Databases (cs.DB)
Abstract
Machine Learning as a Service (MLaaS) is a popular cloud-based solution for customers who aim to use an ML model but lack training data, computation resources, or expertise in ML. In this case, the training datasets are typically a private possession of the ML or data companies and are inaccessible to the customers, but the customers still need an approach to confirm that the training datasets meet their expectations and fulfil regulatory measures like fairness. However, no existing work addresses the above customers' concerns. This work is the first attempt to solve this problem, taking data origin as an entry point. We first define origin membership measurement and based on this, we then define diversity and fairness metrics to address customers' concerns. We then propose a strategy to estimate the values of these two metrics in the inaccessible training dataset, combining shadow training techniques from membership inference and an efficient featurization scheme in multiple instance learning. The evaluation contains an application of text review polarity classification applications based on the language BERT model. Experimental results show that our solution can achieve up to 0.87 accuracy for membership inspection and up to 99.3% confidence in inspecting diversity and fairness distribution.
Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering
Authors: Chenyang Lyu, Tianbo Ji, Yvette Graham, Jennifer Foster
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
Abstract
Conventional Transformer-based Video Question Answering (VideoQA) approaches generally encode frames independently through one or more image encoders followed by interaction between frames and question. However, such schema would incur significant memory use and inevitably slow down the training and inference speed. In this work, we present a highly efficient approach for VideoQA based on existing vision-language pre-trained models where we concatenate video frames to a $n\times n$ matrix and then convert it to one image. By doing so, we reduce the use of the image encoder from $n^{2}$ to $1$ while maintaining the temporal structure of the original video. Experimental results on MSRVTT and TrafficQA show that our proposed approach achieves state-of-the-art performance with nearly $4\times$ faster speed and only 30% memory use. We show that by integrating our approach into VideoQA systems we can achieve comparable, even superior, performance with a significant speed up for training and inference. We believe the proposed approach can facilitate VideoQA-related research by reducing the computational requirements for those who have limited access to budgets and resources. Our code will be made publicly available for research use.
Smooth Robustness Measures for Symbolic Control Via Signal Temporal Logic
Authors: Shirantha Welikala, Hai Lin, Panos J. Antsaklis
Abstract
Symbolic control problems aim to synthesize control policies for dynamical systems under complex temporal specifications. For such problems, Signal Temporal Logic (STL) is increasingly used as the formal specification language due to its rich expressiveness. Moreover, the degree of satisfaction of STL specifications can be evaluated using STL robust semantics'' as a scalar robustness measure. This capability of STL enables transforming a symbolic control problem into an optimization problem that optimizes the corresponding robustness measure. However, since these robustness measures are non-smooth and non-convex, exact solutions can only be computed using computationally inefficient mixed-integer programming techniques that do not scale well. Therefore, recent literature has focused on using smooth approximations of these robustness measures to apply scalable and computationally efficient gradient-based methods to find local optima solutions. In this paper, we first generalize two recently established smooth robustness measures (SRMs) and two new ones and discuss their strengths and weaknesses. Next, we proposeSTL error semantics'' to characterize the approximation errors associated with different SRMs under different parameter configurations. This allows one to sensibly select an SRM (to optimize) along with its parameter values. We then propose ``STL gradient semantics'' to derive explicit gradients of SRMs leading to improve computational efficiency as well as accuracy compared to when using numerically estimated gradients. Finally, these contributions are highlighted using extensive simulation results.
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms
Authors: Andres Pastrana-Cruz, Manuel Lafond
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Several NP-hard problems are solved exactly using exponential-time branching strategies, whether it be branch-and-bound algorithms, or bounded search trees in fixed-parameter algorithms. The number of tractable instances that can be handled by sequential algorithms is usually small, whereas massive parallelization has been shown to significantly increase the space of instances that can be solved exactly. However, previous centralized approaches require too much communication to be efficient, whereas decentralized approaches are more efficient but have difficulty keeping track of the global state of the exploration. In this work, we propose to revisit the centralized paradigm while avoiding previous bottlenecks. In our strategy, the center has lightweight responsibilities, requires only a few bits for every communication, but is still able to keep track of the progress of every worker. In particular, the center never holds any task but is able to guarantee that a process with no work always receives the highest priority task globally. Our strategy was implemented in a generic C++ library called GemPBA, which allows a programmer to convert a sequential branching algorithm into a parallel version by changing only a few lines of code. An experimental case study on the vertex cover problem demonstrates that some of the toughest instances from the DIMACS challenge graphs that would take months to solve sequentially can be handled within two hours with our approach.
Graph Reinforcement Learning for Network Control via Bi-Level Optimization
Authors: Daniele Gammelli, James Harrison, Kaidi Yang, Marco Pavone, Filipe Rodrigues, Francisco C. Pereira
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Optimization problems over dynamic networks have been extensively studied and widely used in the past decades to formulate numerous real-world problems. However, (1) traditional optimization-based approaches do not scale to large networks, and (2) the design of good heuristics or approximation algorithms often requires significant manual trial-and-error. In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality. To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems. Instead of naively computing actions over high-dimensional graph elements, e.g., edges, we propose a bi-level formulation where we (1) specify a desired next state via RL, and (2) solve a convex program to best achieve it, leading to drastically improved scalability and performance. We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework.
Deep Ensembling for Perceptual Image Quality Assessment
Authors: Nisar Ahmed, H. M. Shahzad Asif, Abdul Rauf Bhatti, Atif Khan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Blind image quality assessment is a challenging task particularly due to the unavailability of reference information. Training a deep neural network requires a large amount of training data which is not readily available for image quality. Transfer learning is usually opted to overcome this limitation and different deep architectures are used for this purpose as they learn features differently. After extensive experiments, we have designed a deep architecture containing two CNN architectures as its sub-units. Moreover, a self-collected image database BIQ2021 is proposed with 12,000 images having natural distortions. The self-collected database is subjectively scored and is used for model training and validation. It is demonstrated that synthetic distortion databases cannot provide generalization beyond the distortion types used in the database and they are not ideal candidates for general-purpose image quality assessment. Moreover, a large-scale database of 18.75 million images with synthetic distortions is used to pretrain the model and then retrain it on benchmark databases for evaluation. Experiments are conducted on six benchmark databases three of which are synthetic distortion databases (LIVE, CSIQ and TID2013) and three are natural distortion databases (LIVE Challenge Database, CID2013 and KonIQ-10 k). The proposed approach has provided a Pearson correlation coefficient of 0.8992, 0.8472 and 0.9452 subsequently and Spearman correlation coefficient of 0.8863, 0.8408 and 0.9421. Moreover, the performance is demonstrated using perceptually weighted rank correlation to indicate the perceptual superiority of the proposed approach. Multiple experiments are conducted to validate the generalization performance of the proposed model by training on different subsets of the databases and validating on the test subset of BIQ2021 database.
Dual-Alignment Pre-training for Cross-lingual Sentence Embedding
Abstract
Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding. However, our research indicates that token-level alignment is also crucial in multilingual scenarios, which has not been fully explored previously. Based on our findings, we propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding that incorporates both sentence-level and token-level alignment. To achieve this, we introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart. This reconstruction objective encourages the model to embed translation information into the token representation. Compared to other token-level alignment methods such as translation language modeling, RTL is more suitable for dual encoder architectures and is computationally efficient. Extensive experiments on three sentence-level cross-lingual benchmarks demonstrate that our approach can significantly improve sentence embedding. Our code is available at https://github.com/ChillingDream/DAP.
Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data
Authors: Keyu Li, Xinyu Mao, Chengwei Ye, Ang Li, Yangxin Xu, Max Q.-H. Meng
Abstract
Robotic ultrasound (US) systems have shown great potential to make US examinations easier and more accurate. Recently, various machine learning techniques have been proposed to realize automatic US image interpretation for robotic US acquisition tasks. However, obtaining large amounts of real US imaging data for training is usually expensive or even unfeasible in some clinical applications. An alternative is to build a simulator to generate synthetic US data for training, but the differences between simulated and real US images may result in poor model performance. This work presents a Sim2Real framework to efficiently learn robotic US image analysis tasks based only on simulated data for real-world deployment. A style transfer module is proposed based on unsupervised contrastive learning and used as a preprocessing step to convert the real US images into the simulation style. Thereafter, a task-relevant model is designed to combine CNNs with vision transformers to generate the task-dependent prediction with improved generalization ability. We demonstrate the effectiveness of our method in an image regression task to predict the probe position based on US images in robotic transesophageal echocardiography (TEE). Our results show that using only simulated US data and a small amount of unlabelled real data for training, our method can achieve comparable performance to semi-supervised and fully supervised learning methods. Moreover, the effectiveness of our previously proposed CT-based US image simulation method is also indirectly confirmed.
Lightweight Self-Knowledge Distillation with Multi-source Information Fusion
Authors: Xucong Wang, Pengchao Han, Lei Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Knowledge Distillation (KD) is a powerful technique for transferring knowledge between neural network models, where a pre-trained teacher model is used to facilitate the training of the target student model. However, the availability of a suitable teacher model is not always guaranteed. To address this challenge, Self-Knowledge Distillation (SKD) attempts to construct a teacher model from itself. Existing SKD methods add Auxiliary Classifiers (AC) to intermediate layers of the model or use the history models and models with different input data within the same class. However, these methods are computationally expensive and only capture time-wise and class-wise features of data. In this paper, we propose a lightweight SKD framework that utilizes multi-source information to construct a more informative teacher. Specifically, we introduce a Distillation with Reverse Guidance (DRG) method that considers different levels of information extracted by the model, including edge, shape, and detail of the input data, to construct a more informative teacher. Additionally, we design a Distillation with Shape-wise Regularization (DSR) method that ensures a consistent shape of ranked model output for all data. We validate the performance of the proposed DRG, DSR, and their combination through comprehensive experiments on various datasets and models. Our results demonstrate the superiority of the proposed methods over baselines (up to 2.87%) and state-of-the-art SKD methods (up to 1.15%), while being computationally efficient and robust. The code is available at https://github.com/xucong-parsifal/LightSKD.
Machine learning enhanced real-time aerodynamic forces prediction based on sparse pressure sensor inputs
Authors: Junming Duan, Qian Wang, Jan S. Hesthaven
Abstract
Accurate prediction of aerodynamic forces in real-time is crucial for autonomous navigation of unmanned aerial vehicles (UAVs). This paper presents a data-driven aerodynamic force prediction model based on a small number of pressure sensors located on the surface of UAV. The model is built on a linear term that can make a reasonably accurate prediction and a nonlinear correction for accuracy improvement. The linear term is based on a reduced basis reconstruction of the surface pressure distribution, where the basis is extracted from numerical simulation data and the basis coefficients are determined by solving linear pressure reconstruction equations at a set of sensor locations. Sensor placement is optimized using the discrete empirical interpolation method (DEIM). Aerodynamic forces are computed by integrating the reconstructed surface pressure distribution. The nonlinear term is an artificial neural network (NN) that is trained to bridge the gap between the ground truth and the DEIM prediction, especially in the scenario where the DEIM model is constructed from simulation data with limited fidelity. A large network is not necessary for accurate correction as the linear model already captures the main dynamics of the surface pressure field, thus yielding an efficient DEIM+NN aerodynamic force prediction model. The model is tested on numerical and experimental dynamic stall data of a 2D NACA0015 airfoil, and numerical simulation data of dynamic stall of a 3D drone. Numerical results demonstrate that the machine learning enhanced model can make fast and accurate predictions of aerodynamic forces using only a few pressure sensors, even for the NACA0015 case in which the simulations do not agree well with the wind tunnel experiments. Furthermore, the model is robust to noise.
Can we forget how we learned? Representing states in iterated belief revision}
Abstract
The three most common representations of states in iterated belief revision are compared: explicit, by levels and by history. The first is a connected preorder between models, the second is a list of formulae representing equivalence classes, the third is the sequence of the previous revisions. The latter depends on the revision semantics and on history rewriting, and the latter depends on the allowed rewritings. All mechanisms represent all possible states. A rewritten history of lexicographic revision is more efficient than the other considered representations in terms of size with arbitrary history rewritings. Establishing the redundancy of such a history is a mild rewriting. It is coNP-complete in the general case, and is hard even on histories of two revisions or revisions of arbitrary length of Horn formulae, and is polynomial on histories of two Horn formulae. A minor technical result is a polynomial-time algorithm for establishing whether a Horn formula is equivalent to the negation of another Horn formula.
Counterfactual Outcome Prediction using Structured State Space Model
Abstract
Counterfactual outcome prediction in longitudinal data has recently gained attention due to its potential applications in healthcare and social sciences. In this paper, we explore the use of the state space model, a popular sequence model, for this task. Specifically, we compare the performance of two models: Treatment Effect Neural Controlled Differential Equation (TE-CDE) and structured state space model (S4Model). While TE-CDE uses controlled differential equations to address time-dependent confounding, it suffers from optimization issues and slow training. In contrast, S4Model is more efficient at modeling long-range dependencies and easier to train. We evaluate the models on a simulated lung tumor growth dataset and find that S4Model outperforms TE-CDE with 1.63x reduction in per epoch training time and 10x better normalized mean squared error. Additionally, S4Model is more stable during training and less sensitive to weight initialization than TE-CDE. Our results suggest that the state space model may be a promising approach for counterfactual outcome prediction in longitudinal data, with S4Model offering a more efficient and effective alternative to TE-CDE.
PIQI: Perceptual Image Quality Index based on Ensemble of Gaussian Process Regression
Authors: Nisar Ahmed, Hafiz Muhammad Shahzad Asif, Hassan Khalid
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Digital images contain a lot of redundancies, therefore, compression techniques are applied to reduce the image size without loss of reasonable image quality. Same become more prominent in the case of videos which contains image sequences and higher compression ratios are achieved in low throughput networks. Assessment of quality of images in such scenarios has become of particular interest. Subjective evaluation in most of the scenarios is infeasible so objective evaluation is preferred. Among the three objective quality measures, full-reference and reduced-reference methods require an original image in some form to calculate the image quality which is unfeasible in scenarios such as broadcasting, acquisition or enhancement. Therefore, a no-reference Perceptual Image Quality Index (PIQI) is proposed in this paper to assess the quality of digital images which calculates luminance and gradient statistics along with mean subtracted contrast normalized products in multiple scales and color spaces. These extracted features are provided to a stacked ensemble of Gaussian Process Regression (GPR) to perform the perceptual quality evaluation. The performance of the PIQI is checked on six benchmark databases and compared with twelve state-of-the-art methods and competitive results are achieved. The comparison is made based on RMSE, Pearson and Spearman correlation coefficients between ground truth and predicted quality scores. The scores of 0.0552, 0.9802 and 0.9776 are achieved respectively for these metrics on CSIQ database. Two cross-dataset evaluation experiments are performed to check the generalization of PIQI.
A Multi-Client Searchable Encryption Scheme for IoT Environment
Authors: Nazatul H. Sultan, Shabnam Kasra-Kermanshahi, Yen Tran, Shangqi Lai, Vijay Varadharajan, Surya Nepal, Xun Yi
Abstract
The proliferation of connected devices through Internet connectivity presents both opportunities for smart applications and risks to security and privacy. It is vital to proactively address these concerns to fully leverage the potential of the Internet of Things. IoT services where one data owner serves multiple clients, like smart city transportation, smart building management and healthcare can offer benefits but also bring cybersecurity and data privacy risks. For example, in healthcare, a hospital may collect data from medical devices and make it available to multiple clients such as researchers and pharmaceutical companies. This data can be used to improve medical treatments and research but if not protected, it can also put patients' personal information at risk. To ensure the benefits of these services, it is important to implement proper security and privacy measures. In this paper, we propose a symmetric searchable encryption scheme with dynamic updates on a database that has a single owner and multiple clients for IoT environments. Our proposed scheme supports both forward and backward privacy. Additionally, our scheme supports a decentralized storage environment in which data owners can outsource data across multiple servers or even across multiple service providers to improve security and privacy. Further, it takes a minimum amount of effort and costs to revoke a client's access to our system at any time. The performance and formal security analyses of the proposed scheme show that our scheme provides better functionality, and security and is more efficient in terms of computation and storage than the closely related works.
One-shot neural band selection for spectral recovery
Authors: Hai-Miao Hu, Zhenbo Xu, Wenshuai Xu, You Song, YiTao Zhang, Liu Liu, Zhilin Han, Ajin Meng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Band selection has a great impact on the spectral recovery quality. To solve this ill-posed inverse problem, most band selection methods adopt hand-crafted priors or exploit clustering or sparse regularization constraints to find most prominent bands. These methods are either very slow due to the computational cost of repeatedly training with respect to different selection frequencies or different band combinations. Many traditional methods rely on the scene prior and thus are not applicable to other scenarios. In this paper, we present a novel one-shot Neural Band Selection (NBS) framework for spectral recovery. Unlike conventional searching approaches with a discrete search space and a non-differentiable search strategy, our NBS is based on the continuous relaxation of the band selection process, thus allowing efficient band search using gradient descent. To enable the compatibility for se- lecting any number of bands in one-shot, we further exploit the band-wise correlation matrices to progressively suppress similar adjacent bands. Extensive evaluations on the NTIRE 2022 Spectral Reconstruction Challenge demonstrate that our NBS achieves consistent performance gains over competitive baselines when examined with four different spectral recov- ery methods. Our code will be publicly available.
A SKG Security Challenge: Indoor SKG Under an On-The-Shoulder Eavesdropping Attack
Authors: Amitha Mayya, Miroslav Mitev, Arsenia Chorti, Gerhard Fettweis
Abstract
Physical layer security (PLS) is seen as the means to enhance physical layer trustworthiness in 6G. This work provides a proof-of-concept for one of the most mature PLS technologies, i.e., secret key generation (SKG) from wireless fading coefficients during the channel's coherence time. As opposed to other works, where only specific parts of the protocol are typically investigated, here, we implement the full SKG chain in four indoor experimental campaigns. In detail, we consider two legitimate nodes, who use the wireless channel to extract secret keys and a malicious node placed in the immediate vicinity of one of them, who acts as a passive eavesdropper. To estimate the final SKG rate we evaluate the conditional min-entropy by taking into account all information available to the eavesdropper. Finally, we use this paper to announce the first ever physical layer security challenge, mirroring practices in cryptography. We call the community to scrutinize the presented results and try to ``break'' our SKG implementation. To this end, we provide, i) the full dataset observed by the eavesdroppers and all algorithms used, ii) $20$ blocks of $16-$byte long ciphertexts, encrypted using AES-256 with $20$ distilled secret keys, and, iii) all codes and software used in our SKG implementation. An attack will be considered successful if any part(s) of the plaintext are successfully retrieved.
Challenges with the Application of Cyber Security for Airworthiness (CSA) in Real-World Contexts
Authors: Beckett LeClair, James McLeod, Lee Ramsay, Mick Warren
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Abstract
The ever increasing push towards reliance upon computerised technology in commercial, general, and military aerospace brings with it an increasing amount of potential cyber hazards and attacks. Consequently, the variety of attack vectors is greater than ever. Recognized Good Practice standards such as DO 326A and ED 202A attempt to address this by providing guidelines for cyber security on in-service aircraft, though implementation work for such initiatives is still in early stages. From previous work on in service aircraft, the authors have determined that one of the key challenges is that of the retrospective application of new regulations to existing designs. This can present significant requirements for time, money, and Suitably Qualified and Experienced Personnel resource, things which are often in already limited supply in military environments. The authors have previously explored efficient ways of approaching compliance, with promising results. There is still the need to consider this retroactivity challenge in tandem with other key factors affecting the application of CSA, in order to determine any more potential mitigating actions that could lower the barrier to effective and efficient implementation of secure approaches in the air domain. This work explores the interrelated challenges surrounding real-world applications of CSA and the beginnings of how these may be overcome.
Massive Uniform Mesh Decimation via a Fast Intrinsic Delaunay Triangulation
Abstract
Triangular meshes are still today the data structure at the main foundations of many computer graphics applications. With the increasing demand in content variety, a lot of effort has been and is being put into developing new algorithms to automatically generate and edit geometric assets, with a particular focus on 3D scans. However, this kind of content is often generated with a dramatically high resolution, making it impractical for a large variety of tasks. Furthermore, procedural assets and 3D scans largely suffer from poor geometry quality, which makes them unsuitable in various applications. We propose a new efficient technique for massively decimating dense meshes with high vertex count very quickly. The proposed method relies on a fast algorithm for computing geodesic farthest point sampling and Voronoi partitioning, and generates simplified meshes with high-quality uniform triangulations.
Latent Distribution Adjusting for Face Anti-Spoofing
Abstract
With the development of deep learning, the field of face anti-spoofing (FAS) has witnessed great progress. FAS is usually considered a classification problem, where each class is assumed to contain a single cluster optimized by softmax loss. In practical deployment, one class can contain several local clusters, and a single-center is insufficient to capture the inherent structure of the FAS data. However, few approaches consider large distribution discrepancies in the field of FAS. In this work, we propose a unified framework called Latent Distribution Adjusting (LDA) with properties of latent, discriminative, adaptive, generic to improve the robustness of the FAS model by adjusting complex data distribution with multiple prototypes. 1) Latent. LDA attempts to model the data of each class as a Gaussian mixture distribution, and acquire a flexible number of centers for each class in the last fully connected layer implicitly. 2) Discriminative. To enhance the intra-class compactness and inter-class discrepancy, we propose a margin-based loss for providing distribution constrains for prototype learning. 3) Adaptive. To make LDA more efficient and decrease redundant parameters, we propose Adaptive Prototype Selection (APS) by selecting the appropriate number of centers adaptively according to different distributions. 4) Generic. Furthermore, LDA can adapt to unseen distribution by utilizing very few training data without re-training. Extensive experiments demonstrate that our framework can 1) make the final representation space both intra-class compact and inter-class separable, 2) outperform the state-of-the-art methods on multiple standard FAS benchmarks.
AdversarialWord Dilution as Text Data Augmentation in Low-Resource Regime
Abstract
Data augmentation is widely used in text classification, especially in the low-resource regime where a few examples for each class are available during training. Despite the success, generating data augmentations as hard positive examples that may increase their effectiveness is under-explored. This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations to train the low-resource text classification model efficiently. Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding, making the augmented inputs hard to be recognized as positive by the classification model. We adversarially learn the dilution weights through a constrained min-max optimization process with the guidance of the labels. Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods. The additional analysis demonstrates that the data augmentations generated by AWD are interpretable and can flexibly extend to new examples without further training.
On CSI-Free Multi-Antenna Schemes for Massive Wireless-Powered Underground Sensor Networks
Abstract
Radio-frequency wireless energy transfer (WET) is a promising technology to realize wireless-powered underground sensor networks (WPUSNs) and enable sustainable underground monitoring. However, due to the severe attenuation in harsh underground soil and the tight energy budget of the underground sensors, traditional WPUSNs relying on the channel state information (CSI) are highly inefficient, especially in massive WET scenarios. To address this challenge, we comparatively assess the feasibility of several state-of-the-art CSI-free multi-antenna WET schemes for WPUSNs, under a given power budget. Moreover, to overcome the extremely low WET efficiency in underground channels, we propose a distributed CSI-free system, where multiple power beacons (PBs) simultaneously charge a large set of underground sensors without any CSI. We consider the position-aware K-Means and the position-agnostic equally-far-from-center (EFFC) approaches for the optimal deployment of the PBs. Our results evince that the performance of the proposed distributed CSI-free system can approach or even surpass that of a traditional full-CSI WET strategy, especially when adopting an appropriate CSI-free scheme, applying the advisable PBs deployment approach, and equipping the PBs with an appropriate number of antennas. Finally, we discuss the impact of underground parameters, i.e., the burial depth of devices and the volumetric water content of soil, on the system's performance, and identify potential challenges and research opportunities for practical distributed CSI-free WPUSNs deployment.
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Abstract
AI systems empowered by reinforcement learning (RL) algorithms harbor the immense potential to catalyze societal advancement, yet their deployment is often impeded by significant safety concerns. Particularly in safety-critical applications, researchers have raised concerns about unintended harms or unsafe behaviors of unaligned RL agents. The philosophy of safe reinforcement learning (SafeRL) is to align RL agents with harmless intentions and safe behavioral patterns. In SafeRL, agents learn to develop optimal policies by receiving feedback from the environment, while also fulfilling the requirement of minimizing the risk of unintended harm or unsafe behavior. However, due to the intricate nature of SafeRL algorithm implementation, combining methodologies across various domains presents a formidable challenge. This had led to an absence of a cohesive and efficacious learning framework within the contemporary SafeRL research milieu. In this work, we introduce a foundational framework designed to expedite SafeRL research endeavors. Our comprehensive framework encompasses an array of algorithms spanning different RL domains and places heavy emphasis on safety elements. Our efforts are to make the SafeRL-related research process more streamlined and efficient, therefore facilitating further research in AI safety. Our project is released at: https://github.com/PKU-Alignment/omnisafe.
Energy-Efficient WiFi Backscatter Communication for Green IoTs
Abstract
The boom of the Internet of Things has revolutionized people's lives, but it has also resulted in massive resource consumption and environmental pollution. Recently, Green IoT (GIoT) has become a worldwide consensus to address this issue. In this paper, we propose EEWScatter, an energy-efficient WiFi backscatter communication system to pursue the goal of GIoT. Unlike previous backscatter systems that solely focus on tags, our approach offers a comprehensive system-wide view on energy conservation. Specifically, we reuse ambient signals as carriers and utilize an ultra-low-power and battery-free design for tag nodes by backscatter. Further, we design a new CRC-based algorithm that enables the demodulation of both ambient and tag data by only a single receiver while using ambient carriers. Such a design eliminates system reliance on redundant transceivers with high power consumption. Results demonstrate that EEWScatter achieves the lowest overall system power consumption and saves at least half of the energy. What's more, the power consumption of our tag is only 1/1000 of that of active radio. We believe that EEWScatter is a critical step towards a sustainable future.
Bézout identities and control of the heat equation
Authors: François Ollivier
Subjects: Symbolic Computation (cs.SC); Optimization and Control (math.OC)
Abstract
Computing analytic B\'ezout identities remains a difficult task, which has many applications in control theory. Flat PDE systems have cast a new light on this problem. We consider here a simple case of special interest: a rod of length $a+b$, insulated at both ends and heated at point $x=a$. The case $a=0$ is classical, the temperature of the other end $\theta(b,t)$ being then a flat output, with parametrization $\theta(x,t)=\cosh((b-x)(\partial/\partial t)^{1/2}\theta(b,t)$. When $a$ and $b$ are integers, with $a$ odd and $b$ even, the system is flat and the flat output is obtained from the B\'ezout identity $f(x)\cosh(ax)+g(x)\cosh(bx)=1$, the omputation of which boils down to a B\'ezout identity of Chebyshev polynomials. But this form is not the most efficient and a smaller expression $f(x)=\sum{k=1}^{n} c{k}\cosh(kx)$ may be computed in linear time. These results are compared with an approximations by a finite system, using a classical discretization. We provide experimental computations, approximating a non rational value $r$ by a sequence of fractions $b/a$, showing that the power series for the B\'ezout relation seems to converge.
Establishing Shared Query Understanding in an Open Multi-Agent System
Authors: Nikolaos Kondylidis, Ilaria Tiddi, Annette ten Teije
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)
Abstract
We propose a method that allows to develop shared understanding between two agents for the purpose of performing a task that requires cooperation. Our method focuses on efficiently establishing successful task-oriented communication in an open multi-agent system, where the agents do not know anything about each other and can only communicate via grounded interaction. The method aims to assist researchers that work on human-machine interaction or scenarios that require a human-in-the-loop, by defining interaction restrictions and efficiency metrics. To that end, we point out the challenges and limitations of such a (diverse) setup, while also restrictions and requirements which aim to ensure that high task performance truthfully reflects the extent to which the agents correctly understand each other. Furthermore, we demonstrate a use-case where our method can be applied for the task of cooperative query answering. We design the experiments by modifying an established ontology alignment benchmark. In this example, the agents want to query each other, while representing different databases, defined in their own ontologies that contain different and incomplete knowledge. Grounded interaction here has the form of examples that consists of common instances, for which the agents are expected to have similar knowledge. Our experiments demonstrate successful communication establishment under the required restrictions, and compare different agent policies that aim to solve the task in an efficient manner.
Multi-task convolutional neural network for image aesthetic assessment
Authors: Derya Soydaner, Johan Wagemans
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
As people's aesthetic preferences for images are far from understood, image aesthetic assessment is a challenging artificial intelligence task. The range of factors underlying this task is almost unlimited, but we know that some aesthetic attributes affect those preferences. In this study, we present a multi-task convolutional neural network that takes into account these attributes. The proposed neural network jointly learns the attributes along with the overall aesthetic scores of images. This multi-task learning framework allows for effective generalization through the utilization of shared representations. Our experiments demonstrate that the proposed method outperforms the state-of-the-art approaches in predicting overall aesthetic scores for images in one benchmark of image aesthetics. We achieve near-human performance in terms of overall aesthetic scores when considering the Spearman's rank correlations. Moreover, our model pioneers the application of multi-tasking in another benchmark, serving as a new baseline for future research. Notably, our approach achieves this performance while using fewer parameters compared to existing multi-task neural networks in the literature, and consequently makes our method more efficient in terms of computational complexity.
Transformational Supervisor Localization
Authors: Sander Thuijsman, Kai Cai, Michel Reniers
Subjects: Formal Languages and Automata Theory (cs.FL)
Abstract
Supervisor localization can be applied to distribute a monolithic supervisor into local supervisors. Performing supervisor localization can be computationally costly. In this work, we consider systems that evolve over time. We study how to reuse the results from a previous supervisor localization, to more efficiently compute local supervisors when the system is adapted. We call this approach transformational supervisor localization, and present algorithms for the procedure. The efficiency of the procedure is experimentally evaluated.
Towards Automatic Identification of Globally Valid Geometric Flat Outputs via Numerical Optimization
Authors: Jake Welde, Vijay Kumar
Subjects: Robotics (cs.RO); Differential Geometry (math.DG); Optimization and Control (math.OC)
Abstract
Differential flatness enables efficient planning and control for underactuated robotic systems, but we lack a systematic and practical means of identifying a flat output (or determining whether one exists) for an arbitrary robotic system. In this work, we leverage recent results elucidating the role of symmetry in constructing flat outputs for free-flying robotic systems. Using the tools of Riemannian geometry, Lie group theory, and differential forms, we cast the search for a globally valid, equivariant flat output as an optimization problem. An approximate transcription of this continuum formulation to a quadratic program is performed, and its solutions for two example systems achieve precise agreement with the known closed-form flat outputs. Our results point towards a systematic, automated approach to numerically identify geometric flat outputs directly from the system model, particularly useful when complexity renders pen and paper analysis intractable.
Learning Higher-order Object Interactions for Keypoint-based Video Understanding
Authors: Yi Huang, Asim Kadav, Farley Lai, Deep Patel, Hans Peter Graf
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation, and may capture scene context using various modalities that further increases compute costs. Efficient methods such as those used for AR/VR often only use human-keypoint information but suffer from a loss of scene context that hurts accuracy. In this paper, we describe an action-localization method, KeyNet, that uses only the keypoint data for tracking and action recognition. Specifically, KeyNet introduces the use of object based keypoint information to capture context in the scene. Our method illustrates how to build a structured intermediate representation that allows modeling higher-order interactions in the scene from object and human keypoints without using any RGB information. We find that KeyNet is able to track and classify human actions at just 5 FPS. More importantly, we demonstrate that object keypoints can be modeled to recover any loss in context from using keypoint information over AVA action and Kinetics datasets.
Increasing Melanoma Diagnostic Confidence: Forcing the Convolutional Network to Learn from the Lesion
Authors: Norsang Lama, R. Joe Stanley, Anand Nambisan, Akanksha Maurya, Jason Hagerty, William V. Stoecker
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep learning implemented with convolutional network architectures can exceed specialists' diagnostic accuracy. However, whole-image deep learning trained on a given dataset may not generalize to other datasets. The problem arises because extra-lesional features - ruler marks, ink marks, and other melanoma correlates - may serve as information leaks. These extra-lesional features, discoverable by heat maps, degrade melanoma diagnostic performance and cause techniques learned on one data set to fail to generalize. We propose a novel technique to improve melanoma recognition by an EfficientNet model. The model trains the network to detect the lesion and learn features from the detected lesion. A generalizable elliptical segmentation model for lesions was developed, with an ellipse enclosing a lesion and the ellipse enclosed by an extended rectangle (bounding box). The minimal bounding box was extended by 20% to allow some background around the lesion. The publicly available International Skin Imaging Collaboration (ISIC) 2020 skin lesion image dataset was used to evaluate the effectiveness of the proposed method. Our test results show that the proposed method improved diagnostic accuracy by increasing the mean area under receiver operating characteristic curve (mean AUC) score from 0.9 to 0.922. Additionally, correctly diagnosed scores are also improved, providing better separation of scores, thereby increasing melanoma diagnostic confidence. The proposed lesion-focused convolutional technique warrants further study.
Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations
Abstract
Humans use semantic concepts such as spatial relations between objects to describe scenes and communicate tasks such as "Put the tea to the right of the cup" or "Move the plate between the fork and the spoon." Just as children, assistive robots must be able to learn the sub-symbolic meaning of such concepts from human demonstrations and instructions. We address the problem of incrementally learning geometric models of spatial relations from few demonstrations collected online during interaction with a human. Such models enable a robot to manipulate objects in order to fulfill desired spatial relations specified by verbal instructions. At the start, we assume the robot has no geometric model of spatial relations. Given a task as above, the robot requests the user to demonstrate the task once in order to create a model from a single demonstration, leveraging cylindrical probability distribution as generative representation of spatial relations. We show how this model can be updated incrementally with each new demonstration without access to past examples in a sample-efficient way using incremental maximum likelihood estimation, and demonstrate the approach on a real humanoid robot.
InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning
Authors: Lintong Zhang, Tejaswi Digumarti, Georgi Tinchev, Maurice Fallon
Abstract
Localization for autonomous robots in prior maps is crucial for their functionality. This paper offers a solution to this problem for indoor environments called InstaLoc, which operates on an individual lidar scan to localize it within a prior map. We draw on inspiration from how humans navigate and position themselves by recognizing the layout of distinctive objects and structures. Mimicking the human approach, InstaLoc identifies and matches object instances in the scene with those from a prior map. As far as we know, this is the first method to use panoptic segmentation directly inferring on 3D lidar scans for indoor localization. InstaLoc operates through two networks based on spatially sparse tensors to directly infer dense 3D lidar point clouds. The first network is a panoptic segmentation network that produces object instances and their semantic classes. The second smaller network produces a descriptor for each object instance. A consensus based matching algorithm then matches the instances to the prior map and estimates a six degrees of freedom (DoF) pose for the input cloud in the prior map. The significance of InstaLoc is that it has two efficient networks. It requires only one to two hours of training on a mobile GPU and runs in real-time at 1 Hz. Our method achieves between two and four times more detections when localizing, as compared to baseline methods, and achieves higher precision on these detections.
Copula-based Performance Analysis for Fluid Antenna Systems under Arbitrary Fading Channels
Authors: Farshad Rostami Ghadi, Kai-Kit Wong, F. Javier Lopez-Martinez, Kin-Fai Tong
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In this letter, we study the performance of a single-user fluid antenna system (FAS) under arbitrary fading distributions, in which the fading channel coefficients over the ports are correlated. We adopt copula theory to model the structure of dependency between fading coefficients. Specifically, we first derive an exact closed-from expression for the outage probability in the most general case, i.e., for any arbitrary choice of fading distribution and copula. Afterwards, for an important specific case, we analyze the performance of the outage probability under correlated Nakagami-$m$ fading channels by exploiting popular Archimedean copulas, namely, Frank, Clayton, and Gumbel. The results demonstrate that FAS outperforms the conventional single fixed-antenna system in terms of the outage probability. We also see that the spatial correlation dependency structure for the FAS is a key factor to determine its performance, which is natively captured through the choice of copula.
Ray-Patch: An Efficient Decoder for Light Field Transformers
Authors: T. B. Martins, J. Civera
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper we propose the Ray-Patch decoder, a novel model to efficiently query transformers to decode implicit representations into target views. Our Ray-Patch decoding reduces the computational footprint up to two orders of magnitude compared to previous models, without losing global attention, and hence maintaining specific task metrics. The key idea of our novel decoder is to split the target image into a set of patches, then querying the transformer for each patch to extract a set of feature vectors, which are finally decoded into the target image using convolutional layers. Our experimental results quantify the effectiveness of our method, specifically the notable boost in rendering speed and equal specific task metrics for different baselines and datasets.
Deep Fourier Residual method for solving time-harmonic Maxwell's equations
Authors: Jamie M. Taylor, Manuela Bastidas, David Pardo, Ignacio Muga
Abstract
Solving PDEs with machine learning techniques has become a popular alternative to conventional methods. In this context, Neural networks (NNs) are among the most commonly used machine learning tools, and in those models, the choice of an appropriate loss function is critical. In general, the main goal is to guarantee that minimizing the loss during training translates to minimizing the error in the solution at the same rate. In this work, we focus on the time-harmonic Maxwell's equations, whose weak formulation takes H(curl) as the space of test functions. We propose a NN in which the loss function is a computable approximation of the dual norm of the weak-form PDE residual. To that end, we employ the Helmholtz decomposition of the space H(curl) and construct an orthonormal basis for this space in two and three spatial dimensions. Here, we use the Discrete Sine/Cosine Transform to accurately and efficiently compute the discrete version of our proposed loss function. Moreover, in the numerical examples we show a high correlation between the proposed loss function and the H(curl)-norm of the error, even in problems with low-regularity solutions.
Accelerating Communications in Federated Applications with Transparent Object Proxies
Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Advances in networks, accelerators, and cloud services encourage programmers to reconsider where to compute -- such as when fast networks make it cost-effective to compute on remote accelerators despite added latency. Workflow and cloud-hosted serverless computing frameworks can manage multi-step computations spanning federated collections of cloud, high-performance computing (HPC), and edge systems, but passing data among computational steps via cloud storage can incur high costs. Here, we overcome this obstacle with a new programming paradigm that decouples control flow from data flow by extending the pass-by-reference model to distributed applications. We describe ProxyStore, a system that implements this paradigm by providing object proxies that act as wide-area object references with just-in-time resolution. This proxy model enables data producers to communicate data unilaterally, transparently, and efficiently to both local and remote consumers. We demonstrate the benefits of this model with synthetic benchmarks and real-world scientific applications, running across various computing platforms.
Urban-StyleGAN: Learning to Generate and Manipulate Images of Urban Scenes
Authors: George Eskandar, Youssef Farag, Tarun Yenamandra, Daniel Cremers, Karim Guirguis, Bin Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
A promise of Generative Adversarial Networks (GANs) is to provide cheap photorealistic data for training and validating AI models in autonomous driving. Despite their huge success, their performance on complex images featuring multiple objects is understudied. While some frameworks produce high-quality street scenes with little to no control over the image content, others offer more control at the expense of high-quality generation. A common limitation of both approaches is the use of global latent codes for the whole image, which hinders the learning of independent object distributions. Motivated by SemanticStyleGAN (SSG), a recent work on latent space disentanglement in human face generation, we propose a novel framework, Urban-StyleGAN, for urban scene generation and manipulation. We find that a straightforward application of SSG leads to poor results because urban scenes are more complex than human faces. To provide a more compact yet disentangled latent representation, we develop a class grouping strategy wherein individual classes are grouped into super-classes. Moreover, we employ an unsupervised latent exploration algorithm in the $\mathcal{S}$-space of the generator and show that it is more efficient than the conventional $\mathcal{W}^{+}$-space in controlling the image content. Results on the Cityscapes and Mapillary datasets show the proposed approach achieves significantly more controllability and improved image quality than previous approaches on urban scenes and is on par with general-purpose non-controllable generative models (like StyleGAN2) in terms of quality.
Addressing computational challenges in physical system simulations with machine learning
Abstract
In this paper, we present a machine learning-based data generator framework tailored to aid researchers who utilize simulations to examine various physical systems or processes. High computational costs and the resulting limited data often pose significant challenges to gaining insights into these systems or processes. Our approach involves a two-step process: initially, we train a supervised predictive model using a limited simulated dataset to predict simulation outcomes. Subsequently, a reinforcement learning agent is trained to generate accurate, simulation-like data by leveraging the supervised model. With this framework, researchers can generate more accurate data and know the outcomes without running high computational simulations, which enables them to explore the parameter space more efficiently and gain deeper insights into physical systems or processes. We demonstrate the effectiveness of the proposed framework by applying it to two case studies, one focusing on earthquake rupture physics and the other on new material development.
SoundStorm: Efficient Parallel Audio Generation
Authors: Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
Abstract
We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a neural audio codec. Compared to the autoregressive generation approach of AudioLM, our model produces audio of the same quality and with higher consistency in voice and acoustic conditions, while being two orders of magnitude faster. SoundStorm generates 30 seconds of audio in 0.5 seconds on a TPU-v4. We demonstrate the ability of our model to scale audio generation to longer sequences by synthesizing high-quality, natural dialogue segments, given a transcript annotated with speaker turns and a short prompt with the speakers' voices.
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Authors: Yuxin Ren, Zihan Zhong, Xingjian Shi, Yi Zhu, Chun Yuan, Mu Li
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer. In order to enhance the guidance of the teacher training process, we introduce the concept of distillation influence to determine the impact of distillation from each training sample on the student's generalization ability. In this paper, we propose Learning Good Teacher Matters (LGTM), an efficient training technique for incorporating distillation influence into the teacher's learning process. By prioritizing samples that are likely to enhance the student's generalization ability, our LGTM outperforms 10 common knowledge distillation baselines on 6 text classification tasks in the GLUE benchmark.
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Authors: Jose Blanchet, Miao Lu, Tong Zhang, Han Zhong
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
We study distributionally robust offline reinforcement learning (robust offline RL), which seeks to find an optimal robust policy purely from an offline dataset that can perform well in perturbed environments. We propose a generic algorithm framework \underline{D}oubly \underline{P}essimistic \underline{M}odel-based \underline{P}olicy \underline{O}ptimization ($\texttt{P}^2\texttt{MPO}$) for robust offline RL, which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. The \emph{double pessimism} principle is crucial to overcome the distributional shift incurred by i) the mismatch between behavior policy and the family of target policies; and ii) the perturbation of the nominal model. Under certain accuracy assumptions on the model estimation subroutine, we show that $\texttt{P}^2\texttt{MPO}$ is provably efficient with \emph{robust partial coverage data}, which means that the offline dataset has good coverage of the distributions induced by the optimal robust policy and perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples including tabular Robust Markov Decision Process (RMDP), factored RMDP, and RMDP with kernel and neural function approximations, we show that $\texttt{P}^2\texttt{MPO}$ enjoys a $\tilde{\mathcal{O}}(n^{-1/2})$ convergence rate, where $n$ is the number of trajectories in the offline dataset. Notably, these models, except for the tabular case, are first identified and proven tractable by this paper. To the best of our knowledge, we first propose a general learning principle -- double pessimism -- for robust offline RL and show that it is provably efficient in the context of general function approximations.
Keyword: faster
Scalable and Robust Tensor Ring Decomposition for Large-scale Data
Abstract
Tensor ring (TR) decomposition has recently received increased attention due to its superior expressive performance for high-order tensors. However, the applicability of traditional TR decomposition algorithms to real-world applications is hindered by prevalent large data sizes, missing entries, and corruption with outliers. In this work, we propose a scalable and robust TR decomposition algorithm capable of handling large-scale tensor data with missing entries and gross corruptions. We first develop a novel auto-weighted steepest descent method that can adaptively fill the missing entries and identify the outliers during the decomposition process. Further, taking advantage of the tensor ring model, we develop a novel fast Gram matrix computation (FGMC) approach and a randomized subtensor sketching (RStS) strategy which yield significant reduction in storage and computational complexity. Experimental results demonstrate that the proposed method outperforms existing TR decomposition methods in the presence of outliers, and runs significantly faster than existing robust tensor completion algorithms.
Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering
Authors: Chenyang Lyu, Tianbo Ji, Yvette Graham, Jennifer Foster
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
Abstract
Conventional Transformer-based Video Question Answering (VideoQA) approaches generally encode frames independently through one or more image encoders followed by interaction between frames and question. However, such schema would incur significant memory use and inevitably slow down the training and inference speed. In this work, we present a highly efficient approach for VideoQA based on existing vision-language pre-trained models where we concatenate video frames to a $n\times n$ matrix and then convert it to one image. By doing so, we reduce the use of the image encoder from $n^{2}$ to $1$ while maintaining the temporal structure of the original video. Experimental results on MSRVTT and TrafficQA show that our proposed approach achieves state-of-the-art performance with nearly $4\times$ faster speed and only 30% memory use. We show that by integrating our approach into VideoQA systems we can achieve comparable, even superior, performance with a significant speed up for training and inference. We believe the proposed approach can facilitate VideoQA-related research by reducing the computational requirements for those who have limited access to budgets and resources. Our code will be made publicly available for research use.
Component Training of Turbo Autoencoders
Authors: Jannis Clausius, Marvin Geiselhart, Stephan ten Brink
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Abstract
Isolated training with Gaussian priors (TGP) of the component autoencoders of turbo-autoencoder architectures enables faster, more consistent training and better generalization to arbitrary decoding iterations than training based on deep unfolding. We propose fitting the components via extrinsic information transfer (EXIT) charts to a desired behavior which enables scaling to larger message lengths ($k \approx 1000$) while retaining competitive performance. To the best of our knowledge, this is the first autoencoder that performs close to classical codes in this regime. Although the binary cross-entropy (BCE) loss function optimizes the bit error rate (BER) of the components, the design via EXIT charts enables to focus on the block error rate (BLER). In serially concatenated systems the component-wise TGP approach is well known for inner components with a fixed outer binary interface, e.g., a learned inner code or equalizer, with an outer binary error correcting code. In this paper we extend the component training to structures with an inner and outer autoencoder, where we propose a new 1-bit quantization strategy for the encoder outputs based on the underlying communication problem. Finally, we discuss the model complexity of the learned components during design time (training) and inference and show that the number of weights in the encoder can be reduced by 99.96 %.
HyHTM: Hyperbolic Geometry based Hierarchical Topic Models
Abstract
Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addresses these limitations by incorporating hierarchical information from hyperbolic geometry to explicitly model hierarchies in topic models. Experimental results with four baselines show that HyHTM can better attend to parent-child relationships among topics. HyHTM produces coherent topic hierarchies that specialise in granularity from generic higher-level topics to specific lowerlevel topics. Further, our model is significantly faster and leaves a much smaller memory footprint than our best-performing baseline.We have made the source code for our algorithm publicly accessible.
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing
Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang
Abstract
Mobile apps are indispensable for people's daily life, and automated GUI (Graphical User Interface) testing is widely used for app quality assurance. There is a growing interest in using learning-based techniques for automated GUI testing which aims at generating human-like actions and interactions. However, the limitations such as low testing coverage, weak generalization, and heavy reliance on training data, make an urgent need for a more effective approach to generate human-like actions to thoroughly test mobile apps. Inspired by the success of the Large Language Model (LLM), e.g., GPT-3 and ChatGPT, in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process, design prompts for inputting this information to LLM, and develop a neural matching network to decode the LLM's output into actionable steps to execute the app. We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline. GPTDroid also detects 48 new bugs on the Google Play with 25 of them being confirmed/fixed. We further summarize the capabilities of GPTDroid behind the superior performance, including semantic text input, compound action, long meaningful test trace, and test case prioritization.
Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation
Authors: Juan Fumero, György Rethy, Athanasios Stratikopoulos, Nikos Foutris, Christos Kotselidis
Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
This paper presents the Beehive SPIR-V Toolkit; a framework that can automatically generate a Java composable and functional library for dynamically building SPIR-V binary modules. The Beehive SPIR-V Toolkit can be used by optimizing compilers and runtime systems to generate and validate SPIR-V binary modules from managed runtime systems, such as the Java Virtual Machine (JVM). Furthermore, our framework is architected to accommodate new SPIR-V releases in an easy-to-maintain manner, and it facilitates the automatic generation of Java libraries for other standards, besides SPIR-V. The Beehive SPIR-V Toolkit also includes an assembler that emits SPIR-V binary modules from disassembled SPIR-V text files, and a disassembler that converts the SPIR-V binary code into a text file, and a console client application. To the best of our knowledge, the Beehive SPIR-V Toolkit is the first Java programming framework that can dynamically generate SPIR-V binary modules. To demonstrate the use of our framework, we showcase the integration of the SPIR-V Beehive Toolkit in the context of the TornadoVM, a Java framework for automatically offloading and running Java programs on heterogeneous hardware. We show that, via the SPIR-V Beehive Toolkit, the TornadoVM is able to compile code 3x faster than its existing OpenCL C JIT compiler, and it performs up to 1.52x faster than the existing OpenCL C backend in TornadoVM.
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Abstract
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained utilizing a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated the superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code will be publicly released.
An Exploratory Study on the Evidence of Hackathons' Role in Solving OSS Newcomers' Challenges
Authors: Ahmed Samir Imam Mahmoud, Alexander Nolte, Dietmar Pfahl
Abstract
Background: OSS projects face various challenges. One major challenge is to onboard and integrate newcomers to the project. Aim: We aim to understand and discuss the challenges newcomers face when joining an OSS project and present evidence on how hackathons can mitigate those challenges. Method: We conducted two searches on digital libraries to (1) explore challenges faced by newcomers to join OSS projects, and (2) collect evidence on how hackathons were used to address them. We defined four evidence categories (positive, inconclusive, and no evidence) to classify evidence how hackathons address challenges. In addition, we investigated whether a hackathon event was related to an OSS project or not. Result: We identified a range of newcomer challenges that were successfully addressed using hackathons. However, not all of the solutions we identified were applied in the context of OSS. Conclusion: There seems to be potential in using hackathons to overcome newcomers' challenges in OSS projects and allow them to integrate faster into the project.
Robust and lightweight audio fingerprint for Automatic Content Recognition
Abstract
This research paper presents a novel audio fingerprinting system for Automatic Content Recognition (ACR). By using signal processing techniques and statistical transformations, our proposed method generates compact fingerprints of audio segments that are robust to noise degradations present in real-world audio. The system is designed to be highly scalable, with the ability to identify thousands of hours of content using fingerprints generated from millions of TVs. The fingerprint's high temporal correlation and utilization of existing GPU-compatible Approximate Nearest Neighbour (ANN) search algorithms make this possible. Furthermore, the fingerprint generation can run on low-power devices with limited compute, making it accessible to a wide range of applications. Experimental results show improvements in our proposed system compared to a min-hash based audio fingerprint on all evaluated metrics, including accuracy on proprietary ACR datasets, retrieval speed, memory usage, and robustness to various noises. For similar retrieval accuracy, our system is 30x faster and uses 6x fewer fingerprints than the min-hash method.
SoundStorm: Efficient Parallel Audio Generation
Authors: Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
Abstract
We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a neural audio codec. Compared to the autoregressive generation approach of AudioLM, our model produces audio of the same quality and with higher consistency in voice and acoustic conditions, while being two orders of magnitude faster. SoundStorm generates 30 seconds of audio in 0.5 seconds on a TPU-v4. We demonstrate the ability of our model to scale audio generation to longer sequences by synthesizing high-quality, natural dialogue segments, given a transcript annotated with speaker turns and a short prompt with the speakers' voices.
Keyword: mobile
Exploration of unknown indoor regions by a swarm of energy-constrained drones
Authors: Ori Rappel, Joseph Z. Ben-Asher, Alfred M. Bruckstein
Abstract
Several distributed algorithms are presented for the exploration of unknown indoor regions by a swarm of flying, energy constrained agents. The agents, which are identical, autonomous, anonymous and oblivious, uniformly cover the region and thus explore it using predefined action rules based on locally sensed information and the energy level of the agents. While flying drones have many advantages in search and rescue scenarios, their main drawback is a high power consumption during flight combined with limited, on-board energy. Furthermore, in these scenarios agent size is severely limited and consequently so are the total weight and capabilities of the agents. The region is modeled as a connected sub-set of a regular grid composed of square cells that the agents enter, over time, via entry points. Some of the agents may settle in unoccupied cells as the exploration progresses. Settled agents conserve energy and become virtual pheromones for the exploration and coverage process, beacons that subsequently aid the remaining, and still exploring, mobile agents. The termination of the coverage process is based on a backward propagating information diffusion scheme. Various algorithmical alternatives are discussed and upper bounds derived and compared to experimental results. Finally, an optimal entry rate that minimizes the total energy consumption is derived for the case of a linear regions.
Trust-Worthy Semantic Communications for the Metaverse Relying on Federated Learning
Abstract
As an evolving successor to the mobile Internet, the Metaverse creates the impression of an immersive environment, integrating the virtual as well as the real world. In contrast to the traditional mobile Internet based on servers, the Metaverse is constructed by billions of cooperating users by harnessing their smart edge devices having limited communication and computation resources. In this immersive environment an unprecedented amount of multi-modal data has to be processed. To circumvent this impending bottleneck, low-rate semantic communication might be harnessed in support of the Metaverse. But given that private multi-modal data is exchanged in the Metaverse, we have to guard against security breaches and privacy invasions. Hence we conceive a trust-worthy semantic communication system for the Metaverse based on a federated learning architecture by exploiting its distributed decision-making and privacy-preserving capability. We conclude by identifying a suite of promising research directions and open issues.
Age of Incorrect Information in Semantic Communications for NOMA Aided XR Applications
Authors: Jianrui Chen, Jingjing Wang, Chunxiao Jiang, Jiaxing Wang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
As an evolving successor to the mobile Internet, the extended reality (XR) devices can generate a fully digital immersive environment similar to the real world, integrating integrating virtual and real-world elements. However, in addition to the difficulties encountered in traditional communications, there emerge a range of new challenges such as ultra-massive access, real-time synchronization as well as unprecedented amount of multi-modal data transmission and processing. To address these challenges, semantic communications might be harnessed in support of XR applications, whereas it lacks a practical and effective performance metric. For broadening a new path for evaluating semantic communications, in this paper, we construct a multi-user uplink non-orthogonal multiple access (NOMA) system to analyze its transmission performance by harnessing a novel metric called age of incorrect information (AoII). First, we derive the average semantic similarity of all the users based on DeepSC and obtain the closed-form expressions for the packets' age of information (AoI) relying on queue theory. Besides, we formulate a non-convex optimization problem for the proposed AoII which combines both error-and AoI-based performance under the constraints of semantic rate, transmit power and status update rate. Finally, in order to solve the problem, we apply an exact linear search based algorithm for finding the optimal policy. Simulation results show that the AoII metric can beneficially evaluate both the error- and AoI-based transmission performance simultaneously.
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing
Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang
Abstract
Mobile apps are indispensable for people's daily life, and automated GUI (Graphical User Interface) testing is widely used for app quality assurance. There is a growing interest in using learning-based techniques for automated GUI testing which aims at generating human-like actions and interactions. However, the limitations such as low testing coverage, weak generalization, and heavy reliance on training data, make an urgent need for a more effective approach to generate human-like actions to thoroughly test mobile apps. Inspired by the success of the Large Language Model (LLM), e.g., GPT-3 and ChatGPT, in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process, design prompts for inputting this information to LLM, and develop a neural matching network to decode the LLM's output into actionable steps to execute the app. We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline. GPTDroid also detects 48 new bugs on the Google Play with 25 of them being confirmed/fixed. We further summarize the capabilities of GPTDroid behind the superior performance, including semantic text input, compound action, long meaningful test trace, and test case prioritization.
A sequential transit network design algorithm with optimal learning under correlated beliefs
Authors: Gyugeun Yoon, Joseph Y. J. Chow
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract
Mobility service route design requires potential demand information to well accommodate travel demand within the service region. Transit planners and operators can access various data sources including household travel survey data and mobile device location logs. However, when implementing a mobility system with emerging technologies, estimating demand level becomes harder because of more uncertainties with user behaviors. Therefore, this study proposes an artificial intelligence-driven algorithm that combines sequential transit network design with optimal learning. An operator gradually expands its route system to avoid risks from inconsistency between designed routes and actual travel demand. At the same time, observed information is archived to update the knowledge that the operator currently uses. Three learning policies are compared within the algorithm: multi-armed bandit, knowledge gradient, and knowledge gradient with correlated beliefs. For validation, a new route system is designed on an artificial network based on public use microdata areas in New York City. Prior knowledge is reproduced from the regional household travel survey data. The results suggest that exploration considering correlations can achieve better performance compared to greedy choices in general. In future work, the problem may incorporate more complexities such as demand elasticity to travel time, no limitations to the number of transfers, and costs for expansion.
Your Identity is Your Behavior -- Continuous User Authentication based on Machine Learning and Touch Dynamics
Authors: Brendan Pelto, Mounika Vanamala, Rushit Dave
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
The aim of this research paper is to look into the use of continuous authentication with mobile touch dynamics, using three different algorithms: Neural Network, Extreme Gradient Boosting, and Support Vector Machine. Mobile devices are constantly increasing in popularity in the world, today smartphone subscriptions have surpassed 6 billion. Mobile touch dynamics refer to the distinct patterns of how a user interacts with their mobile device, this includes factors such as touch pressure, swipe speed, and touch duration. Continuous authentication refers to the process of continuously verifying a user's identity while they are using a device, rather than just at the initial login. This research used a dataset of touch dynamics collected from 40 subjects using the LG V30+. The participants played four mobile games, PUBG, Diep.io, Slither, and Minecraft, for 10 minutes each game. The three algorithms were trained and tested on the extracted dataset, and their performance was evaluated based on metrics such as accuracy, precision, false negative rate, and false positive rate. The results of the research showed that all three algorithms were able to effectively classify users based on their individual touch dynamics, with accuracy ranging from 80% to 95%. The Neural Network algorithm performed the best, achieving the highest accuracy and precision scores, followed closely by XGBoost and SVC. The data shows that continuous authentication using mobile touch dynamics has the potential to be a useful method for enhancing security and reducing the risk of unauthorized access to personal devices. This research also notes the importance of choosing the correct algorithm for a given dataset and use case, as different algorithms may have varying levels of performance depending on the specific task.
Curious Rhythms: Temporal Regularities of Wikipedia Consumption
Authors: Tiziano Piccardi, Martin Gerlach, Robert West
Subjects: Computers and Society (cs.CY); Digital Libraries (cs.DL)
Abstract
Wikipedia, in its role as the world's largest encyclopedia, serves a broad range of information needs. Although previous studies have noted that Wikipedia users' information needs vary throughout the day, there is to date no large-scale, quantitative study of the underlying dynamics. The present paper fills this gap by investigating temporal regularities in daily consumption patterns in a large-scale analysis of billions of timezone-corrected page requests mined from English Wikipedia's server logs, with the goal of investigating how context and time relate to the kind of information consumed. First, we show that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities. Then, we characterize the prototypical shapes of consumption patterns, finding a particularly strong distinction between articles preferred during the evening/night and articles preferred during working hours. Finally, we investigate topical and contextual correlates of Wikipedia articles' access rhythms, finding that article topic, reader country, and access device (mobile vs. desktop) are all important predictors of daily attention patterns. These findings shed new light on how humans seek information on the Web by focusing on Wikipedia as one of the largest open platforms for knowledge and learning, emphasizing Wikipedia's role as a rich knowledge base that fulfills information needs spread throughout the day, with implications for understanding information seeking across the globe and for designing appropriate information systems.
InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning
Authors: Lintong Zhang, Tejaswi Digumarti, Georgi Tinchev, Maurice Fallon
Abstract
Localization for autonomous robots in prior maps is crucial for their functionality. This paper offers a solution to this problem for indoor environments called InstaLoc, which operates on an individual lidar scan to localize it within a prior map. We draw on inspiration from how humans navigate and position themselves by recognizing the layout of distinctive objects and structures. Mimicking the human approach, InstaLoc identifies and matches object instances in the scene with those from a prior map. As far as we know, this is the first method to use panoptic segmentation directly inferring on 3D lidar scans for indoor localization. InstaLoc operates through two networks based on spatially sparse tensors to directly infer dense 3D lidar point clouds. The first network is a panoptic segmentation network that produces object instances and their semantic classes. The second smaller network produces a descriptor for each object instance. A consensus based matching algorithm then matches the instances to the prior map and estimates a six degrees of freedom (DoF) pose for the input cloud in the prior map. The significance of InstaLoc is that it has two efficient networks. It requires only one to two hours of training on a mobile GPU and runs in real-time at 1 Hz. Our method achieves between two and four times more detections when localizing, as compared to baseline methods, and achieves higher precision on these detections.
Keyword: pruning
Image Matching by Bare Homography
Authors: Fabio Bellavia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents Slime, a novel non-deep image matching framework which models the scene as rough local overlapping planes. This intermediate representation sits in-between the local affine approximation of the keypoint patches and the global matching based on both geometrical and similarity constraints, providing a progressive pruning of the correspondences, as planes are easier to handle with respect to general scenes. Slime proceeds by selectively detect, expand, merge and refine the matches associated to almost-planar areas of the scene by exploiting homography constraints. As a result, both the coverage and stability of correct matches over the scene are amplified, allowing traditional hybrid matching pipelines to make up lost ground against recent end-to-end deep matching methods. In addition, the paper gives a thorough comparative analysis of recent state-of-the-art in image matching represented by end-to-end deep networks and hybrid pipelines. The evaluation considers both planar and non-planar scenes, taking into account critical and challenging scenarios including abrupt temporal image changes and strong variations in relative image rotations. According to this analysis, although the impressive progress done in this field, there is still a wide room for improvements to be investigated in future research.
Influential Billboard Slot Selection using Spatial Clustering and Pruned Submodularity Graph
Abstract
Billboard advertising is a popular out-of-home advertising technique adopted by commercial houses. Companies own billboards and offer them to commercial houses on a payment basis. Given a database of billboards with slot information, we want to determine which k slots to choose to maximize influence. We call this the INFLUENTIAL BILLBOARD SLOT SELECTION (IBSS) Problem and pose it as a combinatorial optimization problem. We show that the influence function considered in this paper is non-negative, monotone, and submodular. The incremental greedy approach based on the marginal gain computation leads to a constant factor approximation guarantee. However, this method scales very poorly when the size of the problem instance is very large. To address this, we propose a spatial partitioning and pruned submodularity graph-based approach that is divided into the following three steps: preprocessing, pruning, and selection. We analyze the proposed solution approaches to understand their time, space requirement, and performance guarantee. We conduct extensive set of experiments with real-world datasets and compare the performance of the proposed solution approaches with the available baseline methods. We observe that the proposed approaches lead to more influence than all the baseline methods within reasonable computational time.
Adaptive Federated Pruning in Hierarchical Wireless Networks
Abstract
Federated Learning (FL) is a promising privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets. Hierarchical FL (HFL), as a device-edge-cloud aggregation hierarchy, can enjoy both the cloud server's access to more datasets and the edge servers' efficient communications with devices. However, the learning latency increases with the HFL network scale due to the increasing number of edge servers and devices with limited local computation capability and communication bandwidth. To address this issue, in this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale. We present the convergence analysis of an upper on the l2 norm of gradients for HFL with model pruning, analyze the computation and communication latency of the proposed model pruning scheme, and formulate an optimization problem to maximize the convergence rate under a given latency threshold by jointly optimizing the pruning ratio and wireless resource allocation. By decoupling the optimization problem and using Karush Kuhn Tucker (KKT) conditions, closed-form solutions of pruning ratio and wireless resource allocation are derived. Simulation results show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
Keyword: voxel
There is no result
Keyword: lidar
Correlation Pyramid Network for 3D Single Object Tracking
Abstract
3D LiDAR-based single object tracking (SOT) has gained increasing attention as it plays a crucial role in 3D applications such as autonomous driving. The central problem is how to learn a target-aware representation from the sparse and incomplete point clouds. In this paper, we propose a novel Correlation Pyramid Network (CorpNet) with a unified encoder and a motion-factorized decoder. Specifically, the encoder introduces multi-level self attentions and cross attentions in its main branch to enrich the template and search region features and realize their fusion and interaction, respectively. Additionally, considering the sparsity characteristics of the point clouds, we design a lateral correlation pyramid structure for the encoder to keep as many points as possible by integrating hierarchical correlated features. The output features of the search region from the encoder can be directly fed into the decoder for predicting target locations without any extra matcher. Moreover, in the decoder of CorpNet, we design a motion-factorized head to explicitly learn the different movement patterns of the up axis and the x-y plane together. Extensive experiments on two commonly-used datasets show our CorpNet achieves state-of-the-art results while running in real-time.
Graph-based Global Robot Simultaneous Localization and Mapping using Architectural Plans
Authors: Muhammad Shaheer, Jose Andres Millan-Romera, Hriday Bavle, Jose Luis Sanchez-Lopez, Javier Civera, Holger Voos
Abstract
In this paper, we propose a solution for graph-based global robot simultaneous localization and mapping (SLAM) using architectural plans. Before the start of the robot operation, the previously available architectural plan of the building is converted into our proposed architectural graph (A-Graph). When the robot starts its operation, it uses its onboard LIDAR and odometry to carry out an online SLAM relying on our situational graph (S-Graph), which includes both, a representation of the environment with multiple levels of abstractions, such as walls or rooms, and their relationships, as well as the robot poses with their associated keyframes. Our novel graph-to-graph matching method is used to relate the aforementioned S-Graph and A-Graph, which are aligned and merged, resulting in our novel informed Situational Graph (iS-Graph). Our iS-Graph not only provides graph-based global robot localization, but it extends the graph-based SLAM capabilities of the S-Graph by incorporating into it the prior knowledge of the environment existing in the architectural plan
InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning
Authors: Lintong Zhang, Tejaswi Digumarti, Georgi Tinchev, Maurice Fallon
Abstract
Localization for autonomous robots in prior maps is crucial for their functionality. This paper offers a solution to this problem for indoor environments called InstaLoc, which operates on an individual lidar scan to localize it within a prior map. We draw on inspiration from how humans navigate and position themselves by recognizing the layout of distinctive objects and structures. Mimicking the human approach, InstaLoc identifies and matches object instances in the scene with those from a prior map. As far as we know, this is the first method to use panoptic segmentation directly inferring on 3D lidar scans for indoor localization. InstaLoc operates through two networks based on spatially sparse tensors to directly infer dense 3D lidar point clouds. The first network is a panoptic segmentation network that produces object instances and their semantic classes. The second smaller network produces a descriptor for each object instance. A consensus based matching algorithm then matches the instances to the prior map and estimates a six degrees of freedom (DoF) pose for the input cloud in the prior map. The significance of InstaLoc is that it has two efficient networks. It requires only one to two hours of training on a mobile GPU and runs in real-time at 1 Hz. Our method achieves between two and four times more detections when localizing, as compared to baseline methods, and achieves higher precision on these detections.
Keyword: diffusion
Spiking Network Initialisation and Firing Rate Collapse
Authors: Nicolas Perez-Nieves, Dan F.M Goodman
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Abstract
In recent years, newly developed methods to train spiking neural networks (SNNs) have rendered them as a plausible alternative to Artificial Neural Networks (ANNs) in terms of accuracy, while at the same time being much more energy efficient at inference and potentially at training time. However, it is still unclear what constitutes a good initialisation for an SNN. We often use initialisation schemes developed for ANN training which are often inadequate and require manual tuning. In this paper, we attempt to tackle this issue by using techniques from the ANN initialisation literature as well as computational neuroscience results. We show that the problem of weight initialisation for ANNs is a more nuanced problem than it is for ANNs due to the spike-and-reset non-linearity of SNNs and the firing rate collapse problem. We firstly identify and propose several solutions to the firing rate collapse problem under different sets of assumptions which successfully solve the issue by leveraging classical random walk and Wiener processes results. Secondly, we devise a general strategy for SNN initialisation which combines variance propagation techniques from ANNs and different methods to obtain the expected firing rate and membrane potential distribution based on diffusion and shot-noise approximations. Altogether, we obtain theoretical results to solve the SNN initialisation which consider the membrane potential distribution in the presence of a threshold. Yet, to what extent can these methods be successfully applied to SNNs on real datasets remains an open question.
Common Diffusion Noise Schedules and Sample Steps are Flawed
Authors: Shanchuan Lin, Bingchen Liu, Jiashi Li, Xiao Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.
Exploration of unknown indoor regions by a swarm of energy-constrained drones
Authors: Ori Rappel, Joseph Z. Ben-Asher, Alfred M. Bruckstein
Abstract
Several distributed algorithms are presented for the exploration of unknown indoor regions by a swarm of flying, energy constrained agents. The agents, which are identical, autonomous, anonymous and oblivious, uniformly cover the region and thus explore it using predefined action rules based on locally sensed information and the energy level of the agents. While flying drones have many advantages in search and rescue scenarios, their main drawback is a high power consumption during flight combined with limited, on-board energy. Furthermore, in these scenarios agent size is severely limited and consequently so are the total weight and capabilities of the agents. The region is modeled as a connected sub-set of a regular grid composed of square cells that the agents enter, over time, via entry points. Some of the agents may settle in unoccupied cells as the exploration progresses. Settled agents conserve energy and become virtual pheromones for the exploration and coverage process, beacons that subsequently aid the remaining, and still exploring, mobile agents. The termination of the coverage process is based on a backward propagating information diffusion scheme. Various algorithmical alternatives are discussed and upper bounds derived and compared to experimental results. Finally, an optimal entry rate that minimizes the total energy consumption is derived for the case of a linear regions.
Denoising Diffusion Models for Plug-and-Play Image Restoration
Authors: Yuanzhi Zhu, Kai Zhang, Jingyun Liang, Jiezhang Cao, Bihan Wen, Radu Timofte, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Plug-and-play Image Restoration (IR) has been widely recognized as a flexible and interpretable method for solving various inverse problems by utilizing any off-the-shelf denoiser as the implicit image prior. However, most existing methods focus on discriminative Gaussian denoisers. Although diffusion models have shown impressive performance for high-quality image synthesis, their potential to serve as a generative denoiser prior to the plug-and-play IR methods remains to be further explored. While several other attempts have been made to adopt diffusion models for image restoration, they either fail to achieve satisfactory results or typically require an unacceptable number of Neural Function Evaluations (NFEs) during inference. This paper proposes DiffPIR, which integrates the traditional plug-and-play method into the diffusion sampling framework. Compared to plug-and-play IR methods that rely on discriminative Gaussian denoisers, DiffPIR is expected to inherit the generative ability of diffusion models. Experimental results on three representative IR tasks, including super-resolution, image deblurring, and inpainting, demonstrate that DiffPIR achieves state-of-the-art performance on both the FFHQ and ImageNet datasets in terms of reconstruction faithfulness and perceptual quality with no more than 100 NFEs. The source code is available at {\url{https://github.com/yuanzhi-zhu/DiffPIR}}
A deep learning method for multi-material diffusion problems based on physics-informed neural networks
Abstract
Given the facts of the extensiveness of multi-material diffusion problems and the inability of the standard PINN(Physics-Informed Neural Networks) method for such problems, in this paper we present a novel PINN method that can accurately solve the multi-material diffusion equation. The new method applies continuity conditions at the material interface derived from the property of the diffusion equation, and combines the distinctive spatial separation strategy and the loss term normalization strategy to solve the problem that the residual points cannot be arranged at the material interface, the problem that it is difficult to express non-smooth functions with a single neural network, and the problem that the neural network is difficult to optimize the loss function with different magnitudes of loss terms, which finally provides the available prediction function for a class of multi-material diffusion problems. Numerical experiments verify the robustness and effectiveness of the new method.
CDDM: Channel Denoising Diffusion Models for Wireless Communications
Abstract
Diffusion models (DM) can gradually learn to remove noise, which have been widely used in artificial intelligence generated content (AIGC) in recent years. The property of DM for removing noise leads us to wonder whether DM can be applied to wireless communications to help the receiver eliminate the channel noise. To address this, we propose channel denoising diffusion models (CDDM) for wireless communications in this paper. CDDM can be applied as a new physical layer module after the channel equalization to learn the distribution of the channel input signal, and then utilizes this learned knowledge to remove the channel noise. We design corresponding training and sampling algorithms for the forward diffusion process and the reverse sampling process of CDDM. Moreover, we apply CDDM to a semantic communications system based on joint source-channel coding (JSCC). Experimental results demonstrate that CDDM can further reduce the mean square error (MSE) after minimum mean square error (MMSE) equalizer, and the joint CDDM and JSCC system achieves better performance than the JSCC system and the traditional JPEG2000 with low-density parity-check (LDPC) code approach.
Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples
Authors: Wan Jiang, Yunfeng Diao, He Wang, Jianxin Sun, Meng Wang, Richang Hong
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Safeguarding data from unauthorized exploitation is vital for privacy and security, especially in recent rampant research in security breach such as adversarial/membership attacks. To this end, \textit{unlearnable examples} (UEs) have been recently proposed as a compelling protection, by adding imperceptible perturbation to data so that models trained on them cannot classify them accurately on original clean distribution. Unfortunately, we find UEs provide a false sense of security, because they cannot stop unauthorized users from utilizing other unprotected data to remove the protection, by turning unlearnable data into learnable again. Motivated by this observation, we formally define a new threat by introducing \textit{learnable unauthorized examples} (LEs) which are UEs with their protection removed. The core of this approach is a novel purification process that projects UEs onto the manifold of LEs. This is realized by a new joint-conditional diffusion model which denoises UEs conditioned on the pixel and perceptual similarity between UEs and LEs. Extensive experiments demonstrate that LE delivers state-of-the-art countering performance against both supervised UEs and unsupervised UEs in various scenarios, which is the first generalizable countermeasure to UEs across supervised learning and unsupervised learning.
AMD Autoregressive Motion Diffusion
Authors: Bo Han, Hao Peng, Minjing Dong, Chang Xu, Yi Ren, Yixuan Shen, Yuheng Li
Abstract
Human motion generation aims to produce plausible human motion sequences according to various conditional inputs, such as text or audio. Despite the feasibility of existing methods in generating motion based on short prompts and simple motion patterns, they encounter difficulties when dealing with long prompts or complex motions. The challenges are two-fold: 1) the scarcity of human motion-captured data for long prompts and complex motions. 2) the high diversity of human motions in the temporal domain and the substantial divergence of distributions from conditional modalities, leading to a many-to-many mapping problem when generating motion with complex and long texts. In this work, we address these gaps by 1) elaborating the first dataset pairing long textual descriptions and 3D complex motions (HumanLong3D), and 2) proposing an autoregressive motion diffusion model (AMD). Specifically, AMD integrates the text prompt at the current timestep with the text prompt and action sequences at the previous timestep as conditional information to predict the current action sequences in an iterative manner. Furthermore, we present its generalization for X-to-Motion with "No Modality Left Behind", enabling for the first time the generation of high-definition and high-fidelity human motions based on user-defined modality input.
Diffusion Dataset Generation: Towards Closing the Sim2Real Gap for Pedestrian Detection
Authors: Andrew Farley, Mohsen Zand, Michael Greenspan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We propose a method that augments a simulated dataset using diffusion models to improve the performance of pedestrian detection in real-world data. The high cost of collecting and annotating data in the real-world has motivated the use of simulation platforms to create training datasets. While simulated data is inexpensive to collect and annotate, it unfortunately does not always closely match the distribution of real-world data, which is known as the sim2real gap. In this paper we propose a novel method of synthetic data creation meant to close the sim2real gap for the challenging pedestrian detection task. Our method uses a diffusion-based architecture to learn a real-world distribution which, once trained, is used to generate datasets. We mix this generated data with simulated data as a form of augmentation and show that training on a combination of generated and simulated data increases average precision by as much as 27.3% for pedestrian detection models in real-world data, compared against training on purely simulated data.
Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model
Authors: Fenghe Tang, Jianrui Ding, Lingtao Wang, Min Xian, Chunping Ning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Medical image segmentation is a critical step in computer-aided diagnosis, and convolutional neural networks are popular segmentation networks nowadays. However, the inherent local operation characteristics make it difficult to focus on the global contextual information of lesions with different positions, shapes, and sizes. Semi-supervised learning can be used to learn from both labeled and unlabeled samples, alleviating the burden of manual labeling. However, obtaining a large number of unlabeled images in medical scenarios remains challenging. To address these issues, we propose a Multi-level Global Context Cross-consistency (MGCC) framework that uses images generated by a Latent Diffusion Model (LDM) as unlabeled images for semi-supervised learning. The framework involves of two stages. In the first stage, a LDM is used to generate synthetic medical images, which reduces the workload of data annotation and addresses privacy concerns associated with collecting medical data. In the second stage, varying levels of global context noise perturbation are added to the input of the auxiliary decoder, and output consistency is maintained between decoders to improve the representation ability. Experiments conducted on open-source breast ultrasound and private thyroid ultrasound datasets demonstrate the effectiveness of our framework in bridging the probability distribution and the semantic representation of the medical image. Our approach enables the effective transfer of probability distribution knowledge to the segmentation network, resulting in improved segmentation accuracy. The code is available at https://github.com/FengheTan9/Multi-Level Global-Context-Cross-Consistency.
Discrete Diffusion Probabilistic Models for Symbolic Music Generation
Authors: Matthias Plasser, Silvan Peter, Gerhard Widmer
Abstract
Denoising Diffusion Probabilistic Models (DDPMs) have made great strides in generating high-quality samples in both discrete and continuous domains. However, Discrete DDPMs (D3PMs) have yet to be applied to the domain of Symbolic Music. This work presents the direct generation of Polyphonic Symbolic Music using D3PMs. Our model exhibits state-of-the-art sample quality, according to current quantitative evaluation metrics, and allows for flexible infilling at the note level. We further show, that our models are accessible to post-hoc classifier guidance, widening the scope of possible applications. However, we also cast a critical view on quantitative evaluation of music sample quality via statistical metrics, and present a simple algorithm that can confound our metrics with completely spurious, non-musical samples.
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Abstract
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained utilizing a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated the superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code will be publicly released.
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
Authors: Samaneh Azadi, Akbar Shah, Thomas Hayes, Devi Parikh, Sonal Gupta
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Text-guided human motion generation has drawn significant interest because of its impactful applications spanning animation and robotics. Recently, application of diffusion models for motion generation has enabled improvements in the quality of generated motions. However, existing approaches are limited by their reliance on relatively small-scale motion capture data, leading to poor performance on more diverse, in-the-wild prompts. In this paper, we introduce Make-An-Animation, a text-conditioned human motion generation model which learns more diverse poses and prompts from large-scale image-text datasets, enabling significant improvement in performance over prior works. Make-An-Animation is trained in two stages. First, we train on a curated large-scale dataset of (text, static pseudo-pose) pairs extracted from image-text datasets. Second, we fine-tune on motion capture data, adding additional layers to model the temporal dimension. Unlike prior diffusion models for motion generation, Make-An-Animation uses a U-Net architecture similar to recent text-to-video generation models. Human evaluation of motion realism and alignment with input text shows that our model reaches state-of-the-art performance on text-to-motion generation.
Keyword: dynamic
a general model of vehicle routing guidance systems based on distributive learning scheme
Authors: Ke Wan, Zuo Zhang, Zhiquan Chen
Subjects: Networking and Internet Architecture (cs.NI); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Dynamic traffic assignment and vehicle route guidance have been important problems in ITS for some time. This paper proposes a new model for VRGS, which takes into consideration of the information propagation, user selection and information reaction. Parameter p is then defined as the updating weight for computing cost of traffic based on a distributive learning scheme. p is calculated through a function which denotes information propagation over time and space and the function needs further optimization. Comparison to static traffic assignment, DTA and feasible strategies are given, and future work is also stated.
Fully analog photonic deep Reservoir Computer based on frequency multiplexing
Abstract
Reservoir computers (RC) are randomized recurrent neural networks well adapted to process time series, performing tasks such as nonlinear distortion compensation or prediction of chaotic dynamics. Deep reservoir computers (deep-RC), in which the output of one reservoir is used as input of the next reservoir, can lead to improved performance because, as in other deep artificial neural networks, the successive layers represent the data in more and more abstract ways. We present a fiber-based photonic implementation of a two-layer deep-RC based on frequency multiplexing. The two RC layers are encoded in two frequency combs propagating in the same experimental setup. The connection between layers is fully analog and does not require any digital processing. We find that the deep-RC outperforms traditional RC by up to two orders of magnitude on two benchmark tasks. This work thus paves the way towards using fully analog photonic neuromorphic computing for complex processing of time series, while avoiding costly analog-to-digital and digital-to-analog conversions.
Event Camera-based Visual Odometry for Dynamic Motion Tracking of a Legged Robot Using Adaptive Time Surface
Authors: Shifan Zhu, Zhipeng Tang, Michael Yang, Erik Learned-Miller, Donghyun Kim
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Our paper proposes a direct sparse visual odometry method that combines event and RGB-D data to estimate the pose of agile-legged robots during dynamic locomotion and acrobatic behaviors. Event cameras offer high temporal resolution and dynamic range, which can eliminate the issue of blurred RGB images during fast movements. This unique strength holds a potential for accurate pose estimation of agile-legged robots, which has been a challenging problem to tackle. Our framework leverages the benefits of both RGB-D and event cameras to achieve robust and accurate pose estimation, even during dynamic maneuvers such as jumping and landing a quadruped robot, the Mini-Cheetah. Our major contributions are threefold: Firstly, we introduce an adaptive time surface (ATS) method that addresses the whiteout and blackout issue in conventional time surfaces by formulating pixel-wise decay rates based on scene complexity and motion speed. Secondly, we develop an effective pixel selection method that directly samples from event data and applies sample filtering through ATS, enabling us to pick pixels on distinct features. Lastly, we propose a nonlinear pose optimization formula that simultaneously performs 3D-2D alignment on both RGB-based and event-based maps and images, allowing the algorithm to fully exploit the benefits of both data streams. We extensively evaluate the performance of our framework on both public datasets and our own quadruped robot dataset, demonstrating its effectiveness in accurately estimating the pose of agile robots during dynamic movements.
Deliberation and Voting in Approval-Based Multi-Winner Elections
Authors: Kanav Mehra, Nanda Kishore Sreenivas, Kate Larson
Abstract
Citizen-focused democratic processes where participants deliberate on alternatives and then vote to make the final decision are increasingly popular today. While the computational social choice literature has extensively investigated voting rules, there is limited work that explicitly looks at the interplay of the deliberative process and voting. In this paper, we build a deliberation model using established models from the opinion-dynamics literature and study the effect of different deliberation mechanisms on voting outcomes achieved when using well-studied voting rules. Our results show that deliberation generally improves welfare and representation guarantees, but the results are sensitive to how the deliberation process is organized. We also show, experimentally, that simple voting rules, such as approval voting, perform as well as more sophisticated rules such as proportional approval voting or method of equal shares if deliberation is properly supported. This has ramifications on the practical use of such voting rules in citizen-focused democratic processes.
Exponential Integrators for Phase-Field Equations using Pseudo-spectral Methods: A Python Implementation
Authors: Elvis do A. Soares, Amaro G. Barreto Jr., Frederico W. Tavares
Subjects: Numerical Analysis (math.NA); Soft Condensed Matter (cond-mat.soft); Pattern Formation and Solitons (nlin.PS); Chemical Physics (physics.chem-ph)
Abstract
In this paper, we implement exponential integrators, specifically Integrating Factor (IF) and Exponential Time Differencing (ETD) methods, using pseudo-spectral techniques to solve phase-field equations within a Python framework. These exponential integrators have showcased robust performance and accuracy when addressing stiff nonlinear partial differential equations. We compare these integrators to the well-known implicit-explicit (IMEX) Euler integrators used in phase-field modeling. The synergy between pseudo-spectral techniques and exponential integrators yields significant benefits for modeling intricate systems governed by phase-field dynamics, such as solidification processes and pattern formation. Our comprehensive Python implementation illustrates the effectiveness of this combined approach in solving phase-field model equations. The results obtained from this implementation highlight the accuracy and computational advantages of the ETD method compared to other numerical techniques.
Equilibria of Fully Decentralized Learning in Networked Systems
Authors: Yan Jiang, Wenqi Cui, Baosen Zhang, Jorge Cortés
Abstract
Existing settings of decentralized learning either require players to have full information or the system to have certain special structure that may be hard to check and hinder their applicability to practical systems. To overcome this, we identify a structure that is simple to check for linear dynamical system, where each player learns in a fully decentralized fashion to minimize its cost. We first establish the existence of pure strategy Nash equilibria in the resulting noncooperative game. We then conjecture that the Nash equilibrium is unique provided that the system satisfies an additional requirement on its structure. We also introduce a decentralized mechanism based on projected gradient descent to have agents learn the Nash equilibrium. Simulations on a $5$-player game validate our results.
Physics-enhanced Gaussian Process Variational Autoencoder
Authors: Thomas Beckers, Qirui Wu, George J. Pappas
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Variational autoencoders allow to learn a lower-dimensional latent space based on high-dimensional input/output data. Using video clips as input data, the encoder may be used to describe the movement of an object in the video without ground truth data (unsupervised learning). Even though the object's dynamics is typically based on first principles, this prior knowledge is mostly ignored in the existing literature. Thus, we propose a physics-enhanced variational autoencoder that places a physical-enhanced Gaussian process prior on the latent dynamics to improve the efficiency of the variational autoencoder and to allow physically correct predictions. The physical prior knowledge expressed as linear dynamical system is here reflected by the Green's function and included in the kernel function of the Gaussian process. The benefits of the proposed approach are highlighted in a simulation with an oscillating particle.
Convex Geometric Trajectory Tracking using Lie Algebraic MPC for Autonomous Marine Vehicles
Abstract
Controlling marine vehicles in challenging environments is a complex task due to the presence of nonlinear hydrodynamics and uncertain external disturbances. Despite nonlinear model predictive control (MPC) showing potential in addressing these issues, its practical implementation is often constrained by computational limitations. In this paper, we propose an efficient controller for trajectory tracking of marine vehicles by employing a convex error-state MPC on the Lie group. By leveraging the inherent geometric properties of the Lie group, we can construct globally valid error dynamics and formulate a quadratic programming-based optimization problem. Our proposed MPC demonstrates effectiveness in trajectory tracking through extensive-numerical simulations, including scenarios involving ocean currents. Notably, our method substantially reduces computation time compared to nonlinear MPC, making it well-suited for real-time control applications with long prediction horizons or involving small marine vehicles.
Gaussian Process Port-Hamiltonian Systems: Bayesian Learning with Physics Prior
Authors: Thomas Beckers, Jacob Seidman, Paris Perdikaris, George J. Pappas
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Abstract
Data-driven approaches achieve remarkable results for the modeling of complex dynamics based on collected data. However, these models often neglect basic physical principles which determine the behavior of any real-world system. This omission is unfavorable in two ways: The models are not as data-efficient as they could be by incorporating physical prior knowledge, and the model itself might not be physically correct. We propose Gaussian Process Port-Hamiltonian systems (GP-PHS) as a physics-informed Bayesian learning approach with uncertainty quantification. The Bayesian nature of GP-PHS uses collected data to form a distribution over all possible Hamiltonians instead of a single point estimate. Due to the underlying physics model, a GP-PHS generates passive systems with respect to designated inputs and outputs. Further, the proposed approach preserves the compositional nature of Port-Hamiltonian systems.
Phase Field Modeling and Numerical Algorithm for Two-Phase Dielectric Fluid Flows
Authors: Jielin Yang, Ivan C. Christov, Suchuan Dong
Abstract
We develop a method for modeling and simulating a class of two-phase flows consisting of two immiscible incompressible dielectric fluids and their interactions with imposed external electric fields in two and three dimensions. We first present a thermodynamically-consistent and reduction-consistent phase field model for two-phase dielectric fluids. The model honors the conservation laws and thermodynamic principles, and has the property that, if only one fluid component is present in the system, the two-phase formulation will exactly reduce to that of the corresponding single-phase system. In particular, this model accommodates an equilibrium solution that is compatible with the zero-velocity requirement based on physics. This property provides a simpler method for simulating the equilibrium state of two-phase dielectric systems. We further present an efficient numerical algorithm, together with a spectral-element (for two dimensions) or a hybrid Fourier-spectral/spectral-element (for three dimensions) discretization in space, for simulating this class of problems. This algorithm computes different dynamic variables successively in an un-coupled fashion, and involves only coefficient matrices that are time-independent in the resultant linear algebraic systems upon discretization, even when the physical properties (e.g. permittivity, density, viscosity) of the two dielectric fluids are different. This property is crucial and enables us to employ fast Fourier transforms for three-dimensional problems. Ample numerical simulations of two-phase dielectric flows under imposed voltage are presented to demonstrate the performance of the method herein and to compare the simulation results with theoretical models and experimental data.
Algorithmic Censoring in Dynamic Learning Systems
Authors: Jennifer Chien, Margaret Roberts, Berk Ustun
Abstract
Dynamic learning systems subject to selective labeling exhibit censoring, i.e. persistent negative predictions assigned to one or more subgroups of points. In applications like consumer finance, this results in groups of applicants that are persistently denied and thus never enter into the training data. In this work, we formalize censoring, demonstrate how it can arise, and highlight difficulties in detection. We consider safeguards against censoring - recourse and randomized-exploration - both of which ensure we collect labels for points that would otherwise go unobserved. The resulting techniques allow examples from censored groups to enter into the training data and correct the model. Our results highlight the otherwise unmeasured harms of censoring and demonstrate the effectiveness of mitigation strategies across a range of data generating processes.
Physics-informed Convolutional Recurrent Surrogate Model for Reservoir Simulation with Well Controls
Authors: Jungang Chen, Eduardo Gildin, John E. Killough (Texas A&M University)
Abstract
This paper presents a novel surrogate model for modeling subsurface fluid flow with well controls using a physics-informed convolutional recurrent neural network (PICRNN). The model uses a convolutional long-short term memory (ConvLSTM) to capture the spatiotemporal dependencies of the state evolution dynamics in the porous flow. The ConvLSTM is linked to the state space equations, enabling the incorporation of a discrete-time sequence of well control. The model requires initial state condition and a sequence of well controls as inputs, and predicts the state variables of the system, such as pressure, as output. By minimizing the residuals of reservoir flow state-space equations, the network is trained without the need for labeled data. The model is designed to serve as a surrogate model for predicting future reservoir states based on the initial reservoir state and input engineering controls. Boundary conditions are enforced into the state-space equations so no additional loss term is needed. Three numerical cases are studied, demonstrating the model's effectiveness in predicting reservoir dynamics based on future well/system controls. The proposed model provides a new approach for efficient and accurate prediction of subsurface fluid flow, with potential applications in optimal control design for reservoir engineering.
Self-Supervised Pretraining on Paired Sequences of fMRI Data for Transfer Learning to Brain Decoding Tasks
Authors: Sean Paulsen, Michael Casey
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Abstract
In this work we introduce a self-supervised pretraining framework for transformers on functional Magnetic Resonance Imaging (fMRI) data. First, we pretrain our architecture on two self-supervised tasks simultaneously to teach the model a general understanding of the temporal and spatial dynamics of human auditory cortex during music listening. Our pretraining results are the first to suggest a synergistic effect of multitask training on fMRI data. Second, we finetune the pretrained models and train additional fresh models on a supervised fMRI classification task. We observe significantly improved accuracy on held-out runs with the finetuned models, which demonstrates the ability of our pretraining tasks to facilitate transfer learning. This work contributes to the growing body of literature on transformer architectures for pretraining and transfer learning with fMRI data, and serves as a proof of concept for our pretraining tasks and multitask pretraining on fMRI data.
Learning Linear Embeddings for Non-Linear Network Dynamics with Koopman Message Passing
Authors: King Fai Yeh, Paris Flood, William Redman, Pietro Liò
Abstract
Recently, Koopman operator theory has become a powerful tool for developing linear representations of non-linear dynamical systems. However, existing data-driven applications of Koopman operator theory, including both traditional and deep learning approaches, perform poorly on non-linear network dynamics problems as they do not address the underlying geometric structure. In this paper we present a novel approach based on Koopman operator theory and message passing networks that finds a linear representation for the dynamical system which is globally valid at any time step. The linearisations found by our method produce predictions on a suite of network dynamics problems that are several orders of magnitude better than current state-of-the-art techniques. We also apply our approach to the highly non-linear training dynamics of neural network architectures, and obtain linear representations which can generate network parameters with comparable performance to networks trained by classical optimisers.
Smooth Robustness Measures for Symbolic Control Via Signal Temporal Logic
Authors: Shirantha Welikala, Hai Lin, Panos J. Antsaklis
Abstract
Symbolic control problems aim to synthesize control policies for dynamical systems under complex temporal specifications. For such problems, Signal Temporal Logic (STL) is increasingly used as the formal specification language due to its rich expressiveness. Moreover, the degree of satisfaction of STL specifications can be evaluated using STL robust semantics'' as a scalar robustness measure. This capability of STL enables transforming a symbolic control problem into an optimization problem that optimizes the corresponding robustness measure. However, since these robustness measures are non-smooth and non-convex, exact solutions can only be computed using computationally inefficient mixed-integer programming techniques that do not scale well. Therefore, recent literature has focused on using smooth approximations of these robustness measures to apply scalable and computationally efficient gradient-based methods to find local optima solutions. In this paper, we first generalize two recently established smooth robustness measures (SRMs) and two new ones and discuss their strengths and weaknesses. Next, we proposeSTL error semantics'' to characterize the approximation errors associated with different SRMs under different parameter configurations. This allows one to sensibly select an SRM (to optimize) along with its parameter values. We then propose ``STL gradient semantics'' to derive explicit gradients of SRMs leading to improve computational efficiency as well as accuracy compared to when using numerically estimated gradients. Finally, these contributions are highlighted using extensive simulation results.
Graph Reinforcement Learning for Network Control via Bi-Level Optimization
Authors: Daniele Gammelli, James Harrison, Kaidi Yang, Marco Pavone, Filipe Rodrigues, Francisco C. Pereira
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Optimization problems over dynamic networks have been extensively studied and widely used in the past decades to formulate numerous real-world problems. However, (1) traditional optimization-based approaches do not scale to large networks, and (2) the design of good heuristics or approximation algorithms often requires significant manual trial-and-error. In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality. To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems. Instead of naively computing actions over high-dimensional graph elements, e.g., edges, we propose a bi-level formulation where we (1) specify a desired next state via RL, and (2) solve a convex program to best achieve it, leading to drastically improved scalability and performance. We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework.
Constructing Feedback Linearizable Discretizations for Continuous-Time Systems using Retraction Maps
Authors: Ashutosh Jindal, Ravi Banavar, David Martin Diego
Abstract
Control laws for continuous-time dynamical systems are most often implemented via digital controllers using a sample-and-hold technique. Numerical discretization of the continuous system is an integral part of subsequent analysis. Feedback linearizability of such sampled systems is dependent upon the choice of discretization map or technique. In this article, for feedback linearizable continuous-time systems, we utilize the idea of retraction maps to construct discretizations that are feedback linearizable as well. We also propose a method to functionally compose discretizations to obtain higher-order integrators that are feedback linearizable.
Modelling Human Visual Motion Processing with Trainable Motion Energy Sensing and a Self-attention Network for Adaptive Motion Integration
Abstract
Visual motion processing is essential for organisms to perceive and interact with dynamic environments. Despite extensive research in cognitive neuroscience, image-computable models that can extract informative motion flow from natural scenes in a manner consistent with human visual processing have yet to be established. Meanwhile, recent advancements in computer vision (CV), propelled by deep learning, have led to significant progress in optical flow estimation, a task closely related to motion perception. Here we propose an image-computable model of human motion perception by bridging the gap between human and CV models. Specifically, we introduce a novel two-stage approach that combines trainable motion energy sensing with a recurrent self-attention network for adaptive motion integration and segregation. This model architecture aims to capture the computations in V1-MT, the core structure for motion perception in the biological visual system. In silico neurophysiology reveals that our model's unit responses are similar to mammalian neural recordings regarding motion pooling and speed tuning. The proposed model can also replicate human responses to a range of stimuli examined in past psychophysical studies. The experimental results on the Sintel benchmark demonstrate that our model predicts human responses better than the ground truth, whereas the CV models show the opposite. Further partial correlation analysis indicates our model outperforms several state-of-the-art CV models in explaining the human responses that deviate from the ground truth. Our study provides a computational architecture consistent with human visual motion processing, although the physiological correspondence may not be exact.
Fusion-Based Multi-User Semantic Communications for Wireless Image Transmission over Degraded Broadcast Channels
Abstract
Degraded broadcast channels (DBC) are a typical multi-user communications scenario. There exist classic transmission methods, such as superposition coding with successive interference cancellation, to achieve the DBC capacity region. However, semantic communications method over DBC remains lack of in-depth research. To address this, we design a fusion-based multi-user semantic communications system for wireless image transmission over DBC in this paper. The proposed architecture supports a transmitter extracting semantic features for two users separately, and learns to dynamically fuse these semantic features into a joint latent representation for broadcasting. The key here is to design a flexible image semantic fusion (FISF) module to fuse the semantic features of two users, and to use a multi-layer perceptron (MLP) based neural network to adjust the weights of different user semantic features for flexible adaptability to different users channels. Experiments present the semantic performance region based on the peak signal-to-noise ratio (PSNR) of both users, and show that the proposed system dominates the traditional methods.
Static Pricing Guarantees for Queueing Systems
Authors: Jacob Bergquist, Adam N. Elmachtoub
Subjects: Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT)
Abstract
We consider a general queueing model with price-sensitive customers in which the service provider seeks to balance two objectives, maximizing the average revenue rate and minimizing the average queue length. Customers arrive according to a Poisson process, observe an offered price, and decide to join the queue if their valuation exceeds the price. The queue is operated first-in first-out, and the service times are exponential. Our model represents applications in areas like make-to-order manufacturing, cloud computing, and food delivery. The optimal solution for our model is dynamic; the price changes as the state of the system changes. However, such dynamic pricing policies may be undesirable for a variety of reasons. In this work, we provide performance guarantees for a simple and natural class of static pricing policies which charge a fixed price up to a certain occupancy threshold and then allow no more customers into the system. We provide a series of results showing that such static policies can simultaneously guarantee a constant fraction of the optimal revenue with at most a constant factor increase in expected queue length. For instance, our policy for the M/M/1 setting allows bi-criteria approximations of $(0.5, 1), (0.66, 1.16), (0.75, 1.54)$ and $(0.8, 2)$ for the revenue and queue length, respectively. We also provide guarantees for settings with multiple customer classes and multiple servers, as well as the expected sojourn time objective.
Cooperative Aerial Transportation of Nonuniform Load through Quadrotors by Elastic and Flexible Cables
Authors: Ali Akbar Rezaei Lori, Mohammad Danesh, Iman Izadi
Abstract
In this paper, first the full dynamics of aerial transportation of a rigid body with arbitrary number of quadrotors is derived. Then a control strategy is proposed to convey the nonuniform rigid body appropriately to the desired trajectory. In the dynamical model of this transportation system, not only the load is considered as a nonuniform and non-homogeneous rigid body but also mass, flexibility, and tension of the cables are considered. Each cable is modeled as successive masses, springs, and dampers where each mass, spring, and damper has 4 degrees of freedom (DOF). The Euler-Lagrange equations are used to derive the motion equation. The control strategy includes three loops of attitude control, formation control, and navigation control. The sliding mode control is designed based on multi-agent systems for the formation control where the controller is proven to be asymptotically stable. The navigation control loop, based on the load states, guarantees that the load reaches the desired location. Finally, numerical examples and simulations are presented to verify the appropriate operation of the proposed system for transporting both homogeneous and non-homogeneous bodies by spreading quadrotors according to mass distribution of the body.
Ortho-ODE: Enhancing Robustness and of Neural ODEs against Adversarial Attacks
Abstract
Neural Ordinary Differential Equations (NODEs) probed the usage of numerical solvers to solve the differential equation characterized by a Neural Network (NN), therefore initiating a new paradigm of deep learning models with infinite depth. NODEs were designed to tackle the irregular time series problem. However, NODEs have demonstrated robustness against various noises and adversarial attacks. This paper is about the natural robustness of NODEs and examines the cause behind such surprising behaviour. We show that by controlling the Lipschitz constant of the ODE dynamics the robustness can be significantly improved. We derive our approach from Grownwall's inequality. Further, we draw parallels between contractivity theory and Grownwall's inequality. Experimentally we corroborate the enhanced robustness on numerous datasets - MNIST, CIFAR-10, and CIFAR 100. We also present the impact of adaptive and non-adaptive solvers on the robustness of NODEs.
Information Energy Ratio of XOR Logic Gate at Mesoscopic Scale
Authors: Xiaohu Ge, Muyao Ruan, Xiaoxuan Peng, Yong Xiao, Yang Yang
Abstract
As the size of transistors approaches the mesoscopic scale, existing energy consumption analysis methods exhibit various limits, especially when being applied to describe the non-equilibrium information processing of transistors at ultra-low voltages. The stochastic thermodynamics offers a theoretic tool to analyze the energy consumption of transistor during the non-equilibrium information processing. Based on this theory, an information energy ratio of XOR gate composed of single-electron transistors is proposed at the mesoscopic scale, which can be used to quantify the exchange between the information and energy at XOR gates. Furthermore, the energy efficiency of the parity check circuit is proposed to analyze the energy consumption of digital signal processing systems. Compared with the energy efficiency of parity check circuit adopting the 7 nm semiconductor process supply voltage, simulation results show that the energy efficiency of the parity check circuit is improved by 266% when the supply voltage is chosen at a specified value.
Capacitor Voltage Synchronizing Control of 100% Full-Scale Wind Power Generator-Supplied Power Systems
Abstract
This paper proposes a capacitor voltage synchronizing control (CVSC) system for the regulation of full-scale wind power generator-supplied power systems (FWPS). The capacitor combined with the inverter of a full-scale wind power generator (WPG) is controlled with a CVSC system to mimic the rotor dynamics of a synchronous generator (SG). WPGs are enabled to offer inertial response and primary regulation. The generation and load unbalance of a FWPS is reflected by the deviation of capacitor voltages of WPGs. Small-signal analysis was carried out to investigate the oscillatory modes of a FWPS with the CVSC system. Time-domain simulation studies were undertaken on the FWPS, and the performance of the CVSC system was studied in the cases where a load increase and a three-phase-to-ground fault occurred on the FWPS, respectively.
Machine learning enhanced real-time aerodynamic forces prediction based on sparse pressure sensor inputs
Authors: Junming Duan, Qian Wang, Jan S. Hesthaven
Abstract
Accurate prediction of aerodynamic forces in real-time is crucial for autonomous navigation of unmanned aerial vehicles (UAVs). This paper presents a data-driven aerodynamic force prediction model based on a small number of pressure sensors located on the surface of UAV. The model is built on a linear term that can make a reasonably accurate prediction and a nonlinear correction for accuracy improvement. The linear term is based on a reduced basis reconstruction of the surface pressure distribution, where the basis is extracted from numerical simulation data and the basis coefficients are determined by solving linear pressure reconstruction equations at a set of sensor locations. Sensor placement is optimized using the discrete empirical interpolation method (DEIM). Aerodynamic forces are computed by integrating the reconstructed surface pressure distribution. The nonlinear term is an artificial neural network (NN) that is trained to bridge the gap between the ground truth and the DEIM prediction, especially in the scenario where the DEIM model is constructed from simulation data with limited fidelity. A large network is not necessary for accurate correction as the linear model already captures the main dynamics of the surface pressure field, thus yielding an efficient DEIM+NN aerodynamic force prediction model. The model is tested on numerical and experimental dynamic stall data of a 2D NACA0015 airfoil, and numerical simulation data of dynamic stall of a 3D drone. Numerical results demonstrate that the machine learning enhanced model can make fast and accurate predictions of aerodynamic forces using only a few pressure sensors, even for the NACA0015 case in which the simulations do not agree well with the wind tunnel experiments. Furthermore, the model is robust to noise.
A Multi-Client Searchable Encryption Scheme for IoT Environment
Authors: Nazatul H. Sultan, Shabnam Kasra-Kermanshahi, Yen Tran, Shangqi Lai, Vijay Varadharajan, Surya Nepal, Xun Yi
Abstract
The proliferation of connected devices through Internet connectivity presents both opportunities for smart applications and risks to security and privacy. It is vital to proactively address these concerns to fully leverage the potential of the Internet of Things. IoT services where one data owner serves multiple clients, like smart city transportation, smart building management and healthcare can offer benefits but also bring cybersecurity and data privacy risks. For example, in healthcare, a hospital may collect data from medical devices and make it available to multiple clients such as researchers and pharmaceutical companies. This data can be used to improve medical treatments and research but if not protected, it can also put patients' personal information at risk. To ensure the benefits of these services, it is important to implement proper security and privacy measures. In this paper, we propose a symmetric searchable encryption scheme with dynamic updates on a database that has a single owner and multiple clients for IoT environments. Our proposed scheme supports both forward and backward privacy. Additionally, our scheme supports a decentralized storage environment in which data owners can outsource data across multiple servers or even across multiple service providers to improve security and privacy. Further, it takes a minimum amount of effort and costs to revoke a client's access to our system at any time. The performance and formal security analyses of the proposed scheme show that our scheme provides better functionality, and security and is more efficient in terms of computation and storage than the closely related works.
A Dynamically Similar Lab-Scale District Heating Network via Dimensional Analysis
Abstract
Strict user demands and large variability in external disturbances, along with limited richness in the data collected on the daily operating conditions of district heating networks makes the design and testing of novel energy-reducing control algorithms for district heating networks challenging. This paper presents the development of a dynamically similar lab-scale district heating network that can be used as a test bench for such control algorithms. This test bench is developed using the Buckingham pi theorem to the match the lab-scale components to the full-scale. By retaining the relative thermodynamics and fluid dynamics of a full-scale network in the lab-scale system, the experimental setup allows for repeatability of the experiments being performed and flexibility in the testing conditions. Moreover, the down-scaling of the experiment is leveraged to accelerate testing, allowing for the recreation of operating periods of weeks and months in hours and days. A PID controller is implemented on the lab-scale test bench to validate its response against literature data. Results show 63% efficiency during heating operations compared to 70% efficiency for a similar full-scale system, with comparable pressure losses across the system.
Numerical solution of Poisson partial differential equations in high dimension using deep neural networks
Authors: Mathias Dus, Virginie Ehrlacher
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP); Optimization and Control (math.OC)
Abstract
The aim of this article is to analyze numerical schemes using two-layer neural networks with infinite width for the resolution of the high-dimensional Poisson-Neumann partial differential equations (PDEs) with Neumann boundary conditions. Using Barron's representation of the solution with a measure of probability, the energy is minimized thanks to a gradient curve dynamic on the $2$ Wasserstein space of parameters defining the neural network. Inspired by the work from Bach and Chizat, we prove that if the gradient curve converges, then the represented function is the solution of the elliptic equation considered. Numerical experiments are given to show the potential of the method.
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing
Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang
Abstract
Mobile apps are indispensable for people's daily life, and automated GUI (Graphical User Interface) testing is widely used for app quality assurance. There is a growing interest in using learning-based techniques for automated GUI testing which aims at generating human-like actions and interactions. However, the limitations such as low testing coverage, weak generalization, and heavy reliance on training data, make an urgent need for a more effective approach to generate human-like actions to thoroughly test mobile apps. Inspired by the success of the Large Language Model (LLM), e.g., GPT-3 and ChatGPT, in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process, design prompts for inputting this information to LLM, and develop a neural matching network to decode the LLM's output into actionable steps to execute the app. We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline. GPTDroid also detects 48 new bugs on the Google Play with 25 of them being confirmed/fixed. We further summarize the capabilities of GPTDroid behind the superior performance, including semantic text input, compound action, long meaningful test trace, and test case prioritization.
STLCCP: An Efficient Convex Optimization-based Framework for Signal Temporal Logic Specifications
Abstract
Signal Temporal Logic (STL) is capable of expressing a broad range of temporal properties that controlled dynamical systems must satisfy. In the literature, both mixed-integer programming (MIP) and nonlinear programming (NLP) methods have been applied to solve optimal control problems with STL specifications. However, neither approach has succeeded in solving problems with complex long-horizon STL specifications within a realistic timeframe. This study proposes a new optimization framework, called \textit{STLCCP}, which explicitly incorporates several structures of STL to mitigate this issue. The core of our framework is a structure-aware decomposition of STL formulas, which converts the original program into a difference of convex (DC) programs. This program is then solved as a convex quadratic program sequentially, based on the convex-concave procedure (CCP). Our numerical experiments on several commonly used benchmarks demonstrate that this framework can effectively handle complex scenarios over long horizons, which have been challenging to address even using state-of-the-art optimization methods.
Integrated Planning and Control of Robotic Surgical Instruments for Tasks Autonomy
Authors: Fangxun Zhong, Yun-Hui Liu
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Agile maneuvers are essential for robot-enabled complex tasks such as surgical procedures. Prior explorations on surgery autonomy are limited to feasibility study of completing a single task without systematically addressing generic manipulation safety across different tasks. We present an integrated planning and control framework for 6-DoF robotic instruments for pipeline automation of surgical tasks.We leverage the geometry of a robotic instrument and propose the nodal state space (NSS) to represent the robot state in SE(3) space. Each elementary robot motion could be encoded by regulation of the state parameters via a dynamical system. This theoretically ensures that every in-process trajectory is globally feasible and stably reached to an admissible target, and the controller is of closed-form without computing 6-DoF inverse kinematics. Then, to plan the motion steps reliably, we propose an interactive (instant) goal state of the robot that transforms manipulation planning through desired path constraints into a goal-varying manipulation (GVM) problem. We detail how GVM could adaptively and smoothly plan the procedure (could proceed or rewind the process as needed) based on on-the-fly situations under dynamic or disturbed environment. Finally, we extend the above policy to characterize complete pipelines of various surgical tasks. Simulations show that our framework could smoothly solve twisted maneuvers while avoiding collisions. Physical experiments using the da Vinci Research Kit (dVRK) validates the capability of automating individual tasks including tissue debridement, dissection, and wound suturing. The results confirm good task-level consistency and reliability compared to state-of-the-art automation algorithms.
Your Identity is Your Behavior -- Continuous User Authentication based on Machine Learning and Touch Dynamics
Authors: Brendan Pelto, Mounika Vanamala, Rushit Dave
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
The aim of this research paper is to look into the use of continuous authentication with mobile touch dynamics, using three different algorithms: Neural Network, Extreme Gradient Boosting, and Support Vector Machine. Mobile devices are constantly increasing in popularity in the world, today smartphone subscriptions have surpassed 6 billion. Mobile touch dynamics refer to the distinct patterns of how a user interacts with their mobile device, this includes factors such as touch pressure, swipe speed, and touch duration. Continuous authentication refers to the process of continuously verifying a user's identity while they are using a device, rather than just at the initial login. This research used a dataset of touch dynamics collected from 40 subjects using the LG V30+. The participants played four mobile games, PUBG, Diep.io, Slither, and Minecraft, for 10 minutes each game. The three algorithms were trained and tested on the extracted dataset, and their performance was evaluated based on metrics such as accuracy, precision, false negative rate, and false positive rate. The results of the research showed that all three algorithms were able to effectively classify users based on their individual touch dynamics, with accuracy ranging from 80% to 95%. The Neural Network algorithm performed the best, achieving the highest accuracy and precision scores, followed closely by XGBoost and SVC. The data shows that continuous authentication using mobile touch dynamics has the potential to be a useful method for enhancing security and reducing the risk of unauthorized access to personal devices. This research also notes the importance of choosing the correct algorithm for a given dataset and use case, as different algorithms may have varying levels of performance depending on the specific task.
Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation
Authors: Juan Fumero, György Rethy, Athanasios Stratikopoulos, Nikos Foutris, Christos Kotselidis
Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
This paper presents the Beehive SPIR-V Toolkit; a framework that can automatically generate a Java composable and functional library for dynamically building SPIR-V binary modules. The Beehive SPIR-V Toolkit can be used by optimizing compilers and runtime systems to generate and validate SPIR-V binary modules from managed runtime systems, such as the Java Virtual Machine (JVM). Furthermore, our framework is architected to accommodate new SPIR-V releases in an easy-to-maintain manner, and it facilitates the automatic generation of Java libraries for other standards, besides SPIR-V. The Beehive SPIR-V Toolkit also includes an assembler that emits SPIR-V binary modules from disassembled SPIR-V text files, and a disassembler that converts the SPIR-V binary code into a text file, and a console client application. To the best of our knowledge, the Beehive SPIR-V Toolkit is the first Java programming framework that can dynamically generate SPIR-V binary modules. To demonstrate the use of our framework, we showcase the integration of the SPIR-V Beehive Toolkit in the context of the TornadoVM, a Java framework for automatically offloading and running Java programs on heterogeneous hardware. We show that, via the SPIR-V Beehive Toolkit, the TornadoVM is able to compile code 3x faster than its existing OpenCL C JIT compiler, and it performs up to 1.52x faster than the existing OpenCL C backend in TornadoVM.
Curious Rhythms: Temporal Regularities of Wikipedia Consumption
Authors: Tiziano Piccardi, Martin Gerlach, Robert West
Subjects: Computers and Society (cs.CY); Digital Libraries (cs.DL)
Abstract
Wikipedia, in its role as the world's largest encyclopedia, serves a broad range of information needs. Although previous studies have noted that Wikipedia users' information needs vary throughout the day, there is to date no large-scale, quantitative study of the underlying dynamics. The present paper fills this gap by investigating temporal regularities in daily consumption patterns in a large-scale analysis of billions of timezone-corrected page requests mined from English Wikipedia's server logs, with the goal of investigating how context and time relate to the kind of information consumed. First, we show that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities. Then, we characterize the prototypical shapes of consumption patterns, finding a particularly strong distinction between articles preferred during the evening/night and articles preferred during working hours. Finally, we investigate topical and contextual correlates of Wikipedia articles' access rhythms, finding that article topic, reader country, and access device (mobile vs. desktop) are all important predictors of daily attention patterns. These findings shed new light on how humans seek information on the Web by focusing on Wikipedia as one of the largest open platforms for knowledge and learning, emphasizing Wikipedia's role as a rich knowledge base that fulfills information needs spread throughout the day, with implications for understanding information seeking across the globe and for designing appropriate information systems.
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Abstract
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained utilizing a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated the superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code will be publicly released.
Stable Dinner Party Seating Arrangements
Authors: Damien Berriaud, Andrei Constantinescu, Roger Wattenhofer
Subjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC)
Abstract
A group of $n$ agents with numerical preferences for each other are to be assigned to the $n$ seats of a dining table. We study two natural topologies: circular (cycle) tables and panel (path) tables. For a given seating arrangement, an agent's utility is the sum of its preference values towards its (at most two) direct neighbors. An arrangement is envy-free if no agent strictly prefers someone else's seat, and it is stable if no two agents strictly prefer each other's seats. We show that it is NP-complete to decide whether an envy-free arrangement exists for both paths and cycles, even with binary preferences. In contrast, under the assumption that agents come from a bounded number of classes, for both topologies, we present polynomial-time algorithms computing envy-free and stable arrangements, working even for general preferences. Proving the hardness of computing stable arrangements seems more difficult, as even constructing unstable instances can be challenging. To this end, we propose a characterization of the existence of stable arrangements based on the number of distinct values in the preference matrix and the number of classes of agents. For two classes of agents, we show that stability can always be ensured, both for paths and cycles. For cycles, we moreover show that binary preferences with four classes of agents, as well as three-valued preferences with three classes of agents, are sufficient to prevent the existence of a stable arrangement. For paths, the latter still holds, while we argue that a path-stable arrangement always exists in the binary case under the additional constraint that agents can only swap seats when sitting at most two positions away. We moreover consider the swap dynamics and exhibit instances where they do not converge, despite a stable arrangement existing.
Walking the Walk of AI Ethics: Organizational Challenges and the Individualization of Risk among Ethics Entrepreneurs
Authors: Sanna J. Ali, Angèle Christin, Andrew Smart, Riitta Katila
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Abstract
Amidst decline in public trust in technology, computing ethics have taken center stage, and critics have raised questions about corporate ethics washing. Yet few studies examine the actual implementation of AI ethics values in technology companies. Based on a qualitative analysis of technology workers tasked with integrating AI ethics into product development, we find that workers experience an environment where policies, practices, and outcomes are decoupled. We analyze AI ethics workers as ethics entrepreneurs who work to institutionalize new ethics-related practices within organizations. We show that ethics entrepreneurs face three major barriers to their work. First, they struggle to have ethics prioritized in an environment centered around software product launches. Second, ethics are difficult to quantify in a context where company goals are incentivized by metrics. Third, the frequent reorganization of teams makes it difficult to access knowledge and maintain relationships central to their work. Consequently, individuals take on great personal risk when raising ethics issues, especially when they come from marginalized backgrounds. These findings shed light on complex dynamics of institutional change at technology companies.
Inductive Graph Neural Networks for Moving Object Segmentation
Authors: Wieke Prummel, Jhony H. Giraldo, Anastasia Zakharova, Thierry Bouwmans
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Moving Object Segmentation (MOS) is a challenging problem in computer vision, particularly in scenarios with dynamic backgrounds, abrupt lighting changes, shadows, camouflage, and moving cameras. While graph-based methods have shown promising results in MOS, they have mainly relied on transductive learning which assumes access to the entire training and testing data for evaluation. However, this assumption is not realistic in real-world applications where the system needs to handle new data during deployment. In this paper, we propose a novel Graph Inductive Moving Object Segmentation (GraphIMOS) algorithm based on a Graph Neural Network (GNN) architecture. Our approach builds a generic model capable of performing prediction on newly added data frames using the already trained model. GraphIMOS outperforms previous inductive learning methods and is more generic than previous transductive techniques. Our proposed algorithm enables the deployment of graph-based MOS models in real-world applications.
The Power of Learned Locally Linear Models for Nonlinear Policy Optimization
Authors: Daniel Pfrommer, Max Simchowitz, Tyler Westenbroek, Nikolai Matni, Stephen Tu
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm - e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems. We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $\mathtt{iLQR}$-like policy updates. We demonstrate that this algorithm attains sample complexity polynomial in relevant problem parameters, and, by synthesizing locally stabilizing gains, overcomes exponential dependence in problem horizon. Experimental results validate the performance of our algorithm, and compare to natural deep-learning baselines.
Keyword: efficient
Spiking Network Initialisation and Firing Rate Collapse
Training Neural Networks without Backpropagation: A Deeper Dive into the Likelihood Ratio Method
Numerical implementation of generalized V-line transforms on 2D vector fields and their inversions
Exploiting high-contrast Stokes preconditioners to efficiently solve incompressible fluid-structure interaction problems
Convex Geometric Trajectory Tracking using Lie Algebraic MPC for Autonomous Marine Vehicles
Gaussian Process Port-Hamiltonian Systems: Bayesian Learning with Physics Prior
Phase Field Modeling and Numerical Algorithm for Two-Phase Dielectric Fluid Flows
Scalable Adaptive Traffic Light Control Over a Traffic Network Including Transit Delays
Adaptive Federated Pruning in Hierarchical Wireless Networks
Physics-informed Convolutional Recurrent Surrogate Model for Reservoir Simulation with Well Controls
Private Training Set Inspection in MLaaS
Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering
Smooth Robustness Measures for Symbolic Control Via Signal Temporal Logic
STL robust semantics'' as a scalar robustness measure. This capability of STL enables transforming a symbolic control problem into an optimization problem that optimizes the corresponding robustness measure. However, since these robustness measures are non-smooth and non-convex, exact solutions can only be computed using computationally inefficient mixed-integer programming techniques that do not scale well. Therefore, recent literature has focused on using smooth approximations of these robustness measures to apply scalable and computationally efficient gradient-based methods to find local optima solutions. In this paper, we first generalize two recently established smooth robustness measures (SRMs) and two new ones and discuss their strengths and weaknesses. Next, we propose
STL error semantics'' to characterize the approximation errors associated with different SRMs under different parameter configurations. This allows one to sensibly select an SRM (to optimize) along with its parameter values. We then propose ``STL gradient semantics'' to derive explicit gradients of SRMs leading to improve computational efficiency as well as accuracy compared to when using numerically estimated gradients. Finally, these contributions are highlighted using extensive simulation results.A lightweight semi-centralized strategy for the massive parallelization of branching algorithms
Graph Reinforcement Learning for Network Control via Bi-Level Optimization
Deep Ensembling for Perceptual Image Quality Assessment
Dual-Alignment Pre-training for Cross-lingual Sentence Embedding
Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data
Lightweight Self-Knowledge Distillation with Multi-source Information Fusion
Machine learning enhanced real-time aerodynamic forces prediction based on sparse pressure sensor inputs
Can we forget how we learned? Representing states in iterated belief revision}
Counterfactual Outcome Prediction using Structured State Space Model
PIQI: Perceptual Image Quality Index based on Ensemble of Gaussian Process Regression
A Multi-Client Searchable Encryption Scheme for IoT Environment
One-shot neural band selection for spectral recovery
A SKG Security Challenge: Indoor SKG Under an On-The-Shoulder Eavesdropping Attack
Challenges with the Application of Cyber Security for Airworthiness (CSA) in Real-World Contexts
Massive Uniform Mesh Decimation via a Fast Intrinsic Delaunay Triangulation
Latent Distribution Adjusting for Face Anti-Spoofing
AdversarialWord Dilution as Text Data Augmentation in Low-Resource Regime
On CSI-Free Multi-Antenna Schemes for Massive Wireless-Powered Underground Sensor Networks
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Energy-Efficient WiFi Backscatter Communication for Green IoTs
Bézout identities and control of the heat equation
Establishing Shared Query Understanding in an Open Multi-Agent System
Multi-task convolutional neural network for image aesthetic assessment
Transformational Supervisor Localization
Towards Automatic Identification of Globally Valid Geometric Flat Outputs via Numerical Optimization
Learning Higher-order Object Interactions for Keypoint-based Video Understanding
Increasing Melanoma Diagnostic Confidence: Forcing the Convolutional Network to Learn from the Lesion
Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations
InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning
Copula-based Performance Analysis for Fluid Antenna Systems under Arbitrary Fading Channels
Ray-Patch: An Efficient Decoder for Light Field Transformers
Deep Fourier Residual method for solving time-harmonic Maxwell's equations
Accelerating Communications in Federated Applications with Transparent Object Proxies
Urban-StyleGAN: Learning to Generate and Manipulate Images of Urban Scenes
Addressing computational challenges in physical system simulations with machine learning
SoundStorm: Efficient Parallel Audio Generation
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Keyword: faster
Scalable and Robust Tensor Ring Decomposition for Large-scale Data
Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering
Component Training of Turbo Autoencoders
HyHTM: Hyperbolic Geometry based Hierarchical Topic Models
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing
Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
An Exploratory Study on the Evidence of Hackathons' Role in Solving OSS Newcomers' Challenges
Robust and lightweight audio fingerprint for Automatic Content Recognition
SoundStorm: Efficient Parallel Audio Generation
Keyword: mobile
Exploration of unknown indoor regions by a swarm of energy-constrained drones
Trust-Worthy Semantic Communications for the Metaverse Relying on Federated Learning
Age of Incorrect Information in Semantic Communications for NOMA Aided XR Applications
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing
A sequential transit network design algorithm with optimal learning under correlated beliefs
Your Identity is Your Behavior -- Continuous User Authentication based on Machine Learning and Touch Dynamics
Curious Rhythms: Temporal Regularities of Wikipedia Consumption
InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning
Keyword: pruning
Image Matching by Bare Homography
Influential Billboard Slot Selection using Spatial Clustering and Pruned Submodularity Graph
Adaptive Federated Pruning in Hierarchical Wireless Networks
Keyword: voxel
There is no result
Keyword: lidar
Correlation Pyramid Network for 3D Single Object Tracking
Graph-based Global Robot Simultaneous Localization and Mapping using Architectural Plans
InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning
Keyword: diffusion
Spiking Network Initialisation and Firing Rate Collapse
Common Diffusion Noise Schedules and Sample Steps are Flawed
Exploration of unknown indoor regions by a swarm of energy-constrained drones
Denoising Diffusion Models for Plug-and-Play Image Restoration
A deep learning method for multi-material diffusion problems based on physics-informed neural networks
CDDM: Channel Denoising Diffusion Models for Wireless Communications
Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples
AMD Autoregressive Motion Diffusion
Diffusion Dataset Generation: Towards Closing the Sim2Real Gap for Pedestrian Detection
Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model
Discrete Diffusion Probabilistic Models for Symbolic Music Generation
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
Keyword: dynamic
a general model of vehicle routing guidance systems based on distributive learning scheme
Fully analog photonic deep Reservoir Computer based on frequency multiplexing
Event Camera-based Visual Odometry for Dynamic Motion Tracking of a Legged Robot Using Adaptive Time Surface
Deliberation and Voting in Approval-Based Multi-Winner Elections
Exponential Integrators for Phase-Field Equations using Pseudo-spectral Methods: A Python Implementation
Equilibria of Fully Decentralized Learning in Networked Systems
Physics-enhanced Gaussian Process Variational Autoencoder
Convex Geometric Trajectory Tracking using Lie Algebraic MPC for Autonomous Marine Vehicles
Gaussian Process Port-Hamiltonian Systems: Bayesian Learning with Physics Prior
Phase Field Modeling and Numerical Algorithm for Two-Phase Dielectric Fluid Flows
Algorithmic Censoring in Dynamic Learning Systems
Physics-informed Convolutional Recurrent Surrogate Model for Reservoir Simulation with Well Controls
Self-Supervised Pretraining on Paired Sequences of fMRI Data for Transfer Learning to Brain Decoding Tasks
Learning Linear Embeddings for Non-Linear Network Dynamics with Koopman Message Passing
Smooth Robustness Measures for Symbolic Control Via Signal Temporal Logic
STL robust semantics'' as a scalar robustness measure. This capability of STL enables transforming a symbolic control problem into an optimization problem that optimizes the corresponding robustness measure. However, since these robustness measures are non-smooth and non-convex, exact solutions can only be computed using computationally inefficient mixed-integer programming techniques that do not scale well. Therefore, recent literature has focused on using smooth approximations of these robustness measures to apply scalable and computationally efficient gradient-based methods to find local optima solutions. In this paper, we first generalize two recently established smooth robustness measures (SRMs) and two new ones and discuss their strengths and weaknesses. Next, we propose
STL error semantics'' to characterize the approximation errors associated with different SRMs under different parameter configurations. This allows one to sensibly select an SRM (to optimize) along with its parameter values. We then propose ``STL gradient semantics'' to derive explicit gradients of SRMs leading to improve computational efficiency as well as accuracy compared to when using numerically estimated gradients. Finally, these contributions are highlighted using extensive simulation results.Graph Reinforcement Learning for Network Control via Bi-Level Optimization
Constructing Feedback Linearizable Discretizations for Continuous-Time Systems using Retraction Maps
Modelling Human Visual Motion Processing with Trainable Motion Energy Sensing and a Self-attention Network for Adaptive Motion Integration
Fusion-Based Multi-User Semantic Communications for Wireless Image Transmission over Degraded Broadcast Channels
Static Pricing Guarantees for Queueing Systems
Cooperative Aerial Transportation of Nonuniform Load through Quadrotors by Elastic and Flexible Cables
Ortho-ODE: Enhancing Robustness and of Neural ODEs against Adversarial Attacks
Information Energy Ratio of XOR Logic Gate at Mesoscopic Scale
Capacitor Voltage Synchronizing Control of 100% Full-Scale Wind Power Generator-Supplied Power Systems
Machine learning enhanced real-time aerodynamic forces prediction based on sparse pressure sensor inputs
A Multi-Client Searchable Encryption Scheme for IoT Environment
A Dynamically Similar Lab-Scale District Heating Network via Dimensional Analysis
Numerical solution of Poisson partial differential equations in high dimension using deep neural networks
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing
STLCCP: An Efficient Convex Optimization-based Framework for Signal Temporal Logic Specifications
Integrated Planning and Control of Robotic Surgical Instruments for Tasks Autonomy
Your Identity is Your Behavior -- Continuous User Authentication based on Machine Learning and Touch Dynamics
Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation
Curious Rhythms: Temporal Regularities of Wikipedia Consumption
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Stable Dinner Party Seating Arrangements
Walking the Walk of AI Ethics: Organizational Challenges and the Individualization of Risk among Ethics Entrepreneurs
Inductive Graph Neural Networks for Moving Object Segmentation
The Power of Learned Locally Linear Models for Nonlinear Policy Optimization