Abstract
Modern smart home control systems utilize real-time occupancy and activity monitoring to ensure control efficiency, occupants' comfort, and optimal energy consumption. Moreover, adopting machine learning-based anomaly detection models (ADMs) enhances security and reliability. However, sufficient system knowledge allows adversaries/attackers to alter sensor measurements through stealthy false data injection (FDI) attacks. Although ADMs limit attack scopes, the availability of information like occupants' location, conducted activities, and alteration capability of smart appliances increase the attack surface. Therefore, performing an attack space analysis of modern home control systems is crucial to design robust defense solutions. However, state-of-the-art analyzers do not consider contemporary control and defense solutions and generate trivial attack vectors. To address this, we propose a control and defense-aware novel attack analysis framework for a modern smart home control system, efficiently extracting ADM rules. We verify and validate our framework using a state-of-the-art dataset and a prototype testbed.
Integrating Node Importance and Network Topological Properties for Link Prediction in Complex Network
Authors: Zhu Junxi, Dai Fang, Zhao Fengqun, Guo Wenyan
Subjects: Social and Information Networks (cs.SI); Optimization and Control (math.OC)
Abstract
Link prediction is one of the most important and challenging tasks in complex network analysis, which aims to predict the likelihood of the existence of missing links based on the known information in the network. As critical topological properties in the network, node degree and clustering coefficient are well-suited for describing the tightness of connection between nodes. The node importance can affect the possibility of link existence to a certain extent. By analyzing the impact of different centrality on links, which concluded that the degree centrality and proximity centrality have the greatest influence on link prediction. So, a link prediction algorithm combines node importance and attribute, called DCCLP, is proposed in this paper. In the training phase of the DCCLP algorithm, the maximized AUC indicator in the training set as the objective, and the optimal parameters are estimated by utilizing the White Shark Optimization algorithm. Then the prediction accuracy of the DCCLP algorithm is evaluated in the test set. By experimenting on twenty-one networks with different scales, and comparing with existing algorithms, the experimental results show that the effectiveness and feasibility of DCCLP algorithm, and further illustrate the importance of the degree centrality of node pairs and proximity centrality of nodes to improve the prediction accuracy of link prediction.
Adversarial Security and Differential Privacy in mmWave Beam Prediction in 6G networks
Authors: Ghanta Sai Krishna, Kundrapu Supriya, Sanskar Singh, Sabur Baidya
Abstract
In the forthcoming era of 6G, the mmWave communication is envisioned to be used in dense user scenarios with high bandwidth requirements, that necessitate efficient and accurate beam prediction. Machine learning (ML) based approaches are ushering as a critical solution for achieving such efficient beam prediction for 6G mmWave communications. However, most contemporary ML classifiers are quite susceptible to adversarial inputs. Attackers can easily perturb the methodology through noise addition in the model itself. To mitigate this, the current work presents a defensive mechanism for attenuating the adversarial attacks against projected ML-based models for mmWave beam anticipation by incorporating adversarial training. Furthermore, as training 6G mmWave beam prediction model necessitates the use of large and comprehensive datasets that could include sensitive information regarding the user's location, differential privacy (DP) has been introduced as a technique to preserve the confidentiality of the information by purposefully adding a low sensitivity controlled noise in the datasets. It ensures that even if the information about a user location could be retrieved, the attacker would have no means to determine whether the information is significant or meaningless. With ray-tracing simulations for various outdoor and indoor scenarios, we illustrate the advantage of our proposed novel framework in terms of beam prediction accuracy and effective achievable rate while ensuring the security and privacy in communications.
Random Edge Coding: One-Shot Bits-Back Coding of Large Labeled Graphs
Authors: Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Abstract
We present a one-shot method for compressing large labeled graphs called Random Edge Coding. When paired with a parameter-free model based on P\'olya's Urn, the worst-case computational and memory complexities scale quasi-linearly and linearly with the number of observed edges, making it efficient on sparse graphs, and requires only integer arithmetic. Key to our method is bits-back coding, which is used to sample edges and vertices without replacement from the edge-list in a way that preserves the structure of the graph. Optimality is proven under a class of random graph models that are invariant to permutations of the edges and of vertices within an edge. Experiments indicate Random Edge Coding can achieve competitive compression performance on real-world network datasets and scales to graphs with millions of nodes and edges.
FedHGN: A Federated Framework for Heterogeneous Graph Neural Networks
Authors: Xinyu Fu, Irwin King
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)
Abstract
Heterogeneous graph neural networks (HGNNs) can learn from typed and relational graph data more effectively than conventional GNNs. With larger parameter spaces, HGNNs may require more training data, which is often scarce in real-world applications due to privacy regulations (e.g., GDPR). Federated graph learning (FGL) enables multiple clients to train a GNN collaboratively without sharing their local data. However, existing FGL methods mainly focus on homogeneous GNNs or knowledge graph embeddings; few have considered heterogeneous graphs and HGNNs. In federated heterogeneous graph learning, clients may have private graph schemas. Conventional FL/FGL methods attempting to define a global HGNN model would violate schema privacy. To address these challenges, we propose FedHGN, a novel and general FGL framework for HGNNs. FedHGN adopts schema-weight decoupling to enable schema-agnostic knowledge sharing and employs coefficients alignment to stabilize the training process and improve HGNN performance. With better privacy preservation, FedHGN consistently outperforms local training and conventional FL methods on three widely adopted heterogeneous graph datasets with varying client numbers. The code is available at https://github.com/cynricfu/FedHGN .
ADDSL: Hand Gesture Detection and Sign Language Recognition on Annotated Danish Sign Language
Abstract
For a long time, detecting hand gestures and recognizing them as letters or numbers has been a challenging task. This creates communication barriers for individuals with disabilities. This paper introduces a new dataset, the Annotated Dataset for Danish Sign Language (ADDSL). Annota-tions for the dataset were made using the open-source tool LabelImg in the YOLO format. Using this dataset, a one-stage ob-ject detector model (YOLOv5) was trained with the CSP-DarkNet53 backbone and YOLOv3 head to recognize letters (A-Z) and numbers (0-9) using only seven unique images per class (without augmen-tation). Five models were trained with 350 epochs, resulting in an average inference time of 9.02ms per image and a best accu-racy of 92% when compared to previous research. Our results show that modified model is efficient and more accurate than existing work in the same field. The code repository for our model is available at the GitHub repository https://github.com/s4nyam/pvt-addsl.
CQural: A Novel CNN based Hybrid Architecture for Quantum Continual Machine Learning
Abstract
Training machine learning models in an incremental fashion is not only important but also an efficient way to achieve artificial general intelligence. The ability that humans possess of continuous or lifelong learning helps them to not forget previously learned tasks. However, current neural network models are prone to catastrophic forgetting when it comes to continual learning. Many researchers have come up with several techniques in order to reduce the effect of forgetting from neural networks, however, all techniques are studied classically with a very less focus on changing the machine learning model architecture. In this research paper, we show that it is not only possible to circumvent catastrophic forgetting in continual learning with novel hybrid classical-quantum neural networks, but also explains what features are most important to learn for classification. In addition, we also claim that if the model is trained with these explanations, it tends to give better performance and learn specific features that are far from the decision boundary. Finally, we present the experimental results to show comparisons between classical and classical-quantum hybrid architectures on benchmark MNIST and CIFAR-10 datasets. After successful runs of learning procedure, we found hybrid neural network outperforms classical one in terms of remembering the right evidences of the class-specific features.
Distributionally Robust Differential Dynamic Programming with Wasserstein Distance
Authors: Astghik Hakobyan, Insoon Yang
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Differential dynamic programming (DDP) is a popular technique for solving nonlinear optimal control problems with locally quadratic approximations. However, existing DDP methods are not designed for stochastic systems with unknown disturbance distributions. To address this limitation, we propose a novel DDP method that approximately solves the Wasserstein distributionally robust control (WDRC) problem, where the true disturbance distribution is unknown but a disturbance sample dataset is given. Our approach aims to develop a practical and computationally efficient DDP solution. To achieve this, we use the Kantrovich duality principle to decompose the value function in a novel way and derive closed-form expressions of the distributionally robust control and worst-case distribution policies to be used in each iteration of our DDP algorithm. This characterization makes our method tractable and scalable without the need for numerically solving any minimax optimization problems. The superior out-of-sample performance and scalability of our algorithm are demonstrated through kinematic car navigation and coupled oscillator problems.
NerfBridge: Bringing Real-time, Online Neural Radiance Field Training to Robotics
Authors: Javier Yu, Jun En Low, Keiko Nagami, Mac Schwager
Abstract
This work was presented at the IEEE International Conference on Robotics and Automation 2023 Workshop on Unconventional Spatial Representations. Neural radiance fields (NeRFs) are a class of implicit scene representations that model 3D environments from color images. NeRFs are expressive, and can model the complex and multi-scale geometry of real world environments, which potentially makes them a powerful tool for robotics applications. Modern NeRF training libraries can generate a photo-realistic NeRF from a static data set in just a few seconds, but are designed for offline use and require a slow pose optimization pre-computation step. In this work we propose NerfBridge, an open-source bridge between the Robot Operating System (ROS) and the popular Nerfstudio library for real-time, online training of NeRFs from a stream of images. NerfBridge enables rapid development of research on applications of NeRFs in robotics by providing an extensible interface to the efficient training pipelines and model libraries provided by Nerfstudio. As an example use case we outline a hardware setup that can be used NerfBridge to train a NeRF from images captured by a camera mounted to a quadrotor in both indoor and outdoor environments. For accompanying video https://youtu.be/EH0SLn-RcDg and code https://github.com/javieryu/nerf_bridge.
Shortest Path to Boundary for Self-Intersecting Meshes
Abstract
We introduce a method for efficiently computing the exact shortest path to the boundary of a mesh from a given internal point in the presence of self-intersections. We provide a formal definition of shortest boundary paths for self-intersecting objects and present a robust algorithm for computing the actual shortest boundary path. The resulting method offers an effective solution for collision and self-collision handling while simulating deformable volumetric objects, using fast simulation techniques that provide no guarantees on collision resolution. Our evaluation includes complex self-collision scenarios with a large number of active contacts, showing that our method can successfully handle them by introducing a relatively minor computational overhead.
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks
Authors: Ali Gorji, Andisheh Amrollahi, Andreas Krause
Abstract
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards ``simpler'' functions. Various notions of simplicity have been introduced to characterize this behavior. Here, we focus on the case of neural networks with discrete (zero-one) inputs through the lens of their Fourier (Walsh-Hadamard) transforms, where the notion of simplicity can be captured through the \emph{degree} of the Fourier coefficients. We empirically show that neural networks have a tendency to learn lower-degree frequencies. We show how this spectral bias towards simpler features can in fact \emph{hurt} the neural network's generalization on real-world datasets. To remedy this we propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies. Our regularizer also helps avoid erroneous identification of low-degree frequencies, which further improves generalization. We extensively evaluate our regularizer on synthetic datasets to gain insights into its behavior. Finally, we show significantly improved generalization on four different datasets compared to standard neural networks and other relevant baselines.
A Note on Dimensionality Reduction in Deep Neural Networks using Empirical Interpolation Method
Abstract
Empirical interpolation method (EIM) is a well-known technique to efficiently approximate parameterized functions. This paper proposes to use EIM algorithm to efficiently reduce the dimension of the training data within supervised machine learning. This is termed as DNN-EIM. Applications in data science (e.g., MNIST) and parameterized (and time-dependent) partial differential equations (PDEs) are considered. The proposed DNNs in case of classification are trained in parallel for each class. This approach is sequential, i.e., new classes can be added without having to retrain the network. In case of PDEs, a DNN is designed corresponding to each EIM point. Again, these networks can be trained in parallel, for each EIM point. In all cases, the parallel networks require fewer than ten times the number of training weights. Significant gains are observed in terms of training times, without sacrificing accuracy.
Accessible Interfaces for the Development and Deployment of Robotic Platforms
Abstract
Accessibility is one of the most important features in the design of robots and their interfaces. This thesis proposes methods that improve the accessibility of robots for three different target audiences: consumers, researchers, and learners. In order for humans and robots to work together effectively, they both must be able to communicate with each other. We tackle the problem of generating route instructions that are readily understandable by novice humans for the navigation of a priori unknown indoor environments. We then move on to the related problem of enabling robots to understand natural language utterances in the context of learning to operate articulated objects (e.g., fridges, drawers) by leveraging kinematic models. Next, we turn our focus to the development of accessible and reproducible robotic platforms for scientific research. We propose a new concept for reproducible robotics research that integrates development and benchmarking, so that reproducibility is obtained "by design" from the beginning of the research and development process. We then propose a framework called SHARC (SHared Autonomy for Remote Collaboration), to improve accessibility for underwater robotic intervention operations. SHARC allows multiple remote scientists to efficiently plan and execute high-level sampling procedures using an underwater manipulator while deferring low-level control to the robot. Lastly, we developed the first hardware-based MOOC in AI and robotics. This course allows learners to study autonomy hands-on by making real robots make their own decisions and accomplish broadly defined tasks. We design a new robotic platform from the ground up to support this new learning experience. A fully browser-based interface, based on leading tools and technologies for code development, testing, validation, and deployment serves to maximize the accessibility of these educational resources.
MINT: Multiplier-less Integer Quantization for Spiking Neural Networks
Authors: Ruokai Yin, Yuhang Li, Abhishek Moitra, Priyadarshini Panda
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
We propose Multiplier-less INTeger (MINT) quantization, an efficient uniform quantization scheme for the weights and membrane potentials in spiking neural networks (SNNs). Unlike prior SNN quantization works, MINT quantizes the memory-hungry membrane potentials to extremely low bit-width (2-bit) to significantly reduce the total memory footprint. Additionally, MINT quantization shares the quantization scale between the weights and membrane potentials, eliminating the need for multipliers and floating arithmetic units, which are required by the standard uniform quantization. Experimental results demonstrate that our proposed method achieves accuracy that matches other state-of-the-art SNN quantization works while outperforming them on total memory footprint and hardware cost at deployment time. For instance, 2-bit MINT VGG-16 achieves 48.6% accuracy on TinyImageNet (0.28% better than the full-precision baseline) with approximately 93.8% reduction in total memory footprint from the full-precision model; meanwhile, our model reduces area by 93% and dynamic power by 98% compared to other SNN quantization counterparts.
Crossing the Reality Gap in Tactile-Based Learning
Authors: Ya-Yen Tsai, Bidan Huang, Yu Zheng, Lei Han, Wang Wei Lee, Edward Johns
Abstract
Tactile sensors are believed to be essential in robotic manipulation, and prior works often rely on experts to reason the sensor feedback and design a controller. With the recent advancement in data-driven approaches, complicated manipulation can be realised, but an accurate and efficient tactile simulation is necessary for policy training. To this end, we present an approach to model a commonly used pressure sensor array in simulation and to train a tactile-based manipulation policy with sim-to-real transfer in mind. Each taxel in our model is represented as a mass-spring-damper system, in which the parameters are iteratively identified as plausible ranges. This allows a policy to be trained with domain randomisation which improves its robustness to different environments. Then, we introduce encoders to further align the critical tactile features in a latent space. Finally, our experiments answer questions on tactile-based manipulation, tactile modelling and sim-to-real performance.
Equivariant Few-Shot Learning from Pretrained Models
Abstract
Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of \cite{basu2022equi} and \cite{kaba2022equivariance} propose group averaging (\textit{equitune}) and optimization-based methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While \cite{kaba2022equivariance} are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose $\lambda$-\textit{equitune} that averages the features using \textit{importance weights}, $\lambda$s. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that $\lambda$-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of \cite{kaba2022equivariance} used with appropriate loss functions, which we call \textit{equizero}, also gives excellent zero-shot and finetuned performance. Both equitune and equizero are special cases of $\lambda$-equitune. To show the simplicity and generality of our method, we validate on a wide range of diverse applications and models such as 1) image classification using CLIP, 2) deep Q-learning, 3) fairness in natural language generation (NLG), 4) compositional generalization in languages, and 5) image classification using pretrained CNNs such as Resnet and Alexnet.
Abstract
Recently, Transformers have emerged as the go-to architecture for both vision and language modeling tasks, but their computational efficiency is limited by the length of the input sequence. To address this, several efficient variants of Transformers have been proposed to accelerate computation or reduce memory consumption while preserving performance. This paper presents an efficient vision Transformer, called CageViT, that is guided by convolutional activation to reduce computation. Our CageViT, unlike current Transformers, utilizes a new encoder to handle the rearranged tokens, bringing several technical contributions: 1) Convolutional activation is used to pre-process the token after patchifying the image to select and rearrange the major tokens and minor tokens, which substantially reduces the computation cost through an additional fusion layer. 2) Instead of using the class activation map of the convolutional model directly, we design a new weighted class activation to lower the model requirements. 3) To facilitate communication between major tokens and fusion tokens, Gated Linear SRA is proposed to further integrate fusion tokens into the attention mechanism. We perform a comprehensive validation of CageViT on the image classification challenge. Experimental results demonstrate that the proposed CageViT outperforms the most recent state-of-the-art backbones by a large margin in terms of efficiency, while maintaining a comparable level of accuracy (e.g. a moderate-sized 43.35M model trained solely on 224 x 224 ImageNet-1K can achieve Top-1 accuracy of 83.4% accuracy).
Asynchronous Grant-Free Random Access: Receiver Design with Partially Uni-Directional Message Passing and Interference Suppression Analysis
Abstract
Massive Machine-Type Communications (mMTC) features a massive number of low-cost user equipments (UEs) with sparse activity. Tailor-made for these features, grant-free random access (GF-RA) serves as an efficient access solution for mMTC. However, most existing GF-RA schemes rely on strict synchronization, which incurs excessive coordination burden for the low-cost UEs. In this work, we propose a receiver design for asynchronous GF-RA, and address the joint user activity detection (UAD) and channel estimation (CE) problem in the presence of asynchronization-induced inter-symbol interference. Specifically, the delay profile is exploited at the receiver to distinguish different UEs. However, a sample correlation problem in this receiver design impedes the factorization of the joint likelihood function, which complicates the UAD and CE problem. To address this correlation problem, we design a partially uni-directional (PUD) factor graph representation for the joint likelihood function. Building on this PUD factor graph, we further propose a PUD message passing based sparse Bayesian learning (SBL) algorithm for asynchronous UAD and CE (PUDMP-SBL-aUADCE). Our theoretical analysis shows that the PUDMP-SBL-aUADCE algorithm exhibits higher signal-to-interference-and-noise ratio (SINR) in the asynchronous case than in the synchronous case, i.e., the proposed receiver design can exploit asynchronization to suppress multi-user interference. In addition, considering potential timing error from the low-cost UEs, we investigate the impacts of imperfect delay profile, and reveal the advantages of adopting the SBL method in this case. Finally, extensive simulation results are provided to demonstrate the performance of the PUDMP-SBL-aUADCE algorithm.
Singly Exponential Translation of Alternating Weak Büchi Automata to Unambiguous Büchi Automata
Authors: Yong Li, Sven Schewe, Moshe Y. Vardi
Subjects: Formal Languages and Automata Theory (cs.FL)
Abstract
We introduce a method for translating an alternating weak B\"uchi automaton (AWA), which corresponds to a Linear Dynamic Logic (LDL) formula, to an unambiguous B\"uchi automaton (UBA). Our translations generalise constructions for Linear Temporal Logic (LTL), a less expressive specification language than LDL. In classical constructions, LTL formulas are first translated to alternating \emph{very weak} automata (AVAs) -- automata that have only singleton strongly connected components (SCCs); the AVAs are then handled by efficient disambiguation procedures. However, general AWAs can have larger SCCs, which complicates disambiguation. Currently, the only available disambiguation procedure has to go through an intermediate construction of nondeterministic B\"uchi automata (NBAs), which would incur an exponential blow-up of its own. We introduce a translation from \emph{general} AWAs to UBAs with a \emph{singly} exponential blow-up, which also immediately provides a singly exponential translation from LDL to UBAs. Interestingly, the complexity of our translation is \emph{smaller} than the best known disambiguation algorithm for NBAs (broadly $(0.53n)^n$ vs. $(0.76n)^n$), while the input of our construction can be exponentially more succinct.
Border Complexity of Symbolic Determinant under Rank One Restriction
Abstract
VBP is the class of polynomial families that can be computed by the determinant of a symbolic matrix of the form $A0 + \sum{i=1}^n A_ix_i$ where the size of each $A_i$ is polynomial in the number of variables (equivalently, computable by polynomial-sized algebraic branching programs (ABP)). A major open problem in geometric complexity theory (GCT) is to determine whether VBP is closed under approximation. The power of approximation is well understood for some restricted models of computation, e.g., the class of depth-two circuits, read-once oblivious ABPs (ROABP), monotone ABPs, depth-three circuits of bounded top fan-in, and width-two ABPs. The former three classes are known to be closed under approximation [Bl"{a}ser, Ikenmeyer, Mahajan, Pandey, and Saurabh (2020)], whereas the approximative closure of the last one captures the whole class of polynomial families computable by polynomial-sized formulas [Bringmann, Ikenmeyer, and Zuiddam (2017)]. In this work, we consider the subclass of VBP computed by the determinant of a symbolic matrix of the form $A0 + \sum{i=1}^n A_ix_i$ where for each $1\leq i \leq n$, $A_i$ is of rank one. It has been studied extensively [Edmonds(1968), Edmonds(1979)] and efficient identity testing algorithms are known [Lov"{a}sz (1989), Gurjar and Thierauf (2020)]. We show that this class is closed under approximation. In the language of algebraic geometry, we show that the set obtained by taking coordinatewise products of pairs of points from (the Pl\"{u}cker embedding of) a Grassmannian variety is closed.
River of No Return: Graph Percolation Embeddings for Efficient Knowledge Graph Reasoning
Abstract
We study Graph Neural Networks (GNNs)-based embedding techniques for knowledge graph (KG) reasoning. For the first time, we link the path redundancy issue in the state-of-the-art KG reasoning models based on path encoding and message passing to the transformation error in model training, which brings us new theoretical insights into KG reasoning, as well as high efficacy in practice. On the theoretical side, we analyze the entropy of transformation error in KG paths and point out query-specific redundant paths causing entropy increases. These findings guide us to maintain the shortest paths and remove redundant paths for minimized-entropy message passing. To achieve this goal, on the practical side, we propose an efficient Graph Percolation Process motivated by the percolation model in Fluid Mechanics, and design a lightweight GNN-based KG reasoning framework called Graph Percolation Embeddings (GraPE). GraPE outperforms previous state-of-the-art methods in both transductive and inductive reasoning tasks while requiring fewer training parameters and less inference time.
Abstract
Let $T$ be a matrix whose entries are linear forms over the noncommutative variables $x_1, x_2, \ldots, x_n$. The noncommutative Edmonds' problem (NSINGULAR) aims to determine whether $T$ is invertible in the free skew field generated by $x_1,x_2,\ldots,x_n$. Currently, there are three different deterministic polynomial-time algorithms to solve this problem: using operator scaling [Garg, Gurvits, Oliveira, and Wigserdon (2016)], algebraic methods [Ivanyos, Qiao, and Subrahmanyam (2018)], and convex optimization [Hamada and Hirai (2021)]. In this paper, we present a simpler algorithm for the NSINGULAR problem. While our algorithmic template is similar to the one in Ivanyos et. al.(2018), it significantly differs in its implementation of the rank increment step. Instead of computing the limit of a second Wong sequence, we reduce the problem to the polynomial identity testing (PIT) of noncommutative algebraic branching programs (ABPs). This enables us to bound the bit-complexity of the algorithm over $\mathbb{Q}$ without requiring special care. Moreover, the rank increment step can be implemented in quasipolynomial-time even without an explicit description of the coefficient matrices in $T$. This is possible by exploiting the connection with the black-box PIT of noncommutative ABPs [Forbes and Shpilka (2013)].
EfficientSCI: Densely Connected Network with Space-time Factorization for Large-scale Video Snapshot Compressive Imaging
Authors: Lishun Wang, Miao Cao, Xin Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Image and Video Processing (eess.IV)
Abstract
Video snapshot compressive imaging (SCI) uses a two-dimensional detector to capture consecutive video frames during a single exposure time. Following this, an efficient reconstruction algorithm needs to be designed to reconstruct the desired video frames. Although recent deep learning-based state-of-the-art (SOTA) reconstruction algorithms have achieved good results in most tasks, they still face the following challenges due to excessive model complexity and GPU memory limitations: 1) these models need high computational cost, and 2) they are usually unable to reconstruct large-scale video frames at high compression ratios. To address these issues, we develop an {\bf{\em efficient network}} for video SCI by using {\bf {\em dense connections and space-time factorization mechanism}} within a single residual block, dubbed {\bf \emph{EfficientSCI}}. The EfficientSCI network can well establish spatial-temporal correlation by using {\bf {\em convolution in the spatial domain and Transformer in the temporal domain}}, respectively. We are the first time to show that an UHD color video with high compression ratio can be reconstructed from a snapshot 2D measurement using a single end-to-end deep learning model with PSNR above 32 dB. Extensive results on both simulation and real data show that our method significantly outperforms all previous SOTA algorithms with better real-time performance. The code is at \url{https://github.com/ucaswangls/EfficientSCI.git}.
An efficient solver for ASP(Q)
Authors: Wolfgang Faber, Giuseppe Mazzotta, Francesco Ricca
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract
Answer Set Programming with Quantifiers ASP(Q) extends Answer Set Programming (ASP) to allow for declarative and modular modeling of problems from the entire polynomial hierarchy. The first implementation of ASP(Q), called qasp, was based on a translation to Quantified Boolean Formulae (QBF) with the aim of exploiting the well-developed and mature QBF-solving technology. However, the implementation of the QBF encoding employed in qasp is very general and might produce formulas that are hard to evaluate for existing QBF solvers because of the large number of symbols and sub-clauses. In this paper, we present a new implementation that builds on the ideas of qasp and features both a more efficient encoding procedure and new optimized encodings of ASP(Q) programs in QBF. The new encodings produce smaller formulas (in terms of the number of quantifiers, variables, and clauses) and result in a more efficient evaluation process. An algorithm selection strategy automatically combines several QBF-solving back-ends to further increase performance. An experimental analysis, conducted on known benchmarks, shows that the new system outperforms qasp.
An Efficient Solution Space Exploring and Descent Method for Packing Equal Spheres in a Sphere
Authors: Jianrong Zhou, Shuo Ren, Kun He, Yanli Liu, Chu-Min Li
Abstract
The problem of packing equal spheres in a spherical container is a classic global optimization problem, which has attracted enormous studies in academia and found various applications in industry. This problem is computationally challenging, and many efforts focus on small-scale instances with the number of spherical items less than 200 in the literature. In this work, we propose an efficient local search heuristic algorithm named solution space exploring and descent for solving this problem, which can quantify the solution's quality to determine the number of exploring actions and quickly discover a high-quality solution. Besides, we propose an adaptive neighbor object maintenance method to speed up the convergence of the continuous optimization process and reduce the time consumption. Computational experiments on a large number of benchmark instances with $5 \leq n \leq 400$ spherical items show that our algorithm significantly outperforms the state-of-the-art algorithm. In particular, it improves the 274 best-known results and matches the 84 best-known results out of the 396 well-known benchmark instances.
SHoP: A Deep Learning Framework for Solving High-order Partial Differential Equations
Abstract
Solving partial differential equations (PDEs) has been a fundamental problem in computational science and of wide applications for both scientific and engineering research. Due to its universal approximation property, neural network is widely used to approximate the solutions of PDEs. However, existing works are incapable of solving high-order PDEs due to insufficient calculation accuracy of higher-order derivatives, and the final network is a black box without explicit explanation. To address these issues, we propose a deep learning framework to solve high-order PDEs, named SHoP. Specifically, we derive the high-order derivative rule for neural network, to get the derivatives quickly and accurately; moreover, we expand the network into a Taylor series, providing an explicit solution for the PDEs. We conduct experimental validations four high-order PDEs with different dimensions, showing that we can solve high-order PDEs efficiently and accurately.
Optimized Joint Beamforming for Wireless Powered Over-the-Air Computation
Authors: Siyao Zhang, Xinmin Li, Yin Long, Jie Xu, Shuguang Cui
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This correspondence studies the wireless powered over-the-air computation (AirComp) for achieving sustainable wireless data aggregation (WDA) by integrating AirComp and wireless power transfer (WPT) into a joint design. In particular, we consider that a multi-antenna hybrid access point (HAP) employs the transmit energy beamforming to charge multiple single-antenna low-power wireless devices (WDs) in the downlink, and the WDs use the harvested energy to simultaneously send their messages to the HAP for AirComp in the uplink. Under this setup, we minimize the computation mean square error (MSE), by jointly optimizing the transmit energy beamforming and the receive AirComp beamforming at the HAP, as well as the transmit power at the WDs, subject to the maximum transmit power constraint at the HAP and the wireless energy harvesting constraints at individual WDs. To tackle the non-convex computation MSE minimization problem, we present an efficient algorithm to find a converged high-quality solution by using the alternating optimization technique. Numerical results show that the proposed joint WPT-AirComp approach significantly reduces the computation MSE, as compared to other benchmark schemes.
Synthesizing Resilient Strategies for Infinite-Horizon Objectives in Multi-Agent Systems
Authors: David Klaška, Antonín Kučera, Martin Kurečka, Vít Musil, Petr Novotný, Vojtěch Řehák
Abstract
We consider the problem of synthesizing resilient and stochastically stable strategies for systems of cooperating agents striving to minimize the expected time between consecutive visits to selected locations in a known environment. A strategy profile is resilient if it retains its functionality even if some of the agents fail, and stochastically stable if the visiting time variance is small. We design a novel specification language for objectives involving resilience and stochastic stability, and we show how to efficiently compute strategy profiles (for both autonomous and coordinated agents) optimizing these objectives. Our experiments show that our strategy synthesis algorithm can construct highly non-trivial and efficient strategy profiles for environments with general topology.
CWD30: A Comprehensive and Holistic Dataset for Crop Weed Recognition in Precision Agriculture
Authors: Talha Ilyas, Dewa Made Sri Arsa, Khubaib Ahmad, Yong Chae Jeong, Okjae Won, Jong Hoon Lee, Hyongsuk Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The growing demand for precision agriculture necessitates efficient and accurate crop-weed recognition and classification systems. Current datasets often lack the sample size, diversity, and hierarchical structure needed to develop robust deep learning models for discriminating crops and weeds in agricultural fields. Moreover, the similar external structure and phenomics of crops and weeds complicate recognition tasks. To address these issues, we present the CWD30 dataset, a large-scale, diverse, holistic, and hierarchical dataset tailored for crop-weed recognition tasks in precision agriculture. CWD30 comprises over 219,770 high-resolution images of 20 weed species and 10 crop species, encompassing various growth stages, multiple viewing angles, and environmental conditions. The images were collected from diverse agricultural fields across different geographic locations and seasons, ensuring a representative dataset. The dataset's hierarchical taxonomy enables fine-grained classification and facilitates the development of more accurate, robust, and generalizable deep learning models. We conduct extensive baseline experiments to validate the efficacy of the CWD30 dataset. Our experiments reveal that the dataset poses significant challenges due to intra-class variations, inter-class similarities, and data imbalance. Additionally, we demonstrate that minor training modifications like using CWD30 pretrained backbones can significantly enhance model performance and reduce convergence time, saving training resources on several downstream tasks. These challenges provide valuable insights and opportunities for future research in crop-weed recognition. We believe that the CWD30 dataset will serve as a benchmark for evaluating crop-weed recognition algorithms, promoting advancements in precision agriculture, and fostering collaboration among researchers in the field.
Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network
Authors: Wenzhao Zhao, Barbara D. Wichtmann, Steffen Albert, Angelika Maurer, Frank G. Zöllner, Ulrike Attenberger, Jürgen Hesser
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Filter-decomposition-based group-equivariant convolutional neural networks (G-CNN) have been demonstrated to increase CNN's data efficiency and contribute to better interpretability and controllability of CNN models. However, so far filter-decomposition-based affine G-CNN methods rely on parameter sharing for achieving high parameter efficiency and suffer from a heavy computational burden. They also use a limited number of transformations and in particular ignore the shear transform in the application. In this paper, we address these problems by emphasizing the importance of the diversity of transformations. We propose a flexible and efficient strategy based on weighted filter-wise Monte Carlo sampling. In addition, we introduce shear equivariant CNN to address the highly sparse representations of natural images. We demonstrate that the proposed methods are intrinsically an efficient generalization of traditional CNNs, and we explain the advantage of bottleneck architectures used in the existing state-of-the-art CNN models such as ResNet, ResNext, and ConvNeXt from the group-equivariant perspective. Experiments on image classification and image denoising tasks show that with a set of suitable filter basis, our methods achieve superior performance to standard CNN with high data efficiency. The code will be available at https://github.com/ZhaoWenzhao/MCG_CNN.
Abstract
Working in a semi-constructive logical system that supports the extraction of concurrent programs, we extract a program inverting non-singular real valued matrices from a constructive proof based on Gaussian elimination. Concurrency is used for efficient pivoting, that is, for finding an entry that is apart from zero in a non-null vector of real numbers.
Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation
Authors: Zhenxing Zhang, Lambert Schomaker
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract
The goal of a speech-to-image transform is to produce a photo-realistic picture directly from a speech signal. Recently, various studies have focused on this task and have achieved promising performance. However, current speech-to-image approaches are based on a stacked modular framework that suffers from three vital issues: 1) Training separate networks is time-consuming as well as inefficient and the convergence of the final generative model strongly depends on the previous generators; 2) The quality of precursor images is ignored by this architecture; 3) Multiple discriminator networks are required to be trained. To this end, we propose an efficient and effective single-stage framework called Fusion-S2iGan to yield perceptually plausible and semantically consistent image samples on the basis of given spoken descriptions. Fusion-S2iGan introduces a visual+speech fusion module (VSFM), constructed with a pixel-attention module (PAM), a speech-modulation module (SMM) and a weighted-fusion module (WFM), to inject the speech embedding from a speech encoder into the generator while improving the quality of synthesized pictures. Fusion-S2iGan spreads the bimodal information over all layers of the generator network to reinforce the visual feature maps at various hierarchical levels in the architecture. We conduct a series of experiments on four benchmark data sets, i.e., CUB birds, Oxford-102, Flickr8k and Places-subset. The experimental results demonstrate the superiority of the presented Fusion-S2iGan compared to the state-of-the-art models with a multi-stage architecture and a performance level that is close to traditional text-to-image approaches.
Evolving the Digital Industrial Infrastructure for Production: Steps Taken and the Road Ahead
Authors: Jan Pennekamp, Anastasiia Belova, Thomas Bergs, Matthias Bodenbenner, Andreas Bührig-Polaczek, Markus Dahlmanns, Ike Kunze, Moritz Kröger, Sandra Geisler, Martin Henze, Daniel Lütticke, Benjamin Montavon, Philipp Niemietz, Lucia Ortjohann, Maximilian Rudack, Robert H. Schmitt, Uwe Vroomen, Klaus Wehrle, Michael Zeng
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Databases (cs.DB)
Abstract
The Internet of Production (IoP) leverages concepts such as digital shadows, data lakes, and a World Wide Lab (WWL) to advance today's production. Consequently, it requires a technical infrastructure that can support the agile deployment of these concepts and corresponding high-level applications, which, e.g., demand the processing of massive data in motion and at rest. As such, key research aspects are the support for low-latency control loops, concepts on scalable data stream processing, deployable information security, and semantically rich and efficient long-term storage. In particular, such an infrastructure cannot continue to be limited to machines and sensors, but additionally needs to encompass networked environments: production cells, edge computing, and location-independent cloud infrastructures. Finally, in light of the envisioned WWL, i.e., the interconnection of production sites, the technical infrastructure must be advanced to support secure and privacy-preserving industrial collaboration. To evolve today's production sites and lay the infrastructural foundation for the IoP, we identify five broad streams of research: (1) adapting data and stream processing to heterogeneous data from distributed sources, (2) ensuring data interoperability between systems and production sites, (3) exchanging and sharing data with different stakeholders, (4) network security approaches addressing the risks of increasing interconnectivity, and (5) security architectures to enable secure and privacy-preserving industrial collaboration. With our research, we evolve the underlying infrastructure from isolated, sparsely networked production sites toward an architecture that supports high-level applications and sophisticated digital shadows while facilitating the transition toward a WWL.
Iterated learning and communication jointly explain efficient color naming systems
Authors: Emil Carlsson, Devdatt Dubhashi, Terry Regier
Abstract
It has been argued that semantic systems reflect pressure for efficiency, and a current debate concerns the cultural evolutionary process that produces this pattern. We consider efficiency as instantiated in the Information Bottleneck (IB) principle, and a model of cultural evolution that combines iterated learning and communication. We show that this model, instantiated in neural networks, converges to color naming systems that are efficient in the IB sense and similar to human color naming systems. We also show that iterated learning alone, and communication alone, do not yield the same outcome as clearly.
Personality Understanding of Fictional Characters during Book Reading
Authors: Mo Yu, Jiangnan Li, Shunyu Yao, Wenjie Pang, Xiaochen Zhou, Zhou Xiao, Fandong Meng, Jie Zhou
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Comprehending characters' personalities is a crucial aspect of story reading. As readers engage with a story, their understanding of a character evolves based on new events and information; and multiple fine-grained aspects of personalities can be perceived. This leads to a natural problem of situated and fine-grained personality understanding. The problem has not been studied in the NLP field, primarily due to the lack of appropriate datasets mimicking the process of book reading. We present the first labeled dataset PersoNet for this problem. Our novel annotation strategy involves annotating user notes from online reading apps as a proxy for the original books. Experiments and human studies indicate that our dataset construction is both efficient and accurate; and our task heavily relies on long-term context to achieve accurate predictions for both machines and humans. The dataset is available at https://github.com/Gorov/personet_acl23.
Provably Correct Physics-Informed Neural Networks
Authors: Francisco Eiras, Adel Bibi, Rudy Bunel, Krishnamurthy Dj Dvijotham, Philip Torr, M. Pawan Kumar
Abstract
Recent work provides promising evidence that Physics-informed neural networks (PINN) can efficiently solve partial differential equations (PDE). However, previous works have failed to provide guarantees on the worst-case residual error of a PINN across the spatio-temporal domain - a measure akin to the tolerance of numerical solvers - focusing instead on point-wise comparisons between their solution and the ones obtained by a solver on a set of inputs. In real-world applications, one cannot consider tests on a finite set of points to be sufficient grounds for deployment, as the performance could be substantially worse on a different set. To alleviate this issue, we establish tolerance-based correctness conditions for PINNs over the entire input domain. To verify the extent to which they hold, we introduce $\partial$-CROWN: a general, efficient and scalable post-training framework to bound PINN residual errors. We demonstrate its effectiveness in obtaining tight certificates by applying it to two classically studied PDEs - Burgers' and Schr\"odinger's equations -, and two more challenging ones with real-world applications - the Allan-Cahn and Diffusion-Sorption equations.
Deep and Fast Approximate Order Independent Transparency
Authors: Grigoris Tsopouridis, Andreas-Alexandros Vasilakis, Ioannis Fudos
Abstract
We present a machine learning approach for efficiently computing order independent transparency (OIT). Our method is fast, requires a small constant amount of memory (depends only on the screen resolution and not on the number of triangles or transparent layers), is more accurate as compared to previous approximate methods, works for every scene without setup and is portable to all platforms running even with commodity GPUs. Our method requires a rendering pass to extract all features that are subsequently used to predict the overall OIT pixel color with a pre-trained neural network. We provide a comparative experimental evaluation and shader source code of all methods for reproduction of the experiments.
People Talking and AI Listening: How Stigmatizing Language in EHR Notes Affect AI Performance
Abstract
Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare.
Exploring the Space of Key-Value-Query Models with Intention
Authors: Marta Garnelo, Wojciech Marian Czarnecki
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract
Attention-based models have been a key element of many recent breakthroughs in deep learning. Two key components of Attention are the structure of its input (which consists of keys, values and queries) and the computations by which these three are combined. In this paper we explore the space of models that share said input structure but are not restricted to the computations of Attention. We refer to this space as Keys-Values-Queries (KVQ) Space. Our goal is to determine whether there are any other stackable models in KVQ Space that Attention cannot efficiently approximate, which we can implement with our current deep learning toolbox and that solve problems that are interesting to the community. Maybe surprisingly, the solution to the standard least squares problem satisfies these properties. A neural network module that is able to compute this solution not only enriches the set of computations that a neural network can represent but is also provably a strict generalisation of Linear Attention. Even more surprisingly the computational complexity of this module is exactly the same as that of Attention, making it a suitable drop in replacement. With this novel connection between classical machine learning (least squares) and modern deep learning (Attention) established we justify a variation of our model which generalises regular Attention in the same way. Both new modules are put to the test an a wide spectrum of tasks ranging from few-shot learning to policy distillation that confirm their real-worlds applicability.
Abstract
Nanopore sequencers, being superior to other sequencing technologies for DNA storage in multiple aspects, have attracted considerable attention in recent times. Their high error rates however demand thorough research on practical and efficient coding schemes to enable accurate recovery of stored data. To this end, we consider a simplified model of a nanopore sequencer inspired by Mao \emph{et al.}, that incorporates intersymbol interference and measurement noise. Essentially, our channel model passes a sliding window of length $\ell$ over an input sequence, that outputs the $L_1$-weight of the enclosed $\ell$ bits and shifts by $\delta$ positions with each time step. The resulting $(\ell+1)$-ary vector, termed the \emph{read vector}, may also be corrupted by $t$ substitution errors. By employing graph-theoretic techniques, we deduce that for $\delta=1$, at least $\log \log n$ bits of redundancy are required to correct a single ($t=1$) substitution. Finally for $\ell \geq 3$, we exploit some inherent characteristics of read vectors to arrive at an error-correcting code that is optimal up to an additive constant for this setting.
NFI$_2$: Learning Noise-Free Illuminance-Interpolator for Unsupervised Low-Light Image Enhancement
Abstract
Low-light situations severely restrict the pursuit of aesthetic quality in consumer photography. Although many efforts are devoted to designing heuristics, it is generally mired in a shallow spiral of tedium, such as piling up complex network architectures and empirical strategies. How to delve into the essential physical principles of illumination compensation has been neglected. Following the way of simplifying the complexity, this paper innovatively proposes a simple and efficient Noise-Free Illumination Interpolator (NFI$_2$). According to the constraint principle of illuminance and reflectance within a limited dynamic range, as a prior knowledge in the recovery process, we construct a learnable illuminance interpolator and thereby compensating for non-uniform lighting. With the intention of adapting denoising without annotated data, we design a self-calibrated denoiser with the intrinsic image properties to acquire noise-free low-light images. Starting from the properties of natural image manifolds, a self-regularized recovery loss is introduced as a way to encourage more natural and realistic reflectance map. The model architecture and training losses, guided by prior knowledge, complement and benefit each other, forming a powerful unsupervised leaning framework. Comprehensive experiments demonstrate that the proposed algorithm produces competitive qualitative and quantitative results while maintaining favorable generalization capability in unknown real-world scenarios.
Exploring Inductive Biases in Contrastive Learning: A Clustering Perspective
Authors: Yunzhe Zhang, Yao Lu, Lei Xu, Kunlin Yang, Hui Tang, Shuyuan Ye, Qi Xuan
Abstract
This paper investigates the differences in data organization between contrastive and supervised learning methods, focusing on the concept of locally dense clusters. We introduce a novel metric, Relative Local Density (RLD), to quantitatively measure local density within clusters. Visual examples are provided to highlight the distinctions between locally dense clusters and globally dense ones. By comparing the clusters formed by contrastive and supervised learning, we reveal that contrastive learning generates locally dense clusters without global density, while supervised learning creates clusters with both local and global density. We further explore the use of a Graph Convolutional Network (GCN) classifier as an alternative to linear classifiers for handling locally dense clusters. Finally, we utilize t-SNE visualizations to substantiate the differences between the features generated by contrastive and supervised learning methods. We conclude by proposing future research directions, including the development of efficient classifiers tailored to contrastive learning and the creation of innovative augmentation algorithms.
Abstract
We propose a novel Energy Loss Prediction(ELP) framework that estimates the energy loss in sharing crowdsourced energy services. Crowdsourcing wireless energy services is a novel and convenient solution to enable the ubiquitous charging of nearby IoT devices. Therefore, capturing the wireless energy sharing loss is essential for the successful deployment of efficient energy service composition techniques. We propose Easeformer, a novel attention-based algorithm to predict the battery levels of IoT devices in a crowdsourced energy sharing environment. The predicted battery levels are used to estimate the energy loss. A set of experiments were conducted to demonstrate the feasibility and effectiveness of the proposed framework. We conducted extensive experiments on real wireless energy datasets to demonstrate that our framework significantly outperforms existing methods.
State Representation Learning Using an Unbalanced Atlas
Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad
Abstract
The manifold hypothesis posits that high-dimensional data often lies on a lower-dimensional manifold and that utilizing this manifold as the target space yields more efficient representations. While numerous traditional manifold-based techniques exist for dimensionality reduction, their application in self-supervised learning has witnessed slow progress. The recent MSIMCLR method combines manifold encoding with SimCLR but requires extremely low target encoding dimensions to outperform SimCLR, limiting its applicability. This paper introduces a novel learning paradigm using an unbalanced atlas (UA), capable of surpassing state-of-the-art self-supervised learning approaches. We meticulously investigated and engineered the DeepInfomax with an unbalanced atlas (DIM-UA) method by systematically adapting the Spatiotemporal DeepInfomax (ST-DIM) framework to align with our proposed UA paradigm, employing rigorous scientific methodologies throughout the process. The efficacy of DIM-UA is demonstrated through training and evaluation on the Atari Annotated RAM Interface (AtariARI) benchmark, a modified version of the Atari 2600 framework that produces annotated image samples for representation learning. The UA paradigm improves the existing algorithm significantly when the number of target encoding dimensions grows. For instance, the mean F1 score averaged over categories of DIM-UA is ~75% compared to ~70% of ST-DIM when using 16384 hidden units.
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning
Authors: Gen Li, Wenhao Zhan, Jason D. Lee, Yuejie Chi, Yuxin Chen
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Abstract
This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset and enable effective policy fine-tuning. Leveraging recent advances in reward-agnostic exploration and model-based offline RL, we design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL -- in terms of sample complexities. The proposed algorithm does not require any reward information during data collection. Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data.
Explain Any Concept: Segment Anything Meets Concept-Based Explanation
Authors: Ao Sun, Pingchuan Ma, Yuanyuan Yuan, Shuai Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
EXplainable AI (XAI) is an essential topic to improve human understanding of deep neural networks (DNNs) given their black-box internals. For computer vision tasks, mainstream pixel-based XAI methods explain DNN decisions by identifying important pixels, and emerging concept-based XAI explore forming explanations with concepts (e.g., a head in an image). However, pixels are generally hard to interpret and sensitive to the imprecision of XAI methods, whereas "concepts" in prior works require human annotation or are limited to pre-defined concept sets. On the other hand, driven by large-scale pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promotable framework for performing precise and comprehensive instance segmentation, enabling automatic preparation of concept sets from a given image. This paper for the first time explores using SAM to augment concept-based XAI. We offer an effective and flexible concept-based explanation method, namely Explain Any Concept (EAC), which explains DNN decisions with any concept. While SAM is highly effective and offers an "out-of-the-box" instance segmentation, it is costly when being integrated into defacto XAI pipelines. We thus propose a lightweight per-input equivalent (PIE) scheme, enabling efficient explanation with a surrogate model. Our evaluation over two popular datasets (ImageNet and COCO) illustrate the highly encouraging performance of EAC over commonly-used XAI methods.
Abstract
Existing deep learning models for hyperspectral image (HSI) reconstruction achieve good performance but require powerful hardwares with enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited mobile devices. In this paper, we propose a novel method, Binarized Spectral-Redistribution Network (BiSRNet), for efficient and practical HSI restoration from compressed measurement in snapshot compressive imaging (SCI) systems. Firstly, we redesign a compact and easy-to-deploy base model to be binarized. Then we present the basic unit, Binarized Spectral-Redistribution Convolution (BiSR-Conv). BiSR-Conv can adaptively redistribute the HSI representations before binarizing activation and uses a scalable hyperbolic tangent function to closer approximate the Sign function in backpropagation. Based on our BiSR-Conv, we customize four binarized convolutional modules to address the dimension mismatch and propagate full-precision information throughout the whole network. Finally, our BiSRNet is derived by using the proposed techniques to binarize the base model. Comprehensive quantitative and qualitative experiments manifest that our proposed BiSRNet outperforms state-of-the-art binarization methods and achieves comparable performance with full-precision algorithms. Code and models will be released at https://github.com/caiyuanhao1998/BiSCI and https://github.com/caiyuanhao1998/MST
FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross-Entropy
Abstract
Measuring the distance between machine-produced and human language is acritical open problem. Inspired by empirical findings from psycholinguistics on theperiodicity of entropy in language, we propose FACE, a set of metrics based onFourier Analysis of the estimated Cross-Entropy of language, for measuring thesimilarity between model-generated and human-written languages. Based on anopen-ended generation task and the experimental data from previous studies, weind that FACE can effectively identify the human-model gap, scales with modelsize, reflects the outcomes of different sampling methods for decoding, correlateswell with other evaluation metrics and with human judgment scores. FACE iscomputationally efficient and provides intuitive interpretations.
CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo
Abstract
The core of Multi-view Stereo(MVS) is the matching process among reference and source pixels. Cost aggregation plays a significant role in this process, while previous methods focus on handling it via CNNs. This may inherit the natural limitation of CNNs that fail to discriminate repetitive or incorrect matches due to limited local receptive fields. To handle the issue, we aim to involve Transformer into cost aggregation. However, another problem may occur due to the quadratically growing computational complexity caused by Transformer, resulting in memory overflow and inference latency. In this paper, we overcome these limits with an efficient Transformer-based cost aggregation network, namely CostFormer. The Residual Depth-Aware Cost Transformer(RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth and spatial dimensions. Furthermore, Residual Regression Transformer(RRT) is proposed to enhance spatial attention. The proposed method is a universal plug-in to improve learning-based MVS methods.
Joint Denoising and Few-angle Reconstruction for Low-dose Cardiac SPECT Using a Dual-domain Iterative Network with Adaptive Data Consistency
Authors: Xiongchao Chen, Bo Zhou, Huidong Xie, Xueqi Guo, Qiong Liu, Albert J. Sinusas, Chi Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Myocardial perfusion imaging (MPI) by single-photon emission computed tomography (SPECT) is widely applied for the diagnosis of cardiovascular diseases. Reducing the dose of the injected tracer is essential for lowering the patient's radiation exposure, but it will lead to increased image noise. Additionally, the latest dedicated cardiac SPECT scanners typically acquire projections in fewer angles using fewer detectors to reduce hardware expenses, potentially resulting in lower reconstruction accuracy. To overcome these challenges, we propose a dual-domain iterative network for end-to-end joint denoising and reconstruction from low-dose and few-angle projections of cardiac SPECT. The image-domain network provides a prior estimate for the projection-domain networks. The projection-domain primary and auxiliary modules are interconnected for progressive denoising and few-angle reconstruction. Adaptive Data Consistency (ADC) modules improve prediction accuracy by efficiently fusing the outputs of the primary and auxiliary modules. Experiments using clinical MPI data show that our proposed method outperforms existing image-, projection-, and dual-domain techniques, producing more accurate projections and reconstructions. Ablation studies confirm the significance of the image-domain prior estimate and ADC modules in enhancing network performance.
G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks
Abstract
It has become a popular paradigm to transfer the knowledge of large-scale pre-trained models to various downstream tasks via fine-tuning the entire model parameters. However, with the growth of model scale and the rising number of downstream tasks, this paradigm inevitably meets the challenges in terms of computation consumption and memory footprint issues. Recently, Parameter-Efficient Fine-Tuning (PEFT) (e.g., Adapter, LoRA, BitFit) shows a promising paradigm to alleviate these concerns by updating only a portion of parameters. Despite these PEFTs having demonstrated satisfactory performance in natural language processing, it remains under-explored for the question of whether these techniques could be transferred to graph-based tasks with Graph Transformer Networks (GTNs). Therefore, in this paper, we fill this gap by providing extensive benchmarks with traditional PEFTs on a range of graph-based downstream tasks. Our empirical study shows that it is sub-optimal to directly transfer existing PEFTs to graph-based tasks due to the issue of feature distribution shift. To address this issue, we propose a novel structure-aware PEFT approach, named G-Adapter, which leverages graph convolution operation to introduce graph structure (e.g., graph adjacent matrix) as an inductive bias to guide the updating process. Besides, we propose Bregman proximal point optimization to further alleviate feature distribution shift by preventing the model from aggressive update. Extensive experiments demonstrate that G-Adapter obtains the state-of-the-art performance compared to the counterparts on nine graph benchmark datasets based on two pre-trained GTNs, and delivers tremendous memory footprint efficiency compared to the conventional paradigm.
Inertial-based Navigation by Polynomial Optimization: Inertial-Magnetic Attitude Estimation
Abstract
Inertial-based navigation refers to the navigation methods or systems that have inertial information or sensors as the core part and integrate a spectrum of other kinds of sensors for enhanced performance. Through a series of papers, the authors attempt to explore information blending of inertial-based navigation by a polynomial optimization method. The basic idea is to model rigid motions as finite-order polynomials and then attacks the involved navigation problems by optimally solving their coefficients, taking into considerations the constraints posed by inertial sensors and others. In the current paper, a continuous-time attitude estimation approach is proposed, which transforms the attitude estimation into a constant parameter determination problem by the polynomial optimization. Specifically, the continuous attitude is first approximated by a Chebyshev polynomial, of which the unknown Chebyshev coefficients are determined by minimizing the weighted residuals of initial conditions, dynamics and measurements. We apply the derived estimator to the attitude estimation with the magnetic and inertial sensors. Simulation and field tests show that the estimator has much better stability and faster convergence than the traditional extended Kalman filter does, especially in the challenging large initial state error scenarios.
A 334$μ$W 0.158mm$^2$ ASIC for Post-Quantum Key-Encapsulation Mechanism Saber with Low-latency Striding Toom-Cook Multiplication Authors Version
Authors: Archisman Ghosh, Jose Maria Bermudo Mera, Angshuman Karmakar, Debayan Das, Santosh Ghosh, Ingrid Verbauwhede, Shreyas Sen
Abstract
The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, National Institute of Standard & Technology (NIST) follows this approach. The multiplication of polynomials is the central bottleneck in the computation of lattice based cryptography. Because public key cryptography is mostly used to establish common secret keys, focus is on compact area, power and energy budget and to a lesser extent on throughput or latency. While most other work focuses on optimizing number theoretic transform (NTT) based multiplications, in this paper we highly optimize a Toom-Cook based multiplier. We demonstrate that a memory-efficient striding Toom-Cook with lazy interpolation, results in a highly compact, low power implementation, which on top enables a very regular memory access scheme. To demonstrate the efficiency, we integrate this multiplier into a Saber post-quantum accelerator, one of the four NIST finalists. Algorithmic innovation to reduce active memory, timely clock gating and shift-add multiplier has helped to achieve 38% less power than state-of-the art PQC core, 4x less memory, 36.8% reduction in multiplier energy and 118x reduction in active power with respect to state-of-the-art Saber accelerator (not silicon verified). This accelerator consumes 0.158mm2 active area which is lowest reported till date despite process disadvantages of the state-of-the-art designs.
Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties
Authors: Yassir Fathullah, Guoxuan Xia, Mark Gales
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
Efficiently and reliably estimating uncertainty is an important objective in deep learning. It is especially pertinent to autoregressive sequence tasks, where training and inference costs are typically very high. However, existing research has predominantly focused on tasks with static data such as image classification. In this work, we investigate Ensemble Distribution Distillation (EDD) applied to large-scale natural language sequence-to-sequence data. EDD aims to compress the superior uncertainty performance of an expensive (teacher) ensemble into a cheaper (student) single model. Importantly, the ability to separate knowledge (epistemic) and data (aleatoric) uncertainty is retained. Existing probability-space approaches to EDD, however, are difficult to scale to large vocabularies. We show, for modern transformer architectures on large-scale translation tasks, that modelling the ensemble logits, instead of softmax probabilities, leads to significantly better students. Moreover, the students surprisingly even outperform Deep Ensembles by up to ~10% AUROC on out-of-distribution detection, whilst matching them at in-distribution translation.
Motion Planning (In)feasibility Detection using a Prior Roadmap via Path and Cut Search
Abstract
Motion planning seeks a collision-free path in a configuration space (C-space), representing all possible robot configurations in the environment. As it is challenging to construct a C-space explicitly for a high-dimensional robot, we generally build a graph structure called a roadmap, a discrete approximation of a complex continuous C-space, to reason about connectivity. Checking collision-free connectivity in the roadmap requires expensive edge-evaluation computations, and thus, reducing the number of evaluations has become a significant research objective. However, in practice, we often face infeasible problems: those in which there is no collision-free path in the roadmap between the start and the goal locations. Existing studies often overlook the possibility of infeasibility, becoming highly inefficient by performing many edge evaluations. In this work, we address this oversight in scenarios where a prior roadmap is available; that is, the edges of the roadmap contain the probability of being a collision-free edge learned from past experience. To this end, we propose an algorithm called iterative path and cut finding (IPC) that iteratively searches for a path and a cut in a prior roadmap to detect infeasibility while reducing expensive edge evaluations as much as possible. We further improve the efficiency of IPC by introducing a second algorithm, iterative decomposition and path and cut finding (IDPC), that leverages the fact that cut-finding algorithms partition the roadmap into smaller subgraphs. We analyze the theoretical properties of IPC and IDPC, such as completeness and computational complexity, and evaluate their performance in terms of completion time and the number of edge evaluations in large-scale simulations.
PaLM 2 Technical Report
Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, et al. (76 additional authors not shown)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.
BAD: BiAs Detection for Large Language Models in the context of candidate screening
Abstract
Application Tracking Systems (ATS) have allowed talent managers, recruiters, and college admissions committees to process large volumes of potential candidate applications efficiently. Traditionally, this screening process was conducted manually, creating major bottlenecks due to the quantity of applications and introducing many instances of human bias. The advent of large language models (LLMs) such as ChatGPT and the potential of adopting methods to current automated application screening raises additional bias and fairness issues that must be addressed. In this project, we wish to identify and quantify the instances of social bias in ChatGPT and other OpenAI LLMs in the context of candidate screening in order to demonstrate how the use of these models could perpetuate existing biases and inequalities in the hiring process.
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Abstract
In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information. Firstly, we reframe the problem of MedVQA as a generation task that naturally follows the human-machine interaction, we propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. Secondly, we establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases. Thirdly, we pre-train our proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD and SLAKE, outperforming existing work by a large margin. Additionally, we propose a test set that has undergone manual verification, which is significantly more challenging, even the best models struggle to solve.
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Authors: Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J. Liu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Learning from human feedback has been shown to be effective at aligning language models with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the language model using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC), can also be used to effectively learn from human preferences (SLiC-HF). Furthermore, we demonstrate this can be done with human feedback data collected for a different model, similar to off-policy, offline RL data. Automatic and human evaluation experiments on the TL;DR summarization task show that SLiC-HF significantly improves supervised fine-tuning baselines. Furthermore, SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice.
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Authors: Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. In our experiments, we use DoReMi on a 280M-parameter proxy model to find domain weights for training an 8B-parameter model (30x larger) more efficiently. On The Pile, DoReMi improves perplexity across all domains, even when it downweights a domain. DoReMi improves average few-shot downstream accuracy by 6.5% over a baseline model trained using The Pile's default domain weights and reaches the baseline accuracy with 2.6x fewer training steps. On the GLaM dataset, DoReMi, which has no knowledge of downstream tasks, even matches the performance of using domain weights tuned on downstream tasks.
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Authors: Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend features among subjects. We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning. FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling personalized image generation based on subject images and textual instructions with only forward passes. To address the identity blending problem in the multi-subject generation, FastComposer proposes cross-attention localization supervision during training, enforcing the attention of reference subjects localized to the correct regions in the target images. Naively conditioning on subject embeddings results in subject overfitting. FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation. FastComposer generates images of multiple unseen individuals with different styles, actions, and contexts. It achieves 300$\times$-2500$\times$ speedup compared to fine-tuning-based methods and requires zero extra storage for new subjects. FastComposer paves the way for efficient, personalized, and high-quality multi-subject image creation. Code, model, and dataset are available at https://github.com/mit-han-lab/fastcomposer.
Abstract
It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point. We explore the weights of such pre-trained Transformers (particularly for vision) to attempt to find reasons for this discrepancy. Surprisingly, we find that simply initializing the weights of self-attention layers so that they "look" more like their pre-trained counterparts allows us to train vanilla Transformers faster and to higher final accuracies, particularly on vision tasks such as CIFAR-10 and ImageNet classification, where we see gains in accuracy of over 5% and 4%, respectively. Our initialization scheme is closed form, learning-free, and very simple: we set the product of the query and key weights to be approximately the identity, and the product of the value and projection weights to approximately the negative identity. As this mimics the patterns we saw in pre-trained Transformers, we call the technique "mimetic initialization".
On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations
Authors: Arthur Castello B. de Oliveira, Milad Siami, Eduardo D. Sontag
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to classical statistical belief. This phenomenon, sometimes known as ``benign overfitting'', raises questions regarding in what other ways might overparameterization affect the properties of a learning problem. In this work, we investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation. This uncertainty arises naturally if the gradient is estimated from noisy data or directly measured. Our object of study is a linear neural network with a single, arbitrarily wide, hidden layer and an arbitrary number of inputs and outputs. In this paper we solve the problem for the case where the input and output of our neural-network are one-dimensional, deriving sufficient conditions for robustness of our system based on necessary and sufficient conditions for convergence in the undisturbed case. We then show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized, and discuss directions of future work that might extend our current results for more general formulations.
SIMGA: A Simple and Effective Heterophilous Graph Neural Network with Efficient Global Aggregation
Authors: Haoyu Liu, Ningyi Liao, Siqiang Luo
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract
Graph neural networks (GNNs) realize great success in graph learning but suffer from performance loss when meeting heterophily, i.e. neighboring nodes are dissimilar, due to their local and uniform aggregation. Existing attempts in incoorporating global aggregation for heterophilous GNNs usually require iteratively maintaining and updating full-graph information, which entails $\mathcal{O}(n^2)$ computation efficiency for a graph with $n$ nodes, leading to weak scalability to large graphs. In this paper, we propose SIMGA, a GNN structure integrating SimRank structural similarity measurement as global aggregation. The design of SIMGA is simple, yet it leads to promising results in both efficiency and effectiveness. The simplicity of SIMGA makes it the first heterophilous GNN model that can achieve a propagation efficiency near-linear to $n$. We theoretically demonstrate its effectiveness by treating SimRank as a new interpretation of GNN and prove that the aggregated node representation matrix has expected grouping effect. The performances of SIMGA are evaluated with 11 baselines on 12 benchmark datasets, usually achieving superior accuracy compared with the state-of-the-art models. Efficiency study reveals that SIMGA is up to 5$\times$ faster than the state-of-the-art method on the largest heterophily dataset pokec with over 30 million edges.
Pyramid Diffusion Models For Low-light Image Enhancement
Authors: Dewei Zhou, Zongxin Yang, Yi Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement. Recent diffusion models show realistic and detailed image generation through a sequence of denoising refinements and motivate us to introduce them to low-light image enhancement for recovering realistic details. However, we found two problems when doing this, i.e., 1) diffusion models keep constant resolution in one reverse process, which limits the speed; 2) diffusion models sometimes result in global degradation (e.g., RGB shift). To address the above problems, this paper proposes a Pyramid Diffusion model (PyDiff) for low-light image enhancement. PyDiff uses a novel pyramid diffusion method to perform sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process). Pyramid diffusion makes PyDiff much faster than vanilla diffusion models and introduces no performance degradation. Furthermore, PyDiff uses a global corrector to alleviate the global degradation that may occur in the reverse process, significantly improving the performance and making the training of diffusion models easier with little additional computational consumption. Extensive experiments on popular benchmarks show that PyDiff achieves superior performance and efficiency. Moreover, PyDiff can generalize well to unseen noise and illumination distributions.
Cross-domain Iterative Network for Simultaneous Denoising, Limited-angle Reconstruction, and Attenuation Correction of Low-dose Cardiac SPECT
Authors: Xiongchao Chen, Bo Zhou, Huidong Xie, Xueqi Guo, Qiong Liu, Albert J. Sinusas, Chi Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Single-Photon Emission Computed Tomography (SPECT) is widely applied for the diagnosis of ischemic heart diseases. Low-dose (LD) SPECT aims to minimize radiation exposure but leads to increased image noise. Limited-angle (LA) SPECT enables faster scanning and reduced hardware costs but results in lower reconstruction accuracy. Additionally, computed tomography (CT)-derived attenuation maps ($\mu$-maps) are commonly used for SPECT attenuation correction (AC), but it will cause extra radiation exposure and SPECT-CT misalignments. In addition, the majority of SPECT scanners in the market are not hybrid SPECT/CT scanners. Although various deep learning methods have been introduced to separately address these limitations, the solution for simultaneously addressing these challenges still remains highly under-explored and challenging. To this end, we propose a Cross-domain Iterative Network (CDI-Net) for simultaneous denoising, LA reconstruction, and CT-free AC in cardiac SPECT. In CDI-Net, paired projection- and image-domain networks are end-to-end connected to fuse the emission and anatomical information across domains and iterations. Adaptive Weight Recalibrators (AWR) adjust the multi-channel input features to enhance prediction accuracy. Our experiments using clinical data showed that CDI-Net produced more accurate $\mu$-maps, projections, and reconstructions compared to existing approaches that addressed each task separately. Ablation studies demonstrated the significance of cross-domain and cross-iteration connections, as well as AWR, in improving the reconstruction performance.
Inertial-based Navigation by Polynomial Optimization: Inertial-Magnetic Attitude Estimation
Abstract
Inertial-based navigation refers to the navigation methods or systems that have inertial information or sensors as the core part and integrate a spectrum of other kinds of sensors for enhanced performance. Through a series of papers, the authors attempt to explore information blending of inertial-based navigation by a polynomial optimization method. The basic idea is to model rigid motions as finite-order polynomials and then attacks the involved navigation problems by optimally solving their coefficients, taking into considerations the constraints posed by inertial sensors and others. In the current paper, a continuous-time attitude estimation approach is proposed, which transforms the attitude estimation into a constant parameter determination problem by the polynomial optimization. Specifically, the continuous attitude is first approximated by a Chebyshev polynomial, of which the unknown Chebyshev coefficients are determined by minimizing the weighted residuals of initial conditions, dynamics and measurements. We apply the derived estimator to the attitude estimation with the magnetic and inertial sensors. Simulation and field tests show that the estimator has much better stability and faster convergence than the traditional extended Kalman filter does, especially in the challenging large initial state error scenarios.
PaLM 2 Technical Report
Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, et al. (76 additional authors not shown)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.
ZeroFlow: Fast Zero Label Scene Flow via Distillation
Authors: Kyle Vedder, Neehar Peri, Nathaniel Chodosh, Ishan Khatri, Eric Eaton, Dinesh Jayaraman, Yang Liu, Deva Ramanan, James Hays
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds for large-scale point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feed forward methods are considerably faster, running on the order of tens to hundreds of milliseconds for large-scale point clouds, but require expensive human supervision. To address both limitations, we propose Scene Flow via Distillation, a simple distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feed forward model. Our instantiation of this framework, ZeroFlow, produces scene flow estimates in real-time on large-scale point clouds at quality competitive with state-of-the-art methods while using zero human labels. Notably, at test-time ZeroFlow is over 1000$\times$ faster than label-free state-of-the-art optimization-based methods on large-scale point clouds and over 1000$\times$ cheaper to train on unlabeled data compared to the cost of human annotation of that data. To facilitate research reuse, we release our code, trained model weights, and high quality pseudo-labels for the Argoverse 2 and Waymo Open datasets.
Keyword: mobile
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Authors: Zhangxuan Gu, Zhuoer Xu, Haoxing Chen, Jun Lan, Changhua Meng, Weiqiang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.
Four Factor Authentication with emerging cybersecurity for Mobile Transactions
Authors: Sanyam Jain, Raju gautam, Shivani Sharma, Ravi Tomar
Abstract
Cybersecurity is very essential for Mobile Transactions to complete seamlessly. Mobile Commerce (Mcom.) is the very basic transaction type, which is very commonly used (2 in 5 people uses mobile as transaction medium), To secure this there are various technologies used by this research. The four factors formally known as Multi-Factor-Authentication are: two of them are Traditional methods (User Login-password and One Time Password (aka OTP)) with addition of Geolocation and Facial Recognition. All the data is converted to a text file, which is hidden in an image (using Babushka algorithm). The end-point then decrypts the image using same algorithm.
Abstract
Existing deep learning models for hyperspectral image (HSI) reconstruction achieve good performance but require powerful hardwares with enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited mobile devices. In this paper, we propose a novel method, Binarized Spectral-Redistribution Network (BiSRNet), for efficient and practical HSI restoration from compressed measurement in snapshot compressive imaging (SCI) systems. Firstly, we redesign a compact and easy-to-deploy base model to be binarized. Then we present the basic unit, Binarized Spectral-Redistribution Convolution (BiSR-Conv). BiSR-Conv can adaptively redistribute the HSI representations before binarizing activation and uses a scalable hyperbolic tangent function to closer approximate the Sign function in backpropagation. Based on our BiSR-Conv, we customize four binarized convolutional modules to address the dimension mismatch and propagate full-precision information throughout the whole network. Finally, our BiSRNet is derived by using the proposed techniques to binarize the base model. Comprehensive quantitative and qualitative experiments manifest that our proposed BiSRNet outperforms state-of-the-art binarization methods and achieves comparable performance with full-precision algorithms. Code and models will be released at https://github.com/caiyuanhao1998/BiSCI and https://github.com/caiyuanhao1998/MST
NUANCE: Near Ultrasound Attack On Networked Communication Environments
Authors: Forrest McKee, David Noever
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
This study investigates a primary inaudible attack vector on Amazon Alexa voice services using near ultrasound trojans and focuses on characterizing the attack surface and examining the practical implications of issuing inaudible voice commands. The research maps each attack vector to a tactic or technique from the MITRE ATT&CK matrix, covering enterprise, mobile, and Industrial Control System (ICS) frameworks. The experiment involved generating and surveying fifty near-ultrasonic audios to assess the attacks' effectiveness, with unprocessed commands having a 100% success rate and processed ones achieving a 58% overall success rate. This systematic approach stimulates previously unaddressed attack surfaces, ensuring comprehensive detection and attack design while pairing each ATT&CK Identifier with a tested defensive method, providing attack and defense tactics for prompt-response options. The main findings reveal that the attack method employs Single Upper Sideband Amplitude Modulation (SUSBAM) to generate near-ultrasonic audio from audible sources, transforming spoken commands into a frequency range beyond human-adult hearing. By eliminating the lower sideband, the design achieves a 6 kHz minimum from 16-22 kHz while remaining inaudible after transformation. The research investigates the one-to-many attack surface where a single device simultaneously triggers multiple actions or devices. Additionally, the study demonstrates the reversibility or demodulation of the inaudible signal, suggesting potential alerting methods and the possibility of embedding secret messages like audio steganography.
Keyword: pruning
There is no result
Keyword: voxel
Explaining black box text modules in natural language with language models
Authors: Chandan Singh, Aliyah R. Hsu, Richard Antonello, Shailee Jain, Alexander G. Huth, Bin Yu, Jianfeng Gao
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Abstract
Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A "text module" is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. "Black box" indicates that we only have access to the module's inputs/outputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module's selectivity along with a score for how reliable the explanation is. We study SASC in 3 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations. Second, we use SASC to explain modules found within a pre-trained BERT model, enabling inspection of the model's internals. Finally, we show that SASC can generate explanations for the response of individual fMRI voxels to language stimuli, with potential applications to fine-grained brain mapping. All code for using SASC and reproducing results is made available on Github.
Keyword: lidar
Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes
Authors: Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, Jingdong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Modern autonomous driving systems are typically divided into three main tasks: perception, prediction, and planning. The planning task involves predicting the trajectory of the ego vehicle based on inputs from both internal intention and the external environment, and manipulating the vehicle accordingly. Most existing works evaluate their performance on the nuScenes dataset using the L2 error and collision rate between the predicted trajectories and the ground truth. In this paper, we reevaluate these existing evaluation metrics and explore whether they accurately measure the superiority of different methods. Specifically, we design an MLP-based method that takes raw sensor data (e.g., past trajectory, velocity, etc.) as input and directly outputs the future trajectory of the ego vehicle, without using any perception or prediction information such as camera images or LiDAR. Surprisingly, such a simple method achieves state-of-the-art end-to-end planning performance on the nuScenes dataset, reducing the average L2 error by about 30%. We further conduct in-depth analysis and provide new insights into the factors that are critical for the success of the planning task on nuScenes dataset. Our observation also indicates that we need to rethink the current open-loop evaluation scheme of end-to-end autonomous driving in nuScenes. Codes are available at https://github.com/E2E-AD/AD-MLP.
Keyword: diffusion
Dynamic Causal Explanation Based Diffusion-Variational Graph Neural Network for Spatio-temporal Forecasting
Authors: Guojun Liang, Prayag Tiwari, Sławomir Nowaczyk, Stefan Byttner, Fernando Alonso-Fernandez
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract
Graph neural networks (GNNs), especially dynamic GNNs, have become a research hotspot in spatio-temporal forecasting problems. While many dynamic graph construction methods have been developed, relatively few of them explore the causal relationship between neighbour nodes. Thus, the resulting models lack strong explainability for the causal relationship between the neighbour nodes of the dynamically generated graphs, which can easily lead to a risk in subsequent decisions. Moreover, few of them consider the uncertainty and noise of dynamic graphs based on the time series datasets, which are ubiquitous in real-world graph structure networks. In this paper, we propose a novel Dynamic Diffusion-Variational Graph Neural Network (DVGNN) for spatio-temporal forecasting. For dynamic graph construction, an unsupervised generative model is devised. Two layers of graph convolutional network (GCN) are applied to calculate the posterior distribution of the latent node embeddings in the encoder stage. Then, a diffusion model is used to infer the dynamic link probability and reconstruct causal graphs in the decoder stage adaptively. The new loss function is derived theoretically, and the reparameterization trick is adopted in estimating the probability distribution of the dynamic graphs by Evidence Lower Bound during the backpropagation period. After obtaining the generated graphs, dynamic GCN and temporal attention are applied to predict future states. Experiments are conducted on four real-world datasets of different graph structures in different domains. The results demonstrate that the proposed DVGNN model outperforms state-of-the-art approaches and achieves outstanding Root Mean Squared Error result while exhibiting higher robustness. Also, by F1-score and probability distribution analysis, we demonstrate that DVGNN better reflects the causal relationship and uncertainty of dynamic graphs.
A Method for Training-free Person Image Picture Generation
Abstract
The current state-of-the-art Diffusion model has demonstrated excellent results in generating images. However, the images are monotonous and are mostly the result of the distribution of images of people in the training set, making it challenging to generate multiple images for a fixed number of individuals. This problem can often only be solved by fine-tuning the training of the model. This means that each individual/animated character image must be trained if it is to be drawn, and the hardware and cost of this training is often beyond the reach of the average user, who accounts for the largest number of people. To solve this problem, the Character Image Feature Encoder model proposed in this paper enables the user to use the process by simply providing a picture of the character to make the image of the character in the generated image match the expectation. In addition, various details can be adjusted during the process using prompts. Unlike traditional Image-to-Image models, the Character Image Feature Encoder extracts only the relevant image features, rather than information about the model's composition or movements. In addition, the Character Image Feature Encoder can be adapted to different models after training. The proposed model can be conveniently incorporated into the Stable Diffusion generation process without modifying the model's ontology or used in combination with Stable Diffusion as a joint model.
Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Authors: Pareesa Ameneh Golnari, Zhewei Yao, Yuxiong He
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
This study examines the impact of optimizing the Stable Diffusion (SD) guided inference pipeline. We propose optimizing certain denoising steps by limiting the noise computation to conditional noise and eliminating unconditional noise computation, thereby reducing the complexity of the target iterations by 50%. Additionally, we demonstrate that later iterations of the SD are less sensitive to optimization, making them ideal candidates for applying the suggested optimization. Our experiments show that optimizing the last 20% of the denoising loop iterations results in an 8.2% reduction in inference time with almost no perceivable changes to the human eye. Furthermore, we found that by extending the optimization to 50% of the last iterations, we can reduce inference time by approximately 20.3%, while still generating visually pleasing images.
Link prediction for ex ante influence maximization on temporal networks
Authors: Eric Yanchenko, Tsuyoshi Murata, Petter Holme
Abstract
Influence maximization (IM) is the task of finding the most important nodes in order to maximize the spread of influence or information on a network. This task is typically studied on static or temporal networks where the complete topology of the graph is known. In practice, however, the seed nodes must be selected before observing the future evolution of the network. In this work, we consider this realistic ex ante setting where $p$ time steps of the network have been observed before selecting the seed nodes. Then the influence is calculated after the network continues to evolve for a total of $T>p$ time steps. We address this problem by using statistical, non-negative matrix factorization and graph neural networks link prediction algorithms to predict the future evolution of the network and then apply existing influence maximization algorithms on the predicted networks. Additionally, the output of the link prediction methods can be used to construct novel IM algorithms. We apply the proposed methods to many real-world networks to compare their performance. In many settings, choosing seed nodes based only historical edges provides results comparable to the results treating the future graph snapshots as known. The proposed heuristics based on the link prediction model are also some of the best-performing methods. These findings indicate that, for these data sets and this diffusion mechanism, the latent process which determines the most influential nodes may not have large temporal variation. Thus, knowing the future status of the network is not necessary to obtain good results for ex ante IM.
Pyramid Diffusion Models For Low-light Image Enhancement
Authors: Dewei Zhou, Zongxin Yang, Yi Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement. Recent diffusion models show realistic and detailed image generation through a sequence of denoising refinements and motivate us to introduce them to low-light image enhancement for recovering realistic details. However, we found two problems when doing this, i.e., 1) diffusion models keep constant resolution in one reverse process, which limits the speed; 2) diffusion models sometimes result in global degradation (e.g., RGB shift). To address the above problems, this paper proposes a Pyramid Diffusion model (PyDiff) for low-light image enhancement. PyDiff uses a novel pyramid diffusion method to perform sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process). Pyramid diffusion makes PyDiff much faster than vanilla diffusion models and introduces no performance degradation. Furthermore, PyDiff uses a global corrector to alleviate the global degradation that may occur in the reverse process, significantly improving the performance and making the training of diffusion models easier with little additional computational consumption. Extensive experiments on popular benchmarks show that PyDiff achieves superior performance and efficiency. Moreover, PyDiff can generalize well to unseen noise and illumination distributions.
Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
Abstract
The recent proliferation of large-scale text-to-image models has led to growing concerns that such models may be misused to generate harmful, misleading, and inappropriate content. Motivated by this issue, we derive a technique inspired by continual learning to selectively forget concepts in pretrained deep generative models. Our method, dubbed Selective Amnesia, enables controllable forgetting where a user can specify how a concept should be forgotten. Selective Amnesia can be applied to conditional variational likelihood models, which encompass a variety of popular deep generative frameworks, including variational autoencoders and large-scale text-to-image diffusion models. Experiments across different models demonstrate that our approach induces forgetting on a variety of concepts, from entire classes in standard datasets to celebrity and nudity prompts in text-to-image models. Our code is publicly available at https://github.com/clear-nus/selective-amnesia.
Abstract
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models. Although diffusion models have shown promise in analyzing functional magnetic resonance imaging (fMRI) data, including reconstructing high-quality images consistent with original visual stimuli, their accuracy in extracting semantic and silhouette information from brain signals remains limited. In this regard, we propose a novel approach, referred to as Controllable Mind Visual Diffusion Model (CMVDM). CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks. Additionally, a residual block is incorporated to capture information beyond semantic and silhouette features. We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette. Through extensive experimentation, we demonstrate that CMVDM outperforms existing state-of-the-art methods both qualitatively and quantitatively.
Provably Correct Physics-Informed Neural Networks
Authors: Francisco Eiras, Adel Bibi, Rudy Bunel, Krishnamurthy Dj Dvijotham, Philip Torr, M. Pawan Kumar
Abstract
Recent work provides promising evidence that Physics-informed neural networks (PINN) can efficiently solve partial differential equations (PDE). However, previous works have failed to provide guarantees on the worst-case residual error of a PINN across the spatio-temporal domain - a measure akin to the tolerance of numerical solvers - focusing instead on point-wise comparisons between their solution and the ones obtained by a solver on a set of inputs. In real-world applications, one cannot consider tests on a finite set of points to be sufficient grounds for deployment, as the performance could be substantially worse on a different set. To alleviate this issue, we establish tolerance-based correctness conditions for PINNs over the entire input domain. To verify the extent to which they hold, we introduce $\partial$-CROWN: a general, efficient and scalable post-training framework to bound PINN residual errors. We demonstrate its effectiveness in obtaining tight certificates by applying it to two classically studied PDEs - Burgers' and Schr\"odinger's equations -, and two more challenging ones with real-world applications - the Allan-Cahn and Diffusion-Sorption equations.
Raising the Bar for Certified Adversarial Robustness with Diffusion Models
Authors: Thomas Altstidl, David Dobre, Björn Eskofier, Gauthier Gidel, Leo Schwinn
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirical methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses. In addition, we provide a list of recommendations to scale the robustness of certified training approaches. One of our main insights is that the generalization gap, i.e., the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the robustness improvement when using additional generated data. Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the $\ell2$ ($\epsilon = 36/255$) and $\ell\infty$ ($\epsilon = 8/255$) threat models, outperforming the previous best results by $+3.95\%$ and $+1.39\%$, respectively. Furthermore, we report similar improvements for CIFAR-100.
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Authors: Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend features among subjects. We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning. FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling personalized image generation based on subject images and textual instructions with only forward passes. To address the identity blending problem in the multi-subject generation, FastComposer proposes cross-attention localization supervision during training, enforcing the attention of reference subjects localized to the correct regions in the target images. Naively conditioning on subject embeddings results in subject overfitting. FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation. FastComposer generates images of multiple unseen individuals with different styles, actions, and contexts. It achieves 300$\times$-2500$\times$ speedup compared to fine-tuning-based methods and requires zero extra storage for new subjects. FastComposer paves the way for efficient, personalized, and high-quality multi-subject image creation. Code, model, and dataset are available at https://github.com/mit-han-lab/fastcomposer.
Keyword: dynamic
Vulnerability Detection Using Two-Stage Deep Learning Models
Authors: Mohamed Mjd Alhafi, Mohammad Hammade, Khloud Al Jallad
Abstract
Application security is an essential part of developing modern software, as lots of attacks depend on vulnerabilities in software. The number of attacks is increasing globally due to technological advancements. Companies must include security in every stage of developing, testing, and deploying their software in order to prevent data breaches. There are several methods to detect software vulnerability Non-AI-based such as Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST). However, these approaches have substantial false-positive and false-negative rates. On the other side, researchers have been interested in developing an AI-based vulnerability detection system employing deep learning models like BERT, BLSTM, etc. In this paper, we proposed a two-stage solution, two deep learning models were proposed for vulnerability detection in C/C++ source codes, the first stage is CNN which detects if the source code contains any vulnerability (binary classification model) and the second stage is CNN-LTSM that classifies this vulnerability into a class of 50 different types of vulnerabilities (multiclass classification model). Experiments were done on SySeVR dataset. Results show an accuracy of 99% for the first and 98% for the second stage.
Learning Switching Port-Hamiltonian Systems with Uncertainty Quantification
Authors: Thomas Beckers, Tom Z. Jiahao, George J. Pappas
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Abstract
Switching physical systems are ubiquitous in modern control applications, for instance, locomotion behavior of robots and animals, power converters with switches and diodes. The dynamics and switching conditions are often hard to obtain or even inaccessible in case of a-priori unknown environments and nonlinear components. Black-box neural networks can learn to approximately represent switching dynamics, but typically require a large amount of data, neglect the underlying axioms of physics, and lack of uncertainty quantification. We propose a Gaussian process based learning approach enhanced by switching Port-Hamiltonian systems (GP-SPHS) to learn physical plausible system dynamics and identify the switching condition. The Bayesian nature of Gaussian processes uses collected data to form a distribution over all possible switching policies and dynamics that allows for uncertainty quantification. Furthermore, the proposed approach preserves the compositional nature of Port-Hamiltonian systems. A simulation with a hopping robot validates the effectiveness of the proposed approach.
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Authors: Zhangxuan Gu, Zhuoer Xu, Haoxing Chen, Jun Lan, Changhua Meng, Weiqiang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.
Dynamic Causal Explanation Based Diffusion-Variational Graph Neural Network for Spatio-temporal Forecasting
Authors: Guojun Liang, Prayag Tiwari, Sławomir Nowaczyk, Stefan Byttner, Fernando Alonso-Fernandez
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract
Graph neural networks (GNNs), especially dynamic GNNs, have become a research hotspot in spatio-temporal forecasting problems. While many dynamic graph construction methods have been developed, relatively few of them explore the causal relationship between neighbour nodes. Thus, the resulting models lack strong explainability for the causal relationship between the neighbour nodes of the dynamically generated graphs, which can easily lead to a risk in subsequent decisions. Moreover, few of them consider the uncertainty and noise of dynamic graphs based on the time series datasets, which are ubiquitous in real-world graph structure networks. In this paper, we propose a novel Dynamic Diffusion-Variational Graph Neural Network (DVGNN) for spatio-temporal forecasting. For dynamic graph construction, an unsupervised generative model is devised. Two layers of graph convolutional network (GCN) are applied to calculate the posterior distribution of the latent node embeddings in the encoder stage. Then, a diffusion model is used to infer the dynamic link probability and reconstruct causal graphs in the decoder stage adaptively. The new loss function is derived theoretically, and the reparameterization trick is adopted in estimating the probability distribution of the dynamic graphs by Evidence Lower Bound during the backpropagation period. After obtaining the generated graphs, dynamic GCN and temporal attention are applied to predict future states. Experiments are conducted on four real-world datasets of different graph structures in different domains. The results demonstrate that the proposed DVGNN model outperforms state-of-the-art approaches and achieves outstanding Root Mean Squared Error result while exhibiting higher robustness. Also, by F1-score and probability distribution analysis, we demonstrate that DVGNN better reflects the causal relationship and uncertainty of dynamic graphs.
Distributionally Robust Differential Dynamic Programming with Wasserstein Distance
Authors: Astghik Hakobyan, Insoon Yang
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Differential dynamic programming (DDP) is a popular technique for solving nonlinear optimal control problems with locally quadratic approximations. However, existing DDP methods are not designed for stochastic systems with unknown disturbance distributions. To address this limitation, we propose a novel DDP method that approximately solves the Wasserstein distributionally robust control (WDRC) problem, where the true disturbance distribution is unknown but a disturbance sample dataset is given. Our approach aims to develop a practical and computationally efficient DDP solution. To achieve this, we use the Kantrovich duality principle to decompose the value function in a novel way and derive closed-form expressions of the distributionally robust control and worst-case distribution policies to be used in each iteration of our DDP algorithm. This characterization makes our method tractable and scalable without the need for numerically solving any minimax optimization problems. The superior out-of-sample performance and scalability of our algorithm are demonstrated through kinematic car navigation and coupled oscillator problems.
ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing
Abstract
While various AI explanation (XAI) methods have been proposed to interpret AI systems, whether the state-of-the-art XAI methods are practically useful for humans remains inconsistent findings. To improve the usefulness of XAI methods, a line of studies identifies the gaps between the diverse and dynamic real-world user needs with the status quo of XAI methods. Although prior studies envision mitigating these gaps by integrating multiple XAI methods into the universal XAI interfaces (e.g., conversational or GUI-based XAI systems), there is a lack of work investigating how these systems should be designed to meet practical user needs. In this study, we present ConvXAI, a conversational XAI system that incorporates multiple XAI types, and empowers users to request a variety of XAI questions via a universal XAI dialogue interface. Particularly, we innovatively embed practical user needs (i.e., four principles grounding on the formative study) into ConvXAI design to improve practical usefulness. Further, we design the domain-specific language (DSL) to implement the essential conversational XAI modules and release the core conversational universal XAI API for generalization. The findings from two within-subjects studies with 21 users show that ConvXAI is more useful for humans in perceiving the understanding and writing improvement, and improving the writing process in terms of productivity and sentence quality. Finally, this work contributes insight into the design space of useful XAI, reveals humans' XAI usage patterns with empirical evidence in practice, and identifies opportunities for future useful XAI work.
Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions
Authors: Desong Du, Shaohang Han, Naiming Qi, Haitham Bou Ammar, Jun Wang, Wei Pan
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Reinforcement learning (RL) exhibits impressive performance when managing complicated control tasks for robots. However, its wide application to physical robots is limited by the absence of strong safety guarantees. To overcome this challenge, this paper explores the control Lyapunov barrier function (CLBF) to analyze the safety and reachability solely based on data without explicitly employing a dynamic model. We also proposed the Lyapunov barrier actor-critic (LBAC), a model-free RL algorithm, to search for a controller that satisfies the data-based approximation of the safety and reachability conditions. The proposed approach is demonstrated through simulation and real-world robot control experiments, i.e., a 2D quadrotor navigation task. The experimental findings reveal this approach's effectiveness in reachability and safety, surpassing other model-free RL methods.
MINT: Multiplier-less Integer Quantization for Spiking Neural Networks
Authors: Ruokai Yin, Yuhang Li, Abhishek Moitra, Priyadarshini Panda
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
We propose Multiplier-less INTeger (MINT) quantization, an efficient uniform quantization scheme for the weights and membrane potentials in spiking neural networks (SNNs). Unlike prior SNN quantization works, MINT quantizes the memory-hungry membrane potentials to extremely low bit-width (2-bit) to significantly reduce the total memory footprint. Additionally, MINT quantization shares the quantization scale between the weights and membrane potentials, eliminating the need for multipliers and floating arithmetic units, which are required by the standard uniform quantization. Experimental results demonstrate that our proposed method achieves accuracy that matches other state-of-the-art SNN quantization works while outperforming them on total memory footprint and hardware cost at deployment time. For instance, 2-bit MINT VGG-16 achieves 48.6% accuracy on TinyImageNet (0.28% better than the full-precision baseline) with approximately 93.8% reduction in total memory footprint from the full-precision model; meanwhile, our model reduces area by 93% and dynamic power by 98% compared to other SNN quantization counterparts.
Knowledge Graph Completion Models are Few-shot Learners: An Empirical Study of Relation Labeling in E-commerce with LLMs
Abstract
Knowledge Graphs (KGs) play a crucial role in enhancing e-commerce system performance by providing structured information about entities and their relationships, such as complementary or substitutable relations between products or product types, which can be utilized in recommender systems. However, relation labeling in KGs remains a challenging task due to the dynamic nature of e-commerce domains and the associated cost of human labor. Recently, breakthroughs in Large Language Models (LLMs) have shown surprising results in numerous natural language processing tasks. In this paper, we conduct an empirical study of LLMs for relation labeling in e-commerce KGs, investigating their powerful learning capabilities in natural language and effectiveness in predicting relations between product types with limited labeled data. We evaluate various LLMs, including PaLM and GPT-3.5, on benchmark datasets, demonstrating their ability to achieve competitive performance compared to humans on relation labeling tasks using just 1 to 5 labeled examples per relation. Additionally, we experiment with different prompt engineering techniques to examine their impact on model performance. Our results show that LLMs significantly outperform existing KG completion models in relation labeling for e-commerce KGs and exhibit performance strong enough to replace human labeling.
Abstract
In partial label learning (PLL), each training sample is associated with a set of candidate labels, among which only one is valid. The core of PLL is to disambiguate the candidate labels to get the ground-truth one. In disambiguation, the existing works usually do not fully investigate the effectiveness of the non-candidate label set (a.k.a. complementary labels), which accurately indicates a set of labels that do not belong to a sample. In this paper, we use the non-candidate labels to induce a complementary classifier, which naturally forms an adversarial relationship against the traditional PLL classifier, to eliminate the false-positive labels in the candidate label set. Besides, we assume the feature space and the label space share the same local topological structure captured by a dynamic graph, and use it to assist disambiguation. Extensive experimental results validate the superiority of the proposed approach against state-of-the-art PLL methods on 4 controlled UCI data sets and 6 real-world data sets, and reveal the usefulness of complementary learning in PLL. The code has been released in the link https://github.com/Chongjie-Si/PL-CL.
Model-based Validation as Probabilistic Inference
Authors: Harrison Delecki, Anthony Corso, Mykel J. Kochenderfer
Abstract
Estimating the distribution over failures is a key step in validating autonomous systems. Existing approaches focus on finding failures for a small range of initial conditions or make restrictive assumptions about the properties of the system under test. We frame estimating the distribution over failure trajectories for sequential systems as Bayesian inference. Our model-based approach represents the distribution over failure trajectories using rollouts of system dynamics and computes trajectory gradients using automatic differentiation. Our approach is demonstrated in an inverted pendulum control system, an autonomous vehicle driving scenario, and a partially observable lunar lander. Sampling is performed using an off-the-shelf implementation of Hamiltonian Monte Carlo with multiple chains to capture multimodality and gradient smoothing for safe trajectories. In all experiments, we observed improvements in sample efficiency and parameter space coverage compared to black-box baseline approaches. This work is open sourced.
Impact of ROS 2 Node Composition in Robotic Systems
Authors: Steve Macenski, Alberto Soragna, Michael Carroll, Zhenpeng Ge
Abstract
The Robot Operating System 2 (ROS 2) is the second generation of ROS representing a step forward in the robotic framework. Several new types of nodes and executor models are integral to control where, how, and when information is processed in the computational graph. This paper explores and benchmarks one of these new node types -- the Component node -- which allows nodes to be composed manually or dynamically into processes while retaining separation of concerns in a codebase for distributed development. Composition is shown to achieve a high degree of performance optimization, particularly valuable for resource-constrained systems and sensor processing pipelines, enabling distributed tasks that would not be otherwise possible in ROS 2. In this work, we briefly introduce the significance and design of node composition, then our contribution of benchmarking is provided to analyze its impact on robotic systems. Its compelling influence on performance is shown through several experiments on the latest Long Term Support (LTS) ROS 2 distribution, Humble Hawksbill.
CooK: Empowering General-Purpose Language Models with Modular and Collaborative Knowledge
Abstract
Large language models (LLMs) are increasingly adopted for knowledge-intensive tasks and contexts. Existing approaches improve the knowledge capabilities of general-purpose LLMs through retrieval or generated knowledge prompting, but they fall short of reflecting two key properties of knowledge-rich models: knowledge should be modular, ever-growing, sourced from diverse domains; knowledge acquisition and production should be a collaborative process, where diverse stakeholders contribute new information. To this end, we propose CooK, a novel framework to empower general-purpose large language models with modular and collaboratively sourced knowledge. We first introduce specialized language models, autoregressive models trained on corpora from a wide range of domains and sources. These specialized LMs serve as parametric knowledge repositories that are later prompted to generate background knowledge for general-purpose LLMs. We then propose three knowledge filters to dynamically select and retain information in generated documents by controlling for relevance, brevity, and factuality. Finally, we propose bottom-up and top-down knowledge integration approaches to augment general-purpose LLMs with the curated (relevant, factual) knowledge from community-driven specialized LMs that enable multi-domain knowledge synthesis and on-demand knowledge requests. Through extensive experiments, we demonstrate that CooK achieves state-of-the-art performance on six benchmark datasets. Our results highlight the potential of enriching general-purpose LLMs with evolving and modular knowledge -- relevant knowledge that can be continuously updated through the collective efforts of the research community.
Singly Exponential Translation of Alternating Weak Büchi Automata to Unambiguous Büchi Automata
Authors: Yong Li, Sven Schewe, Moshe Y. Vardi
Subjects: Formal Languages and Automata Theory (cs.FL)
Abstract
We introduce a method for translating an alternating weak B\"uchi automaton (AWA), which corresponds to a Linear Dynamic Logic (LDL) formula, to an unambiguous B\"uchi automaton (UBA). Our translations generalise constructions for Linear Temporal Logic (LTL), a less expressive specification language than LDL. In classical constructions, LTL formulas are first translated to alternating \emph{very weak} automata (AVAs) -- automata that have only singleton strongly connected components (SCCs); the AVAs are then handled by efficient disambiguation procedures. However, general AWAs can have larger SCCs, which complicates disambiguation. Currently, the only available disambiguation procedure has to go through an intermediate construction of nondeterministic B\"uchi automata (NBAs), which would incur an exponential blow-up of its own. We introduce a translation from \emph{general} AWAs to UBAs with a \emph{singly} exponential blow-up, which also immediately provides a singly exponential translation from LDL to UBAs. Interestingly, the complexity of our translation is \emph{smaller} than the best known disambiguation algorithm for NBAs (broadly $(0.53n)^n$ vs. $(0.76n)^n$), while the input of our construction can be exponentially more succinct.
Angle-based formation stabilization and maneuvers in port-Hamiltonian form with bearing and velocity measurements
Authors: Ningbo Li, Pablo Borja, Arjan van der Schaft, Jacquelien M. A. Scherpen
Abstract
This paper proposes a port-Hamiltonian framework for angle-based formation stabilization and maneuvers using bearing and velocity measurements with an underlying triangulated Laman graph. The corresponding port-Hamiltonian controller is designed using virtual couplings on the errors of angle constraints in angle space and then the angle constraints and agent actuators are mapped by the constraint Jacobian, which can be applied to other formation constraints. In addition, due to the fact that the port-Hamiltonian model allows for complex and heterogeneous agent dynamics, our framework can be extended to networks with different agent dynamics and formation constraints. To avoid unavailable distance terms in the control law, an estimator is designed based on port-Hamiltonian theory and the property that energy is coordinate-free for different sensor modalities using bearing and velocity measurements, which permits our framework to inject damping for the formation maneuvers. Furthermore, several maneuvers are analyzed under both considerations of stabilization and transient performance. Simulations are performed to illustrate the effectiveness of the approach.
Risk Assessment of Lymph Node Metastases in Endometrial Cancer Patients: A Causal Approach
Authors: Alessio Zanga, Alice Bernasconi, Peter J.F. Lucas, Hanny Pijnenborg, Casper Reijnen, Marco Scutari, Fabio Stella
Abstract
Assessing the pre-operative risk of lymph node metastases in endometrial cancer patients is a complex and challenging task. In principle, machine learning and deep learning models are flexible and expressive enough to capture the dynamics of clinical risk assessment. However, in this setting we are limited to observational data with quality issues, missing values, small sample size and high dimensionality: we cannot reliably learn such models from limited observational data with these sources of bias. Instead, we choose to learn a causal Bayesian network to mitigate the issues above and to leverage the prior knowledge on endometrial cancer available from clinicians and physicians. We introduce a causal discovery algorithm for causal Bayesian networks based on bootstrap resampling, as opposed to the single imputation used in related works. Moreover, we include a context variable to evaluate whether selection bias results in learning spurious associations. Finally, we discuss the strengths and limitations of our findings in light of the presence of missing data that may be missing-not-at-random, which is common in real-world clinical settings.
Dynamic event-triggered control for multi-agent systems with adjustable inter-event time: a moving average approach
Abstract
This extended abstract presents our recent work on the leader-following consensus control for generic linear multi-agent systems. An improved dynamic event-triggered control framework are proposed, based on a moving average approach. The proposed methods involve model-based estimation and clock-like auxiliary dynamic variables to increase the inter-event time as long as possible eventually. Compared to the static event-triggered strategy and the existing state-of-the-art dynamic event-triggered mechanism, the proposed approach significantly reduces the communication frequency while still guaranteeing asymptotic convergence. Numerical simulations demonstrate the validity of the proposed theoretical results.
Recursive Dynamic State Estimation for Power Systems with an Incomplete Nonlinear DAE Model
Authors: Milos Katanic, John Lygeros, Gabriela Hug
Abstract
Power systems are highly-complex, large-scale engineering systems subject to many uncertainties, which makes accurate mathematical modeling challenging. This paper proposes a novel, centralized dynamic state estimator for power systems that lack models of some components. Including the available dynamic evolution equations, algebraic network equations, and phasor measurements, we apply the least squares criterion to estimate all dynamic and algebraic states recursively. This approach results in an algorithm that generalizes the iterated extended Kalman filter and does not require static network observability. We further derive a graph theoretic condition to guarantee estimability for the proposed approach. A numerical study evaluates the performance under short circuits in the network and load changes for three different discretization schemes. The results show superior tracking performance compared to robust two-stage procedures within computational times that are feasible for real-time application.
Dynamic Structural Brain Network Construction by Hierarchical Prototype Embedding GCN using T1-MRI
Abstract
Constructing structural brain networks using T1-weighted magnetic resonance imaging (T1-MRI) presents a significant challenge due to the lack of direct regional connectivity information. Current methods with T1-MRI rely on predefined regions or isolated pretrained location modules to obtain atrophic regions, which neglects individual specificity. Besides, existing methods capture global structural context only on the whole-image-level, which weaken correlation between regions and the hierarchical distribution nature of brain connectivity.We hereby propose a novel dynamic structural brain network construction method based on T1-MRI, which can dynamically localize critical regions and constrain the hierarchical distribution among them for constructing dynamic structural brain network. Specifically, we first cluster spatially-correlated channel and generate several critical brain regions as prototypes. Further, we introduce a contrastive loss function to constrain the prototypes distribution, which embed the hierarchical brain semantic structure into the latent space. Self-attention and GCN are then used to dynamically construct hierarchical correlations of critical regions for brain network and explore the correlation, respectively. Our method is evaluated on ADNI-1 and ADNI-2 databases for mild cognitive impairment (MCI) conversion prediction, and acheive the state-of-the-art (SOTA) performance. Our source code is available at this http URL***.
Automatic Traffic Scenario Conversion from OpenSCENARIO to CommonRoad
Authors: Yuanfei Lin, Michael Ratzel, Matthias Althoff
Subjects: Robotics (cs.RO); Formal Languages and Automata Theory (cs.FL); Software Engineering (cs.SE)
Abstract
Scenarios are a crucial element for developing, testing, and verifying autonomous driving systems. However, open-source scenarios are often formulated using different terminologies. This limits their usage across different applications as many scenario representation formats are not directly compatible with each other. To address this problem, we present the first open-source converter from the OpenSCENARIO format to the CommonRoad format, which are two of the most popular scenario formats used in autonomous driving. Our converter employs a simulation tool to execute the dynamic elements defined by OpenSCENARIO. The converter is available at commonroad.in.tum.de and we demonstrate its usefulness by converting publicly available scenarios in the OpenSCENARIO format and evaluating them using CommonRoad tools.
Use of a Taxonomy of Empathetic Response Intents to Control and Interpret Empathy in Neural Chatbots
Authors: Anuradha Welivita, Pearl Pu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
A recent trend in the domain of open-domain conversational agents is enabling them to converse empathetically to emotional prompts. Current approaches either follow an end-to-end approach or condition the responses on similar emotion labels to generate empathetic responses. But empathy is a broad concept that refers to the cognitive and emotional reactions of an individual to the observed experiences of another and it is more complex than mere mimicry of emotion. Hence, it requires identifying complex human conversational strategies and dynamics in addition to generic emotions to control and interpret empathetic responding capabilities of chatbots. In this work, we make use of a taxonomy of eight empathetic response intents in addition to generic emotion categories in building a dialogue response generation model capable of generating empathetic responses in a controllable and interpretable manner. It consists of two modules: 1) a response emotion/intent prediction module; and 2) a response generation module. We propose several rule-based and neural approaches to predict the next response's emotion/intent and generate responses conditioned on these predicted emotions/intents. Automatic and human evaluation results emphasize the importance of the use of the taxonomy of empathetic response intents in producing more diverse and empathetically more appropriate responses than end-to-end models.
Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques
Authors: Andrea Lampis, Eugenio Lomurno, Matteo Matteucci
Abstract
Acquiring and annotating suitable datasets for training deep learning models is challenging. This often results in tedious and time-consuming efforts that can hinder research progress. However, generative models have emerged as a promising solution for generating synthetic datasets that can replace or augment real-world data. Despite this, the effectiveness of synthetic data is limited by their inability to fully capture the complexity and diversity of real-world data. To address this issue, we explore the use of Generative Adversarial Networks to generate synthetic datasets for training classifiers that are subsequently evaluated on real-world images. To improve the quality and diversity of the synthetic dataset, we propose three novel post-processing techniques: Dynamic Sample Filtering, Dynamic Dataset Recycle, and Expansion Trick. In addition, we introduce a pipeline called Gap Filler (GaFi), which applies these techniques in an optimal and coordinated manner to maximise classification accuracy on real-world data. Our experiments show that GaFi effectively reduces the gap with real-accuracy scores to an error of 2.03%, 1.78%, and 3.99% on the Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, respectively. These results represent a new state of the art in Classification Accuracy Score and highlight the effectiveness of post-processing techniques in improving the quality of synthetic datasets.
Curriculum Learning in Job Shop Scheduling using Reinforcement Learning
Authors: Constantin Waubert de Puiseau, Hasan Tercan, Tobias Meisen
Abstract
Solving job shop scheduling problems (JSSPs) with a fixed strategy, such as a priority dispatching rule, may yield satisfactory results for several problem instances but, nevertheless, insufficient results for others. From this single-strategy perspective finding a near optimal solution to a specific JSSP varies in difficulty even if the machine setup remains the same. A recent intensively researched and promising method to deal with difficulty variability is Deep Reinforcement Learning (DRL), which dynamically adjusts an agent's planning strategy in response to difficult instances not only during training, but also when applied to new situations. In this paper, we further improve DLR as an underlying method by actively incorporating the variability of difficulty within the same problem size into the design of the learning process. We base our approach on a state-of-the-art methodology that solves JSSP by means of DRL and graph neural network embeddings. Our work supplements the training routine of the agent by a curriculum learning strategy that ranks the problem instances shown during training by a new metric of problem instance difficulty. Our results show that certain curricula lead to significantly better performances of the DRL solutions. Agents trained on these curricula beat the top performance of those trained on randomly distributed training data, reaching 3.2% shorter average makespans.
IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events
Abstract
Video frame interpolation (VFI) increases the video frame rate by inserting a reconstruction frame into two consecutive frames. Due to the limitation of the fixed frame rate of ordinary camera, the frame-only video frame interpolation methods inevitably lose the dynamics in the interval between consecutive frames. In order to compensate for the lack of inter-frame information, motion models are often used, but those models cannot account for the real motions. Event cameras are bio-inspired vision sensor, each pixel of which independently perceives and encodes relative changes in light intensity. Event cameras output sparse, asynchronous streams of events instead of frames, with advantages of high temporal resolution, high dynamics, and low power consumption. An event is usually expressed as a tuple e=(x,y,p,t), which means that at timestamp t, an event with polarity is generated at the pixel (x,y). Positive polarity indicates that the change of light intensity from week to strong is beyond the threshold, while negative polarity is just the opposite. Because an event camera has high temporal resolution up to microseconds, it can capture complete changes or motion between frames. The event flow is the embodiment of inter-frame changes. Therefore, the optical flow estimated from the events does not require any motion model to be fitted, which can be inherently nonlinear. Since events lack intensity information, frame-based optical flow is complementary to event-based optical flow. By combining these two kinds of optical flow, more accurate estimation results can be obtained. Meanwhile, it is possible to reconstruct high-quality keyframes at any timestamp, since real inter-frame dynamics are captured.
NFI$_2$: Learning Noise-Free Illuminance-Interpolator for Unsupervised Low-Light Image Enhancement
Abstract
Low-light situations severely restrict the pursuit of aesthetic quality in consumer photography. Although many efforts are devoted to designing heuristics, it is generally mired in a shallow spiral of tedium, such as piling up complex network architectures and empirical strategies. How to delve into the essential physical principles of illumination compensation has been neglected. Following the way of simplifying the complexity, this paper innovatively proposes a simple and efficient Noise-Free Illumination Interpolator (NFI$_2$). According to the constraint principle of illuminance and reflectance within a limited dynamic range, as a prior knowledge in the recovery process, we construct a learnable illuminance interpolator and thereby compensating for non-uniform lighting. With the intention of adapting denoising without annotated data, we design a self-calibrated denoiser with the intrinsic image properties to acquire noise-free low-light images. Starting from the properties of natural image manifolds, a self-regularized recovery loss is introduced as a way to encourage more natural and realistic reflectance map. The model architecture and training losses, guided by prior knowledge, complement and benefit each other, forming a powerful unsupervised leaning framework. Comprehensive experiments demonstrate that the proposed algorithm produces competitive qualitative and quantitative results while maintaining favorable generalization capability in unknown real-world scenarios.
Agent Heterogeneity Mediates Extremism in an Adaptive Social Network Model
Authors: Seth Bullock, Hiroki Sayama
Subjects: Social and Information Networks (cs.SI); Multiagent Systems (cs.MA); Dynamical Systems (math.DS); Adaptation and Self-Organizing Systems (nlin.AO)
Abstract
An existing model of opinion dynamics on an adaptive social network is extended to introduce update policy heterogeneity, representing the fact that individual differences between social animals can affect their tendency to form, and be influenced by, their social bonds with other animals. As in the original model, the opinions and social connections of a population of model agents change due to three social processes: conformity, homophily and neophily. Here, however, we explore the case in which each node's susceptibility to these three processes is parameterised by node-specific values drawn independently at random from some distribution. This introduction of heterogeneity increases both the degree of extremism and connectedness in the final population (relative to comparable homogeneous networks) and leads to significant assortativity with respect to node update policy parameters as well as node opinions. Each node's update policy parameters also predict properties of the community that they will belong to in the final network configuration. These results suggest that update policy heterogeneity in social populations may have a significant impact on the formation of extremist communities in real-world populations.
SGAD: Spiking Generative Adversarial Network with Attention Scoring Decoding
Authors: Linghao Feng, Dongcheng Zhao, Yi Zeng
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Generative models based on neural networks present a substantial challenge within deep learning. As it stands, such models are primarily limited to the domain of artificial neural networks. Spiking neural networks, as the third generation of neural networks, offer a closer approximation to brain-like processing due to their rich spatiotemporal dynamics. However, generative models based on spiking neural networks are not well studied. In this work, we pioneer constructing a spiking generative adversarial network capable of handling complex images. Our first task was to identify the problems of out-of-domain inconsistency and temporal inconsistency inherent in spiking generative adversarial networks. We addressed these issues by incorporating the Earth-Mover distance and an attention-based weighted decoding method, significantly enhancing the performance of our algorithm across several datasets. Experimental results reveal that our approach outperforms existing methods on the MNIST, FashionMNIST, CIFAR10, and CelebA datasets. Moreover, compared with hybrid spiking generative adversarial networks, where the discriminator is an artificial analog neural network, our methodology demonstrates closer alignment with the information processing patterns observed in the mouse.
Improving Link Prediction in Social Networks Using Local and Global Features: A Clustering-based Approach
Authors: Safiye Ghasemi, Amin Zarei
Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract
Link prediction problem has increasingly become prominent in many domains such as social network analyses, bioinformatics experiments, transportation networks, criminal investigations and so forth. A variety of techniques has been developed for link prediction problem, categorized into 1) similarity based approaches which study a set of features to extract similar nodes; 2) learning based approaches which extract patterns from the input data; 3) probabilistic statistical approaches which optimize a set of parameters to establish a model which can best compute formation probability. However, existing literatures lack approaches which utilize strength of each approach by integrating them to achieve a much more productive one. To tackle the link prediction problem, we propose an approach based on the combination of first and second group methods; the existing studied works use just one of these categories. Our two-phase developed method firstly determines new features related to the position and dynamic behavior of nodes, which enforce the approach more efficiency compared to approaches using mere measures. Then, a subspace clustering algorithm is applied to group social objects based on the computed similarity measures which differentiate the strength of clusters; basically, the usage of local and global indices and the clustering information plays an imperative role in our link prediction process. Some extensive experiments held on real datasets including Facebook, Brightkite and HepTh indicate good performances of our proposal method. Besides, we have experimentally verified our approach with some previous techniques in the area to prove the supremacy of ours.
Digital Twin for Non-Terrestrial Networks: Vision, Challenges, and Enabling Technologies
Authors: Hayder Al-Hraishawi, Madyan Alsenwi, Junaid ur Rehman, Eva Lagunas, Symeon Chatzinotas
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
The ongoing digital transformation has sparked the emergence of various new network applications that demand cutting-edge technologies to enhance their efficiency and functionality. One of the promising technologies in this direction is the digital twin, which is a new approach to design and manage complicated cyber-physical systems with a high degree of automation, intelligence, and resilience. This article discusses the use of digital twin technology as a new approach for modeling non-terrestrial networks (NTNs). Digital twin technology can create accurate data-driven NTN models that operate in real-time, allowing for rapid testing and deployment of new NTN technologies and services, besides facilitating innovation and cost reduction. Specifically, we provide a vision on integrating the digital twin into NTNs and explore the primary deployment challenges, as well as the key potential enabling technologies within NTN realm. In closing, we present a case study that employs a data-driven digital twin model for dynamic and service-oriented network slicing within an open radio access network (O-RAN) NTN architecture.
High-order ADER Discontinuous Galerkin schemes for a symmetric hyperbolic model of compressible barotropic two-fluid flows
Abstract
This paper presents a high-order discontinuous Galerkin finite element method to solve the barotropic version of the conservative symmetric hyperbolic and thermodynamically compatible (SHTC) model of compressible two-phase flow, introduced by Romenski et al., in multiple space dimensions. In the absence of algebraic source terms, the model is endowed with a curl constraint on the relative velocity field. In this paper, the hyperbolicity of the system is studied for the first time in the multidimensional case, showing that the original model is only weakly hyperbolic in multiple space dimensions. To restore strong hyperbolicity, two different methodologies are used: i) the explicit symmetrization of the system, which can be achieved by adding terms that contain linear combinations of the curl involution, similar to the Godunov-Powell terms in the MHD equations; ii) the use of the hyperbolic generalized Lagrangian multiplier (GLM) curl-cleaning approach forwarded. The PDE system is solved using a high-order ADER discontinuous Galerkin method with a posteriori sub-cell finite volume limiter to deal with shock waves and the steep gradients in the volume fraction commonly appearing in the solutions of this type of model. To illustrate the performance of the method, several different test cases and benchmark problems have been run, showing the high-order of the scheme and the good agreement when compared to reference solutions computed with other well-known methods.
Inertial-based Navigation by Polynomial Optimization: Inertial-Magnetic Attitude Estimation
Abstract
Inertial-based navigation refers to the navigation methods or systems that have inertial information or sensors as the core part and integrate a spectrum of other kinds of sensors for enhanced performance. Through a series of papers, the authors attempt to explore information blending of inertial-based navigation by a polynomial optimization method. The basic idea is to model rigid motions as finite-order polynomials and then attacks the involved navigation problems by optimally solving their coefficients, taking into considerations the constraints posed by inertial sensors and others. In the current paper, a continuous-time attitude estimation approach is proposed, which transforms the attitude estimation into a constant parameter determination problem by the polynomial optimization. Specifically, the continuous attitude is first approximated by a Chebyshev polynomial, of which the unknown Chebyshev coefficients are determined by minimizing the weighted residuals of initial conditions, dynamics and measurements. We apply the derived estimator to the attitude estimation with the magnetic and inertial sensors. Simulation and field tests show that the estimator has much better stability and faster convergence than the traditional extended Kalman filter does, especially in the challenging large initial state error scenarios.
Towards Multi-Layered 3D Garments Animation
Authors: Yidi Shao, Chen Change Loy, Bo Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Mimicking realistic dynamics in 3D garment animations is a challenging task due to the complex nature of multi-layered garments and the variety of outer forces involved. Existing approaches mostly focus on single-layered garments driven by only human bodies and struggle to handle general scenarios. In this paper, we propose a novel data-driven method, called LayersNet, to model garment-level animations as particle-wise interactions in a micro physics system. We improve simulation efficiency by representing garments as patch-level particles in a two-level structural hierarchy. Moreover, we introduce a novel Rotation Equivalent Transformation that leverages the rotation invariance and additivity of physics systems to better model outer forces. To verify the effectiveness of our approach and bridge the gap between experimental environments and real-world scenarios, we introduce a new challenging dataset, D-LAYERS, containing 700K frames of dynamics of 4,900 different combinations of multi-layered garments driven by both human bodies and randomly sampled wind. Our experiments show that LayersNet achieves superior performance both quantitatively and qualitatively. We will make the dataset and code publicly available at https://mmlab-ntu.github.io/project/layersnet/index.html .
Keyword: efficient
SHATTER: Control and Defense-Aware Attack Analytics for Activity-Driven Smart Home Systems
Integrating Node Importance and Network Topological Properties for Link Prediction in Complex Network
Adversarial Security and Differential Privacy in mmWave Beam Prediction in 6G networks
Random Edge Coding: One-Shot Bits-Back Coding of Large Labeled Graphs
FedHGN: A Federated Framework for Heterogeneous Graph Neural Networks
ADDSL: Hand Gesture Detection and Sign Language Recognition on Annotated Danish Sign Language
CQural: A Novel CNN based Hybrid Architecture for Quantum Continual Machine Learning
Distributionally Robust Differential Dynamic Programming with Wasserstein Distance
NerfBridge: Bringing Real-time, Online Neural Radiance Field Training to Robotics
Shortest Path to Boundary for Self-Intersecting Meshes
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks
A Note on Dimensionality Reduction in Deep Neural Networks using Empirical Interpolation Method
Accessible Interfaces for the Development and Deployment of Robotic Platforms
MINT: Multiplier-less Integer Quantization for Spiking Neural Networks
Crossing the Reality Gap in Tactile-Based Learning
Equivariant Few-Shot Learning from Pretrained Models
CageViT: Convolutional Activation Guided Efficient Vision Transformer
Asynchronous Grant-Free Random Access: Receiver Design with Partially Uni-Directional Message Passing and Interference Suppression Analysis
Singly Exponential Translation of Alternating Weak Büchi Automata to Unambiguous Büchi Automata
Border Complexity of Symbolic Determinant under Rank One Restriction
River of No Return: Graph Percolation Embeddings for Efficient Knowledge Graph Reasoning
The Noncommutative Edmonds' Problem Re-visited
EfficientSCI: Densely Connected Network with Space-time Factorization for Large-scale Video Snapshot Compressive Imaging
An efficient solver for ASP(Q)
An Efficient Solution Space Exploring and Descent Method for Packing Equal Spheres in a Sphere
SHoP: A Deep Learning Framework for Solving High-order Partial Differential Equations
Optimized Joint Beamforming for Wireless Powered Over-the-Air Computation
Synthesizing Resilient Strategies for Infinite-Horizon Objectives in Multi-Agent Systems
CWD30: A Comprehensive and Holistic Dataset for Crop Weed Recognition in Precision Agriculture
Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network
Concurrent Gaussian elimination
Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation
Evolving the Digital Industrial Infrastructure for Production: Steps Taken and the Road Ahead
Iterated learning and communication jointly explain efficient color naming systems
Personality Understanding of Fictional Characters during Book Reading
Provably Correct Physics-Informed Neural Networks
Deep and Fast Approximate Order Independent Transparency
People Talking and AI Listening: How Stigmatizing Language in EHR Notes Affect AI Performance
Exploring the Space of Key-Value-Query Models with Intention
Error-Correcting Codes for Nanopore Sequencing
NFI$_2$: Learning Noise-Free Illuminance-Interpolator for Unsupervised Low-Light Image Enhancement
Exploring Inductive Biases in Contrastive Learning: A Clustering Perspective
Energy Loss Prediction in IoT Energy Services
State Representation Learning Using an Unbalanced Atlas
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning
Explain Any Concept: Segment Anything Meets Concept-Based Explanation
Binarized Spectral Compressive Imaging
FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross-Entropy
CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo
Joint Denoising and Few-angle Reconstruction for Low-dose Cardiac SPECT Using a Dual-domain Iterative Network with Adaptive Data Consistency
G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks
Inertial-based Navigation by Polynomial Optimization: Inertial-Magnetic Attitude Estimation
A 334$μ$W 0.158mm$^2$ ASIC for Post-Quantum Key-Encapsulation Mechanism Saber with Low-latency Striding Toom-Cook Multiplication Authors Version
Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties
Motion Planning (In)feasibility Detection using a Prior Roadmap via Path and Cut Search
PaLM 2 Technical Report
BAD: BiAs Detection for Large Language Models in the context of candidate screening
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Keyword: faster
Mimetic Initialization of Self-Attention Layers
On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations
SIMGA: A Simple and Effective Heterophilous Graph Neural Network with Efficient Global Aggregation
Pyramid Diffusion Models For Low-light Image Enhancement
Cross-domain Iterative Network for Simultaneous Denoising, Limited-angle Reconstruction, and Attenuation Correction of Low-dose Cardiac SPECT
Inertial-based Navigation by Polynomial Optimization: Inertial-Magnetic Attitude Estimation
PaLM 2 Technical Report
ZeroFlow: Fast Zero Label Scene Flow via Distillation
Keyword: mobile
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Four Factor Authentication with emerging cybersecurity for Mobile Transactions
Binarized Spectral Compressive Imaging
NUANCE: Near Ultrasound Attack On Networked Communication Environments
Keyword: pruning
There is no result
Keyword: voxel
Explaining black box text modules in natural language with language models
Keyword: lidar
Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes
Keyword: diffusion
Dynamic Causal Explanation Based Diffusion-Variational Graph Neural Network for Spatio-temporal Forecasting
A Method for Training-free Person Image Picture Generation
Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
Link prediction for ex ante influence maximization on temporal networks
Pyramid Diffusion Models For Low-light Image Enhancement
Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
Controllable Mind Visual Diffusion Model
Provably Correct Physics-Informed Neural Networks
Raising the Bar for Certified Adversarial Robustness with Diffusion Models
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Keyword: dynamic
Vulnerability Detection Using Two-Stage Deep Learning Models
Learning Switching Port-Hamiltonian Systems with Uncertainty Quantification
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Dynamic Causal Explanation Based Diffusion-Variational Graph Neural Network for Spatio-temporal Forecasting
Distributionally Robust Differential Dynamic Programming with Wasserstein Distance
ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing
Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions
MINT: Multiplier-less Integer Quantization for Spiking Neural Networks
Knowledge Graph Completion Models are Few-shot Learners: An Empirical Study of Relation Labeling in E-commerce with LLMs
Complementary Classifier Induced Partial Label Learning
Model-based Validation as Probabilistic Inference
Impact of ROS 2 Node Composition in Robotic Systems
CooK: Empowering General-Purpose Language Models with Modular and Collaborative Knowledge
Singly Exponential Translation of Alternating Weak Büchi Automata to Unambiguous Büchi Automata
Angle-based formation stabilization and maneuvers in port-Hamiltonian form with bearing and velocity measurements
Risk Assessment of Lymph Node Metastases in Endometrial Cancer Patients: A Causal Approach
Dynamic event-triggered control for multi-agent systems with adjustable inter-event time: a moving average approach
Recursive Dynamic State Estimation for Power Systems with an Incomplete Nonlinear DAE Model
Dynamic Structural Brain Network Construction by Hierarchical Prototype Embedding GCN using T1-MRI
Automatic Traffic Scenario Conversion from OpenSCENARIO to CommonRoad
Use of a Taxonomy of Empathetic Response Intents to Control and Interpret Empathy in Neural Chatbots
Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques
Curriculum Learning in Job Shop Scheduling using Reinforcement Learning
IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events
NFI$_2$: Learning Noise-Free Illuminance-Interpolator for Unsupervised Low-Light Image Enhancement
Agent Heterogeneity Mediates Extremism in an Adaptive Social Network Model
SGAD: Spiking Generative Adversarial Network with Attention Scoring Decoding
Improving Link Prediction in Social Networks Using Local and Global Features: A Clustering-based Approach
Digital Twin for Non-Terrestrial Networks: Vision, Challenges, and Enabling Technologies
High-order ADER Discontinuous Galerkin schemes for a symmetric hyperbolic model of compressible barotropic two-fluid flows
Inertial-based Navigation by Polynomial Optimization: Inertial-Magnetic Attitude Estimation
Towards Multi-Layered 3D Garments Animation