Abstract
Disk scrubbing is a process aimed at resolving read errors on disks by reading data from the disk. However, scrubbing the entire storage array at once can adversely impact system performance, particularly during periods of high input/output operations. Additionally, the continuous reading of data from disks when scrubbing can result in wear and tear, especially on larger capacity disks, due to the significant time and energy consumption involved. To address these issues, we propose a selective disk scrubbing method that enhances the overall reliability and power efficiency in data centers. Our method employs a Machine Learning model based on Mondrian Conformal prediction to identify specific disks for scrubbing, by proactively predicting the health status of each disk in the storage pool, forecasting n-days in advance, and using an open-source dataset. For disks predicted as non-healthy, we mark them for replacement without further action. For healthy drives, we create a set and quantify their relative health across the entire storage pool based on the predictor's confidence. This enables us to prioritize selective scrubbing for drives with established scrubbing frequency based on the scrub cycle. The method we propose provides an efficient and dependable solution for managing enterprise disk drives. By scrubbing just 22.7% of the total storage disks, we can achieve optimized energy consumption and reduce the carbon footprint of the data center.
FANET Experiment: Real-Time Surveillance Applications Connected to Image Processing System
Authors: Bashir Olaniyi Sadiq, Muhammed Yusuf Abiodun, Sikiru Olayinka Zakariyya, Mohammed Dahiru Buhari
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The major goal of this paper is to use image enhancement techniques for enhancing and extracting data in FANET applications to improve the efficiency of surveillance. The proposed conceptual system design can improve the likelihood of FANET operations in oil pipeline surveillance, and sports and media coverage with the ultimate goal of providing efficient services to those who are interested. The system architecture model is based on current scientific principles and developing technologies. A FANET, which is capable of gathering image data from video-enabled drones, and an image processing system that permits data collection and analysis are the two primary components of the system. Based on the image processing technique, a proof of concept for efficient data extraction and enhancement in FANET situations and possible services is illustrated.
Photon: A Cross Platform P2P Data Transfer Application
Abstract
Modern computing requires efficient and dependable data transport. Current solutions like Bluetooth, SMS (Short Message Service), and Email have their restrictions on efficiency, file size, compatibility, and cost. In order to facilitate direct communication and resource sharing amongst linked devices, this research study offers a cross-platform peer-to-peer (P2P) data transmission solution that takes advantage of P2P networks' features. The system enables cost-effective and high-performance data transport by using the compute, storage, and network resources of the participating devices. Simple file sharing, adaptability, dependability, and high performance are some of the important benefits. The examination of the suggested solution is presented in this paper and includes discussion of the P2P architecture, data transfer mechanisms, performance assessment, implementation issues, security concerns, and the potential difficulties that needs to be addressed. The research intends to validate the efficacy and potential of the suggested cross-platform P2P data transfer solution, delivering better efficiency and dependability for users across various platforms, through practical investigations and comparisons with existing approaches.
HYDRA: Hybrid Robot Actions for Imitation Learning
Abstract
Imitation Learning (IL) is a sample efficient paradigm for robot learning using expert demonstrations. However, policies learned through IL suffer from state distribution shift at test time, due to compounding errors in action prediction which lead to previously unseen states. Choosing an action representation for the policy that minimizes this distribution shift is critical in imitation learning. Prior work propose using temporal action abstractions to reduce compounding errors, but they often sacrifice policy dexterity or require domain-specific knowledge. To address these trade-offs, we introduce HYDRA, a method that leverages a hybrid action space with two levels of action abstractions: sparse high-level waypoints and dense low-level actions. HYDRA dynamically switches between action abstractions at test time to enable both coarse and fine-grained control of a robot. In addition, HYDRA employs action relabeling to increase the consistency of actions in the dataset, further reducing distribution shift. HYDRA outperforms prior imitation learning methods by 30-40% on seven challenging simulation and real world environments, involving long-horizon tasks in the real world like making coffee and toasting bread. Videos are found on our website: https://tinyurl.com/3mc6793z
The power of motifs as inductive bias for learning molecular distributions
Authors: Johanna Sommer, Leon Hetzel, David Lüdke, Fabian Theis, Stephan Günnemann
Abstract
Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover's improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery.
TemperatureGAN: Generative Modeling of Regional Atmospheric Temperatures
Authors: Emmanuel Balogun, Robert Buechler, Ram Rajagopal, Arun Majumdar
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
Abstract
Stochastic generators are useful for estimating climate impacts on various sectors. Projecting climate risk in various sectors, e.g. energy systems, requires generators that are accurate (statistical resemblance to ground-truth), reliable (do not produce erroneous examples), and efficient. Leveraging data from the North American Land Data Assimilation System, we introduce TemperatureGAN, a Generative Adversarial Network conditioned on months, locations, and time periods, to generate 2m above ground atmospheric temperatures at an hourly resolution. We propose evaluation methods and metrics to measure the quality of generated samples. We show that TemperatureGAN produces high-fidelity examples with good spatial representation and temporal dynamics consistent with known diurnal cycles.
A Unified Framework for Online Data-Driven Predictive Control with Robust Safety Guarantees
Authors: Amin Vahidi-Moghaddam, Kaian Chen, Kaixiang Zhang, Zhaojian Li, Yan Wang, Kai Wu
Abstract
Despite great successes, model predictive control (MPC) relies on an accurate dynamical model and requires high onboard computational power, impeding its wider adoption in engineering systems, especially for nonlinear real-time systems with limited computation power. These shortcomings of MPC motivate this work to make such a control framework more practically viable for real-world applications. Specifically, to remove the required accurate dynamical model and reduce the computational cost for nonlinear MPC (NMPC), this paper develops a unified online data-driven predictive control pipeline to efficiently control a system with guaranteed safety without incurring large computational complexity. The new aspect of this idea is learning not only the real system but also the control policy, which results in a reasonable computational cost for the data-driven predictive controllers. More specifically, we first develop a spatial temporal filter (STF)-based concurrent learning scheme to systematically identify system dynamics for general nonlinear systems. We then develop a robust control barrier function (RCBF) for safety guarantees in the presence of model uncertainties and learn the RCBF-based NMPC policy. Furthermore, to mitigate the performance degradation due to the existing model uncertainties, we propose an online policy correction scheme through perturbation analysis and design of an ancillary feedback controller. Finally, extensive simulations on two applications, cart-inverted pendulum and automotive powertrain control, are performed to demonstrate the efficacy of the proposed framework, which shows comparable performance with much lower computational cost in comparison with several benchmark algorithms.
ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation
Authors: Shuyang Sun, Weijun Wang, Qihang Yu, Andrew Howard, Philip Torr, Liang-Chieh Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to its high complexity, the training objective of panoptic segmentation will inevitably lead to much higher false positive penalization. Such unbalanced loss makes the training process of the end-to-end mask-transformer based architectures difficult, especially for efficient models. In this paper, we present ReMaX that adds relaxation to mask predictions and class predictions during training for panoptic segmentation. We demonstrate that via these simple relaxation techniques during training, our model can be consistently improved by a clear margin \textbf{without} any extra computational cost on inference. By combining our method with efficient backbones like MobileNetV3-Small, our method achieves new state-of-the-art results for efficient panoptic segmentation on COCO, ADE20K and Cityscapes. Code and pre-trained checkpoints will be available at \url{https://github.com/google-research/deeplab2}.
Scaling Model Checking for DNN Analysis via State-Space Reduction and Input Segmentation (Extended Version)
Authors: Mahum Naseer, Osman Hasan, Muhammad Shafique
Abstract
Owing to their remarkable learning capabilities and performance in real-world applications, the use of machine learning systems based on Neural Networks (NNs) has been continuously increasing. However, various case studies and empirical findings in the literature suggest that slight variations to NN inputs can lead to erroneous and undesirable NN behavior. This has led to considerable interest in their formal analysis, aiming to provide guarantees regarding a given NN's behavior. Existing frameworks provide robustness and/or safety guarantees for the trained NNs, using satisfiability solving and linear programming. We proposed FANNet, the first model checking-based framework for analyzing a broader range of NN properties. However, the state-space explosion associated with model checking entails a scalability problem, making the FANNet applicable only to small NNs. This work develops state-space reduction and input segmentation approaches, to improve the scalability and timing efficiency of formal NN analysis. Compared to the state-of-the-art FANNet, this enables our new model checking-based framework to reduce the verification's timing overhead by a factor of up to 8000, making the framework applicable to NNs even with approximately $80$ times more network parameters. This in turn allows the analysis of NN safety properties using the new framework, in addition to all the NN properties already included with FANNet. The framework is shown to be efficiently able to analyze properties of NNs trained on healthcare datasets as well as the well--acknowledged ACAS Xu NNs.
Visualizing Geophylogenies -- Internal and External Labeling with Phylogenetic Tree Constraints
Authors: Jonathan Klawitter, Felix Klesen, Joris Y. Scholl, Thomas C. van Dijk, Alexander Zaft
Abstract
A geophylogeny is a phylogenetic tree where each leaf (biological taxon) has an associated geographic location (site). To clearly visualize a geophylogeny, the tree is typically represented as a crossing-free drawing next to a map. The correspondence between the taxa and the sites is either shown with matching labels on the map (internal labeling) or with leaders that connect each site to the corresponding leaf of the tree (external labeling). In both cases, a good order of the leaves is paramount for understanding the association between sites and taxa. We define several quality measures for internal labeling and give an efficient algorithm for optimizing them. In contrast, minimizing the number of leader crossings in an external labeling is NP-hard. We show nonetheless that optimal solutions can be found in a matter of seconds on realistic instances using integer linear programming. Finally, we provide several efficient heuristic algorithms and experimentally show them to be near optimal on real-world and synthetic instances.
Topology-Aware Resilient Routing Protocol for FANETs: An Adaptive Q-Learning Approach
Authors: Yanpeng Cui, Qixun Zhang, Zhiyong Feng, Zhiqing Wei, Ce Shi, Heng Yang
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Flying ad hoc networks (FANETs) play a crucial role in numerous military and civil applications since it shortens mission duration and enhances coverage significantly compared with a single unmanned aerial vehicle (UAV). Whereas, designing an energy-efficient FANET routing protocol with a high packet delivery rate (PDR) and low delay is challenging owing to the dynamic topology changes. In this article, we propose a topology-aware resilient routing strategy based on adaptive Q-learning (TARRAQ) to accurately capture topology changes with low overhead and make routing decisions in a distributed and autonomous way. First, we analyze the dynamic behavior of UAV nodes via the queuing theory, and then the closed-form solutions of neighbors' change rate (NCR) and neighbors' change interarrival time (NCIT) distribution are derived. Based on the real-time NCR and NCIT, a resilient sensing interval (SI) is determined by defining the expected sensing delay of network events. Besides, we also present an adaptive Q-learning approach that enables UAVs to make distributed, autonomous, and adaptive routing decisions, where the above SI ensures that the action space can be updated in time at a low cost. The simulation results verify the accuracy of the topology dynamic analysis model and also prove that our TARRAQ outperforms the Q-learning-based topology-aware routing (QTAR), mobility prediction-based virtual routing (MPVR), and greedy perimeter stateless routing based on energy-efficient hello (EE-Hello) in terms of 25.23%, 20.24%, and 13.73% lower overhead, 9.41%, 14.77%, and 16.70% higher PDR, and 5.12%, 15.65%, and 11.31% lower energy consumption, respectively.
HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image
Abstract
Survival prediction based on whole slide images (WSIs) is a challenging task for patient-level multiple instance learning (MIL). Due to the vast amount of data for a patient (one or multiple gigapixels WSIs) and the irregularly shaped property of WSI, it is difficult to fully explore spatial, contextual, and hierarchical interaction in the patient-level bag. Many studies adopt random sampling pre-processing strategy and WSI-level aggregation models, which inevitably lose critical prognostic information in the patient-level bag. In this work, we propose a hierarchical vision Transformer framework named HVTSurv, which can encode the local-level relative spatial information, strengthen WSI-level context-aware communication, and establish patient-level hierarchical interaction. Firstly, we design a feature pre-processing strategy, including feature rearrangement and random window masking. Then, we devise three layers to progressively obtain patient-level representation, including a local-level interaction layer adopting Manhattan distance, a WSI-level interaction layer employing spatial shuffle, and a patient-level interaction layer using attention pooling. Moreover, the design of hierarchical network helps the model become more computationally efficient. Finally, we validate HVTSurv with 3,104 patients and 3,752 WSIs across 6 cancer types from The Cancer Genome Atlas (TCGA). The average C-Index is 2.50-11.30% higher than all the prior weakly supervised methods over 6 TCGA datasets. Ablation study and attention visualization further verify the superiority of the proposed HVTSurv. Implementation is available at: https://github.com/szc19990412/HVTSurv.
Physics-informed invertible neural network for the Koopman operator learning
Abstract
In Koopman operator theory, a finite-dimensional nonlinear system is transformed into an infinite but linear system using a set of observable functions. However, manually selecting observable functions that span the invariant subspace of the Koopman operator based on prior knowledge is inefficient and challenging, particularly when little or no information is available about the underlying systems. Furthermore, current methodologies tend to disregard the importance of the invertibility of observable functions, which leads to inaccurate results. To address these challenges, we propose the so-called FlowDMD, a Flow-based Dynamic Mode Decomposition that utilizes the Coupling Flow Invertible Neural Network (CF-INN) framework. FlowDMD leverages the intrinsically invertible characteristics of the CF-INN to learn the invariant subspaces of the Koopman operator and accurately reconstruct state variables. Numerical experiments demonstrate the superior performance of our algorithm compared to state-of-the-art methodologies.
LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection
Abstract
As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-the-art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-consuming and may introduce bias. At the same time, we find that after finetuning on Twitter bot detection, pretrained language models achieve competitive performance and do not require a graph structure during deployment. Inspired by this finding, we propose a novel bot detection framework LMBot that distills the knowledge of graph neural networks (GNNs) into language models (LMs) for graph-less deployment in Twitter bot detection to combat the challenge of data dependency. Moreover, LMBot is compatible with graph-based and graph-less datasets. Specifically, we first represent each user as a textual sequence and feed them into the LM for domain adaptation. For graph-based datasets, the output of LMs provides input features for the GNN, enabling it to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process. Armed with the LM, we can perform graph-less inference, which resolves the graph data dependency and sampling bias issues. For datasets without graph structure, we simply replace the GNN with an MLP, which has also shown strong performance. Our experiments demonstrate that LMBot achieves state-of-the-art performance on four Twitter bot detection benchmarks. Extensive studies also show that LMBot is more robust, versatile, and efficient compared to graph-based Twitter bot detection methods.
Screw and Lie Group Theory in Multibody Kinematics -- Motion Representation and Recursive Kinematics of Tree-Topology Systems
Authors: Andreas Mueller
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE); Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
After three decades of computational multibody system (MBS) dynamics, current research is centered at the development of compact and user friendly yet computationally efficient formulations for the analysis of complex MBS. The key to this is a holistic geometric approach to the kinematics modeling observing that the general motion of rigid bodies as well as the relative motion due to technical joints are screw motions. Moreover, screw theory provides the geometric setting and Lie group theory the analytic foundation for an intuitive and compact MBS modeling. The inherent frame invariance of this modeling approach gives rise to very efficient recursive $O\left( n\right) $ algorithms, for which the so-called 'spatial operator algebra' is one example, and allows for use of readily available geometric data. In this paper three variants for describing the configuration of tree-topology MBS in terms of relative coordinates, i.e. joint variables, are presented: the standard formulation using body-fixed joint frames, a formulation without joint frames, and a formulation without either joint or body-fixed reference frames. This allows for describing the MBS kinematics without introducing joint reference frames and therewith rendering the use of restrictive modeling convention, such as Denavit-Hartenberg parameters, redundant. Four different definitions of twists are recalled and the corresponding recursive expressions are derived. The corresponding Jacobians and their factorization are derived. The aim of this paper is to motivate the use of Lie group modeling and to provide a review of the different formulations for the kinematics of tree-topology MBS in terms of relative (joint) coordinates from the unifying perspective of screw and Lie group theory.
Hashing-Based Distributed Clustering for Massive High-Dimensional Data
Authors: Yifeng Xiao, Jiang Xue, Deyu Meng
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Clustering analysis is of substantial significance for data mining. The properties of big data raise higher demand for more efficient and economical distributed clustering methods. However, existing distributed clustering methods mainly focus on the size of data but ignore possible problems caused by data dimension. To solve this problem, we propose a new distributed algorithm, referred to as Hashing-Based Distributed Clustering (HBDC). Motivated by the outstanding performance of hashing methods for nearest neighbor searching, this algorithm applies the learning-to-hash technique to the clustering problem, which possesses incomparable advantages for data storage, transmission and computation. Following a global-sub-site paradigm, the HBDC consists of distributed training of hashing network and spectral clustering for hash codes at the global site. The sub-sites use the learnable network as a hash function to convert massive HD original data into a small number of hash codes, and send them to the global site for final clustering. In addition, a sample-selection method and slight network structures are designed to accelerate the convergence of the hash network. We also analyze the transmission cost of HBDC, including the upper bound. Our experiments on synthetic and real datasets illustrate the superiority of HBDC compared with existing state-of-the-art algorithms.
STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking
Authors: Yubo Cui, Zhiheng Li, Zheng Fang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
3D single object tracking with point clouds is a critical task in 3D computer vision. Previous methods usually input the last two frames and use the predicted box to get the template point cloud in previous frame and the search area point cloud in the current frame respectively, then use similarity-based or motion-based methods to predict the current box. Although these methods achieved good tracking performance, they ignore the historical information of the target, which is important for tracking. In this paper, compared to inputting two frames of point clouds, we input multi-frame of point clouds to encode the spatio-temporal information of the target and learn the motion information of the target implicitly, which could build the correlations among different frames to track the target in the current frame efficiently. Meanwhile, rather than directly using the point feature for feature fusion, we first crop the point cloud features into many patches and then use sparse attention mechanism to encode the patch-level similarity and finally fuse the multi-frame features. Extensive experiments show that our method achieves competitive results on challenging large-scale benchmarks (62.6% in KITTI and 49.66% in NuScenes).
Collision-free Motion Planning for Mobile Robots by Zero-order Robust Optimization-based MPC
Authors: Yunfan Gao, Florian Messerer, Jonathan Frey, Niels van Duijkeren, Moritz Diehl
Abstract
This paper presents an implementation of robust model predictive control (MPC) for collision-free reference trajectory tracking for mobile robots. The presented approach considers the robot motion to be subject to process noise bounded by ellipsoidal sets. In order to efficiently handle the evolution of the disturbance ellipsoids within the MPC, the zero-order robust optimization (zoRO) scheme is applied. The idea is to fix the disturbance ellipsoids within one optimization iteration and solve the problem repeatedly with updated disturbance ellipsoid trajectories. The zero-order approach is suboptimal in general. However, we show that it does not impair convergence to the reference trajectory in the absence of obstacles. The experiments on an industrial mobile robot prototype demonstrate the performance of the controller.
High-throughput Simulation of Federated Learning via Resource-Aware Client Placement
Authors: Lorenzo Sani, Pedro Porto Buarque de Gusmão, Alex Iacob, Wanru Zhao, Xinchi Qiu, Yan Gao, Javier Fernandez-Marques, Nicholas Donald Lane
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Federated Learning (FL) is the privacy-preserving machine learning paradigm which collaboratively trains a model across millions of devices. Simulated environments are fundamental to large-scale FL research, allowing researchers to quickly test new ideas to solve system and statistical heterogeneity issues. This work proposes \emph{Pollen}, a novel resource-aware system capable of speeding up FL simulations by efficiently placing clients across distributed and heterogeneous hardware. We propose minimising server-GPU communication and using an efficient client placement policy based on the inherent trade-offs of FL client placement on heterogeneous GPUs. These trade-offs are explored experimentally. This exploration has been conducted via relevant baselines on three popular FL tasks: image classification, speech recognition and text generation. We compare \emph{Pollen} to existing ad-hoc FL frameworks, such as Flower, Flute and FedScale, and show performance gains of $50\%$ to $400\%$.
FedBone: Towards Large-Scale Federated Multi-Task Learning
Abstract
Heterogeneous federated multi-task learning (HFMTL) is a federated learning technique that combines heterogeneous tasks of different clients to achieve more accurate, comprehensive predictions. In real-world applications, visual and natural language tasks typically require large-scale models to extract high-level abstract features. However, large-scale models cannot be directly applied to existing federated multi-task learning methods. Existing HFML methods also disregard the impact of gradient conflicts on multi-task optimization during the federated aggregation process. In this work, we propose an innovative framework called FedBone, which enables the construction of large-scale models with better generalization from the perspective of server-client split learning and gradient projection. We split the entire model into two components: a large-scale general model (referred to as the general model) on the cloud server and multiple task-specific models (referred to as the client model) on edge clients, solving the problem of insufficient computing power on edge clients. The conflicting gradient projection technique is used to enhance the generalization of the large-scale general model between different tasks. The proposed framework is evaluated on two benchmark datasets and a real ophthalmic dataset. Comprehensive results demonstrate that FedBone efficiently adapts to heterogeneous local tasks of each client and outperforms existing federated learning algorithms in most dense prediction and classification tasks with off-the-shelf computational resources on the client side.
Secure and Efficient Flexibility Service Procurement: A Game-Theoretic Approach
Authors: Xiupeng Chen, Koorosh Shomalzadeh, Jacquelien M. A. Scherpen, Nima Monshizadeh
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Procuring flexibility services from energy consumers has been a potential solution to accommodating renewable generations in future power system. However, efficiently and securely coordinating the behaviors of diverse market participants within a privacy-preserving environment remains a challenge. This paper addresses this issue by introducing a game-theoretic market framework for real-time energy balancing. The competition among energy consumers is modeled as a Generalized Nash Game (GNG), which enables the analysis of their strategic decision-making. To mitigate the market power exerted by active energy consumers, we employ a supply function-based bidding method in this market design. We incorporate physical constraints to ensure the secure operation of the distribution network. Previous approaches to steering consumers towards the Generalized Nash Equilibrium (GNE) of this game often necessitate the sharing of private information, either in full or in part, which may not be practically feasible. To overcome this limitation, we propose a preconditioned forward-backward algorithm, with analytical convergence guarantees, that only requires participants to share limited, non-private sensitive information with others. Finally, numerical simulations on the enhanced IEEE 33-bus test case validate the effectiveness of our proposed market mechanism and algorithm.
Landmark Guided Active Exploration with Stable Low-level Policy Learning
Authors: Fei Cui, Jiaojiao Fang, Mengke Yang, Guizhong Liu
Abstract
Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes long-horizon tasks into sub-tasks through a hierarchical framework and it has demonstrated promising results across a variety of domains. However, the high-level policy's action space is often excessively large, presenting a significant challenge to effective exploration and resulting in potentially inefficient training. Moreover, the dynamic variability of the low-level policy introduces non-stationarity to the high-level state transition function, significantly impeding the learning of the high-level policy. In this paper, we design a measure of prospect for subgoals by planning in the goal space based on the goal-conditioned value function. Building upon the measure of prospect, we propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty which aims to guide the agent to explore efficiently and improve sample efficiency. To address the non-stationarity arising from the dynamic changes of the low-level policy, we apply a state-specific regularization to the learning of low-level policy, which facilitates stable learning of the hierarchical policy. The experimental results demonstrate that our proposed exploration strategy significantly outperforms the baseline methods across multiple tasks.
Multigrid-Augmented Deep Learning for the Helmholtz Equation: Better Scalability with Compact Implicit Layers
Authors: Bar Lerer, Ido Ben-Yair, Eran Treister
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Abstract
We present a deep learning-based iterative approach to solve the discrete heterogeneous Helmholtz equation for high wavenumbers. Combining classical iterative multigrid solvers and convolutional neural networks (CNNs) via preconditioning, we obtain a learned neural solver that is faster and scales better than a standard multigrid solver. Our approach offers three main contributions over previous neural methods of this kind. First, we construct a multilevel U-Net-like encoder-solver CNN with an implicit layer on the coarsest grid of the U-Net, where convolution kernels are inverted. This alleviates the field of view problem in CNNs and allows better scalability. Second, we improve upon the previous CNN preconditioner in terms of the number of parameters, computation time, and convergence rates. Third, we propose a multiscale training approach that enables the network to scale to problems of previously unseen dimensions while still maintaining a reasonable training procedure. Our encoder-solver architecture can be used to generalize over different slowness models of various difficulties and is efficient at solving for many right-hand sides per slowness model. We demonstrate the benefits of our novel architecture with numerical experiments on a variety of heterogeneous two-dimensional problems at high wavenumbers.
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Authors: Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu, Xiaoshuai Sun, Tongliang Liu, Rongrong Ji, Dacheng Tao
Abstract
Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and $N$:$M$ structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e., $O(\log T/\sqrt{T})$. Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50\% sparsity. Code is available at https://github.com/Mi-Peng/Systematic-Investigation-of-Sparse-Perturbed-Sharpness-Aware-Minimization-Optimizer.
Comparative study of subset selection methods for rapid prototyping of 3D object detection algorithms
Authors: Konrad Lis, Tomasz Kryjak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Object detection in 3D is a crucial aspect in the context of autonomous vehicles and drones. However, prototyping detection algorithms is time-consuming and costly in terms of energy and environmental impact. To address these challenges, one can check the effectiveness of different models by training on a subset of the original training set. In this paper, we present a comparison of three algorithms for selecting such a subset - random sampling, random per class sampling, and our proposed MONSPeC (Maximum Object Number Sampling per Class). We provide empirical evidence for the superior effectiveness of random per class sampling and MONSPeC over basic random sampling. By replacing random sampling with one of the more efficient algorithms, the results obtained on the subset are more likely to transfer to the results on the entire dataset. The code is available at: https://github.com/vision-agh/monspec.
RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution
Abstract
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images, which is conducive to enhancing the imaging effects of smartphones with limited sensors. The main challenge of BurstSR is to effectively combine the complementary information from input frames, while existing methods still struggle with it. In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network. In particular, we emphasize the role of the base-frame and utilize it as a key prompt to guide the knowledge acquisition from other frames in every recurrence. Moreover, we introduce an implicit weighting loss to improve the model's flexibility in facing input frames with variable numbers. Extensive experiments on both synthetic and real-world datasets demonstrate that our method achieves better results than state-of-the-art ones. Codes and pre-trained models are available at https://github.com/ZcsrenlongZ/RBSR.
Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Abstract
The event streams generated by dynamic vision sensors (DVS) are sparse and non-uniform in the spatial domain, while still dense and redundant in the temporal domain. Although spiking neural network (SNN), the event-driven neuromorphic model, has the potential to extract spatio-temporal features from the event streams, it is not effective and efficient. Based on the above, we propose an events sparsification spiking framework dubbed as Razor SNN, pruning pointless event frames progressively. Concretely, we extend the dynamic mechanism based on the global temporal embeddings, reconstruct the features, and emphasize the events effect adaptively at the training stage. During the inference stage, eliminate fruitless frames hierarchically according to a binary mask generated by the trained temporal embeddings. Comprehensive experiments demonstrate that our Razor SNN achieves competitive performance consistently on four events-based benchmarks: DVS 128 Gesture, N-Caltech 101, CIFAR10-DVS and SHD.
Projection-based first-order constrained optimization solver for robotics
Authors: Hakan Girgin, Tobias Löw, Teng Xue, Sylvain Calinon
Abstract
Robot programming tools ranging from inverse kinematics (IK) to model predictive control (MPC) are most often described as constrained optimization problems. Even though there are currently many commercially-available second-order solvers, robotics literature recently focused on efficient implementations and improvements over these solvers for real-time robotic applications. However, most often, these implementations stay problem-specific and are not easy to access or implement, or do not exploit the geometric aspect of the robotics problems. In this work, we propose to solve these problems using a fast, easy-to-implement first-order method that fully exploits the geometric constraints via Euclidean projections, called Augmented Lagrangian Spectral Projected Gradient Descent (ALSPG). We show that 1. using projections instead of full constraints and gradients improves the performance of the solver and 2. ALSPG stays competitive to the standard second-order methods such as iLQR in the unconstrained case. We showcase these results with IK and motion planning problems on simulated examples and with an MPC problem on a 7-axis manipulator experiment.
An Integrated FPGA Accelerator for Deep Learning-based 2D/3D Path Planning
Abstract
Path planning is a crucial component for realizing the autonomy of mobile robots. However, due to limited computational resources on mobile robots, it remains challenging to deploy state-of-the-art methods and achieve real-time performance. To address this, we propose P3Net (PointNet-based Path Planning Networks), a lightweight deep-learning-based method for 2D/3D path planning, and design an IP core (P3NetCore) targeting FPGA SoCs (Xilinx ZCU104). P3Net improves the algorithm and model architecture of the recently-proposed MPNet. P3Net employs an encoder with a PointNet backbone and a lightweight planning network in order to extract robust point cloud features and sample path points from a promising region. P3NetCore is comprised of the fully-pipelined point cloud encoder, batched bidirectional path planner, and parallel collision checker, to cover most part of the algorithm. On the 2D (3D) datasets, P3Net with the IP core runs 24.54-149.57x and 6.19-115.25x (10.03-59.47x and 3.38-28.76x) faster than ARM Cortex CPU and Nvidia Jetson while only consuming 0.255W (0.809W), and is up to 1049.42x (133.84x) power-efficient than the workstation. P3Net improves the success rate by up to 28.2% and plans a near-optimal path, leading to a significantly better tradeoff between computation and solution quality than MPNet and the state-of-the-art sampling-based methods.
Termination of Picard Iteration for Coupled Neutronics/Thermal-Hydraulics Simulations
Abstract
In this paper, we consider the coupled N/TH problem, in which the termination criterion for the neutronics iteration adopts an adaptive tolerance with respect to the fuel temperature residual at each Picard iteration. We refer to this coupling scheme as the inexact Picard iteration method. Fourier analysis is performed to investigate how the convergence behavior of Picard iteration is influenced by the inexact neutronics solution. It is found that if the convergence rate of the neutronics solution is slow, Picard coupling may become unstable unless a tighter tolerance is used for the neutronics iteration. Nevertheless, our analysis indicates that a certain amount of over-solving is necessary for the stability of Picard iteration unless the iterative solution of the subproblem is very efficient, which however has not been addressed in the previous studies.
MCQUIC -- A Multicast Extension for QUIC
Authors: Max Franke, Jake Holland, Stefan Schmid
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Mass live content, such as world cups, the Superbowl or the Olympics, attract audiences of hundreds of millions of viewers. While such events were predominantly consumed on TV, more and more viewers follow big events on the Internet, which poses a scalability challenge: current unicast delivery over the web comes with large overheads and is inefficient. An attractive alternative are multicast-based transmissions, however, current solutions have several drawbacks, mostly related to security and privacy, which prevent them from being implemented in browsers. In this paper we introduce a multicast extension to QUIC, a widely popular transport protocol standardized by the IETF, that solves several of these problems. It enables multicast delivery by offering encryption as well as integrity verification of packets distributed over multicast and automatic unicast fallback, which solves one of multicasts major obstacles to large scale deployment. It is transparent to applications and can be easily utilized by simply enabling an option in QUIC. This extension is soley focused on the transport layer and uses already existing multicast mechanisms on the network layer.
Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Abstract
Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on the Spiking Heidelberg Dataset (SHD) and the Spiking Speech Commands (SSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two hidden fully connected layers. We showed that fixed random delays help, and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in both SHD and SSC without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays
Thompson sampling for improved exploration in GFlowNets
Abstract
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying
Authors: Shaoshuai Shi, Li Jiang, Dengxin Dai, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. However, this task is challenging due to the diverse behaviors of traffic participants and complex environmental contexts. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries, enabling efficient and accurate prediction of future trajectories. By customizing intention queries for distinct motion modalities, MTR improves multimodal motion prediction while reducing reliance on dense goal candidates. The framework comprises two essential processes: global intention localization, identifying the agent's intent to enhance overall efficiency, and local movement refinement, adaptively refining predicted trajectories for improved accuracy. Moreover, we introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents. MTR++ incorporates symmetric context modeling and mutually-guided intention querying modules to facilitate future behavior interaction among multiple agents, resulting in scene-compliant future trajectories. Extensive experimental results demonstrate that the MTR framework achieves state-of-the-art performance on the highly-competitive motion prediction benchmarks, while the MTR++ framework surpasses its precursor, exhibiting enhanced performance and efficiency in predicting accurate multimodal future trajectories for multiple agents.
Screw and Lie Group Theory in Multibody Dynamics -- Recursive Algorithms and Equations of Motion of Tree-Topology Systems
Authors: Andreas Mueller
Subjects: Numerical Analysis (math.NA); Robotics (cs.RO); Differential Geometry (math.DG); Optimization and Control (math.OC)
Abstract
Screw and Lie group theory allows for user-friendly modeling of multibody systems (MBS) while at the same they give rise to computationally efficient recursive algorithms. The inherent frame invariance of such formulations allows for use of arbitrary reference frames within the kinematics modeling (rather than obeying modeling conventions such as the Denavit-Hartenberg convention) and to avoid introduction of joint frames. The computational efficiency is owed to a representation of twists, accelerations, and wrenches that minimizes the computational effort. This can be directly carried over to dynamics formulations. In this paper recursive $O\left( n\right) $ Newton-Euler algorithms are derived for the four most frequently used representations of twists, and their specific features are discussed. These formulations are related to the corresponding algorithms that were presented in the literature. The MBS motion equations are derived in closed form using the Lie group formulation. One are the so-called 'Euler-Jourdain' or 'projection' equations, of which Kane's equations are a special case, and the other are the Lagrange equations. The recursive kinematics formulations are readily extended to higher orders in order to compute derivatives of the motions equations. To this end, recursive formulations for the acceleration and jerk are derived. It is briefly discussed how this can be employed for derivation of the linearized motion equations and their time derivatives. The geometric modeling allows for direct application of Lie group integration methods, which is briefly discussed.
Circular Systems Engineering
Authors: Istvan David, Dominik Bork, Gerti Kappel
Subjects: Computers and Society (cs.CY); Software Engineering (cs.SE); Systems and Control (eess.SY)
Abstract
The perception of the value and propriety of modern engineered systems is changing. In addition to their functional and extra-functional properties, nowadays' systems are also evaluated by their sustainability properties. The next generation of systems will be characterized by an overall elevated sustainability -- including their post-life, driven by efficient value retention mechanisms. Current systems engineering practices fall short to support these ambitions and need to be revised appropriately. In this paper, we introduce the concept of circular systems engineering, a novel paradigm for systems sustainability. After defining a conceptual reference framework to situate systems engineering practices within, we derive prerequisites for circular systems engineering. Finally, we outline the challenges and research opportunities associated with circular systems engineering.
Meta-Reasoning: Semantics-Symbol Deconstruction For Large Language Models
Abstract
Symbolization methods in large language models (LLMs) have been shown effective to improve LLMs' reasoning ability. However, most of these approaches hinge on mapping natural languages to formal languages (e.g., Python, SQL) that are more syntactically complete and free of ambiguity. Although effective, they depart from the natural language itself and deviate from the habits of human thinking, and instead cater more to the execution mindset of computers. In contrast, we hope to simplify natural language by starting from the concept of symbols in linguistics itself, so that LLMs can learn the common formulation and general solution of reasoning problems wrapped in different natural semantics. From this consideration, we propose \textbf{Meta-Reasoning}, which allows LLMs to automatically accomplish semantic-symbol deconstruction, i.e., semantic resolution, to maximally reduce different questions of certain reasoning tasks to similar natural language representation, thus gaining the ability to learn by analogy and facilitating data-efficient in-context learning. Our experiments show that the Meta-Reasoning paradigm saliently enhances LLMs' reasoning performance with fewer demonstrations. They can learn not only reasoning chains but also general solutions to certain types of tasks. In particular, for symbolic reasoning tasks, such as 7-step Tracking Shuffled Objects, GPT-3 (text-davinci-002) achieves over 99% accuracy with only one Meta-Reasoning demonstration, outperforming all current LLMs with the standard chain-of-thought prompting.
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Authors: Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123.
Keyword: faster
Multigrid-Augmented Deep Learning for the Helmholtz Equation: Better Scalability with Compact Implicit Layers
Authors: Bar Lerer, Ido Ben-Yair, Eran Treister
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Abstract
We present a deep learning-based iterative approach to solve the discrete heterogeneous Helmholtz equation for high wavenumbers. Combining classical iterative multigrid solvers and convolutional neural networks (CNNs) via preconditioning, we obtain a learned neural solver that is faster and scales better than a standard multigrid solver. Our approach offers three main contributions over previous neural methods of this kind. First, we construct a multilevel U-Net-like encoder-solver CNN with an implicit layer on the coarsest grid of the U-Net, where convolution kernels are inverted. This alleviates the field of view problem in CNNs and allows better scalability. Second, we improve upon the previous CNN preconditioner in terms of the number of parameters, computation time, and convergence rates. Third, we propose a multiscale training approach that enables the network to scale to problems of previously unseen dimensions while still maintaining a reasonable training procedure. Our encoder-solver architecture can be used to generalize over different slowness models of various difficulties and is efficient at solving for many right-hand sides per slowness model. We demonstrate the benefits of our novel architecture with numerical experiments on a variety of heterogeneous two-dimensional problems at high wavenumbers.
Control of Cross-Directional Systems with Approximate Symmetries
Authors: Idris Kempf, Paul Goulart, Stephen Duncan
Abstract
Structural symmetries of linear dynamical systems can be exploited for decoupling the dynamics and reducing the computational complexity of the controller implementation. However, in practical applications, inexact structural symmetries undermine the ability to decouple the system, resulting in the loss of any potential complexity reduction. To address this, we propose substituting an approximation with exact structural symmetries for the original system model, thereby introducing an approximation error. We focus on internal model controllers for cross-directional systems encountered in large-scale and high-speed control problems of synchrotrons or the process industry and characterise the stability, performance, and robustness properties of the resulting closed loop. While existing approaches replace the original system model with one that minimises the Frobenius norm of the approximation error, we show that this can lead to instability or poor performance. Instead, we propose approximations that are obtained from semidefinite programming problems. We show that our proposed approximations can yield stable systems even when the Frobenius norm approximation does not. The paper concludes with numerical examples and a case study of a synchrotron light source with inexact structural symmetries. Exploiting structural symmetries in large-scale and high-speed systems enables faster sampling times and the use of more advanced control techniques, even when the symmetries are approximate.
An Integrated FPGA Accelerator for Deep Learning-based 2D/3D Path Planning
Abstract
Path planning is a crucial component for realizing the autonomy of mobile robots. However, due to limited computational resources on mobile robots, it remains challenging to deploy state-of-the-art methods and achieve real-time performance. To address this, we propose P3Net (PointNet-based Path Planning Networks), a lightweight deep-learning-based method for 2D/3D path planning, and design an IP core (P3NetCore) targeting FPGA SoCs (Xilinx ZCU104). P3Net improves the algorithm and model architecture of the recently-proposed MPNet. P3Net employs an encoder with a PointNet backbone and a lightweight planning network in order to extract robust point cloud features and sample path points from a promising region. P3NetCore is comprised of the fully-pipelined point cloud encoder, batched bidirectional path planner, and parallel collision checker, to cover most part of the algorithm. On the 2D (3D) datasets, P3Net with the IP core runs 24.54-149.57x and 6.19-115.25x (10.03-59.47x and 3.38-28.76x) faster than ARM Cortex CPU and Nvidia Jetson while only consuming 0.255W (0.809W), and is up to 1049.42x (133.84x) power-efficient than the workstation. P3Net improves the success rate by up to 28.2% and plans a near-optimal path, leading to a significantly better tradeoff between computation and solution quality than MPNet and the state-of-the-art sampling-based methods.
Thompson sampling for improved exploration in GFlowNets
Abstract
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
Solving Edge Clique Cover Exactly via Synergistic Data Reduction
Authors: Anthony Hevia, Benjamin Kallus, Summer McClintic, Samantha Reisner, Darren Strash, Johnathan Wilson
Abstract
The edge clique cover (ECC) problem -- where the goal is to find a minimum cardinality set of cliques that cover all the edges of a graph -- is a classic NP-hard problem that has received much attention from both the theoretical and experimental algorithms communities. While small sparse graphs can be solved exactly via the branch-and-reduce algorithm of Gramm et al. [JEA 2009], larger instances can currently only be solved inexactly using heuristics with unknown overall solution quality. We revisit computing minimum ECCs exactly in practice by combining data reduction for both the ECC \emph{and} vertex clique cover (VCC) problems, which we do by modifying the polynomial-time reduction of Kou et al. [Commun. ACM, 1978] to transform a reduced ECC instance to a VCC instance; alternatively, we show it is possible to ``lift'' VCC reductions to the ECC problem. Our experiments show that combining data reduction for both problems (which we call \emph{synergistic data reduction}) enables finding exact minimum ECCs orders of magnitude faster than the technique of Gramm et al., and enables solving large sparse graphs on up to millions of vertices and edges that have never before been solved. With these new exact solutions in hand, we objectively evaluate the quality of recent heuristic algorithms on large instances for the first time. The most recent of these, \textsf{EO-ECC} by Abdullah et al. [ICCS 2022], is able to exactly solve 8 of the 21 instances for which we have exact solutions. It is our hope that our strategy rallies researchers to seek improved exact and heuristic methods for the ECC problem.
Keyword: mobile
ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation
Authors: Shuyang Sun, Weijun Wang, Qihang Yu, Andrew Howard, Philip Torr, Liang-Chieh Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to its high complexity, the training objective of panoptic segmentation will inevitably lead to much higher false positive penalization. Such unbalanced loss makes the training process of the end-to-end mask-transformer based architectures difficult, especially for efficient models. In this paper, we present ReMaX that adds relaxation to mask predictions and class predictions during training for panoptic segmentation. We demonstrate that via these simple relaxation techniques during training, our model can be consistently improved by a clear margin \textbf{without} any extra computational cost on inference. By combining our method with efficient backbones like MobileNetV3-Small, our method achieves new state-of-the-art results for efficient panoptic segmentation on COCO, ADE20K and Cityscapes. Code and pre-trained checkpoints will be available at \url{https://github.com/google-research/deeplab2}.
Collision-free Motion Planning for Mobile Robots by Zero-order Robust Optimization-based MPC
Authors: Yunfan Gao, Florian Messerer, Jonathan Frey, Niels van Duijkeren, Moritz Diehl
Abstract
This paper presents an implementation of robust model predictive control (MPC) for collision-free reference trajectory tracking for mobile robots. The presented approach considers the robot motion to be subject to process noise bounded by ellipsoidal sets. In order to efficiently handle the evolution of the disturbance ellipsoids within the MPC, the zero-order robust optimization (zoRO) scheme is applied. The idea is to fix the disturbance ellipsoids within one optimization iteration and solve the problem repeatedly with updated disturbance ellipsoid trajectories. The zero-order approach is suboptimal in general. However, we show that it does not impair convergence to the reference trajectory in the absence of obstacles. The experiments on an industrial mobile robot prototype demonstrate the performance of the controller.
An Integrated FPGA Accelerator for Deep Learning-based 2D/3D Path Planning
Abstract
Path planning is a crucial component for realizing the autonomy of mobile robots. However, due to limited computational resources on mobile robots, it remains challenging to deploy state-of-the-art methods and achieve real-time performance. To address this, we propose P3Net (PointNet-based Path Planning Networks), a lightweight deep-learning-based method for 2D/3D path planning, and design an IP core (P3NetCore) targeting FPGA SoCs (Xilinx ZCU104). P3Net improves the algorithm and model architecture of the recently-proposed MPNet. P3Net employs an encoder with a PointNet backbone and a lightweight planning network in order to extract robust point cloud features and sample path points from a promising region. P3NetCore is comprised of the fully-pipelined point cloud encoder, batched bidirectional path planner, and parallel collision checker, to cover most part of the algorithm. On the 2D (3D) datasets, P3Net with the IP core runs 24.54-149.57x and 6.19-115.25x (10.03-59.47x and 3.38-28.76x) faster than ARM Cortex CPU and Nvidia Jetson while only consuming 0.255W (0.809W), and is up to 1049.42x (133.84x) power-efficient than the workstation. P3Net improves the success rate by up to 28.2% and plans a near-optimal path, leading to a significantly better tradeoff between computation and solution quality than MPNet and the state-of-the-art sampling-based methods.
Abstract
A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-exit models, mixed DNN precision, as well as ML inference accelerator designs that minimize latency and energy, while preserving delivered accuracy. All of them, however, yield improvements for a single static point in the latency-accuracy tradeoff space. We make a case for applications that operate in dynamically changing deployment scenarios, where no single static point is optimal. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that uses (activates) different SubNets within this weight-shared construct. This creates an opportunity to exploit the inherent temporal locality with our proposed SubGraph Stationary (SGS) optimization. We take a hardware-software co-design approach with a real implementation of SGS in SushiAccel and the implementation of a software scheduler SushiSched controlling which SubNets to serve and what to cache in real-time. Combined, they are vertically integrated into SUSHI-an inference serving stack. For the stream of queries, SUSHI yields up to 25% improvement in latency, 0.98% increase in served accuracy. SUSHI can achieve up to 78.7% off-chip energy savings.
Detection-segmentation convolutional neural network for autonomous vehicle perception
Authors: Maciej Baczmanski, Robert Synoczek, Mateusz Wasala, Tomasz Kryjak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Object detection and segmentation are two core modules of an autonomous vehicle perception system. They should have high efficiency and low latency while reducing computational complexity. Currently, the most commonly used algorithms are based on deep neural networks, which guarantee high efficiency but require high-performance computing platforms. In the case of autonomous vehicles, i.e. cars, but also drones, it is necessary to use embedded platforms with limited computing power, which makes it difficult to meet the requirements described above. A reduction in the complexity of the network can be achieved by using an appropriate: architecture, representation (reduced numerical precision, quantisation, pruning), and computing platform. In this paper, we focus on the first factor - the use of so-called detection-segmentation networks as a component of a perception system. We considered the task of segmenting the drivable area and road markings in combination with the detection of selected objects (pedestrians, traffic lights, and obstacles). We compared the performance of three different architectures described in the literature: MultiTask V3, HybridNets, and YOLOP. We conducted the experiments on a custom dataset consisting of approximately 500 images of the drivable area and lane markings, and 250 images of detected objects. Of the three methods analysed, MultiTask V3 proved to be the best, achieving 99% mAP_50 for detection, 97% MIoU for drivable area segmentation, and 91% MIoU for lane segmentation, as well as 124 fps on the RTX 3060 graphics card. This architecture is a good solution for embedded perception systems for autonomous vehicles. The code is available at: https://github.com/vision-agh/MMAR_2023.
Miniaturized Graph Convolutional Networks with Topologically Consistent Pruning
Authors: Hichem Sahbi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Magnitude pruning is one of the mainstream methods in lightweight architecture design whose goal is to extract subnetworks with the largest weight connections. This method is known to be successful, but under very high pruning regimes, it suffers from topological inconsistency which renders the extracted subnetworks disconnected, and this hinders their generalization ability. In this paper, we devise a novel magnitude pruning method that allows extracting subnetworks while guarantying their topological consistency. The latter ensures that only accessible and co-accessible -- impactful -- connections are kept in the resulting lightweight networks. Our solution is based on a novel reparametrization and two supervisory bi-directional networks which implement accessibility/co-accessibility and guarantee that only connected subnetworks will be selected during training. This solution allows enhancing generalization significantly, under very high pruning regimes, as corroborated through extensive experiments, involving graph convolutional networks, on the challenging task of skeleton-based action recognition.
Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Abstract
The event streams generated by dynamic vision sensors (DVS) are sparse and non-uniform in the spatial domain, while still dense and redundant in the temporal domain. Although spiking neural network (SNN), the event-driven neuromorphic model, has the potential to extract spatio-temporal features from the event streams, it is not effective and efficient. Based on the above, we propose an events sparsification spiking framework dubbed as Razor SNN, pruning pointless event frames progressively. Concretely, we extend the dynamic mechanism based on the global temporal embeddings, reconstruct the features, and emphasize the events effect adaptively at the training stage. During the inference stage, eliminate fruitless frames hierarchically according to a binary mask generated by the trained temporal embeddings. Comprehensive experiments demonstrate that our Razor SNN achieves competitive performance consistently on four events-based benchmarks: DVS 128 Gesture, N-Caltech 101, CIFAR10-DVS and SHD.
Keyword: diffusion
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Authors: Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract
The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. We adopt contrastive audio-visual pretraining (CAVP) to learn more temporally and semantically aligned features, then train an LDM with CAVP-aligned visual features on spectrogram latent space. The CAVP-aligned features enable LDM to capture the subtler audio-visual correlation via a cross-attention module. We further significantly improve sample quality with `double guidance'. Diff-Foley achieves state-of-the-art V2A performance on current large scale V2A dataset. Furthermore, we demonstrate Diff-Foley practical applicability and generalization capabilities via downstream finetuning. Project Page: see https://diff-foley.github.io/
Class-Incremental Learning using Diffusion Model for Distillation and Replay
Authors: Quentin Jodelet, Xin Liu, Yin Jun Phua, Tsuyoshi Murata
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Class-incremental learning aims to learn new classes in an incremental fashion without forgetting the previously learned ones. Several research works have shown how additional data can be used by incremental models to help mitigate catastrophic forgetting. In this work, following the recent breakthrough in text-to-image generative models and their wide distribution, we propose the use of a pretrained Stable Diffusion model as a source of additional data for class-incremental learning. Compared to competitive methods that rely on external, often unlabeled, datasets of real images, our approach can generate synthetic samples belonging to the same classes as the previously encountered images. This allows us to use those additional data samples not only in the distillation loss but also for replay in the classification loss. Experiments on the competitive benchmarks CIFAR100, ImageNet-Subset, and ImageNet demonstrate how this new approach can be used to further improve the performance of state-of-the-art methods for class-incremental learning on large scale datasets.
Counting Guidance for High Fidelity Text-to-Image Synthesis
Authors: Wonjun Kang, Kevin Galim, Hyung Il Koo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recently, the quality and performance of text-to-image generation significantly advanced due to the impressive results of diffusion models. However, text-to-image diffusion models still fail to generate high fidelity content with respect to the input prompt. One problem where text-to-diffusion models struggle is generating the exact number of objects specified in the text prompt. E.g. given a prompt "five apples and ten lemons on a table", diffusion-generated images usually contain the wrong number of objects. In this paper, we propose a method to improve diffusion models to focus on producing the correct object count given the input prompt. We adopt a counting network that performs reference-less class-agnostic counting for any given image. We calculate the gradients of the counting network and refine the predicted noise for each step. To handle multiple types of objects in the prompt, we use novel attention map guidance to obtain high-fidelity masks for each object. Finally, we guide the denoising process by the calculated gradients for each object. Through extensive experiments and evaluation, we demonstrate that our proposed guidance method greatly improves the fidelity of diffusion models to object count.
Content-Preserving Diffusion Model for Unsupervised AS-OCT image Despeckling
Authors: Li Sanqian, Higashita Risa, Fu Huazhu, Li Heng, Niu Jingxuan, Liu Jiang
Subjects: Graphics (cs.GR); Image and Video Processing (eess.IV)
Abstract
Anterior segment optical coherence tomography (AS-OCT) is a non-invasive imaging technique that is highly valuable for ophthalmic diagnosis. However, speckles in AS-OCT images can often degrade the image quality and affect clinical analysis. As a result, removing speckles in AS-OCT images can greatly benefit automatic ophthalmology analysis. Unfortunately, challenges still exist in deploying effective AS-OCT image denoising algorithms, including collecting sufficient paired training data and the requirement to preserve consistent content in medical images. To address these practical issues, we propose an unsupervised AS-OCT despeckling algorithm via Content Preserving Diffusion Model (CPDM) with statistical knowledge. At the training stage, a Markov chain transforms clean images to white Gaussian noise by repeatedly adding random noise and removes the predicted noise in a reverse procedure. At the inference stage, we first analyze the statistical distribution of speckles and convert it into a Gaussian distribution, aiming to match the fast truncated reverse diffusion process. We then explore the posterior distribution of observed images as a fidelity term to ensure content consistency in the iterative procedure. Our experimental results show that CPDM significantly improves image quality compared to competitive methods. Furthermore, we validate the benefits of CPDM for subsequent clinical analysis, including ciliary muscle (CM) segmentation and scleral spur (SS) localization.
On Numerical Methods for Stochastic SINDy
Authors: Mathias Wanner, Igor Mezić
Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
Abstract
The Sparse Identification of Nonlinear Dynamics (SINDy) algorithm can be applied to stochastic differential equations to estimate the drift and the diffusion function using data from a realization of the SDE. The SINDy algorithm requires sample data from each of these functions, which is typically estimated numerically from the data of the state. We analyze the performance of the previously proposed estimates for the drift and diffusion function to give bounds on the error for finite data. However, since this algorithm only converges as both the sampling frequency and the length of trajectory go to infinity, obtaining approximations within a certain tolerance may be infeasible. To combat this, we develop estimates with higher orders of accuracy for use in the SINDy framework. For a given sampling frequency, these estimates give more accurate approximations of the drift and diffusion functions, making SINDy a far more feasible system identification method.
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Authors: Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123.
Keyword: adaptive
AdaCache: A Disaggregated Cache System with Adaptive Block Size for Cloud Block Storage
Authors: Qirui Yang, Runyu Jin, Ni Fan, Devasena Inupakutika, Bridget Davis, Ming Zhao
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
NVMe SSD caching has demonstrated impressive capabilities in solving cloud block storage's I/O bottleneck and enhancing application performance in public, private, and hybrid cloud environments. However, traditional host-side caching solutions have several serious limitations. First, the cache cannot be shared across hosts, leading to low cache utilization. Second, the commonly-used fix-sized cache block allocation mechanism is unable to provide good cache performance with low memory overhead for diverse cloud workloads with vastly different I/O patterns. This paper presents AdaCache, a novel userspace disaggregated cache system that utilizes adaptive cache block allocation for cloud block storage. First, AdaCache proposes an innovative adaptive cache block allocation scheme that allocates cache blocks based on the request size to achieve both good cache performance and low memory overhead. Second, AdaCache proposes a group-based cache organization that stores cache blocks into groups to solve the fragmentation problem brought by variable-sized cache blocks. Third, AdaCache designs a two-level cache replacement policy that replaces cache blocks in both single blocks and groups to improve the hit ratio. Experimental results with real-world traces show that AdaCache can substantially improve I/O performance and reduce storage access caused by cache miss with a much lower memory usage compared to traditional fix-sized cache systems.
Towards Anatomy Education with Generative AI-based Virtual Assistants in Immersive Virtual Reality Environments
Abstract
Anatomy education is essential to support medical students in understanding the morphology, location, and spatial relationships of anatomical structures. Virtual reality (VR) and interactive 3D visualization systems have been proposed to provide an engaging learning experience and environment. However, VR-based systems integrated with a generative artificial intelligence (AI) assistant for anatomy education are still underrepresented. This work presents a VR environment with a generative AI virtual assistant to support human anatomy education, allowing the user to communicate verbally with the virtual assistant. We aim to provide a more interactive, adaptive, and informative learning experience. The proposed environment was assessed in a pilot user study (n = 16) with a comparison of two configurations: avatar and screen-based virtual assistant. We observed no significant difference between the configurations and difficulty level in the task completion time and the number of interactions with the virtual assistant. However, there was a significant difference in the score between the difficulty level in the avatar configuration. The results also provide insights into the usability, task load, and sense of presence in the virtual environment. Our proposed environment offers potential benefits and research directions for medical education, using generative AI to assist and enhance the learning experience.
Designing Stable Neural Networks using Convex Analysis and ODEs
Authors: Ferdia Sherry, Elena Celledoni, Matthias J. Ehrhardt, Davide Murari, Brynjulf Owren, Carola-Bibiane Schönlieb
Abstract
Motivated by classical work on the numerical integration of ordinary differential equations we present a ResNet-styled neural network architecture that encodes non-expansive (1-Lipschitz) operators, as long as the spectral norms of the weights are appropriately constrained. This is to be contrasted with the ordinary ResNet architecture which, even if the spectral norms of the weights are constrained, has a Lipschitz constant that, in the worst case, grows exponentially with the depth of the network. Further analysis of the proposed architecture shows that the spectral norms of the weights can be further constrained to ensure that the network is an averaged operator, making it a natural candidate for a learned denoiser in Plug-and-Play algorithms. Using a novel adaptive way of enforcing the spectral norm constraints, we show that, even with these constraints, it is possible to train performant networks. The proposed architecture is applied to the problem of adversarially robust image classification, to image denoising, and finally to the inverse problem of deblurring.
Topology-Aware Resilient Routing Protocol for FANETs: An Adaptive Q-Learning Approach
Authors: Yanpeng Cui, Qixun Zhang, Zhiyong Feng, Zhiqing Wei, Ce Shi, Heng Yang
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Flying ad hoc networks (FANETs) play a crucial role in numerous military and civil applications since it shortens mission duration and enhances coverage significantly compared with a single unmanned aerial vehicle (UAV). Whereas, designing an energy-efficient FANET routing protocol with a high packet delivery rate (PDR) and low delay is challenging owing to the dynamic topology changes. In this article, we propose a topology-aware resilient routing strategy based on adaptive Q-learning (TARRAQ) to accurately capture topology changes with low overhead and make routing decisions in a distributed and autonomous way. First, we analyze the dynamic behavior of UAV nodes via the queuing theory, and then the closed-form solutions of neighbors' change rate (NCR) and neighbors' change interarrival time (NCIT) distribution are derived. Based on the real-time NCR and NCIT, a resilient sensing interval (SI) is determined by defining the expected sensing delay of network events. Besides, we also present an adaptive Q-learning approach that enables UAVs to make distributed, autonomous, and adaptive routing decisions, where the above SI ensures that the action space can be updated in time at a low cost. The simulation results verify the accuracy of the topology dynamic analysis model and also prove that our TARRAQ outperforms the Q-learning-based topology-aware routing (QTAR), mobility prediction-based virtual routing (MPVR), and greedy perimeter stateless routing based on energy-efficient hello (EE-Hello) in terms of 25.23%, 20.24%, and 13.73% lower overhead, 9.41%, 14.77%, and 16.70% higher PDR, and 5.12%, 15.65%, and 11.31% lower energy consumption, respectively.
Improving Federated Aggregation with Deep Unfolding Networks
Authors: Shanika I Nanayakkara, Shiva Raj Pokhrel, Gang Li
Abstract
The performance of Federated learning (FL) is negatively affected by device differences and statistical characteristics between participating clients. To address this issue, we introduce a deep unfolding network (DUN)-based technique that learns adaptive weights that unbiasedly ameliorate the adverse impacts of heterogeneity. The proposed method demonstrates impressive accuracy and quality-aware aggregation. Furthermore, it evaluated the best-weighted normalization approach to define less computational power on the aggregation method. The numerical experiments in this study demonstrate the effectiveness of this approach and provide insights into the interpretability of the unbiased weights learned. By incorporating unbiased weights into the model, the proposed approach effectively addresses quality-aware aggregation under the heterogeneity of the participating clients and the FL environment. Codes and details are \href{https://github.com/shanikairoshi/Improved_DUN_basedFL_Aggregation}{here}.
3D induction log modelling with integral equation method and domain decomposition preconditioning
Abstract
The deployment of electromagnetic (EM) induction tools while drilling is one of the standard routines for assisting the geosteering decision-making process. The conductivity distribution obtained through the inversion of the EM induction log can provide important information about the geological structure around the borehole. To image the 3D geological structure in the subsurface, 3D inversion of the EM induction log is required. Because the inversion process is mainly dependent on forward modelling, the use of fast and accurate forward modelling is essential. In this paper, we present an improved version of the integral equation (IE) based modelling technique for general anisotropic media with domain decomposition preconditioning. The discretised IE after domain decomposition equals a fixed-point equation that is solved iteratively with either the block Gauss-Seidel or Jacobi preconditioning. Within each iteration, the inverse of the block matrix is computed using a Krylov subspace method instead of a direct solver. An additional reduction in computational time is obtained by using an adaptive relative residual stopping criterion in the iterative solver. Numerical experiments show a maximum reduction in computational time of 35 per cent compared to solving the full-domain IE with a conventional GMRES solver. Additionally, the reduction of memory requirement for covering a large area of the induction tool sensitivity enables acceleration with limited GPU memory. Hence, we conclude that the domain decomposition method is improving the efficiency of the IE method by reducing the computation time and memory requirement.
Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Abstract
The event streams generated by dynamic vision sensors (DVS) are sparse and non-uniform in the spatial domain, while still dense and redundant in the temporal domain. Although spiking neural network (SNN), the event-driven neuromorphic model, has the potential to extract spatio-temporal features from the event streams, it is not effective and efficient. Based on the above, we propose an events sparsification spiking framework dubbed as Razor SNN, pruning pointless event frames progressively. Concretely, we extend the dynamic mechanism based on the global temporal embeddings, reconstruct the features, and emphasize the events effect adaptively at the training stage. During the inference stage, eliminate fruitless frames hierarchically according to a binary mask generated by the trained temporal embeddings. Comprehensive experiments demonstrate that our Razor SNN achieves competitive performance consistently on four events-based benchmarks: DVS 128 Gesture, N-Caltech 101, CIFAR10-DVS and SHD.
Termination of Picard Iteration for Coupled Neutronics/Thermal-Hydraulics Simulations
Abstract
In this paper, we consider the coupled N/TH problem, in which the termination criterion for the neutronics iteration adopts an adaptive tolerance with respect to the fuel temperature residual at each Picard iteration. We refer to this coupling scheme as the inexact Picard iteration method. Fourier analysis is performed to investigate how the convergence behavior of Picard iteration is influenced by the inexact neutronics solution. It is found that if the convergence rate of the neutronics solution is slow, Picard coupling may become unstable unless a tighter tolerance is used for the neutronics iteration. Nevertheless, our analysis indicates that a certain amount of over-solving is necessary for the stability of Picard iteration unless the iterative solution of the subproblem is very efficient, which however has not been addressed in the previous studies.
MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying
Authors: Shaoshuai Shi, Li Jiang, Dengxin Dai, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. However, this task is challenging due to the diverse behaviors of traffic participants and complex environmental contexts. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries, enabling efficient and accurate prediction of future trajectories. By customizing intention queries for distinct motion modalities, MTR improves multimodal motion prediction while reducing reliance on dense goal candidates. The framework comprises two essential processes: global intention localization, identifying the agent's intent to enhance overall efficiency, and local movement refinement, adaptively refining predicted trajectories for improved accuracy. Moreover, we introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents. MTR++ incorporates symmetric context modeling and mutually-guided intention querying modules to facilitate future behavior interaction among multiple agents, resulting in scene-compliant future trajectories. Extensive experimental results demonstrate that the MTR framework achieves state-of-the-art performance on the highly-competitive motion prediction benchmarks, while the MTR++ framework surpasses its precursor, exhibiting enhanced performance and efficiency in predicting accurate multimodal future trajectories for multiple agents.
Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation
Abstract
3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, typically demanding high-resolution 3D perceptual grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we propose Act3D, a manipulation policy Transformer that casts 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, iteratively samples 3D point grids in free space in a coarse-to-fine manner, featurizes them using relative spatial attention to the physical feature cloud, and selects the best feature point for end-effector pose prediction. Act3D sets a new state-of-the-art in RLbench, an established manipulation benchmark. Our model achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLbench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. In thorough ablations, we show the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions. Code and videos are available at our project site: https://act3d.github.io/.
Abstract
A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-exit models, mixed DNN precision, as well as ML inference accelerator designs that minimize latency and energy, while preserving delivered accuracy. All of them, however, yield improvements for a single static point in the latency-accuracy tradeoff space. We make a case for applications that operate in dynamically changing deployment scenarios, where no single static point is optimal. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that uses (activates) different SubNets within this weight-shared construct. This creates an opportunity to exploit the inherent temporal locality with our proposed SubGraph Stationary (SGS) optimization. We take a hardware-software co-design approach with a real implementation of SGS in SushiAccel and the implementation of a software scheduler SushiSched controlling which SubNets to serve and what to cache in real-time. Combined, they are vertically integrated into SUSHI-an inference serving stack. For the stream of queries, SUSHI yields up to 25% improvement in latency, 0.98% increase in served accuracy. SUSHI can achieve up to 78.7% off-chip energy savings.
Designing strong baselines for ternary neural network quantization through support and mass equalization
Authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. These results rely on over-parameterized backbones, which are expensive to run. This computational burden can be dramatically reduced by quantizing (in either data-free (DFQ), post-training (PTQ) or quantization-aware training (QAT) scenarios) floating point values to ternary values (2 bits, with each weight taking value in {-1,0,1}). In this context, we observe that rounding to nearest minimizes the expected error given a uniform distribution and thus does not account for the skewness and kurtosis of the weight distribution, which strongly affects ternary quantization performance. This raises the following question: shall one minimize the highest or average quantization error? To answer this, we design two operators: TQuant and MQuant that correspond to these respective minimization tasks. We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios in DFQ, PTQ and QAT and give strong insights to pave the way for future research in deep neural network quantization.
Keyword: efficient
Enterprise Disk Drive Scrubbing Based on Mondrian Conformal Predictors
FANET Experiment: Real-Time Surveillance Applications Connected to Image Processing System
Photon: A Cross Platform P2P Data Transfer Application
HYDRA: Hybrid Robot Actions for Imitation Learning
The power of motifs as inductive bias for learning molecular distributions
TemperatureGAN: Generative Modeling of Regional Atmospheric Temperatures
A Unified Framework for Online Data-Driven Predictive Control with Robust Safety Guarantees
ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation
Scaling Model Checking for DNN Analysis via State-Space Reduction and Input Segmentation (Extended Version)
Visualizing Geophylogenies -- Internal and External Labeling with Phylogenetic Tree Constraints
Topology-Aware Resilient Routing Protocol for FANETs: An Adaptive Q-Learning Approach
HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image
Physics-informed invertible neural network for the Koopman operator learning
LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection
Screw and Lie Group Theory in Multibody Kinematics -- Motion Representation and Recursive Kinematics of Tree-Topology Systems
Hashing-Based Distributed Clustering for Massive High-Dimensional Data
STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking
Collision-free Motion Planning for Mobile Robots by Zero-order Robust Optimization-based MPC
High-throughput Simulation of Federated Learning via Resource-Aware Client Placement
FedBone: Towards Large-Scale Federated Multi-Task Learning
Secure and Efficient Flexibility Service Procurement: A Game-Theoretic Approach
Landmark Guided Active Exploration with Stable Low-level Policy Learning
Multigrid-Augmented Deep Learning for the Helmholtz Equation: Better Scalability with Compact Implicit Layers
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Comparative study of subset selection methods for rapid prototyping of 3D object detection algorithms
RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution
Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Projection-based first-order constrained optimization solver for robotics
An Integrated FPGA Accelerator for Deep Learning-based 2D/3D Path Planning
Termination of Picard Iteration for Coupled Neutronics/Thermal-Hydraulics Simulations
MCQUIC -- A Multicast Extension for QUIC
Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Thompson sampling for improved exploration in GFlowNets
MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying
Screw and Lie Group Theory in Multibody Dynamics -- Recursive Algorithms and Equations of Motion of Tree-Topology Systems
Circular Systems Engineering
Meta-Reasoning: Semantics-Symbol Deconstruction For Large Language Models
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Keyword: faster
Multigrid-Augmented Deep Learning for the Helmholtz Equation: Better Scalability with Compact Implicit Layers
Control of Cross-Directional Systems with Approximate Symmetries
An Integrated FPGA Accelerator for Deep Learning-based 2D/3D Path Planning
Thompson sampling for improved exploration in GFlowNets
Solving Edge Clique Cover Exactly via Synergistic Data Reduction
Keyword: mobile
ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation
Collision-free Motion Planning for Mobile Robots by Zero-order Robust Optimization-based MPC
An Integrated FPGA Accelerator for Deep Learning-based 2D/3D Path Planning
Keyword: pruning
Subgraph Stationary Hardware-Software Inference Co-Design
Detection-segmentation convolutional neural network for autonomous vehicle perception
Miniaturized Graph Convolutional Networks with Topologically Consistent Pruning
Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Keyword: diffusion
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Class-Incremental Learning using Diffusion Model for Distillation and Replay
Counting Guidance for High Fidelity Text-to-Image Synthesis
Content-Preserving Diffusion Model for Unsupervised AS-OCT image Despeckling
On Numerical Methods for Stochastic SINDy
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Keyword: adaptive
AdaCache: A Disaggregated Cache System with Adaptive Block Size for Cloud Block Storage
Towards Anatomy Education with Generative AI-based Virtual Assistants in Immersive Virtual Reality Environments
Designing Stable Neural Networks using Convex Analysis and ODEs
Topology-Aware Resilient Routing Protocol for FANETs: An Adaptive Q-Learning Approach
Improving Federated Aggregation with Deep Unfolding Networks
3D induction log modelling with integral equation method and domain decomposition preconditioning
Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings
Termination of Picard Iteration for Coupled Neutronics/Thermal-Hydraulics Simulations
MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying
Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation
Keyword: quantization
Subgraph Stationary Hardware-Software Inference Co-Design
Designing strong baselines for ternary neural network quantization through support and mass equalization