Abstract
In this paper, we explore the challenges associated with navigating complex T-intersections in dense traffic scenarios for autonomous vehicles (AVs). Reinforcement learning algorithms have emerged as a promising approach to address these challenges by enabling AVs to make safe and efficient decisions in real-time. Here, we address the problem of efficiently and safely navigating T-intersections using a lower-cost, single-agent approach based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning algorithm. We show that our TD3-based method, when trained and tested in the CARLA simulation platform, demonstrates stable convergence and improved safety performance in various traffic densities. Our results reveal that the proposed approach enables the AV to effectively navigate T-intersections, outperforming previous methods in terms of travel delays, collision minimization, and overall cost. This study contributes to the growing body of knowledge on reinforcement learning applications in autonomous driving and highlights the potential of single-agent, cost-effective methods for addressing more complex driving scenarios and advancing reinforcement learning algorithms in the future.
Time-vectorized numerical integration for systems of ODEs
Authors: Mark C. Messner, Tianchen Hu, Tianju Chen
Abstract
Stiff systems of ordinary differential equations (ODEs) and sparse training data are common in scientific problems. This paper describes efficient, implicit, vectorized methods for integrating stiff systems of ordinary differential equations through time and calculating parameter gradients with the adjoint method. The main innovation is to vectorize the problem both over the number of independent times series and over a batch or "chunk" of sequential time steps, effectively vectorizing the assembly of the implicit system of ODEs. The block-bidiagonal structure of the linearized implicit system for the backward Euler method allows for further vectorization using parallel cyclic reduction (PCR). Vectorizing over both axes of the input data provides a higher bandwidth of calculations to the computing device, allowing even problems with comparatively sparse data to fully utilize modern GPUs and achieving speed ups of greater than 100x, compared to standard, sequential time integration. We demonstrate the advantages of implicit, vectorized time integration with several example problems, drawn from both analytical stiff and non-stiff ODE models as well as neural ODE models. We also describe and provide a freely available open-source implementation of the methods developed here.
Electrical Grid Anomaly Detection via Tensor Decomposition
Authors: Alexander Most, Maksim Eren, Nigel Lawrence, Boian Alexandrov
Abstract
Supervisory Control and Data Acquisition (SCADA) systems often serve as the nervous system for substations within power grids. These systems facilitate real-time monitoring, data acquisition, control of equipment, and ensure smooth and efficient operation of the substation and its connected devices. Previous work has shown that dimensionality reduction-based approaches, such as Principal Component Analysis (PCA), can be used for accurate identification of anomalies in SCADA systems. While not specifically applied to SCADA, non-negative matrix factorization (NMF) has shown strong results at detecting anomalies in wireless sensor networks. These unsupervised approaches model the normal or expected behavior and detect the unseen types of attacks or anomalies by identifying the events that deviate from the expected behavior. These approaches; however, do not model the complex and multi-dimensional interactions that are naturally present in SCADA systems. Differently, non-negative tensor decomposition is a powerful unsupervised machine learning (ML) method that can model the complex and multi-faceted activity details of SCADA events. In this work, we novelly apply the tensor decomposition method Canonical Polyadic Alternating Poisson Regression (CP-APR) with a probabilistic framework, which has previously shown state-of-the-art anomaly detection results on cyber network data, to identify anomalies in SCADA systems. We showcase that the use of statistical behavior analysis of SCADA communication with tensor decomposition improves the specificity and accuracy of identifying anomalies in electrical grid systems. In our experiments, we model real-world SCADA system data collected from the electrical grid operated by Los Alamos National Laboratory (LANL) which provides transmission and distribution service through a partnership with Los Alamos County, and detect synthetically generated anomalies.
Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach
Authors: Heasung Kim, Sravan Ankireddy
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract
In this project, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) are communicating with multiple user equipments (UEs). Because of the exponential computational complexity of brute force search, we instead solve this non-convex optimization problem using deep reinforcement learning (RL) techniques. The modern communication systems are notorious for their difficulty in exactly modeling their behaviour. This limits us in using RL based algorithms as interaction with the environment is needed for the agent to explore and learn efficiently. Further, it is ill advised to deploy the algorithm in real world for exploration and learning because of the high cost of failure. In contrast to the previous RL-based solutions proposed, such as deep-Q network (DQN) based control, we propose taking an offline model based approach. We specifically consider discrete batch constrained deep Q-learning (BCQ) and show that performance similar to DQN can be acheived with only a fraction of the data and without the need for exploration. This results in maximizing sample efficiency and minimizing risk in the deployment of a new algorithm to commercial networks. We provide the entire resource of the project, including code and data, at the following link: https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g.
Every Parameter Matters: Ensuring the Convergence of Federated Learning with Dynamic Heterogeneous Models Reduction
Authors: Hanhan Zhou, Tian Lan, Guru Venkataramani, Wenbo Ding
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Cross-device Federated Learning (FL) faces significant challenges where low-end clients that could potentially make unique contributions are excluded from training large models due to their resource bottlenecks. Recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. In this paper, we present a unifying framework for heterogeneous FL algorithms with online model extraction and provide a general convergence analysis. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we illuminate two key factors impacting its convergence: model-extraction noise and minimum coverage index, advocating a joint design of local model extraction for efficient heterogeneous FL.
An Efficient Resilient MPC Scheme via Constraint Tightening against Cyberattacks: Application to Vehicle Cruise Control
Authors: Milad Farsi, Shuhao Bian, Nasser L. Azad, Xiaobing Shi, Andrew Walenstein
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
We propose a novel framework for designing a resilient Model Predictive Control (MPC) targeting uncertain linear systems under cyber attack. Assuming a periodic attack scenario, we model the system under Denial of Service (DoS) attack, also with measurement noise, as an uncertain linear system with parametric and additive uncertainty. To detect anomalies, we employ a Kalman filter-based approach. Then, through our observations of the intensity of the launched attack, we determine a range of possible values for the system matrices, as well as establish bounds of the additive uncertainty for the equivalent uncertain system. Leveraging a recent constraint tightening robust MPC method, we present an optimization-based resilient algorithm. Accordingly, we compute the uncertainty bounds and corresponding constraints offline for various attack magnitudes. Then, this data can be used efficiently in the MPC computations online. We demonstrate the effectiveness of the developed framework on the Adaptive Cruise Control (ACC) problem.
ELDEN: Exploration via Local Dependencies
Authors: Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martin-Martin
Abstract
Tasks with large state space and sparse rewards present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To deal with this problem, the community has proposed to augment the reward function with intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in order, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other. We present ELDEN, Exploration via Local DepENdencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. ELDEN utilizes a novel scheme -- the partial derivative of the learned dynamics to model the local dependencies between entities accurately and computationally efficiently. The uncertainty of the predicted dependencies is then used as an intrinsic reward to encourage exploration toward new interactions. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods.
Heterophily-Based Graph Neural Network for Imbalanced Classification
Abstract
Graph neural networks (GNNs) have shown promise in addressing graph-related problems, including node classification. However, conventional GNNs assume an even distribution of data across classes, which is often not the case in real-world scenarios, where certain classes are severely underrepresented. This leads to suboptimal performance of standard GNNs on imbalanced graphs. In this paper, we introduce a unique approach that tackles imbalanced classification on graphs by considering graph heterophily. We investigate the intricate relationship between class imbalance and graph heterophily, revealing that minority classes not only exhibit a scarcity of samples but also manifest lower levels of homophily, facilitating the propagation of erroneous information among neighboring nodes. Drawing upon this insight, we propose an efficient method, called Fast Im-GBK, which integrates an imbalance classification strategy with heterophily-aware GNNs to effectively address the class imbalance problem while significantly reducing training time. Our experiments on real-world graphs demonstrate our model's superiority in classification performance and efficiency for node classification tasks compared to existing baselines.
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Authors: Tao Li, Gang Li, Zhiwei Deng, Bryan Wang, Yang Li
Subjects: Computation and Language (cs.CL); Systems and Control (eess.SY)
Abstract
Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment (e.g. MiniWoB++). To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting. Without these trace examples, it remains a challenge how an agent can autonomously learn and improve its control on a computer, which limits the ability of an agent to perform a new task. We approach this problem with a zero-shot agent that requires no given expert traces. Our agent plans for executable actions on a partially observed environment, and iteratively progresses a task by identifying and learning from its mistakes via self-reflection and structured thought management. On the easy tasks of MiniWoB++, we show that our zero-shot agent often outperforms recent SoTAs, with more efficient reasoning. For tasks with more complexity, our reflective agent performs on par with prior best models, even though previous works had the advantages of accessing expert traces or additional screen information.
AcTExplore: Active Tactile Exploration on Unknown Objects
Authors: Amir-Hossein Shahidzadeh, Seong Jong Yoo, Pavan Mantripragada, Chahat Deep Singh, Cornelia Fermüller, Yiannis Aloimonos
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, primarily due to the large-scale unknown environments and limited sensing coverage of these sensors. To this end, we present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales that automatically explores the object surfaces in a limited number of steps. Through sufficient exploration, our algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. Our method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes.
Search-Adaptor: Text Embedding Customization for Information Retrieval
Authors: Jinsung Yoon, Sercan O Arik, Yanfei Chen, Tomas Pfister
Abstract
Text embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data has the power to further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, for customizing LLMs for information retrieval in an efficient and robust way. Search-Adaptor modifies the original text embedding generated by pre-trained LLMs, and can be integrated with any LLM, including those only available via APIs. On multiple real-world English and multilingual retrieval datasets, we show consistent and significant performance benefits for Search-Adaptor -- e.g., more than 5.2% improvements over the Google Embedding APIs in nDCG@10 averaged over 13 BEIR datasets.
Constrained Bayesian Optimization with Adaptive Active Learning of Unknown Constraints
Abstract
Optimizing objectives under constraints, where both the objectives and constraints are black box functions, is a common scenario in real-world applications such as scientific experimental design, design of medical therapies, and industrial process optimization. One popular approach to handling these complex scenarios is Bayesian Optimization (BO). In terms of theoretical behavior, BO is relatively well understood in the unconstrained setting, where its principles have been well explored and validated. However, when it comes to constrained Bayesian optimization (CBO), the existing framework often relies on heuristics or approximations without the same level of theoretical guarantees. In this paper, we delve into the theoretical and practical aspects of constrained Bayesian optimization, where the objective and constraints can be independently evaluated and are subject to noise. By recognizing that both the objective and constraints can help identify high-confidence regions of interest (ROI), we propose an efficient CBO framework that intersects the ROIs identified from each aspect to determine the general ROI. The ROI, coupled with a novel acquisition function that adaptively balances the optimization of the objective and the identification of feasible regions, enables us to derive rigorous theoretical justifications for its performance. We showcase the efficiency and robustness of our proposed CBO framework through empirical evidence and discuss the fundamental challenge of deriving practical regret bounds for CBO algorithms.
Tokenizer Choice For LLM Training: Negligible or Crucial?
Authors: Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr
Abstract
The recent success of LLMs has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as a blind spot. Shedding light on this underexplored area, we conduct a comprehensive study on the influence of tokenizer choice on LLM downstream performance by training 24 mono- and multilingual LLMs at a 2.6B parameter scale, ablating different tokenizer algorithms and parameterizations. Our studies highlight that the tokenizer choice can significantly impact the model's downstream performance, training and inference costs. In particular, we find that the common tokenizer evaluation metrics fertility and parity are not always predictive of model downstream performance, rendering these metrics a questionable choice for tokenizer evaluation. Furthermore, we show that multilingual tokenizers trained on the five most frequent European languages require vocabulary size increases of factor three in comparison to English. While English-only tokenizers have been applied to the training of multi-lingual LLMs in the past, we find that this approach results in a severe downstream performance degradation and additional training costs of up to 68%, due to an inefficient tokenization vocabulary.
Stabilizing Subject Transfer in EEG Classification with Divergence Estimation
Authors: Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons, Yunus Bicer, Deniz Erdogmus
Abstract
Classification models for electroencephalogram (EEG) data show a large decrease in performance when evaluated on unseen test sub jects. We reduce this performance decrease using new regularization techniques during model training. We propose several graphical models to describe an EEG classification task. From each model, we identify statistical relationships that should hold true in an idealized training scenario (with infinite data and a globally-optimal model) but that may not hold in practice. We design regularization penalties to enforce these relationships in two stages. First, we identify suitable proxy quantities (divergences such as Mutual Information and Wasserstein-1) that can be used to measure statistical independence and dependence relationships. Second, we provide algorithms to efficiently estimate these quantities during training using secondary neural network models. We conduct extensive computational experiments using a large benchmark EEG dataset, comparing our proposed techniques with a baseline method that uses an adversarial classifier. We find our proposed methods significantly increase balanced accuracy on test subjects and decrease overfitting. The proposed methods exhibit a larger benefit over a greater range of hyperparameters than the baseline method, with only a small computational cost at training time. These benefits are largest when used for a fixed training period, though there is still a significant benefit for a subset of hyperparameters when our techniques are used in conjunction with early stopping regularization.
Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction
Authors: Pol Caselles, Eduard Ramon, Jaime Garcia, Gil Triginer, Francesc Moreno-Noguer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent advancements in learning techniques that employ coordinate-based neural representations have yielded remarkable results in multi-view 3D reconstruction tasks. However, these approaches often require a substantial number of input views (typically several tens) and computationally intensive optimization procedures to achieve their effectiveness. In this paper, we address these limitations specifically for the problem of few-shot full 3D head reconstruction. We accomplish this by incorporating a probabilistic shape and appearance prior into coordinate-based representations, enabling faster convergence and improved generalization when working with only a few input images (even as low as a single image). During testing, we leverage this prior to guide the fitting process of a signed distance function using a differentiable renderer. By incorporating the statistical prior alongside parallelizable ray tracing and dynamic caching strategies, we achieve an efficient and accurate approach to few-shot full 3D head reconstruction. Moreover, we extend the H3DS dataset, which now comprises 60 high-resolution 3D full head scans and their corresponding posed images and masks, which we use for evaluation purposes. By leveraging this dataset, we demonstrate the remarkable capabilities of our approach in achieving state-of-the-art results in geometry reconstruction while being an order of magnitude faster than previous approaches.
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Authors: Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Text-guided image editing faces significant challenges to training and inference flexibility. Much literature collects large amounts of annotated image-text pairs to train text-conditioned generative models from scratch, which is expensive and not efficient. After that, some approaches that leverage pre-trained vision-language models are put forward to avoid data collection, but they are also limited by either per text-prompt optimization or inference-time hyper-parameters tuning. To address these issues, we investigate and identify a specific space, referred to as CLIP DeltaSpace, where the CLIP visual feature difference of two images is semantically aligned with the CLIP textual feature difference of their corresponding text descriptions. Based on DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP visual feature differences to the latent space directions of a generative model during the training phase, and predicts the latent space directions from the CLIP textual feature differences during the inference phase. And this design endows DeltaEdit with two advantages: (1) text-free training; (2) generalization to various text prompts for zero-shot inference. Extensive experiments validate the effectiveness and versatility of DeltaEdit with different generative models, including both the GAN model and the diffusion model, in achieving flexible text-guided image editing. Code is available at https://github.com/Yueming6568/DeltaEdit.
Price of Stability in Quality-Aware Federated Learning
Authors: Yizhou Yan, Xinyu Tang, Chao Huang, Ming Tang
Abstract
Federated Learning (FL) is a distributed machine learning scheme that enables clients to train a shared global model without exchanging local data. The presence of label noise can severely degrade the FL performance, and some existing studies have focused on algorithm design for label denoising. However, they ignored the important issue that clients may not apply costly label denoising strategies due to them being self-interested and having heterogeneous valuations on the FL performance. To fill this gap, we model the clients' interactions as a novel label denoising game and characterize its equilibrium. We also analyze the price of stability, which quantifies the difference in the system performance (e.g., global model accuracy, social welfare) between the equilibrium outcome and the socially optimal solution. We prove that the equilibrium outcome always leads to a lower global model accuracy than the socially optimal solution does. We further design an efficient algorithm to compute the socially optimal solution. Numerical experiments on MNIST dataset show that the price of stability increases as the clients' data become noisier, calling for an effective incentive mechanism.
Analysis of Weather and Time Features in Machine Learning-aided ERCOT Load Forecasting
Authors: Jonathan Yang, Mingjian Tuo, Jin Lu, Xingpeng Li
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Accurate load forecasting is critical for efficient and reliable operations of the electric power system. A large part of electricity consumption is affected by weather conditions, making weather information an important determinant of electricity usage. Personal appliances and industry equipment also contribute significantly to electricity demand with temporal patterns, making time a useful factor to consider in load forecasting. This work develops several machine learning (ML) models that take various time and weather information as part of the input features to predict the short-term system-wide total load. Ablation studies were also performed to investigate and compare the impacts of different weather factors on the prediction accuracy. Actual load and historical weather data for the same region were processed and then used to train the ML models. It is interesting to observe that using all available features, each of which may be correlated to the load, is unlikely to achieve the best forecasting performance; features with redundancy may even decrease the inference capabilities of ML models. This indicates the importance of feature selection for ML models. Overall, case studies demonstrated the effectiveness of ML models trained with different weather and time input features for ERCOT load forecasting.
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models
Abstract
Large language models have become a vital component in modern NLP, achieving state of the art performance in a variety of tasks. However, they are often inefficient for real-world deployment due to their expensive inference costs. Knowledge distillation is a promising technique to improve their efficiency while retaining most of their effectiveness. In this paper, we reproduce, compare and analyze several representative methods for task-agnostic (general-purpose) distillation of Transformer language models. Our target of study includes Output Distribution (OD) transfer, Hidden State (HS) transfer with various layer mapping strategies, and Multi-Head Attention (MHA) transfer based on MiniLMv2. Through our extensive experiments, we study the effectiveness of each method for various student architectures in both monolingual (English) and multilingual settings. Overall, we show that MHA transfer based on MiniLMv2 is generally the best option for distillation and explain the potential reasons behind its success. Moreover, we show that HS transfer remains as a competitive baseline, especially under a sophisticated layer mapping strategy, while OD transfer consistently lags behind other approaches. Findings from this study helped us deploy efficient yet effective student models for latency-critical applications.
Revisiting Multi-modal 3D Semantic Segmentation in Real-world Autonomous Driving
Authors: Feng Jiang, Chaoping Tu, Gang Zhang, Jun Li, Hanqing Huang, Junyu Lin, Di Feng, Jian Pu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
LiDAR and camera are two critical sensors for multi-modal 3D semantic segmentation and are supposed to be fused efficiently and robustly to promise safety in various real-world scenarios. However, existing multi-modal methods face two key challenges: 1) difficulty with efficient deployment and real-time execution; and 2) drastic performance degradation under weak calibration between LiDAR and cameras. To address these challenges, we propose CPGNet-LCF, a new multi-modal fusion framework extending the LiDAR-only CPGNet. CPGNet-LCF solves the first challenge by inheriting the easy deployment and real-time capabilities of CPGNet. For the second challenge, we introduce a novel weak calibration knowledge distillation strategy during training to improve the robustness against the weak calibration. CPGNet-LCF achieves state-of-the-art performance on the nuScenes and SemanticKITTI benchmarks. Remarkably, it can be easily deployed to run in 20ms per frame on a single Tesla V100 GPU using TensorRT TF16 mode. Furthermore, we benchmark performance over four weak calibration levels, demonstrating the robustness of our proposed approach.
HybridChain: Fast, Accurate, and Secure Transaction Processing with Distributed Learning
Authors: Amirhossein Taherpour, Xiaodong Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
In order to fully unlock the transformative power of distributed ledgers and blockchains, it is crucial to develop innovative consensus algorithms that can overcome the obstacles of security, scalability, and interoperability, which currently hinder their widespread adoption. This paper introduces HybridChain that combines the advantages of sharded blockchain and DAG distributed ledger, and a consensus algorithm that leverages decentralized learning. Our approach involves validators exchanging perceptions as votes to assess potential conflicts between transactions and the witness set, representing input transactions in the UTXO model. These perceptions collectively contribute to an intermediate belief regarding the validity of transactions. By integrating their beliefs with those of other validators, localized decisions are made to determine validity. Ultimately, a final consensus is achieved through a majority vote, ensuring precise and efficient validation of transactions. Our proposed approach is compared to the existing DAG-based scheme IOTA and the sharded blockchain Omniledger through extensive simulations. The results show that IOTA has high throughput and low latency but sacrifices accuracy and is vulnerable to orphanage attacks especially with low transaction rates. Omniledger achieves stable accuracy by increasing shards but has increased latency. In contrast, the proposed HybridChain exhibits fast, accurate, and secure transaction processing, and excellent scalability.
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Authors: Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang Huang, Christine Chan, Chuer Pan, Chuyuan Fu, Coline Devin, Danny Driess, Deepak Pathak, Dhruv Shah, Dieter Büchler, Dmitry Kalashnikov, Dorsa Sadigh, Edward Johns, Federico Ceola, Fei Xia, Freek Stulp, Gaoyue Zhou, Gaurav S. Sukhatme, Gautam Salhotra, Ge Yan, Giulio Schiavi, Gregory Kahn, Hao Su, Hao-Shu Fang, Haochen Shi, Heni Ben Amor, Henrik I Christensen, Hiroki Furuta, Homer Walke, Hongjie Fang, Igor Mordatch, Ilija Radosavovic, Isabel Leal, Jacky Liang, Jad Abou-Chakra, et al. (121 additional authors not shown)
Abstract
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms.
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Abstract
Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et al. (2021). We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i.e., the depth of the computation graph). Based on our observations, we propose a transformer-based architecture called Hyper-UT, which combines dynamic function generation from hyper networks with adaptive depth from Universal Transformers. This model demonstrates higher accuracy and a fairer allocation of computational resources when generalizing to higher numbers of computation steps. We conclude that mechanisms for adaptive depth and modularity complement each other in improving efficient generalization concerning example complexity. Additionally, to emphasize the broad applicability of our findings, we illustrate that in a standard image recognition task, Hyper- UT's performance matches that of a ViT model but with considerably reduced computational demands (achieving over 70\% average savings by effectively using fewer layers).
Gesture Recognition for FMCW Radar on the Edge
Authors: Maximilian Strobel, Stephan Schoenfeldt, Jonas Daugalas
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
This paper introduces a lightweight gesture recognition system based on 60 GHz frequency modulated continuous wave (FMCW) radar. We show that gestures can be characterized efficiently by a set of five features, and propose a slim radar processing algorithm to extract these features. In contrast to previous approaches, we avoid heavy 2D processing, i.e. range-Doppler imaging, and perform instead an early target detection - this allows us to port the system to fully embedded platforms with tight constraints on memory, compute and power consumption. A recurrent neural network (RNN) based architecture exploits these features to jointly detect and classify five different gestures. The proposed system recognizes gestures with an F1 score of 98.4% on our hold-out test dataset, it runs on an Arm Cortex-M4 microcontroller requiring less than 280 kB of flash memory, 120 kB of RAM, and consuming 75 mW of power.
Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System
Abstract
Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality of generated responses. In this paper, we propose the application of maximal marginal likelihood to train a perceptive retriever by utilizing signals from response generation for supervision. In addition, our approach goes beyond considering solely retrieved entities and incorporates various meta knowledge to guide the generator, thus improving the utilization of knowledge. We evaluate our approach on three task-oriented dialogue datasets using T5 and ChatGPT as the backbone models. The results demonstrate that when combined with meta knowledge, the response generator can effectively leverage high-quality knowledge records from the retriever and enhance the quality of generated responses. The codes and models of this paper are available at https://github.com/shenwzh3/MK-TOD.
Convexification for a Coefficient Inverse Problem of Mean Field Games
Authors: Michael V. Klibanov, Jingzhi Li, Zhipeng Yang
Abstract
The globally convergent convexification numerical method is constructed for a Coefficient Inverse Problem for the Mean Field Games System. A coefficient characterizing the global interaction term is recovered from the single measurement data. In particular, a new Carleman estimate for the Volterra integral operator is proven, and it stronger than the previously known one. Numerical results demonstrate accurate reconstructions from noisy data.
Extending Multi-modal Contrastive Representations
Authors: Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao Jin, Zhou Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-modal contrastive representation (MCR) of more than three modalities is critical in multi-modal learning. Although recent methods showcase impressive achievements, the high dependence on large-scale, high-quality paired data and the expensive training costs limit their further development. Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces. Specifically, Ex-MCR aligns multiple existing MCRs into the same based MCR, which can effectively preserve the original semantic alignment of the based MCR. Besides, we comprehensively enhance the entire learning pipeline for aligning MCR spaces from the perspectives of training data, architecture, and learning objectives. With the preserved original modality alignment and the enhanced space alignment, Ex-MCR shows superior representation learning performance and excellent modality extensibility. To demonstrate the effectiveness of Ex-MCR, we align the MCR spaces of CLAP (audio-text) and ULIP (3D-vision) into the CLIP (vision-text), leveraging the overlapping text and image modality, respectively. Remarkably, without using any paired data, Ex-MCR learns a 3D-image-text-audio unified contrastive representation, and it achieves state-of-the-art performance on audio-visual, 3D-image, audio-text, visual-text retrieval, and 3D object classification tasks. More importantly, extensive qualitative results further demonstrate the emergent semantic alignment between the extended modalities (e.g., audio and 3D), which highlights the great potential of modality extensibility.
InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems
Abstract
Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that InstructTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here https://github.com/WillyHC22/InstructTODS/
A Hybrid Transfer Learning Assisted Decision Support System for Accurate Prediction of Alzheimer Disease
Authors: Mahin Khan Mahadi, Abdullah Abdullah, Jamal Uddin, Asif Newaz
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Alzheimer's disease (AD) is the most common long-term illness in elderly people. In recent years, deep learning has become popular in the area of medical imaging and has had a lot of success there. It has become the most effective way to look at medical images. When it comes to detecting AD, the deep neural model is more accurate and effective than general machine learning. Our research contributes to the development of a more comprehensive understanding and detection of the disease by identifying four distinct classes that are predictive of AD with a high weighted accuracy of 98.91%. A unique strategy has been proposed to improve the accuracy of the imbalance dataset classification problem via the combination of ensemble averaging models and five different transfer learning models in this study. EfficientNetB0+Resnet152(effnet+res152) and InceptionV3+EfficientNetB0+Resnet50(incep+effnet+res50) models have been fine-tuned and have reached the highest weighted accuracy for multi-class AD stage classifications.
EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval
Abstract
Dense embedding-based retrieval is now the industry standard for semantic search and ranking problems, like obtaining relevant web documents for a given query. Such techniques use a two-stage process: (a) contrastive learning to train a dual encoder to embed both the query and documents and (b) approximate nearest neighbor search (ANNS) for finding similar documents for a given query. These two stages are disjoint; the learned embeddings might be ill-suited for the ANNS method and vice-versa, leading to suboptimal performance. In this work, we propose End-to-end Hierarchical Indexing -- EHI -- that jointly learns both the embeddings and the ANNS structure to optimize retrieval performance. EHI uses a standard dual encoder model for embedding queries and documents while learning an inverted file index (IVF) style tree structure for efficient ANNS. To ensure stable and efficient learning of discrete tree-based ANNS structure, EHI introduces the notion of dense path embedding that captures the position of a query/document in the tree. We demonstrate the effectiveness of EHI on several benchmarks, including de-facto industry standard MS MARCO (Dev set and TREC DL19) datasets. For example, with the same compute budget, EHI outperforms state-of-the-art (SOTA) in by 0.6% (MRR@10) on MS MARCO dev set and by 4.2% (nDCG@10) on TREC DL19 benchmarks.
Scalarization for Multi-Task and Multi-Domain Learning at Scale
Abstract
Training a single model on multiple input domains and/or output tasks allows for compressing information from multiple sources into a unified backbone hence improves model efficiency. It also enables potential positive knowledge transfer across tasks/domains, leading to improved accuracy and data-efficient training. However, optimizing such networks is a challenge, in particular due to discrepancies between the different tasks or domains: Despite several hypotheses and solutions proposed over the years, recent work has shown that uniform scalarization training, i.e., simply minimizing the average of the task losses, yields on-par performance with more costly SotA optimization methods. This raises the issue of how well we understand the training dynamics of multi-task and multi-domain networks. In this work, we first devise a large-scale unified analysis of multi-domain and multi-task learning to better understand the dynamics of scalarization across varied task/domain combinations and model sizes. Following these insights, we then propose to leverage population-based training to efficiently search for the optimal scalarization weights when dealing with a large number of tasks or domains.
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Authors: Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, Rongrong Ji
Abstract
The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre-LLMs approaches in reducing model complexity, network pruning appears to lag behind in the era of LLMs, due mostly to its costly fine-tuning (or re-training) necessity under the massive volumes of model parameter and training data. To close this industry-academia gap, we introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that slightly updates sparse LLMs without the expensive backpropagation and any weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs. To accomplish this purpose, DSnoT particularly takes into account the anticipated reduction in reconstruction error for pruning and growing, as well as the variance w.r.t. different input data for growing each weight. This practice can be executed efficiently in linear time since its obviates the need of backpropagation for fine-tuning LLMs. Extensive experiments on LLaMA-V1/V2, Vicuna, and OPT across various benchmarks demonstrate the effectiveness of DSnoT in enhancing the performance of sparse LLMs, especially at high sparsity levels. For instance, DSnoT is able to outperform the state-of-the-art Wanda by 26.79 perplexity at 70% sparsity with LLaMA-7B. Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs. Codes are available at https://github.com/zxyxmu/DSnoT.
Relation-aware Ensemble Learning for Knowledge Graph Embedding
Abstract
Knowledge graph (KG) embedding is a fundamental task in natural language processing, and various methods have been proposed to explore semantic patterns in distinctive ways. In this paper, we propose to learn an ensemble by leveraging existing methods in a relation-aware manner. However, exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods. To address this issue, we propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently. This algorithm has the same computation cost as general ensemble methods but with much better performance. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method in efficiently searching relation-aware ensemble weights and achieving state-of-the-art embedding performance. The code is public at https://github.com/LARS-research/RelEns.
Attacking The Assortativity Coefficient Under A Rewiring Strategy
Authors: Shuo Zou, Bo Zhou, Qi Xuan
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Abstract
Degree correlation is an important characteristic of networks, which is usually quantified by the assortativity coefficient. However, concerns arise about changing the assortativity coefficient of a network when networks suffer from adversarial attacks. In this paper, we analyze the factors that affect the assortativity coefficient and study the optimization problem of maximizing or minimizing the assortativity coefficient (r) in rewired networks with $k$ pairs of edges. We propose a greedy algorithm and formulate the optimization problem using integer programming to obtain the optimal solution for this problem. Through experiments, we demonstrate the reasonableness and effectiveness of our proposed algorithm. For example, rewired edges 10% in the ER network, the assortativity coefficient improved by 60%.
TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light System
Abstract
We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net), a learning-based technique for disparity computation in mono-camera structured light systems. In our hardware setting, a static pattern is projected onto a dynamic scene and captured by a monocular camera. Different from most former disparity estimation methods that operate in a frame-wise manner, our network acquires disparity maps in a temporally incremental way. Specifically, We exploit the deformation of projected patterns (named pattern flow ) on captured image sequences, to model the temporal information. Notably, this newly proposed pattern flow formulation reflects the disparity changes along the epipolar line, which is a special form of optical flow. Tailored for pattern flow, the TIDE-Net, a recurrent architecture, is proposed and implemented. For each incoming frame, our model fuses correlation volumes (from current frame) and disparity (from former frame) warped by pattern flow. From fused features, the final stage of TIDE-Net estimates the residual disparity rather than the full disparity, as conducted by many previous methods. Interestingly, this design brings clear empirical advantages in terms of efficiency and generalization ability. Using only synthetic data for training, our extensitve evaluation results (w.r.t. both accuracy and efficienty metrics) show superior performance than several SOTA models on unseen real data. The code is available on https://github.com/CodePointer/TIDENet.
Weighted sparsity and sparse tensor networks for least squares approximation
Authors: Philipp Trunschke, Anthony Nouy, Martin Eigel
Abstract
Approximation of high-dimensional functions is a problem in many scientific fields that is only feasible if advantageous structural properties, such as sparsity in a given basis, can be exploited. A relevant tool for analysing sparse approximations is Stechkin's lemma. In its standard form, however, this lemma does not allow to explain convergence rates for a wide range of relevant function classes. This work presents a new weighted version of Stechkin's lemma that improves the best $n$-term rates for weighted $\ell^p$-spaces and associated function classes such as Sobolev or Besov spaces. For the class of holomorphic functions, which occur as solutions of common high-dimensional parameter-dependent PDEs, we recover exponential rates that are not directly obtainable with Stechkin's lemma. Since weighted $\ell^p$-summability induces weighted sparsity, compressed sensing algorithms can be used to approximate the associated functions. To break the curse of dimensionality, which these algorithms suffer, we recall that sparse approximations can be encoded efficiently using tensor networks with sparse component tensors. We also demonstrate that weighted $\ell^p$-summability induces low ranks, which motivates a second tensor train format with low ranks and a single weighted sparse core. We present new alternating algorithms for best $n$-term approximation in both formats. To analyse the sample complexity for the new model classes, we derive a novel result of independent interest that allows the transfer of the restricted isometry property from one set to another sufficiently close set. Although they lead up to the analysis of our final model class, our contributions on weighted Stechkin and the restricted isometry property are of independent interest and can be read independently.
CAMELL: Confidence-based Acquisition Model for Efficient Self-supervised Active Learning with Label Validation
Authors: Carel van Niekerk, Christian Geishauser, Michael Heck, Shutong Feng, Hsien-chin Lin, Nurul Lubis, Benjamin Ruppik, Renato Vukovic, Milica Gašić
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present \textbf{CAMELL} (Confidence-based Acquisition Model for Efficient self-supervised active Learning with Label validation), a pool-based active learning framework tailored for sequential multi-output problems. CAMELL possesses three core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, (2) it facilitates self-supervision for the remainder of the sequence, and (3) it employs a label validation mechanism to prevent erroneous labels from contaminating the dataset and harming model performance. We evaluate CAMELL on sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMELL outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs
Abstract
We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge the gap between modalities, EasyGen is built upon a bidirectional conditional diffusion model named BiDiffuser, which promotes more efficient interactions between modalities. EasyGen handles image-to-text generation by integrating BiDiffuser and an LLM via a simple projection layer. Unlike most existing multimodal models that are limited to generating text responses, EasyGen can also facilitate text-to-image generation by leveraging the LLM to create textual descriptions, which can be interpreted by BiDiffuser to generate appropriate visual responses. Extensive quantitative and qualitative experiments demonstrate the effectiveness of EasyGen, whose training can be easily achieved in a lab setting. The source code is available at https://github.com/zxy556677/EasyGen.
LRRU: Long-short Range Recurrent Updating Networks for Depth Completion
Authors: Yufei Wang, Bo Li, Ge Zhang, Qi Liu, Tao Gao, Yuchao Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Existing deep learning-based depth completion methods generally employ massive stacked layers to predict the dense depth map from sparse input data. Although such approaches greatly advance this task, their accompanied huge computational complexity hinders their practical applications. To accomplish depth completion more efficiently, we propose a novel lightweight deep network framework, the Long-short Range Recurrent Updating (LRRU) network. Without learning complex feature representations, LRRU first roughly fills the sparse input to obtain an initial dense depth map, and then iteratively updates it through learned spatially-variant kernels. Our iterative update process is content-adaptive and highly flexible, where the kernel weights are learned by jointly considering the guidance RGB images and the depth map to be updated, and large-to-small kernel scopes are dynamically adjusted to capture long-to-short range dependencies. Our initial depth map has coarse but complete scene depth information, which helps relieve the burden of directly regressing the dense depth from sparse ones, while our proposed method can effectively refine it to an accurate depth map with less learnable parameters and inference time. Experimental results demonstrate that our proposed LRRU variants achieve state-of-the-art performance across different parameter regimes. In particular, the LRRU-Base model outperforms competing approaches on the NYUv2 dataset, and ranks 1st on the KITTI depth completion benchmark at the time of submission. Project page: https://npucvr.github.io/LRRU/.
Generalization of splitting methods based on modified potentials to nonlinear evolution equations of parabolic and Schrödinger type
Authors: Sergio Blanes, Fernando Casas, Cesáreo González, Mechthild Thalhammer
Abstract
The present work is concerned with the extension of modified potential operator splitting methods to specific classes of nonlinear evolution equations. The considered partial differential equations of Schr{\"o}dinger and parabolic type comprise the Laplacian, a potential acting as multiplication operator, and a cubic nonlinearity. Moreover, an invariance principle is deduced that has a significant impact on the efficient realisation of the resulting modified operator splitting methods for the Schr{\"o}dinger case.} Numerical illustrations for the time-dependent Gross--Pitaevskii equation in the physically most relevant case of three space dimensions and for its parabolic counterpart related to ground state and excited state computations confirm the benefits of the proposed fourth-order modified operator splitting method in comparison with standard splitting methods. The presented results are novel and of particular interest from both, a theoretical perspective to inspire future investigations of modified operator splitting methods for other classes of nonlinear evolution equations and a practical perspective to advance the reliable and efficient simulation of Gross--Pitaevskii systems in real and imaginary time.
ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models
Abstract
Knowledge Base Question Answering (KBQA) aims to derive answers to natural language questions over large-scale knowledge bases (KBs), which are generally divided into two research components: knowledge retrieval and semantic parsing. However, three core challenges remain, including inefficient knowledge retrieval, retrieval errors adversely affecting semantic parsing, and the complexity of previous KBQA methods. In the era of large language models (LLMs), we introduce ChatKBQA, a novel generate-then-retrieve KBQA framework built on fine-tuning open-source LLMs such as Llama-2, ChatGLM2 and Baichuan2. ChatKBQA proposes generating the logical form with fine-tuned LLMs first, then retrieving and replacing entities and relations through an unsupervised retrieval method, which improves both generation and retrieval more straightforwardly. Experimental results reveal that ChatKBQA achieves new state-of-the-art performance on standard KBQA datasets, WebQSP, and ComplexWebQuestions (CWQ). This work also provides a new paradigm for combining LLMs with knowledge graphs (KGs) for interpretable and knowledge-required question answering. Our code is publicly available.
Big data-driven prediction of airspace congestion
Authors: Samet Ayhan, Ítalo Romani de Oliveira, Glaucia Balvedi, Pablo Costas, Alexandre Leite, Felipe C. F. de Azevedo
Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
Abstract
Air Navigation Service Providers (ANSP) worldwide have been making a considerable effort for the development of a better method to measure and predict aircraft counts within a particular airspace, also referred to as airspace density. An accurate measurement and prediction of airspace density is crucial for a better managed airspace, both strategically and tactically, yielding a higher level of automation and thereby reducing the air traffic controller's workload. Although the prior approaches have been able to address the problem to some extent, data management and query processing of ever-increasing vast volume of air traffic data at high rates, for various analytics purposes such as predicting aircraft counts, still remains a challenge especially when only linear prediction models are used. In this paper, we present a novel data management and prediction system that accurately predicts aircraft counts for a particular airspace sector within the National Airspace System (NAS). The incoming Traffic Flow Management (TFM) data is streaming, big, uncorrelated and noisy. In the preprocessing step, the system continuously processes the incoming raw data, reduces it to a compact size, and stores it in a NoSQL database, where it makes the data available for efficient query processing. In the prediction step, the system learns from historical trajectories and uses their segments to collect key features such as sector boundary crossings, weather parameters, and other air traffic data. The features are fed into various regression models, including linear, non-linear and ensemble models, and the best performing model is used for prediction. Evaluation on an extensive set of real track, weather, and air traffic data including boundary crossings in the U.S. verify that our system efficiently and accurately predicts aircraft counts in each airspace sector.
UniParser: Multi-Human Parsing with Unified Correlation Representation Learning
Authors: Jiaming Chu, Lei Jin, Junliang Xing, Jian Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through separate branches and distinct output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level representations in three key aspects: 1) we propose a unified correlation representation learning approach, allowing our network to learn instance and category features within the cosine space; 2) we unify the form of outputs of each modules as pixel-level segmentation results while supervising instance and category features using a homogeneous label accompanied by an auxiliary loss; and 3) we design a joint optimization procedure to fuse instance and category representations. By virtual of unifying instance-level and category-level output, UniParser circumvents manually designed post-processing techniques and surpasses state-of-the-art methods, achieving 49.3% AP on MHPv2.0 and 60.4% AP on CIHP. We will release our source code, pretrained models, and online demos to facilitate future studies.
Reroute Prediction Service
Authors: Ítalo Romani de Oliveira, Samet Ayhan, Michael Biglin, Pablo Costas, Euclides C. Pinto Neto
Abstract
The cost of delays was estimated as 33 billion US dollars only in 2019 for the US National Airspace System, a peak value following a growth trend in past years. Aiming to address this huge inefficiency, we designed and developed a novel Data Analytics and Machine Learning system, which aims at reducing delays by proactively supporting re-routing decisions. Given a time interval up to a few days in the future, the system predicts if a reroute advisory for a certain Air Route Traffic Control Center or for a certain advisory identifier will be issued, which may impact the pertinent routes. To deliver such predictions, the system uses historical reroute data, collected from the System Wide Information Management (SWIM) data services provided by the FAA, and weather data, provided by the US National Centers for Environmental Prediction (NCEP). The data is huge in volume, and has many items streamed at high velocity, uncorrelated and noisy. The system continuously processes the incoming raw data and makes it available for the next step where an interim data store is created and adaptively maintained for efficient query processing. The resulting data is fed into an array of ML algorithms, which compete for higher accuracy. The best performing algorithm is used in the final prediction, generating the final results. Mean accuracy values higher than 90% were obtained in our experiments with this system. Our algorithm divides the area of interest in units of aggregation and uses temporal series of the aggregate measures of weather forecast parameters in each geographical unit, in order to detect correlations with reroutes and where they will most likely occur. Aiming at practical application, the system is formed by a number of microservices, which are deployed in the cloud, making the system distributed, scalable and highly available.
A Parallel Feature-preserving Mesh Variable Offsetting Method with Dynamic Programming
Abstract
Mesh offsetting plays an important role in discrete geometric processing. In this paper, we propose a parallel feature-preserving mesh offsetting framework with variable distance. Different from the traditional method based on distance and normal vector, a new calculation of offset position is proposed by using dynamic programming and quadratic programming, and the sharp feature can be preserved after offsetting. Instead of distance implicit field, a spatial coverage region represented by polyhedral for computing offsets is proposed. Our method can generate an offsetting model with smaller mesh size, and also can achieve high quality without gaps, holes, and self-intersections. Moreover, several acceleration techniques are proposed for the efficient mesh offsetting, such as the parallel computing with grid, AABB tree and rays computing. In order to show the efficiency and robustness of the proposed framework, we have tested our method on the quadmesh dataset, which is available at [https://www.quadmesh.cloud]. The source code of the proposed algorithm is available on GitHub at [https://github.com/iGame-Lab/PFPOffset].
μ-DDRL: A QoS-Aware Distributed Deep Reinforcement Learning Technique for Service Offloading in Fog computing Environments
Authors: Mohammad Goudarzi, Maria A. Rodriguez, Majid Sarvi, Rajkumar Buyya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Fog and Edge computing extend cloud services to the proximity of end users, allowing many Internet of Things (IoT) use cases, particularly latency-critical applications. Smart devices, such as traffic and surveillance cameras, often do not have sufficient resources to process computation-intensive and latency-critical services. Hence, the constituent parts of services can be offloaded to nearby Edge/Fog resources for processing and storage. However, making offloading decisions for complex services in highly stochastic and dynamic environments is an important, yet difficult task. Recently, Deep Reinforcement Learning (DRL) has been used in many complex service offloading problems; however, existing techniques are most suitable for centralized environments, and their convergence to the best-suitable solutions is slow. In addition, constituent parts of services often have predefined data dependencies and quality of service constraints, which further intensify the complexity of service offloading. To solve these issues, we propose a distributed DRL technique following the actor-critic architecture based on Asynchronous Proximal Policy Optimization (APPO) to achieve efficient and diverse distributed experience trajectory generation. Also, we employ PPO clipping and V-trace techniques for off-policy correction for faster convergence to the most suitable service offloading solutions. The results obtained demonstrate that our technique converges quickly, offers high scalability and adaptability, and outperforms its counterparts by improving the execution time of heterogeneous services.
Credit Blockchain for Faster Transactions in P2P Energy Trading
Authors: Amit kumar Vishwakarma, Yatindra Nath Singh
Subjects: Networking and Internet Architecture (cs.NI); Computer Science and Game Theory (cs.GT)
Abstract
P2P trading of energy can be a good alternative to incentivize distributed non-conventional energy production and meet the burgeoning energy demand. For efficient P2P trading, a free market for trading needs to be established while ensuring the information reliability, security, and privacy. Blockchain has been used to provide this framework, but it consumes very high energy and is slow. Further, until now, no blockchain model has considered the role of conventional electric utility companies in P2P trading. In this paper, we have introduced a credit blockchain that reduces energy consumption by employing a new mechanism to update transactions and increases speed by providing interest free loans to buyers. This model also integrates the electric utility companies within the P2P trading framework, thereby increasing members trading options. We have also discussed the pricing strategies for trading. All the above assertions have been verified through simulations, demonstrating that this model will promote P2P trading by providing enhanced security, speed, and greater trading options. The proposed model will also help trade energy at prices beneficial for both sellers and buyers.
Generative AI-driven Semantic Communication Framework for NextG Wireless Network
Authors: Avi Deb Raha, Md. Shirajum Munir, Apurba Adhikary, Yu Qiao, Choong Seon Hong
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
This work designs a novel semantic communication (SemCom) framework for the next-generation wireless network to tackle the challenges of unnecessary transmission of vast amounts that cause high bandwidth consumption, more latency, and experience with bad quality of services (QoS). In particular, these challenges hinder applications like intelligent transportation systems (ITS), metaverse, mixed reality, and the Internet of Everything, where real-time and efficient data transmission is paramount. Therefore, to reduce communication overhead and maintain the QoS of emerging applications such as metaverse, ITS, and digital twin creation, this work proposes a novel semantic communication framework. First, an intelligent semantic transmitter is designed to capture the meaningful information (e.g., the rode-side image in ITS) by designing a domain-specific Mobile Segment Anything Model (MSAM)-based mechanism to reduce the potential communication traffic while QoS remains intact. Second, the concept of generative AI is introduced for building the SemCom to reconstruct and denoise the received semantic data frame at the receiver end. In particular, the Generative Adversarial Network (GAN) mechanism is designed to maintain a superior quality reconstruction under different signal-to-noise (SNR) channel conditions. Finally, we have tested and evaluated the proposed semantic communication (SemCom) framework with the real-world 6G scenario of ITS; in particular, the base station equipped with an RGB camera and a mmWave phased array. Experimental results demonstrate the efficacy of the proposed SemCom framework by achieving high-quality reconstruction across various SNR channel conditions, resulting in 93.45% data reduction in communication.
Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast
Authors: Lorraine A. K. Ayad, Grigorios Loukides, Solon P. Pissis, Hilde Verbeek
Abstract
Sparse suffix sorting is the problem of sorting $b=o(n)$ suffixes of a string of length $n$. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in text indexing, the existing algorithms have not been employed by practitioners. Arguably this is because there are no simple, direct, and efficient algorithms for sparse suffix array construction. We provide two new algorithms for constructing the sparse suffix and LCP arrays that are simultaneously simple, direct, small, and fast. In particular, our algorithms are: simple in the sense that they can be implemented using only basic data structures; direct in the sense that the output arrays are not a byproduct of constructing the sparse suffix tree or an LCE data structure; fast in the sense that they run in $\mathcal{O}(n\log b)$ time, in the worst case, or in $\mathcal{O}(n)$ time, when the total number of suffixes with an LCP value greater than $2^{\lfloor \log \frac{n}{b} \rfloor + 1}-1$ is in $\mathcal{O}(b/\log b)$, matching the time of the optimal yet much more complicated algorithms [Gawrychowski and Kociumaka, SODA 2017; Birenzwige et al., SODA 2020]; and small in the sense that they can be implemented using only $8b+o(b)$ machine words. Our algorithms are simplified, yet non-trivial, space-efficient adaptations of the Monte Carlo algorithm by I et al. for constructing the sparse suffix tree in $\mathcal{O}(n\log b)$ time [STACS 2014]. We also provide proof-of-concept experiments to justify our claims on simplicity and efficiency.
Survey on Near-Space Information Networks: Channel Modeling, Networking, and Transmission Perspectives
Authors: Xianbin Cao, Peng Yang, Xiaoning Su
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Near-space information networks (NSIN) composed of high-altitude platforms (HAPs), high- and low-altitude unmanned aerial vehicles (UAVs) are a new regime for providing quickly, robustly, and cost-efficiently sensing and communication services. Precipitated by innovations and breakthroughs in manufacturing, materials, communications, electronics, and control technologies, NSIN have emerged as an essential component of the emerging sixth-generation of mobile communication systems. This article aims at providing and discussing the latest advances in NSIN in the research areas of channel modeling, networking, and transmission from a forward-looking, comparative, and technological evolutionary perspective. In this article, we highlight the characteristics of NSIN and present the promising use-cases of NSIN. The impact of airborne platforms' unstable movements on the phase delays of onboard antenna arrays with diverse structures is mathematically analyzed. The recent advancements in HAP channel modeling are elaborated on, along with the significant differences between HAP and UAV channel modeling. A comprehensive review of the networking technologies of NSIN in network deployment, handoff management, and network management aspects is provided. Besides, the promising technologies and communication protocols of the physical layer, medium access control (MAC) layer, network layer, and transport layer of NSIN for achieving efficient transmission over NSIN are overviewed. Finally, we outline some open issues and promising directions of NSIN deserved for future study and discuss the corresponding challenges.
Subspace Adaptation Prior for Few-Shot Learning
Authors: Mike Huisman, Aske Plaat, Jan N. van Rijn
Abstract
Gradient-based meta-learning techniques aim to distill useful prior knowledge from a set of training tasks such that new tasks can be learned more efficiently with gradient descent. While these methods have achieved successes in various scenarios, they commonly adapt all parameters of trainable layers when learning new tasks. This neglects potentially more efficient learning strategies for a given task distribution and may be susceptible to overfitting, especially in few-shot learning where tasks must be learned from a limited number of examples. To address these issues, we propose Subspace Adaptation Prior (SAP), a novel gradient-based meta-learning algorithm that jointly learns good initialization parameters (prior knowledge) and layer-wise parameter subspaces in the form of operation subsets that should be adaptable. In this way, SAP can learn which operation subsets to adjust with gradient descent based on the underlying task distribution, simultaneously decreasing the risk of overfitting when learning new tasks. We demonstrate that this ability is helpful as SAP yields superior or competitive performance in few-shot image classification settings (gains between 0.1% and 3.9% in accuracy). Analysis of the learned subspaces demonstrates that low-dimensional operations often yield high activation strengths, indicating that they may be important for achieving good few-shot learning performance. For reproducibility purposes, we publish all our research code publicly.
DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control
Authors: Kevin Huang, Rwik Rana, Alexander Spitzer, Guanya Shi, Byron Boots
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
Precise arbitrary trajectory tracking for quadrotors is challenging due to unknown nonlinear dynamics, trajectory infeasibility, and actuation limits. To tackle these challenges, we present Deep Adaptive Trajectory Tracking (DATT), a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world. DATT builds on a novel feedforward-feedback-adaptive control structure trained in simulation using reinforcement learning. When deployed on real hardware, DATT is augmented with a disturbance estimator using L1 adaptive control in closed-loop, without any fine-tuning. DATT significantly outperforms competitive adaptive nonlinear and model predictive controllers for both feasible smooth and infeasible trajectories in unsteady wind fields, including challenging scenarios where baselines completely fail. Moreover, DATT can efficiently run online with an inference time less than 3.2 ms, less than 1/4 of the adaptive nonlinear model predictive control baseline
A RISC-V MCU with adaptive reverse body bias and ultra-low-power retention mode in 22 nm FD-SOI
Authors: Heiner Bauer, Marco Stolba, Stefan Scholze, Dennis Walter, Christian Mayr, Alexander Oefelein, Sebastian Höppner, André Scharfe, Flo Schraut, Holger Eisenreich
Abstract
We present a low-power, energy efficient 32-bit RISC-V microprocessor unit (MCU) in 22 nm FD-SOI. It achieves ultra-low leakage,even at high temperatures, by using an adaptive reverse body biasing aware sign-off approach, a low-power optimized physical implementation, and custom SRAM macros with retention mode. We demonstrate the robustness of the chip with measurements over the full industrial temperature range, from -40 {\deg}C to 125 {\deg}C. Our results match the state of the art (SOTA) with 4.8 uW / MHz at 50 MHz in active mode and surpass the SOTA in ultra-low-power retention mode.
A Frustratingly Easy Plug-and-Play Detection-and-Reasoning Module for Chinese Spelling Check
Abstract
In recent years, Chinese Spelling Check (CSC) has been greatly improved by designing task-specific pre-training methods or introducing auxiliary tasks, which mostly solve this task in an end-to-end fashion. In this paper, we propose to decompose the CSC workflow into detection, reasoning, and searching subtasks so that the rich external knowledge about the Chinese language can be leveraged more directly and efficiently. Specifically, we design a plug-and-play detection-and-reasoning module that is compatible with existing SOTA non-autoregressive CSC models to further boost their performance. We find that the detection-and-reasoning module trained for one model can also benefit other models. We also study the primary interpretability provided by the task decomposition. Extensive experiments and detailed analyses demonstrate the effectiveness and competitiveness of the proposed module.
Tikuna: An Ethereum Blockchain Network Security Monitoring System
Authors: Andres Gomez Ramirez, Loui Al Sardy, Francis Gomez Ramirez
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Blockchain security is becoming increasingly relevant in today's cyberspace as it extends its influence in many industries. This paper focuses on protecting the lowest level layer in the blockchain, particularly the P2P network that allows the nodes to communicate and share information. The P2P network layer may be vulnerable to several families of attacks, such as Distributed Denial of Service (DDoS), eclipse attacks, or Sybil attacks. This layer is prone to threats inherited from traditional P2P networks, and it must be analyzed and understood by collecting data and extracting insights from the network behavior to reduce those risks. We introduce Tikuna, an open-source tool for monitoring and detecting potential attacks on the Ethereum blockchain P2P network, at an early stage. Tikuna employs an unsupervised Long Short-Term Memory (LSTM) method based on Recurrent Neural Network (RNN) to detect attacks and alert users. Empirical results indicate that the proposed approach significantly improves detection performance, with the ability to detect and classify attacks, including eclipse attacks, Covert Flash attacks, and others that target the Ethereum blockchain P2P network layer, with high accuracy. Our research findings demonstrate that Tikuna is a valuable security tool for assisting operators to efficiently monitor and safeguard the status of Ethereum validators and the wider P2P network
Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling
Authors: Julien Demange-Chryst, François Bachoc, Jérôme Morio, Timothé Krauth
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Abstract
Probability density function estimation with weighted samples is the main foundation of all adaptive importance sampling algorithms. Classically, a target distribution is approximated either by a non-parametric model or within a parametric family. However, these models suffer from the curse of dimensionality or from their lack of flexibility. In this contribution, we suggest to use as the approximating model a distribution parameterised by a variational autoencoder. We extend the existing framework to the case of weighted samples by introducing a new objective function. The flexibility of the obtained family of distributions makes it as expressive as a non-parametric model, and despite the very high number of parameters to estimate, this family is much more efficient in high dimension than the classical Gaussian or Gaussian mixture families. Moreover, in order to add flexibility to the model and to be able to learn multimodal distributions, we consider a learnable prior distribution for the variational autoencoder latent variables. We also introduce a new pre-training procedure for the variational autoencoder to find good starting weights of the neural networks to prevent as much as possible the posterior collapse phenomenon to happen. At last, we explicit how the resulting distribution can be combined with importance sampling, and we exploit the proposed procedure in existing adaptive importance sampling algorithms to draw points from a target distribution and to estimate a rare event probability in high dimension on two multimodal problems.
Fast & Efficient Learning of Bayesian Networks from Data: Knowledge Discovery and Causality
Abstract
Structure learning is essential for Bayesian networks (BNs) as it uncovers causal relationships, and enables knowledge discovery, predictions, inferences, and decision-making under uncertainty. Two novel algorithms, FSBN and SSBN, based on the PC algorithm, employ local search strategy and conditional independence tests to learn the causal network structure from data. They incorporate d-separation to infer additional topology information, prioritize conditioning sets, and terminate the search immediately and efficiently. FSBN achieves up to 52% computation cost reduction, while SSBN surpasses it with a remarkable 72% reduction for a 200-node network. SSBN demonstrates further efficiency gains due to its intelligent strategy. Experimental studies show that both algorithms match the induction quality of the PC algorithm while significantly reducing computation costs. This enables them to offer interpretability and adaptability while reducing the computational burden, making them valuable for various applications in big data analytics.
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration
Abstract
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI, i.e., predicting the judgment of the case in terms of case fact description. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Thus, it is worthwhile to explore the utilization of precedents in the LJP. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task. These can be broken down into two categories: large language models (LLMs) and domain-specific models. LLMs are capable of interpreting and generating complex natural language, while domain models are efficient in learning task-specific information. In this paper, we propose the precedent-enhanced LJP framework (PLJP), a system that leverages the strength of both LLM and domain models in the context of precedents. Specifically, the domain models are designed to provide candidate labels and find the proper precedents efficiently, and the large models will make the final prediction with an in-context precedents comprehension. Experiments on the real-world dataset demonstrate the effectiveness of our PLJP. Moreover, our work shows a promising direction for LLM and domain-model collaboration that can be generalized to other vertical domains.
Towards End-to-end 4-Bit Inference on Generative Large Language Models
Authors: Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh
Abstract
We show that the majority of the inference computations for large generative models such as LLaMA and OPT can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups while at the same time maintaining good accuracy. We achieve this via a hybrid quantization strategy called QUIK, which compresses most of the weights and activations to 4-bit, while keeping some outlier weights and activations in higher-precision. Crucially, our scheme is designed with computational efficiency in mind: we provide GPU kernels with highly-efficient layer-wise runtimes, which lead to practical end-to-end throughput improvements of up to 3.1x relative to FP16 execution. Code and models are provided at https://github.com/IST-DASLab/QUIK.
Transformer-based Multimodal Change Detection with Multitask Consistency Constraints
Authors: Biyuan Liu, Huaixin Chen, Kun Li, Michael Ying Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Change detection plays a fundamental role in Earth observation for analyzing temporal iterations over time. However, recent studies have largely neglected the utilization of multimodal data that presents significant practical and technical advantages compared to single-modal approaches. This research focuses on leveraging digital surface model (DSM) data and aerial images captured at different times for detecting change beyond 2D. We observe that the current change detection methods struggle with the multitask conflicts between semantic and height change detection tasks. To address this challenge, we propose an efficient Transformer-based network that learns shared representation between cross-dimensional inputs through cross-attention. It adopts a consistency constraint to establish the multimodal relationship, which involves obtaining pseudo change through height change thresholding and minimizing the difference between semantic and pseudo change within their overlapping regions. A DSM-to-image multimodal dataset encompassing three cities in the Netherlands was constructed. It lays a new foundation for beyond-2D change detection from cross-dimensional inputs. Compared to five state-of-the-art change detection methods, our model demonstrates consistent multitask superiority in terms of semantic and height change detection. Furthermore, the consistency strategy can be seamlessly adapted to the other methods, yielding promising improvements.
Keyword: faster
Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving
Authors: Sean J. Wang, Honghao Zhu, Aaron M. Johnson
Abstract
Autonomous off-road driving is challenging as risky actions taken by the robot may lead to catastrophic damage. As such, developing controllers in simulation is often desirable as it provides a safer and more economical alternative. However, accurately modeling robot dynamics is difficult due to the complex robot dynamics and terrain interactions in unstructured environments. Domain randomization addresses this problem by randomizing simulation dynamics parameters, however this approach sacrifices performance for robustness leading to policies that are sub-optimal for any target dynamics. We introduce a novel model-based reinforcement learning approach that aims to balance robustness with adaptability. Our approach trains a System Identification Transformer (SIT) and an Adaptive Dynamics Model (ADM) under a variety of simulated dynamics. The SIT uses attention mechanisms to distill state-transition observations from the target system into a context vector, which provides an abstraction for its target dynamics. Conditioned on this, the ADM probabilistically models the system's dynamics. Online, we use a Risk-Aware Model Predictive Path Integral controller (MPPI) to safely control the robot under its current understanding of the dynamics. We demonstrate in simulation as well as in multiple real-world environments that this approach enables safer behaviors upon initialization and becomes less conservative (i.e. faster) as its understanding of the target system dynamics improves with more observations. In particular, our approach results in an approximately 41% improvement in lap-time over the non-adaptive baseline while remaining safe across different environments.
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images
Authors: Zhao Ning Zou, Yuhang Zhang, Robert Wijaya
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Transformer-based object detectors (DETR) have shown significant performance across machine vision tasks, ultimately in object detection. This detector is based on a self-attention mechanism along with the transformer encoder-decoder architecture to capture the global context in the image. The critical issue to be addressed is how this model architecture can handle different image nuisances, such as occlusion and adversarial perturbations. We studied this issue by measuring the performance of DETR with different experiments and benchmarking the network with convolutional neural network (CNN) based detectors like YOLO and Faster-RCNN. We found that DETR performs well when it comes to resistance to interference from information loss in occlusion images. Despite that, we found that the adversarial stickers put on the image require the network to produce a new unnecessary set of keys, queries, and values, which in most cases, results in a misdirection of the network. DETR also performed poorer than YOLOv5 in the image corruption benchmark. Furthermore, we found that DETR depends heavily on the main query when making a prediction, which leads to imbalanced contributions between queries since the main query receives most of the gradient flow.
Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction
Authors: Pol Caselles, Eduard Ramon, Jaime Garcia, Gil Triginer, Francesc Moreno-Noguer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent advancements in learning techniques that employ coordinate-based neural representations have yielded remarkable results in multi-view 3D reconstruction tasks. However, these approaches often require a substantial number of input views (typically several tens) and computationally intensive optimization procedures to achieve their effectiveness. In this paper, we address these limitations specifically for the problem of few-shot full 3D head reconstruction. We accomplish this by incorporating a probabilistic shape and appearance prior into coordinate-based representations, enabling faster convergence and improved generalization when working with only a few input images (even as low as a single image). During testing, we leverage this prior to guide the fitting process of a signed distance function using a differentiable renderer. By incorporating the statistical prior alongside parallelizable ray tracing and dynamic caching strategies, we achieve an efficient and accurate approach to few-shot full 3D head reconstruction. Moreover, we extend the H3DS dataset, which now comprises 60 high-resolution 3D full head scans and their corresponding posed images and masks, which we use for evaluation purposes. By leveraging this dataset, we demonstrate the remarkable capabilities of our approach in achieving state-of-the-art results in geometry reconstruction while being an order of magnitude faster than previous approaches.
End-to-end Story Plot Generator
Authors: Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian
Abstract
Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words. We study the problem of automatic generation of story plots, which includes story premise, character descriptions, plot outlines, etc. To generate a single engaging plot, existing plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands of calls to LLMs (e.g., OpenAI API) in the planning stage of the story plot, which is costly and takes at least several minutes. Moreover, the hard-wired nature of the method makes the pipeline non-differentiable, blocking fast specialization and personalization of the plot generator. In this paper, we propose three models, $\texttt{OpenPlot}$, $\texttt{E2EPlot}$ and $\texttt{RLPlot}$, to address these challenges. $\texttt{OpenPlot}$ replaces expensive OpenAI API calls with LLaMA2 (Touvron et al., 2023) calls via careful prompt designs, which leads to inexpensive generation of high-quality training datasets of story plots. We then train an end-to-end story plot generator, $\texttt{E2EPlot}$, by supervised fine-tuning (SFT) using approximately 13000 story plots generated by $\texttt{OpenPlot}$. $\texttt{E2EPlot}$ generates story plots of comparable quality to $\texttt{OpenPlot}$, and is > 10$\times$ faster (1k tokens in only 30 seconds on average). Finally, we obtain $\texttt{RLPlot}$ that is further fine-tuned with RLHF on several different reward models for different aspects of story quality, which yields 60.0$\%$ winning rate against $\texttt{E2EPlot}$ along the aspect of suspense and surprise.
Online Adaptive Disparity Estimation for Dynamic Scenes in Structured Light Systems
Abstract
In recent years, deep neural networks have shown remarkable progress in dense disparity estimation from dynamic scenes in monocular structured light systems. However, their performance significantly drops when applied in unseen environments. To address this issue, self-supervised online adaptation has been proposed as a solution to bridge this performance gap. Unlike traditional fine-tuning processes, online adaptation performs test-time optimization to adapt networks to new domains. Therefore, achieving fast convergence during the adaptation process is critical for attaining satisfactory accuracy. In this paper, we propose an unsupervised loss function based on long sequential inputs. It ensures better gradient directions and faster convergence. Our loss function is designed using a multi-frame pattern flow, which comprises a set of sparse trajectories of the projected pattern along the sequence. We estimate the sparse pseudo ground truth with a confidence mask using a filter-based method, which guides the online adaptation process. Our proposed framework significantly improves the online adaptation speed and achieves superior performance on unseen data.
μ-DDRL: A QoS-Aware Distributed Deep Reinforcement Learning Technique for Service Offloading in Fog computing Environments
Authors: Mohammad Goudarzi, Maria A. Rodriguez, Majid Sarvi, Rajkumar Buyya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Fog and Edge computing extend cloud services to the proximity of end users, allowing many Internet of Things (IoT) use cases, particularly latency-critical applications. Smart devices, such as traffic and surveillance cameras, often do not have sufficient resources to process computation-intensive and latency-critical services. Hence, the constituent parts of services can be offloaded to nearby Edge/Fog resources for processing and storage. However, making offloading decisions for complex services in highly stochastic and dynamic environments is an important, yet difficult task. Recently, Deep Reinforcement Learning (DRL) has been used in many complex service offloading problems; however, existing techniques are most suitable for centralized environments, and their convergence to the best-suitable solutions is slow. In addition, constituent parts of services often have predefined data dependencies and quality of service constraints, which further intensify the complexity of service offloading. To solve these issues, we propose a distributed DRL technique following the actor-critic architecture based on Asynchronous Proximal Policy Optimization (APPO) to achieve efficient and diverse distributed experience trajectory generation. Also, we employ PPO clipping and V-trace techniques for off-policy correction for faster convergence to the most suitable service offloading solutions. The results obtained demonstrate that our technique converges quickly, offers high scalability and adaptability, and outperforms its counterparts by improving the execution time of heterogeneous services.
Training and Predicting Visual Error for Real-Time Applications
Authors: João Libório Cardoso, Bernhard Kerbl, Lei Yang, Yury Uralsky, Michael Wimmer
Abstract
Visual error metrics play a fundamental role in the quantification of perceived image similarity. Most recently, use cases for them in real-time applications have emerged, such as content-adaptive shading and shading reuse to increase performance and improve efficiency. A wide range of different metrics has been established, with the most sophisticated being capable of capturing the perceptual characteristics of the human visual system. However, their complexity, computational expense, and reliance on reference images to compare against prevent their generalized use in real-time, restricting such applications to using only the simplest available metrics. In this work, we explore the abilities of convolutional neural networks to predict a variety of visual metrics without requiring either reference or rendered images. Specifically, we train and deploy a neural network to estimate the visual error resulting from reusing shading or using reduced shading rates. The resulting models account for 70%-90% of the variance while achieving up to an order of magnitude faster computation times. Our solution combines image-space information that is readily available in most state-of-the-art deferred shading pipelines with reprojection from previous frames to enable an adequate estimate of visual errors, even in previously unseen regions. We describe a suitable convolutional network architecture and considerations for data preparation for training. We demonstrate the capability of our network to predict complex error metrics at interactive rates in a real-time application that implements content-adaptive shading in a deferred pipeline. Depending on the portion of unseen image regions, our approach can achieve up to $2\times$ performance compared to state-of-the-art methods.
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Authors: Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai, Radu Soricut
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrained using classification objectives to contrastively (SigLIP) pretrained ones. We find that, while slightly underperforming on standard image classification benchmarks, SigLIP-based PaLI shows superior performance across various multimodal benchmarks, especially on localization and visually-situated text understanding. We scale the SigLIP image encoder up to 2 billion parameters, and achieves a new state-of-the-art on multilingual cross-modal retrieval. We hope that PaLI-3, at only 5B parameters, rekindles research on fundamental pieces of complex VLMs, and could fuel a new generation of scaled-up models.
Keyword: mobile
Defect Analysis of 3D Printed Cylinder Object Using Transfer Learning Approaches
Abstract
Additive manufacturing (AM) is gaining attention across various industries like healthcare, aerospace, and automotive. However, identifying defects early in the AM process can reduce production costs and improve productivity - a key challenge. This study explored the effectiveness of machine learning (ML) approaches, specifically transfer learning (TL) models, for defect detection in 3D-printed cylinders. Images of cylinders were analyzed using models including VGG16, VGG19, ResNet50, ResNet101, InceptionResNetV2, and MobileNetV2. Performance was compared across two datasets using accuracy, precision, recall, and F1-score metrics. In the first study, VGG16, InceptionResNetV2, and MobileNetV2 achieved perfect scores. In contrast, ResNet50 had the lowest performance, with an average F1-score of 0.32. Similarly, in the second study, MobileNetV2 correctly classified all instances, while ResNet50 struggled with more false positives and fewer true positives, resulting in an F1-score of 0.75. Overall, the findings suggest certain TL models like MobileNetV2 can deliver high accuracy for AM defect classification, although performance varies across algorithms. The results provide insights into model optimization and integration needs for reliable automated defect analysis during 3D printing. By identifying the top-performing TL techniques, this study aims to enhance AM product quality through robust image-based monitoring and inspection.
Exploring the relationship between response time sequence in scale answering process and severity of insomnia: a machine learning approach
Authors: Zhao Su, Rongxun Liu, Keyin Zhou, Xinru Wei, Ning Wang, Zexin Lin, Yuanchen Xie, Jie Wang, Fei Wang, Shenzhong Zhang, Xizhe Zhang
Abstract
Objectives: The study aims to investigate the relationship between insomnia and response time. Additionally, it aims to develop a machine learning model to predict the presence of insomnia in participants using response time data. Methods: A mobile application was designed to administer scale tests and collect response time data from 2729 participants. The relationship between symptom severity and response time was explored, and a machine learning model was developed to predict the presence of insomnia. Results: The result revealed a statistically significant difference (p<.001) in the total response time between participants with or without insomnia symptoms. A correlation was observed between the severity of specific insomnia aspects and response times at the individual questions level. The machine learning model demonstrated a high predictive accuracy of 0.743 in predicting insomnia symptoms based on response time data. Conclusions: These findings highlight the potential utility of response time data to evaluate cognitive and psychological measures, demonstrating the effectiveness of using response time as a diagnostic tool in the assessment of insomnia.
Generative AI-driven Semantic Communication Framework for NextG Wireless Network
Authors: Avi Deb Raha, Md. Shirajum Munir, Apurba Adhikary, Yu Qiao, Choong Seon Hong
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
This work designs a novel semantic communication (SemCom) framework for the next-generation wireless network to tackle the challenges of unnecessary transmission of vast amounts that cause high bandwidth consumption, more latency, and experience with bad quality of services (QoS). In particular, these challenges hinder applications like intelligent transportation systems (ITS), metaverse, mixed reality, and the Internet of Everything, where real-time and efficient data transmission is paramount. Therefore, to reduce communication overhead and maintain the QoS of emerging applications such as metaverse, ITS, and digital twin creation, this work proposes a novel semantic communication framework. First, an intelligent semantic transmitter is designed to capture the meaningful information (e.g., the rode-side image in ITS) by designing a domain-specific Mobile Segment Anything Model (MSAM)-based mechanism to reduce the potential communication traffic while QoS remains intact. Second, the concept of generative AI is introduced for building the SemCom to reconstruct and denoise the received semantic data frame at the receiver end. In particular, the Generative Adversarial Network (GAN) mechanism is designed to maintain a superior quality reconstruction under different signal-to-noise (SNR) channel conditions. Finally, we have tested and evaluated the proposed semantic communication (SemCom) framework with the real-world 6G scenario of ITS; in particular, the base station equipped with an RGB camera and a mmWave phased array. Experimental results demonstrate the efficacy of the proposed SemCom framework by achieving high-quality reconstruction across various SNR channel conditions, resulting in 93.45% data reduction in communication.
Survey on Near-Space Information Networks: Channel Modeling, Networking, and Transmission Perspectives
Authors: Xianbin Cao, Peng Yang, Xiaoning Su
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Near-space information networks (NSIN) composed of high-altitude platforms (HAPs), high- and low-altitude unmanned aerial vehicles (UAVs) are a new regime for providing quickly, robustly, and cost-efficiently sensing and communication services. Precipitated by innovations and breakthroughs in manufacturing, materials, communications, electronics, and control technologies, NSIN have emerged as an essential component of the emerging sixth-generation of mobile communication systems. This article aims at providing and discussing the latest advances in NSIN in the research areas of channel modeling, networking, and transmission from a forward-looking, comparative, and technological evolutionary perspective. In this article, we highlight the characteristics of NSIN and present the promising use-cases of NSIN. The impact of airborne platforms' unstable movements on the phase delays of onboard antenna arrays with diverse structures is mathematically analyzed. The recent advancements in HAP channel modeling are elaborated on, along with the significant differences between HAP and UAV channel modeling. A comprehensive review of the networking technologies of NSIN in network deployment, handoff management, and network management aspects is provided. Besides, the promising technologies and communication protocols of the physical layer, medium access control (MAC) layer, network layer, and transport layer of NSIN for achieving efficient transmission over NSIN are overviewed. Finally, we outline some open issues and promising directions of NSIN deserved for future study and discuss the corresponding challenges.
SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network
Authors: Lei Yao, Yong Zhang, Zilong Yan, Jialu Tian
Abstract
In the rapid development of artificial intelligence, solving complex AI tasks is a crucial technology in intelligent mobile networks. Despite the good performance of specialized AI models in intelligent mobile networks, they are unable to handle complicated AI tasks. To address this challenge, we propose Systematic Artificial Intelligence (SAI), which is a framework designed to solve AI tasks by leveraging Large Language Models (LLMs) and JSON-format intent-based input to connect self-designed model library and database. Specifically, we first design a multi-input component, which simultaneously integrates Large Language Models (LLMs) and JSON-format intent-based inputs to fulfill the diverse intent requirements of different users. In addition, we introduce a model library module based on model cards which employ model cards to pairwise match between different modules for model composition. Model cards contain the corresponding model's name and the required performance metrics. Then when receiving user network requirements, we execute each subtask for multiple selected model combinations and provide output based on the execution results and LLM feedback. By leveraging the language capabilities of LLMs and the abundant AI models in the model library, SAI can complete numerous complex AI tasks in the communication network, achieving impressive results in network optimization, resource allocation, and other challenging tasks.
Timestamp-supervised Wearable-based Activity Segmentation and Recognition with Contrastive Learning and Order-Preserving Optimal Transport
Authors: Songpengcheng Xia, Lei Chu, Ling Pei, Jiarui Yang, Wenxian Yu, Robert C. Qiu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Human activity recognition (HAR) with wearables is one of the serviceable technologies in ubiquitous and mobile computing applications. The sliding-window scheme is widely adopted while suffering from the multi-class windows problem. As a result, there is a growing focus on joint segmentation and recognition with deep-learning methods, aiming at simultaneously dealing with HAR and time-series segmentation issues. However, obtaining the full activity annotations of wearable data sequences is resource-intensive or time-consuming, while unsupervised methods yield poor performance. To address these challenges, we propose a novel method for joint activity segmentation and recognition with timestamp supervision, in which only a single annotated sample is needed in each activity segment. However, the limited information of sparse annotations exacerbates the gap between recognition and segmentation tasks, leading to sub-optimal model performance. Therefore, the prototypes are estimated by class-activation maps to form a sample-to-prototype contrast module for well-structured embeddings. Moreover, with the optimal transport theory, our approach generates the sample-level pseudo-labels that take advantage of unlabeled data between timestamp annotations for further performance improvement. Comprehensive experiments on four public HAR datasets demonstrate that our model trained with timestamp supervision is superior to the state-of-the-art weakly-supervised methods and achieves comparable performance to the fully-supervised approaches.
Keyword: pruning
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning
Abstract
Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by identifying and removing redundant training samples without sacrificing performance. In this work, we aim to address the problem of DP for transfer learning, i.e., how to prune a source dataset for improved pretraining efficiency and lossless finetuning accuracy on downstream target tasks. To our best knowledge, the problem of DP for transfer learning remains open, as previous studies have primarily addressed DP and transfer learning as separate problems. By contrast, we establish a unified viewpoint to integrate DP with transfer learning and find that existing DP methods are not suitable for the transfer learning paradigm. We then propose two new DP methods, label mapping and feature mapping, for supervised and self-supervised pretraining settings respectively, by revisiting the DP problem through the lens of source-target domain mapping. Furthermore, we demonstrate the effectiveness of our approach on numerous transfer learning tasks. We show that source data classes can be pruned by up to 40% ~ 80% without sacrificing downstream performance, resulting in a significant 2 ~ 5 times speed-up during the pretraining stage. Besides, our proposal exhibits broad applicability and can improve other computationally intensive transfer learning techniques, such as adversarial pretraining. Codes are available at https://github.com/OPTML-Group/DP4TL.
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Authors: Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, Rongrong Ji
Abstract
The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre-LLMs approaches in reducing model complexity, network pruning appears to lag behind in the era of LLMs, due mostly to its costly fine-tuning (or re-training) necessity under the massive volumes of model parameter and training data. To close this industry-academia gap, we introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that slightly updates sparse LLMs without the expensive backpropagation and any weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs. To accomplish this purpose, DSnoT particularly takes into account the anticipated reduction in reconstruction error for pruning and growing, as well as the variance w.r.t. different input data for growing each weight. This practice can be executed efficiently in linear time since its obviates the need of backpropagation for fine-tuning LLMs. Extensive experiments on LLaMA-V1/V2, Vicuna, and OPT across various benchmarks demonstrate the effectiveness of DSnoT in enhancing the performance of sparse LLMs, especially at high sparsity levels. For instance, DSnoT is able to outperform the state-of-the-art Wanda by 26.79 perplexity at 70% sparsity with LLaMA-7B. Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs. Codes are available at https://github.com/zxyxmu/DSnoT.
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Authors: Sheng Zhou, Dan Guo, Jia Li, Xun Yang, Meng Wang
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.
Keyword: diffusion
Histogram- and Diffusion-Based Medical Out-of-Distribution Detection
Authors: Evi M.C. Huijben, Sina Amirrajab, Josien P.W. Pluim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Out-of-distribution (OOD) detection is crucial for the safety and reliability of artificial intelligence algorithms, especially in the medical domain. In the context of the Medical OOD (MOOD) detection challenge 2023, we propose a pipeline that combines a histogram-based method and a diffusion-based method. The histogram-based method is designed to accurately detect homogeneous anomalies in the toy examples of the challenge, such as blobs with constant intensity values. The diffusion-based method is based on one of the latest methods for unsupervised anomaly detection, called DDPM-OOD. We explore this method and propose extensive post-processing steps for pixel-level and sample-level anomaly detection on brain MRI and abdominal CT data provided by the challenge. Our results show that the proposed DDPM method is sensitive to blur and bias field samples, but faces challenges with anatomical deformation, black slice, and swapped patches. These findings suggest that further research is needed to improve the performance of DDPM for OOD detection in medical images.
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Authors: Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Text-guided image editing faces significant challenges to training and inference flexibility. Much literature collects large amounts of annotated image-text pairs to train text-conditioned generative models from scratch, which is expensive and not efficient. After that, some approaches that leverage pre-trained vision-language models are put forward to avoid data collection, but they are also limited by either per text-prompt optimization or inference-time hyper-parameters tuning. To address these issues, we investigate and identify a specific space, referred to as CLIP DeltaSpace, where the CLIP visual feature difference of two images is semantically aligned with the CLIP textual feature difference of their corresponding text descriptions. Based on DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP visual feature differences to the latent space directions of a generative model during the training phase, and predicts the latent space directions from the CLIP textual feature differences during the inference phase. And this design endows DeltaEdit with two advantages: (1) text-free training; (2) generalization to various text prompts for zero-shot inference. Extensive experiments validate the effectiveness and versatility of DeltaEdit with different generative models, including both the GAN model and the diffusion model, in achieving flexible text-guided image editing. Code is available at https://github.com/Yueming6568/DeltaEdit.
DDMT: Denoising Diffusion Mask Transformer Models for Multivariate Time Series Anomaly Detection
Authors: Chaocheng Yang, Tingyin Wang, Xuanhui Yan
Abstract
Anomaly detection in multivariate time series has emerged as a crucial challenge in time series research, with significant research implications in various fields such as fraud detection, fault diagnosis, and system state estimation. Reconstruction-based models have shown promising potential in recent years for detecting anomalies in time series data. However, due to the rapid increase in data scale and dimensionality, the issues of noise and Weak Identity Mapping (WIM) during time series reconstruction have become increasingly pronounced. To address this, we introduce a novel Adaptive Dynamic Neighbor Mask (ADNM) mechanism and integrate it with the Transformer and Denoising Diffusion Model, creating a new framework for multivariate time series anomaly detection, named Denoising Diffusion Mask Transformer (DDMT). The ADNM module is introduced to mitigate information leakage between input and output features during data reconstruction, thereby alleviating the problem of WIM during reconstruction. The Denoising Diffusion Transformer (DDT) employs the Transformer as an internal neural network structure for Denoising Diffusion Model. It learns the stepwise generation process of time series data to model the probability distribution of the data, capturing normal data patterns and progressively restoring time series data by removing noise, resulting in a clear recovery of anomalies. To the best of our knowledge, this is the first model that combines Denoising Diffusion Model and the Transformer for multivariate time series anomaly detection. Experimental evaluations were conducted on five publicly available multivariate time series anomaly detection datasets. The results demonstrate that the model effectively identifies anomalies in time series data, achieving state-of-the-art performance in anomaly detection.
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation
Abstract
Recent text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images given text-prompts as input. However, these models fail to convey appropriate spatial composition specified by a layout instruction. In this work, we probe into zero-shot grounded T2I generation with diffusion models, that is, generating images corresponding to the input layout information without training auxiliary modules or finetuning diffusion models. We propose a Region and Boundary (R&B) aware cross-attention guidance approach that gradually modulates the attention maps of diffusion model during generative process, and assists the model to synthesize images (1) with high fidelity, (2) highly compatible with textual input, and (3) interpreting layout instructions accurately. Specifically, we leverage the discrete sampling to bridge the gap between consecutive attention maps and discrete layout constraints, and design a region-aware loss to refine the generative layout during diffusion process. We further propose a boundary-aware loss to strengthen object discriminability within the corresponding regions. Experimental results show that our method outperforms existing state-of-the-art zero-shot grounded T2I generation methods by a large margin both qualitatively and quantitatively on several benchmarks.
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs
Abstract
We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge the gap between modalities, EasyGen is built upon a bidirectional conditional diffusion model named BiDiffuser, which promotes more efficient interactions between modalities. EasyGen handles image-to-text generation by integrating BiDiffuser and an LLM via a simple projection layer. Unlike most existing multimodal models that are limited to generating text responses, EasyGen can also facilitate text-to-image generation by leveraging the LLM to create textual descriptions, which can be interpreted by BiDiffuser to generate appropriate visual responses. Extensive quantitative and qualitative experiments demonstrate the effectiveness of EasyGen, whose training can be easily achieved in a lab setting. The source code is available at https://github.com/zxy556677/EasyGen.
MINDE: Mutual Information Neural Diffusion Estimation
Authors: Giulio Franzese, Mustapha Bounoua, Pietro Michiardi
Abstract
In this work we present a new method for the estimation of Mutual Information (MI) between random variables. Our approach is based on an original interpretation of the Girsanov theorem, which allows us to use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions. As a by-product, our method also enables the estimation of the entropy of random variables. Armed with such building blocks, we present a general recipe to measure MI, which unfolds in two directions: one uses conditional diffusion process, whereas the other uses joint diffusion processes that allow simultaneous modelling of two random variables. Our results, which derive from a thorough experimental protocol over all the variants of our approach, indicate that our method is more accurate than the main alternatives from the literature, especially for challenging distributions. Furthermore, our methods pass MI self-consistency tests, including data processing and additivity under independence, which instead are a pain-point of existing methods.
SUPG-stabilized stabilization-free VEM: a numerical investigation
Authors: Andrea Borio, Martina Busetto, Francesca Marcon
Abstract
We numerically investigate the possibility of defining stabilization-free Virtual Element (VEM) discretizations of advection-diffusion problems in the advection-dominated regime. To this end, we consider a SUPG stabilized formulation of the scheme. Numerical tests comparing the proposed method with standard VEM show that the lack of an additional arbitrary stabilization term, typical of VEM schemes, that adds artificial diffusion to the discrete solution, allows to better approximate boundary layers, in particular in the case of a low order scheme.
Unseen Image Synthesis with Diffusion Models
Authors: Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
While the current trend in the generative field is scaling up towards larger models and more training data for generalized domain representations, we go the opposite direction in this work by synthesizing unseen domain images without additional training. We do so via latent sampling and geometric optimization using pre-trained and frozen Denoising Diffusion Probabilistic Models (DDPMs) on single-domain datasets. Our key observation is that DDPMs pre-trained even just on single-domain images are already equipped with sufficient representation abilities to reconstruct arbitrary images from the inverted latent encoding following bi-directional deterministic diffusion and denoising trajectories. This motivates us to investigate the statistical and geometric behaviors of the Out-Of-Distribution (OOD) samples from unseen image domains in the latent spaces along the denoising chain. Notably, we theoretically and empirically show that the inverted OOD samples also establish Gaussians that are distinguishable from the original In-Domain (ID) samples in the intermediate latent spaces, which allows us to sample from them directly. Geometrical domain-specific and model-dependent information of the unseen subspace (e.g., sample-wise distance and angles) is used to further optimize the sampled OOD latent encodings from the estimated Gaussian prior. We conduct extensive analysis and experiments using pre-trained diffusion models (DDPM, iDDPM) on different datasets (AFHQ, CelebA-HQ, LSUN-Church, and LSUN-Bedroom), proving the effectiveness of this novel perspective to explore and re-think the diffusion models' data synthesis generalization ability.
Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy
Authors: Anton Baryshnikov, Max Ryabinin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Text-to-image synthesis has recently attracted widespread attention due to rapidly improving quality and numerous practical applications. However, the language understanding capabilities of text-to-image models are still poorly understood, which makes it difficult to reason about prompt formulations that a given model would understand well. In this work, we measure the capability of popular text-to-image models to understand $\textit{hypernymy}$, or the "is-a" relation between words. We design two automatic metrics based on the WordNet semantic hierarchy and existing image classifiers pretrained on ImageNet. These metrics both enable broad quantitative comparison of linguistic capabilities for text-to-image models and offer a way of finding fine-grained qualitative differences, such as words that are unknown to models and thus are difficult for them to draw. We comprehensively evaluate popular text-to-image models, including GLIDE, Latent Diffusion, and Stable Diffusion, showing how our metrics can provide a better understanding of the individual strengths and weaknesses of these models.
Keyword: adaptive
Safe Deep Policy Adaptation
Authors: Wenli Xiao, Tairan He, John Dolan, Guanya Shi
Abstract
A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.
Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving
Authors: Sean J. Wang, Honghao Zhu, Aaron M. Johnson
Abstract
Autonomous off-road driving is challenging as risky actions taken by the robot may lead to catastrophic damage. As such, developing controllers in simulation is often desirable as it provides a safer and more economical alternative. However, accurately modeling robot dynamics is difficult due to the complex robot dynamics and terrain interactions in unstructured environments. Domain randomization addresses this problem by randomizing simulation dynamics parameters, however this approach sacrifices performance for robustness leading to policies that are sub-optimal for any target dynamics. We introduce a novel model-based reinforcement learning approach that aims to balance robustness with adaptability. Our approach trains a System Identification Transformer (SIT) and an Adaptive Dynamics Model (ADM) under a variety of simulated dynamics. The SIT uses attention mechanisms to distill state-transition observations from the target system into a context vector, which provides an abstraction for its target dynamics. Conditioned on this, the ADM probabilistically models the system's dynamics. Online, we use a Risk-Aware Model Predictive Path Integral controller (MPPI) to safely control the robot under its current understanding of the dynamics. We demonstrate in simulation as well as in multiple real-world environments that this approach enables safer behaviors upon initialization and becomes less conservative (i.e. faster) as its understanding of the target system dynamics improves with more observations. In particular, our approach results in an approximately 41% improvement in lap-time over the non-adaptive baseline while remaining safe across different environments.
An Efficient Resilient MPC Scheme via Constraint Tightening against Cyberattacks: Application to Vehicle Cruise Control
Authors: Milad Farsi, Shuhao Bian, Nasser L. Azad, Xiaobing Shi, Andrew Walenstein
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
We propose a novel framework for designing a resilient Model Predictive Control (MPC) targeting uncertain linear systems under cyber attack. Assuming a periodic attack scenario, we model the system under Denial of Service (DoS) attack, also with measurement noise, as an uncertain linear system with parametric and additive uncertainty. To detect anomalies, we employ a Kalman filter-based approach. Then, through our observations of the intensity of the launched attack, we determine a range of possible values for the system matrices, as well as establish bounds of the additive uncertainty for the equivalent uncertain system. Leveraging a recent constraint tightening robust MPC method, we present an optimization-based resilient algorithm. Accordingly, we compute the uncertainty bounds and corresponding constraints offline for various attack magnitudes. Then, this data can be used efficiently in the MPC computations online. We demonstrate the effectiveness of the developed framework on the Adaptive Cruise Control (ACC) problem.
Constrained Bayesian Optimization with Adaptive Active Learning of Unknown Constraints
Abstract
Optimizing objectives under constraints, where both the objectives and constraints are black box functions, is a common scenario in real-world applications such as scientific experimental design, design of medical therapies, and industrial process optimization. One popular approach to handling these complex scenarios is Bayesian Optimization (BO). In terms of theoretical behavior, BO is relatively well understood in the unconstrained setting, where its principles have been well explored and validated. However, when it comes to constrained Bayesian optimization (CBO), the existing framework often relies on heuristics or approximations without the same level of theoretical guarantees. In this paper, we delve into the theoretical and practical aspects of constrained Bayesian optimization, where the objective and constraints can be independently evaluated and are subject to noise. By recognizing that both the objective and constraints can help identify high-confidence regions of interest (ROI), we propose an efficient CBO framework that intersects the ROIs identified from each aspect to determine the general ROI. The ROI, coupled with a novel acquisition function that adaptively balances the optimization of the objective and the identification of feasible regions, enables us to derive rigorous theoretical justifications for its performance. We showcase the efficiency and robustness of our proposed CBO framework through empirical evidence and discuss the fundamental challenge of deriving practical regret bounds for CBO algorithms.
DDMT: Denoising Diffusion Mask Transformer Models for Multivariate Time Series Anomaly Detection
Authors: Chaocheng Yang, Tingyin Wang, Xuanhui Yan
Abstract
Anomaly detection in multivariate time series has emerged as a crucial challenge in time series research, with significant research implications in various fields such as fraud detection, fault diagnosis, and system state estimation. Reconstruction-based models have shown promising potential in recent years for detecting anomalies in time series data. However, due to the rapid increase in data scale and dimensionality, the issues of noise and Weak Identity Mapping (WIM) during time series reconstruction have become increasingly pronounced. To address this, we introduce a novel Adaptive Dynamic Neighbor Mask (ADNM) mechanism and integrate it with the Transformer and Denoising Diffusion Model, creating a new framework for multivariate time series anomaly detection, named Denoising Diffusion Mask Transformer (DDMT). The ADNM module is introduced to mitigate information leakage between input and output features during data reconstruction, thereby alleviating the problem of WIM during reconstruction. The Denoising Diffusion Transformer (DDT) employs the Transformer as an internal neural network structure for Denoising Diffusion Model. It learns the stepwise generation process of time series data to model the probability distribution of the data, capturing normal data patterns and progressively restoring time series data by removing noise, resulting in a clear recovery of anomalies. To the best of our knowledge, this is the first model that combines Denoising Diffusion Model and the Transformer for multivariate time series anomaly detection. Experimental evaluations were conducted on five publicly available multivariate time series anomaly detection datasets. The results demonstrate that the model effectively identifies anomalies in time series data, achieving state-of-the-art performance in anomaly detection.
Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation
Authors: Yilin Lyu, Liyuan Wang, Xingxing Zhang, Zicheng Sun, Hang Su, Jun Zhu, Liping Jing
Abstract
Continual learning entails learning a sequence of tasks and balancing their knowledge appropriately. With limited access to old training samples, much of the current work in deep neural networks has focused on overcoming catastrophic forgetting of old tasks in gradient-based optimization. However, the normalization layers provide an exception, as they are updated interdependently by the gradient and statistics of currently observed training samples, which require specialized strategies to mitigate recency bias. In this work, we focus on the most popular Batch Normalization (BN) and provide an in-depth theoretical analysis of its sub-optimality in continual learning. Our analysis demonstrates the dilemma between balance and adaptation of BN statistics for incremental tasks, which potentially affects training stability and generalization. Targeting on these particular challenges, we propose Adaptive Balance of BN (AdaB$^2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions and a modified momentum to balance BN statistics, corresponding to the training and testing stages. By implementing BN in a continual learning fashion, our approach achieves significant performance gains across a wide range of benchmarks, particularly for the challenging yet realistic online scenarios (e.g., up to 7.68%, 6.86% and 4.26% on Split CIFAR-10, Split CIFAR-100 and Split Mini-ImageNet, respectively). Our code is available at https://github.com/lvyilin/AdaB2N.
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Abstract
Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et al. (2021). We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i.e., the depth of the computation graph). Based on our observations, we propose a transformer-based architecture called Hyper-UT, which combines dynamic function generation from hyper networks with adaptive depth from Universal Transformers. This model demonstrates higher accuracy and a fairer allocation of computational resources when generalizing to higher numbers of computation steps. We conclude that mechanisms for adaptive depth and modularity complement each other in improving efficient generalization concerning example complexity. Additionally, to emphasize the broad applicability of our findings, we illustrate that in a standard image recognition task, Hyper- UT's performance matches that of a ViT model but with considerably reduced computational demands (achieving over 70\% average savings by effectively using fewer layers).
Multi-level Adaptive Contrastive Learning for Knowledge Internalization in Dialogue Generation
Abstract
Knowledge-grounded dialogue generation aims to mitigate the issue of text degeneration by incorporating external knowledge to supplement the context. However, the model often fails to internalize this information into responses in a human-like manner. Instead, it simply inserts segments of the provided knowledge into generic responses. As a result, the generated responses tend to be tedious, incoherent, and in lack of interactivity which means the degeneration problem is still unsolved. In this work, we first find that such copying-style degeneration is primarily due to the weak likelihood objective, which allows the model to "cheat" the objective by merely duplicating knowledge segments in a superficial pattern matching based on overlap. To overcome this challenge, we then propose a Multi-level Adaptive Contrastive Learning (MACL) framework that dynamically samples negative examples and subsequently penalizes degeneration behaviors at both the token-level and sequence-level. Extensive experiments on the WoW dataset demonstrate the effectiveness of our approach across various pre-trained models.
LRRU: Long-short Range Recurrent Updating Networks for Depth Completion
Authors: Yufei Wang, Bo Li, Ge Zhang, Qi Liu, Tao Gao, Yuchao Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Existing deep learning-based depth completion methods generally employ massive stacked layers to predict the dense depth map from sparse input data. Although such approaches greatly advance this task, their accompanied huge computational complexity hinders their practical applications. To accomplish depth completion more efficiently, we propose a novel lightweight deep network framework, the Long-short Range Recurrent Updating (LRRU) network. Without learning complex feature representations, LRRU first roughly fills the sparse input to obtain an initial dense depth map, and then iteratively updates it through learned spatially-variant kernels. Our iterative update process is content-adaptive and highly flexible, where the kernel weights are learned by jointly considering the guidance RGB images and the depth map to be updated, and large-to-small kernel scopes are dynamically adjusted to capture long-to-short range dependencies. Our initial depth map has coarse but complete scene depth information, which helps relieve the burden of directly regressing the dense depth from sparse ones, while our proposed method can effectively refine it to an accurate depth map with less learnable parameters and inference time. Experimental results demonstrate that our proposed LRRU variants achieve state-of-the-art performance across different parameter regimes. In particular, the LRRU-Base model outperforms competing approaches on the NYUv2 dataset, and ranks 1st on the KITTI depth completion benchmark at the time of submission. Project page: https://npucvr.github.io/LRRU/.
PAGE: Equilibrate Personalization and Generalization in Federated Learning
Abstract
Federated learning (FL) is becoming a major driving force behind machine learning as a service, where customers (clients) collaboratively benefit from shared local updates under the orchestration of the service provider (server). Representing clients' current demands and the server's future demand, local model personalization and global model generalization are separately investigated, as the ill-effects of data heterogeneity enforce the community to focus on one over the other. However, these two seemingly competing goals are of equal importance rather than black and white issues, and should be achieved simultaneously. In this paper, we propose the first algorithm to balance personalization and generalization on top of game theory, dubbed PAGE, which reshapes FL as a co-opetition game between clients and the server. To explore the equilibrium, PAGE further formulates the game as Markov decision processes, and leverages the reinforcement learning algorithm, which simplifies the solving complexity. Extensive experiments on four widespread datasets show that PAGE outperforms state-of-the-art FL baselines in terms of global and local prediction accuracy simultaneously, and the accuracy can be improved by up to 35.20% and 39.91%, respectively. In addition, biased variants of PAGE imply promising adaptiveness to demand shifts in practice.
Reroute Prediction Service
Authors: Ítalo Romani de Oliveira, Samet Ayhan, Michael Biglin, Pablo Costas, Euclides C. Pinto Neto
Abstract
The cost of delays was estimated as 33 billion US dollars only in 2019 for the US National Airspace System, a peak value following a growth trend in past years. Aiming to address this huge inefficiency, we designed and developed a novel Data Analytics and Machine Learning system, which aims at reducing delays by proactively supporting re-routing decisions. Given a time interval up to a few days in the future, the system predicts if a reroute advisory for a certain Air Route Traffic Control Center or for a certain advisory identifier will be issued, which may impact the pertinent routes. To deliver such predictions, the system uses historical reroute data, collected from the System Wide Information Management (SWIM) data services provided by the FAA, and weather data, provided by the US National Centers for Environmental Prediction (NCEP). The data is huge in volume, and has many items streamed at high velocity, uncorrelated and noisy. The system continuously processes the incoming raw data and makes it available for the next step where an interim data store is created and adaptively maintained for efficient query processing. The resulting data is fed into an array of ML algorithms, which compete for higher accuracy. The best performing algorithm is used in the final prediction, generating the final results. Mean accuracy values higher than 90% were obtained in our experiments with this system. Our algorithm divides the area of interest in units of aggregation and uses temporal series of the aggregate measures of weather forecast parameters in each geographical unit, in order to detect correlations with reroutes and where they will most likely occur. Aiming at practical application, the system is formed by a number of microservices, which are deployed in the cloud, making the system distributed, scalable and highly available.
Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding
Authors: Jixuan Cui, Jun Li, Zhen Mei, Kang Wei, Sha Wei, Ming Ding, Wen Chen, Song Guo
Abstract
Deep learning-based fault diagnosis (FD) approaches require a large amount of training data, which are difficult to obtain since they are located across different entities. Federated learning (FL) enables multiple clients to collaboratively train a shared model with data privacy guaranteed. However, the domain discrepancy and data scarcity problems among clients deteriorate the performance of the global FL model. To tackle these issues, we propose a novel framework called representation encoding-based federated meta-learning (REFML) for few-shot FD. First, a novel training strategy based on representation encoding and meta-learning is developed. It harnesses the inherent heterogeneity among training clients, effectively transforming it into an advantage for out-of-distribution generalization on unseen working conditions or equipment types. Additionally, an adaptive interpolation method that calculates the optimal combination of local and global models as the initialization of local training is proposed. This helps to further utilize local information to mitigate the negative effects of domain discrepancy. As a result, high diagnostic accuracy can be achieved on unseen working conditions or equipment types with limited training data. Compared with the state-of-the-art methods, such as FedProx, the proposed REFML framework achieves an increase in accuracy by 2.17%-6.50% when tested on unseen working conditions of the same equipment type and 13.44%-18.33% when tested on totally unseen equipment types, respectively.
DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control
Authors: Kevin Huang, Rwik Rana, Alexander Spitzer, Guanya Shi, Byron Boots
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
Precise arbitrary trajectory tracking for quadrotors is challenging due to unknown nonlinear dynamics, trajectory infeasibility, and actuation limits. To tackle these challenges, we present Deep Adaptive Trajectory Tracking (DATT), a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world. DATT builds on a novel feedforward-feedback-adaptive control structure trained in simulation using reinforcement learning. When deployed on real hardware, DATT is augmented with a disturbance estimator using L1 adaptive control in closed-loop, without any fine-tuning. DATT significantly outperforms competitive adaptive nonlinear and model predictive controllers for both feasible smooth and infeasible trajectories in unsteady wind fields, including challenging scenarios where baselines completely fail. Moreover, DATT can efficiently run online with an inference time less than 3.2 ms, less than 1/4 of the adaptive nonlinear model predictive control baseline
ImageManip: Image-based Robotic Manipulation with Affordance-guided Next View Selection
Authors: Xiaoqi Li, Yanzi Wang, Yan Shen, Ponomarenko Iaroslav, Haoran Lu, Qianxu Wang, Boshi An, Jiaming Liu, Hao Dong
Abstract
In the realm of future home-assistant robots, 3D articulated object manipulation is essential for enabling robots to interact with their environment. Many existing studies make use of 3D point clouds as the primary input for manipulation policies. However, this approach encounters challenges due to data sparsity and the significant cost associated with acquiring point cloud data, which can limit its practicality. In contrast, RGB images offer high-resolution observations using cost effective devices but lack spatial 3D geometric information. To overcome these limitations, we present a novel image-based robotic manipulation framework. This framework is designed to capture multiple perspectives of the target object and infer depth information to complement its geometry. Initially, the system employs an eye-on-hand RGB camera to capture an overall view of the target object. It predicts the initial depth map and a coarse affordance map. The affordance map indicates actionable areas on the object and serves as a constraint for selecting subsequent viewpoints. Based on the global visual prior, we adaptively identify the optimal next viewpoint for a detailed observation of the potential manipulation success area. We leverage geometric consistency to fuse the views, resulting in a refined depth map and a more precise affordance map for robot manipulation decisions. By comparing with prior works that adopt point clouds or RGB images as inputs, we demonstrate the effectiveness and practicality of our method. In the project webpage (https://sites.google.com/view/imagemanip), real world experiments further highlight the potential of our method for practical deployment.
A RISC-V MCU with adaptive reverse body bias and ultra-low-power retention mode in 22 nm FD-SOI
Authors: Heiner Bauer, Marco Stolba, Stefan Scholze, Dennis Walter, Christian Mayr, Alexander Oefelein, Sebastian Höppner, André Scharfe, Flo Schraut, Holger Eisenreich
Abstract
We present a low-power, energy efficient 32-bit RISC-V microprocessor unit (MCU) in 22 nm FD-SOI. It achieves ultra-low leakage,even at high temperatures, by using an adaptive reverse body biasing aware sign-off approach, a low-power optimized physical implementation, and custom SRAM macros with retention mode. We demonstrate the robustness of the chip with measurements over the full industrial temperature range, from -40 {\deg}C to 125 {\deg}C. Our results match the state of the art (SOTA) with 4.8 uW / MHz at 50 MHz in active mode and surpass the SOTA in ultra-low-power retention mode.
Training and Predicting Visual Error for Real-Time Applications
Authors: João Libório Cardoso, Bernhard Kerbl, Lei Yang, Yury Uralsky, Michael Wimmer
Abstract
Visual error metrics play a fundamental role in the quantification of perceived image similarity. Most recently, use cases for them in real-time applications have emerged, such as content-adaptive shading and shading reuse to increase performance and improve efficiency. A wide range of different metrics has been established, with the most sophisticated being capable of capturing the perceptual characteristics of the human visual system. However, their complexity, computational expense, and reliance on reference images to compare against prevent their generalized use in real-time, restricting such applications to using only the simplest available metrics. In this work, we explore the abilities of convolutional neural networks to predict a variety of visual metrics without requiring either reference or rendered images. Specifically, we train and deploy a neural network to estimate the visual error resulting from reusing shading or using reduced shading rates. The resulting models account for 70%-90% of the variance while achieving up to an order of magnitude faster computation times. Our solution combines image-space information that is readily available in most state-of-the-art deferred shading pipelines with reprojection from previous frames to enable an adequate estimate of visual errors, even in previously unseen regions. We describe a suitable convolutional network architecture and considerations for data preparation for training. We demonstrate the capability of our network to predict complex error metrics at interactive rates in a real-time application that implements content-adaptive shading in a deferred pipeline. Depending on the portion of unseen image regions, our approach can achieve up to $2\times$ performance compared to state-of-the-art methods.
Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling
Authors: Julien Demange-Chryst, François Bachoc, Jérôme Morio, Timothé Krauth
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Abstract
Probability density function estimation with weighted samples is the main foundation of all adaptive importance sampling algorithms. Classically, a target distribution is approximated either by a non-parametric model or within a parametric family. However, these models suffer from the curse of dimensionality or from their lack of flexibility. In this contribution, we suggest to use as the approximating model a distribution parameterised by a variational autoencoder. We extend the existing framework to the case of weighted samples by introducing a new objective function. The flexibility of the obtained family of distributions makes it as expressive as a non-parametric model, and despite the very high number of parameters to estimate, this family is much more efficient in high dimension than the classical Gaussian or Gaussian mixture families. Moreover, in order to add flexibility to the model and to be able to learn multimodal distributions, we consider a learnable prior distribution for the variational autoencoder latent variables. We also introduce a new pre-training procedure for the variational autoencoder to find good starting weights of the neural networks to prevent as much as possible the posterior collapse phenomenon to happen. At last, we explicit how the resulting distribution can be combined with importance sampling, and we exploit the proposed procedure in existing adaptive importance sampling algorithms to draw points from a target distribution and to estimate a rare event probability in high dimension on two multimodal problems.
Keyword: quantization
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Abstract
Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves the generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. We will release our code.
Towards End-to-end 4-Bit Inference on Generative Large Language Models
Authors: Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh
Abstract
We show that the majority of the inference computations for large generative models such as LLaMA and OPT can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups while at the same time maintaining good accuracy. We achieve this via a hybrid quantization strategy called QUIK, which compresses most of the weights and activations to 4-bit, while keeping some outlier weights and activations in higher-precision. Crucially, our scheme is designed with computational efficiency in mind: we provide GPU kernels with highly-efficient layer-wise runtimes, which lead to practical end-to-end throughput improvements of up to 3.1x relative to FP16 execution. Code and models are provided at https://github.com/IST-DASLab/QUIK.
Keyword: efficient
Deep Reinforcement Learning for Autonomous Vehicle Intersection Navigation
Time-vectorized numerical integration for systems of ODEs
Electrical Grid Anomaly Detection via Tensor Decomposition
Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach
Every Parameter Matters: Ensuring the Convergence of Federated Learning with Dynamic Heterogeneous Models Reduction
An Efficient Resilient MPC Scheme via Constraint Tightening against Cyberattacks: Application to Vehicle Cruise Control
ELDEN: Exploration via Local Dependencies
Heterophily-Based Graph Neural Network for Imbalanced Classification
A Zero-Shot Language Agent for Computer Control with Structured Reflection
AcTExplore: Active Tactile Exploration on Unknown Objects
Search-Adaptor: Text Embedding Customization for Information Retrieval
Constrained Bayesian Optimization with Adaptive Active Learning of Unknown Constraints
Tokenizer Choice For LLM Training: Negligible or Crucial?
Stabilizing Subject Transfer in EEG Classification with Divergence Estimation
Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Price of Stability in Quality-Aware Federated Learning
Analysis of Weather and Time Features in Machine Learning-aided ERCOT Load Forecasting
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models
Revisiting Multi-modal 3D Semantic Segmentation in Real-world Autonomous Driving
HybridChain: Fast, Accurate, and Secure Transaction Processing with Distributed Learning
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Gesture Recognition for FMCW Radar on the Edge
Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System
Convexification for a Coefficient Inverse Problem of Mean Field Games
Extending Multi-modal Contrastive Representations
InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems
A Hybrid Transfer Learning Assisted Decision Support System for Accurate Prediction of Alzheimer Disease
EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval
Scalarization for Multi-Task and Multi-Domain Learning at Scale
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Relation-aware Ensemble Learning for Knowledge Graph Embedding
Attacking The Assortativity Coefficient Under A Rewiring Strategy
TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light System
Weighted sparsity and sparse tensor networks for least squares approximation
CAMELL: Confidence-based Acquisition Model for Efficient Self-supervised Active Learning with Label Validation
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs
LRRU: Long-short Range Recurrent Updating Networks for Depth Completion
Generalization of splitting methods based on modified potentials to nonlinear evolution equations of parabolic and Schrödinger type
ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models
Big data-driven prediction of airspace congestion
UniParser: Multi-Human Parsing with Unified Correlation Representation Learning
Reroute Prediction Service
A Parallel Feature-preserving Mesh Variable Offsetting Method with Dynamic Programming
μ-DDRL: A QoS-Aware Distributed Deep Reinforcement Learning Technique for Service Offloading in Fog computing Environments
Credit Blockchain for Faster Transactions in P2P Energy Trading
Generative AI-driven Semantic Communication Framework for NextG Wireless Network
Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast
Survey on Near-Space Information Networks: Channel Modeling, Networking, and Transmission Perspectives
Subspace Adaptation Prior for Few-Shot Learning
DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control
A RISC-V MCU with adaptive reverse body bias and ultra-low-power retention mode in 22 nm FD-SOI
A Frustratingly Easy Plug-and-Play Detection-and-Reasoning Module for Chinese Spelling Check
Tikuna: An Ethereum Blockchain Network Security Monitoring System
Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling
Fast & Efficient Learning of Bayesian Networks from Data: Knowledge Discovery and Causality
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration
Towards End-to-end 4-Bit Inference on Generative Large Language Models
Transformer-based Multimodal Change Detection with Multitask Consistency Constraints
Keyword: faster
Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images
Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction
End-to-end Story Plot Generator
Online Adaptive Disparity Estimation for Dynamic Scenes in Structured Light Systems
μ-DDRL: A QoS-Aware Distributed Deep Reinforcement Learning Technique for Service Offloading in Fog computing Environments
Training and Predicting Visual Error for Real-Time Applications
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Keyword: mobile
Defect Analysis of 3D Printed Cylinder Object Using Transfer Learning Approaches
Exploring the relationship between response time sequence in scale answering process and severity of insomnia: a machine learning approach
Generative AI-driven Semantic Communication Framework for NextG Wireless Network
Survey on Near-Space Information Networks: Channel Modeling, Networking, and Transmission Perspectives
SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network
Timestamp-supervised Wearable-based Activity Segmentation and Recognition with Contrastive Learning and Order-Preserving Optimal Transport
Keyword: pruning
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Keyword: diffusion
Histogram- and Diffusion-Based Medical Out-of-Distribution Detection
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
DDMT: Denoising Diffusion Mask Transformer Models for Multivariate Time Series Anomaly Detection
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs
MINDE: Mutual Information Neural Diffusion Estimation
SUPG-stabilized stabilization-free VEM: a numerical investigation
Unseen Image Synthesis with Diffusion Models
Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy
Keyword: adaptive
Safe Deep Policy Adaptation
Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving
An Efficient Resilient MPC Scheme via Constraint Tightening against Cyberattacks: Application to Vehicle Cruise Control
Constrained Bayesian Optimization with Adaptive Active Learning of Unknown Constraints
DDMT: Denoising Diffusion Mask Transformer Models for Multivariate Time Series Anomaly Detection
Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Multi-level Adaptive Contrastive Learning for Knowledge Internalization in Dialogue Generation
LRRU: Long-short Range Recurrent Updating Networks for Depth Completion
PAGE: Equilibrate Personalization and Generalization in Federated Learning
Reroute Prediction Service
Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding
DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control
ImageManip: Image-based Robotic Manipulation with Affordance-guided Next View Selection
A RISC-V MCU with adaptive reverse body bias and ultra-low-power retention mode in 22 nm FD-SOI
Training and Predicting Visual Error for Real-Time Applications
Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling
Keyword: quantization
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Towards End-to-end 4-Bit Inference on Generative Large Language Models