Abstract
Aircraft design is an iterative process that requires an estimation of aerodynamic characteristics including drag and lift coefficients, stall behavior, velocity, and pressure profiles repeatedly, especially in the conceptual design phase. PanAir is a high-order aerodynamic panel method-based algorithm developed as a part of the Public Domain Aeronautical Software program mostly with NASA sponsorship. It is based upon potential flow theory and is used to numerically compute lift, induced drag, and moment coefficients of the aircraft in both subsonic and supersonic flight regimes. Quoting from developers it is the most versatile and accurate of all the linear theory panel codes. PanAir is a classical software that requires geometric data as an input in the form of a PanAir-compatible format; however, commonly used Computer-Aided Design Software packages no longer conform to the PanAir input format. Likewise, PanAir produces its output in a PanAir-specific output file which is not compatible with commonly used visualization software. The input geometry required by PanAir and its output, therefore, involves significant manual preand post-processing using intermediary software. The work proposed here is an automated pre and post-processor to be used together with PanAir. With the environment proposed in this work, manipulation of input and output data using several intermediary software to and from PanAir is bypassed successfully. The proposed environment is validated over a Cessna 210 aircraft geometry with a modified NLF (1)-0414 airfoil. The aircraft is numerically analyzed using PanAir together
Improved Shortest Path Restoration Lemmas for Multiple Edge Failures: Trade-offs Between Fault-tolerance and Subpaths
Authors: Greg Bodwin, Lily Wang
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
Abstract
The restoration lemma is a classic result by Afek, Bremler-Barr, Kaplan, Cohen, and Merritt [PODC '01], which relates the structure of shortest paths in a graph $G$ before and after some edges in the graph fail. Their work shows that, after one edge failure, any replacement shortest path avoiding this failing edge can be partitioned into two pre-failure shortest paths. More generally, this implies an additive tradeoff between fault tolerance and subpath count: for any $f, k$, we can partition any $f$-edge-failure replacement shortest path into $k+1$ subpaths which are each an $(f-k)$-edge-failure replacement shortest path. This generalized result has found applications in routing, graph algorithms, fault tolerant network design, and more. Our main result improves this to a multiplicative tradeoff between fault tolerance and subpath count. We show that for all $f, k$, any $f$-edge-failure replacement path can be partitioned into $O(k)$ subpaths that are each an $(f/k)$-edge-failure replacement path. We also show an asymptotically matching lower bound. In particular, our results imply that the original restoration lemma is exactly tight in the case $k=1$, but can be significantly improved for larger $k$. We also show an extension of this result to weighted input graphs, and we give efficient algorithms that compute path decompositions satisfying our improved restoration lemmas.
Fast Safe Rectangular Corridor-based Online AGV Trajectory Optimization with Obstacle Avoidance
Authors: Shaoqiang Liang, Songyuan Fa, Zong Chen, Yiqun Li
Abstract
Automated Guided Vehicles (AGVs) are widely adopted in various industries due to their efficiency and adaptability. However, safely deploying AGVs in dynamic environments remains a significant challenge. This paper introduces an online trajectory optimization framework, the Fast Safe Rectangular Corridor (FSRC), designed for AGVs in obstacle-rich settings. The primary challenge is efficiently planning trajectories that prioritize safety and collision avoidance. To tackle this challenge, the FSRC algorithm constructs convex regions, represented as rectangular corridors, to address obstacle avoidance constraints within an optimal control problem. This conversion from non-convex to box constraints improves the collision avoidance efficiency and quality. Additionally, the Modified Visibility Graph algorithm speeds up path planning, and a boundary discretization strategy expedites FSRC construction. The framework also includes a dynamic obstacle avoidance strategy for real-time adaptability. Our framework's effectiveness and superiority have been demonstrated in experiments, particularly in computational efficiency (see Fig. \ref{fig:case1} and \ref{fig:case23}). Compared to state-of-the-art frameworks, our trajectory planning framework significantly enhances computational efficiency, ranging from 1 to 2 orders of magnitude (see Table \ref{tab:res}). Notably, the FSRC algorithm outperforms other safe convex corridor-based methods, substantially improving computational efficiency by 1 to 2 orders of magnitude (see Table \ref{tab:FRSC}).
Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures
Authors: Johnathan Alsop, Shaizeen Aga, Mohamed Ibrahim, Mahzabeen Islam, Andrew Mccrabb, Nuwan Jayasena
Abstract
Continual demand for memory bandwidth has made it worthwhile for memory vendors to reassess processing in memory (PIM), which enables higher bandwidth by placing compute units in/near-memory. As such, memory vendors have recently proposed commercially viable PIM designs. However, these proposals are largely driven by the needs of (a narrow set of) machine learning (ML) primitives. While such proposals are reasonable given the the growing importance of ML, as memory is a pervasive component, %in this work, we make there is a case for a more inclusive PIM design that can accelerate primitives across domains. In this work, we ascertain the capabilities of commercial PIM proposals to accelerate various primitives across domains. We first begin with outlining a set of characteristics, termed PIM-amenability-test, which aid in assessing if a given primitive is likely to be accelerated by PIM. Next, we apply this test to primitives under study to ascertain efficient data-placement and orchestration to map the primitives to underlying PIM architecture. We observe here that, even though primitives under study are largely PIM-amenable, existing commercial PIM proposals do not realize their performance potential for these primitives. To address this, we identify bottlenecks that arise in PIM execution and propose hardware and software optimizations which stand to broaden the acceleration reach of commercial PIM designs (improving average PIM speedups from 1.12x to 2.49x relative to a GPU baseline). Overall, while we believe emerging commercial PIM proposals add a necessary and complementary design point in the application acceleration space, hardware-software co-design is necessary to deliver their benefits broadly.
Abstract
Text-to-image diffusion models understand spatial relationship between objects, but do they represent the true 3D structure of the world from only 2D supervision? We demonstrate that yes, 3D knowledge is encoded in 2D image diffusion models like Stable Diffusion, and we show that this structure can be exploited for 3D vision tasks. Our method, Viewpoint Neural Textual Inversion (ViewNeTI), controls the 3D viewpoint of objects in generated images from frozen diffusion models. We train a small neural mapper to take camera viewpoint parameters and predict text encoder latents; the latents then condition the diffusion generation process to produce images with the desired camera viewpoint. ViewNeTI naturally addresses Novel View Synthesis (NVS). By leveraging the frozen diffusion model as a prior, we can solve NVS with very few input views; we can even do single-view novel view synthesis. Our single-view NVS predictions have good semantic details and photorealism compared to prior methods. Our approach is well suited for modeling the uncertainty inherent in sparse 3D vision problems because it can efficiently generate diverse samples. Our view-control mechanism is general, and can even change the camera view in images generated by user-defined prompts.
Bipedal Walking on Constrained Footholds with MPC Footstep Control
Abstract
Bipedal robots promise the ability to traverse rough terrain quickly and efficiently, and indeed, humanoid robots can now use strong ankles and careful foot placement to traverse discontinuous terrain. However, more agile underactuated bipeds have small feet and weak ankles, and must constantly adjust their planned footstep position to maintain balance. We introduce a new model-predictive footstep controller which jointly optimizes over the robot's discrete choice of stepping surface, impending footstep position sequence, ankle torque in the sagittal plane, and center of mass trajectory, to track a velocity command. The controller is formulated as a single Mixed Integer Quadratic Program (MIQP) which is solved at 50-200 Hz, depending on terrain complexity. We implement a state of the art real-time elevation mapping and convex terrain decomposition framework to inform the controller of its surroundings in the form on convex polygons representing steppable terrain. We investigate the capabilities and challenges of our approach through hardware experiments on the underactuated biped Cassie.
Test Case Generation and Test Oracle Support for Testing CPSs using Hybrid Models
Abstract
Cyber-Physical Systems (CPSs) play a central role in the behavior of a wide range of autonomous physical systems such as medical devices, autonomous vehicles, and smart homes, many of which are safety-critical. CPSs are often specified iteratively as a sequence of models at different levels that can be tested via simulation systems at early stages of their development cycle. One such model is a hybrid automaton; these are used frequently for CPS applications and have the advantage of encapsulating both continuous and discrete CPS behaviors. When testing CPSs, engineers can take advantage of these models to generate test cases that target both types of these behaviors. Moreover, since these models are constructed early in the development process for CPSs, they allow test cases to be generated early in that process for those CPSs, even before simulation models of the CPSs have been designed. One challenge when testing CPSs is that these systems may operate differently even under an identically applied test scenario. In such cases, we cannot employ test oracles that use predetermined deterministic behaviors; instead, test oracles should consider sets of desired behaviors in order to determine whether the CPS has behaved appropriately. In this paper we present a test case generation technique, HYTEST, that generates test cases based on hybrid models, accompanied by appropriate test oracles, for use in testing CPSs early in their development cycle. To evaluate the effectiveness and efficiency of HYTEST, we conducted an empirical study in which we applied the technique to several CPSs and measured its ability to detect faults in those CPSs and the amount of time required to perform the testing process. The results of the study show that HYTEST was able to detect faults more effectively and efficiently than the baseline techniques we compare it to.
Efficient online update of model predictive control in embedded systems using first-order methods
Authors: Victor Gracia, Pablo Krupa, Teodoro Alamo, Daniel Limon
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Model Predictive Control (MPC) is typically characterized for being computationally demanding, as it requires solving optimization problems online; a particularly relevant point when considering its implementation in embedded systems. To reduce the computational burden of the optimization algorithm, most solvers perform as many offline operations as possible, typically performing the computation and factorization of its expensive matrices offline and then storing them in the embedded system. This improves the efficiency of the solver, with the disadvantage that online changes on some of the ingredients of the MPC formulation require performing these expensive computations online. This article presents an efficient algorithm for the factorization of the key matrix used in several first-order optimization methods applied to linear MPC formulations, allowing its prediction model and cost function matrices to be updated online at the expense of a small computational cost. We show results comparing the proposed approach with other solvers from the literature applied to a linear time-varying system.
Traveling Waves Encode the Recent Past and Enhance Sequence Learning
Authors: T. Anderson Keller, Lyle Muller, Terrence Sejnowski, Max Welling
Abstract
Traveling waves of neural activity have been observed throughout the brain at a diversity of regions and scales; however, their precise computational role is still debated. One physically grounded hypothesis suggests that the cortical sheet may act like a wave-field capable of storing a short-term memory of sequential stimuli through induced waves traveling across the cortical surface. To date, however, the computational implications of this idea have remained hypothetical due to the lack of a simple recurrent neural network architecture capable of exhibiting such waves. In this work, we introduce a model to fill this gap, which we denote the Wave-RNN (wRNN), and demonstrate how both connectivity constraints and initialization play a crucial role in the emergence of wave-like dynamics. We then empirically show how such an architecture indeed efficiently encodes the recent past through a suite of synthetic memory tasks where wRNNs learn faster and perform significantly better than wave-free counterparts. Finally, we explore the implications of this memory storage system on more complex sequence modeling tasks such as sequential image classification and find that wave-based models not only again outperform comparable wave-free RNNs while using significantly fewer parameters, but additionally perform comparably to more complex gated architectures such as LSTMs and GRUs. We conclude with a discussion of the implications of these results for both neuroscience and machine learning.
VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research
Authors: Sarina Meyer, Xiaoxiao Miao, Ngoc Thang Vu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity of evaluation and the absence of user-friendly research frameworks. We therefore propose an efficient speaker anonymization and evaluation framework based on a modular and easily extendable structure, almost fully in Python. The framework facilitates the orchestration of several anonymization approaches in parallel and allows for interfacing between different techniques. Furthermore, we propose modifications to common evaluation methods which make the evaluation more powerful and reduces their computation time by 65 to 95\%, depending on the metric. Our code is fully open source.
A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling
Authors: Charles Dawson, Chuchu Fan
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Before autonomous systems can be deployed in safety-critical applications, we must be able to understand and verify the safety of these systems. For cases where the risk or cost of real-world testing is prohibitive, we propose a simulation-based framework for a) predicting ways in which an autonomous system is likely to fail and b) automatically adjusting the system's design to preemptively mitigate those failures. We frame this problem through the lens of approximate Bayesian inference and use differentiable simulation for efficient failure case prediction and repair. We apply our approach on a range of robotics and control problems, including optimizing search patterns for robot swarms and reducing the severity of outages in power transmission networks. Compared to optimization-based falsification techniques, our method predicts a more diverse, representative set of failure modes, and we also find that our use of differentiable simulation yields solutions that have up to 10x lower cost and requires up to 2x fewer iterations to converge relative to gradient-free techniques. Code and videos can be found at https://mit-realm.github.io/breaking-things/
SSL-Net: A Synergistic Spectral and Learning-based Network for Efficient Bird Sound Classification
Authors: Yiyuan Yang, Kaichen Zhou, Niki Trigoni, Andrew Markham
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
Efficient and accurate bird sound classification is of important for ecology, habitat protection and scientific research, as it plays a central role in monitoring the distribution and abundance of species. However, prevailing methods typically demand extensively labeled audio datasets and have highly customized frameworks, imposing substantial computational and annotation loads. In this study, we present an efficient and general framework called SSL-Net, which combines spectral and learned features to identify different bird sounds. Encouraging empirical results gleaned from a standard field-collected bird audio dataset validate the efficacy of our method in extracting features efficiently and achieving heightened performance in bird sound classification, even when working with limited sample sizes. Furthermore, we present three feature fusion strategies, aiding engineers and researchers in their selection through quantitative analysis.
Low-rank Tensor Train Decomposition Using TensorSketch
Abstract
Tensor train decomposition is one of the most powerful approaches for processing high-dimensional data. For low-rank tensor train decomposition of large tensors, the alternating least squares (ALS) algorithm is widely used by updating each core tensor alternatively. However, it may suffer from the curse of dimensionality due to the large scale of subproblems. In this paper, a novel randomized proximal ALS algorithm is proposed for low-rank tensor train decomposition by using TensorSketch, which allows for efficient implementation via fast Fourier transform. The theoretical lower bounds of sketch size are estimated for approximating the optimal value of subproblems. Numerical experiments on synthetic and real-world data also demonstrate the effectiveness and efficiency of the proposed algorithm.
MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces
Abstract
Due to their highly structured characteristics, faces are easier to recover than natural scenes for blind image super-resolution. Therefore, we can extract the degradation representation of an image from the low-quality and recovered face pairs. Using the degradation representation, realistic low-quality images can then be synthesized to fine-tune the super-resolution model for the real-world low-quality image. However, such a procedure is time-consuming and laborious, and the gaps between recovered faces and the ground-truths further increase the optimization uncertainty. To facilitate efficient model adaptation towards image-specific degradations, we propose a method dubbed MetaF2N, which leverages the contained Faces to fine-tune model parameters for adapting to the whole Natural image in a Meta-learning framework. The degradation extraction and low-quality image synthesis steps are thus circumvented in our MetaF2N, and it requires only one fine-tuning step to get decent performance. Considering the gaps between the recovered faces and ground-truths, we further deploy a MaskNet for adaptively predicting loss weights at different positions to reduce the impact of low-confidence areas. To evaluate our proposed MetaF2N, we have collected a real-world low-quality dataset with one or multiple faces in each image, and our MetaF2N achieves superior performance on both synthetic and real-world datasets. Source code, pre-trained models, and collected datasets are available at https://github.com/yinzhicun/MetaF2N.
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Abstract
This paper proposes a method for extracting a lightweight subset from a text-to-speech (TTS) corpus ensuring synthetic speech quality. In recent years, methods have been proposed for constructing large-scale TTS corpora by collecting diverse data from massive sources such as audiobooks and YouTube. Although these methods have gained significant attention for enhancing the expressive capabilities of TTS systems, they often prioritize collecting vast amounts of data without considering practical constraints like storage capacity and computation time in training, which limits the available data quantity. Consequently, the need arises to efficiently collect data within these volume constraints. To address this, we propose a method for selecting the core subset~(known as \textit{core-set}) from a TTS corpus on the basis of a \textit{diversity metric}, which measures the degree to which a subset encompasses a wide range. Experimental results demonstrate that our proposed method performs significantly better than the baseline phoneme-balanced data selection across language and corpus size.
AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness
Authors: Liyao Jiang, Chenglin Li, Haolan Chen, Xiaodong Gao, Xinwang Zhong, Yang Qiu, Shani Ye, Di Niu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Online advertisements are important elements in e-commerce sites, social media platforms, and search engines. With the increasing popularity of mobile browsing, many online ads are displayed with visual information in the form of a cover image in addition to text descriptions to grab the attention of users. Various recent studies have focused on predicting the click rates of online advertisements aware of visual features or composing optimal advertisement elements to enhance visibility. In this paper, we propose Advertisement Style Editing and Attractiveness Enhancement (AdSEE), which explores whether semantic editing to ads images can affect or alter the popularity of online advertisements. We introduce StyleGAN-based facial semantic editing and inversion to ads images and train a click rate predictor attributing GAN-based face latent representations in addition to traditional visual and textual features to click rates. Through a large collected dataset named QQ-AD, containing 20,527 online ads, we perform extensive offline tests to study how different semantic directions and their edit coefficients may impact click rates. We further design a Genetic Advertisement Editor to efficiently search for the optimal edit directions and intensity given an input ad cover image to enhance its projected click rates. Online A/B tests performed over a period of 5 days have verified the increased click-through rates of AdSEE-edited samples as compared to a control group of original ads, verifying the relation between image styles and ad popularity. We open source the code for AdSEE research at https://github.com/LiyaoJiang1998/adsee.
Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval
Authors: Rui Deng, Qian Wu, Yuke Li, Haoran Fu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Optimizing video inference efficiency has become increasingly important with the growing demand for video analysis in various fields. Some existing methods achieve high efficiency by explicit discard of spatial or temporal information, which poses challenges in fast-changing and fine-grained scenarios. To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations. Specifically, we leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features, refining and updating the features into a high-low resolution video sequence. To process the new sequence, we introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions, while reducing spatial computation costs quadratically by utilizing fewer spatial tokens in low-resolution non-saliency frames. The entire network can be end-to-end optimized via the integration of the differentiable compression module. Experimental results show that our method achieves the best trade-off between efficiency and performance on near-duplicate video retrieval and competitive results on dynamic video classification compared to state-of-the-art methods. Code:https://github.com/dun-research/DRCA
FedJudge: Federated Legal Large Language Model
Authors: Linan Yue, Qi Liu, Yichao Du, Weibo Gao, Ye Liu, Fangzhou Yao
Abstract
Large Language Models (LLMs) have gained prominence in the field of Legal Intelligence, offering potential applications in assisting legal professionals and laymen. However, the centralized training of these Legal LLMs raises data privacy concerns, as legal data is distributed among various institutions containing sensitive individual information. This paper addresses this challenge by exploring the integration of Legal LLMs with Federated Learning (FL) methodologies. By employing FL, Legal LLMs can be fine-tuned locally on devices or clients, and their parameters are aggregated and distributed on a central server, ensuring data privacy without directly sharing raw data. However, computation and communication overheads hinder the full fine-tuning of LLMs under the FL setting. Moreover, the distribution shift of legal data reduces the effectiveness of FL methods. To this end, in this paper, we propose the first Federated Legal Large Language Model (FedJudge) framework, which fine-tunes Legal LLMs efficiently and effectively. Specifically, FedJudge utilizes parameter-efficient fine-tuning methods to update only a few additional parameters during the FL training. Besides, we explore the continual learning methods to preserve the global model's important parameters when training local clients to mitigate the problem of data shifts. Extensive experimental results on three real-world datasets clearly validate the effectiveness of FedJudge. Code is released at https://github.com/yuelinan/FedJudge.
A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge
Authors: Longwei Huang, Chao Fang, Qiong Li, Jun Lin, Zhongfeng Wang
Abstract
Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts of energy, memory, and computing resources. However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle the challenges above, we propose a precision-scalable RISC-V DNN processor with on-device learning capability. It facilitates diverse precision levels of fixed-point DNN inference, spanning from 2-bit to 16-bit, and enhances on-device learning through improved support with FP16 operations. Moreover, we employ multiple methods such as FP16 multiplier reuse and multi-precision integer multiplier reuse, along with balanced mapping of FPGA resources, to significantly improve hardware resource utilization. Experimental results on the Xilinx ZCU102 FPGA show that our processor significantly improves inference throughput by 1.6$\sim$14.6$\times$ and energy efficiency by 1.1$\sim$14.6$\times$ across various DNNs, compared to the prior art, XpulpNN. Additionally, our processor achieves a 16.5$\times$ higher FP throughput for on-device learning.
TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification
Authors: Yiqiang Cai, Peihong Zhang, Shengchen Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
Recent studies focus on developing efficient systems for acoustic scene classification (ASC) using convolutional neural networks (CNNs), which typically consist of consecutive kernels. This paper highlights the benefits of using separate kernels as a more powerful and efficient design approach in ASC tasks. Inspired by the time-frequency nature of audio signals, we propose TF-SepNet, a CNN architecture that separates the feature processing along the time and frequency dimensions. Features resulted from the separate paths are then merged by channels and directly forwarded to the classifier. Instead of the conventional two dimensional (2D) kernel, TF-SepNet incorporates one dimensional (1D) kernels to reduce the computational costs. Experiments have been conducted using the TAU Urban Acoustic Scene 2022 Mobile development dataset. The results show that TF-SepNet outperforms similar state-of-the-arts that use consecutive kernels. A further investigation reveals that the separate kernels lead to a larger effective receptive field (ERF), which enables TF-SepNet to capture more time-frequency features.
Abstract
Quasi-quotation (or, code templates) has long been used as a convenient tool for code generation, commonly implemented as a pre-processing/translation into code-generation combinators. The original MetaOCaml was also based on such translation, done post type checking. BER MetaOCaml employs a significantly different, efficient (especially in version N114) translation integrated with type-checking, in the least intrusive way. This paper presents the integrated efficient translation for the first time.
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Authors: Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract
Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since the Conformer was designed for sequence-to-sequence tasks, its direct application to ADD tasks may be sub-optimal. To tackle this limitation, we propose HM-Conformer by adopting two components: (1) Hierarchical pooling method progressively reducing the sequence length to eliminate duplicated information (2) Multi-level classification token aggregation method utilizing classification tokens to gather information from different blocks. Owing to these components, HM-Conformer can efficiently detect spoofing evidence by processing various sequence lengths and aggregating them. In experimental results on the ASVspoof 2021 Deepfake dataset, HM-Conformer achieved a 15.71% EER, showing competitive performance compared to recent systems.
Practical Program Repair via Preference-based Ensemble Strategy
Authors: Wenkang Zhong, Chuanyi Li, Kui Liu, Tongtong Xu, Tegawendé F. Bissyandé, Jidong Ge, Bin Luo, Vincent Ng
Abstract
To date, over 40 Automated Program Repair (APR) tools have been designed with varying bug-fixing strategies, which have been demonstrated to have complementary performance in terms of being effective for different bug classes. Intuitively, it should be feasible to improve the overall bug-fixing performance of APR via assembling existing tools. Unfortunately, simply invoking all available APR tools for a given bug can result in unacceptable costs on APR execution as well as on patch validation (via expensive testing). Therefore, while assembling existing tools is appealing, it requires an efficient strategy to reconcile the need to fix more bugs and the requirements for practicality. In light of this problem, we propose a Preference-based Ensemble Program Repair framework (P-EPR), which seeks to effectively rank APR tools for repairing different bugs. P-EPR is the first non-learning-based APR ensemble method that is novel in its exploitation of repair patterns as a major source of knowledge for ranking APR tools and its reliance on a dynamic update strategy that enables it to immediately exploit and benefit from newly derived repair results. Experimental results show that P-EPR outperforms existing strategies significantly both in flexibility and effectiveness.
Astrocyte-Integrated Dynamic Function Exchange in Spiking Neural Networks
Authors: Murat Isik, Kayode Inadagbo
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
This paper presents an innovative methodology for improving the robustness and computational efficiency of Spiking Neural Networks (SNNs), a critical component in neuromorphic computing. The proposed approach integrates astrocytes, a type of glial cell prevalent in the human brain, into SNNs, creating astrocyte-augmented networks. To achieve this, we designed and implemented an astrocyte model in two distinct platforms: CPU/GPU and FPGA. Our FPGA implementation notably utilizes Dynamic Function Exchange (DFX) technology, enabling real-time hardware reconfiguration and adaptive model creation based on current operating conditions. The novel approach of leveraging astrocytes significantly improves the fault tolerance of SNNs, thereby enhancing their robustness. Notably, our astrocyte-augmented SNN displays near-zero latency and theoretically infinite throughput, implying exceptional computational efficiency. Through comprehensive comparative analysis with prior works, it's established that our model surpasses others in terms of neuron and synapse count while maintaining an efficient power consumption profile. These results underscore the potential of our methodology in shaping the future of neuromorphic computing, by providing robust and energy-efficient systems.
PRIEST: Projection Guided Sampling-Based Optimization For Autonomous Navigation
Abstract
Efficient navigation in unknown and dynamic environments is crucial for expanding the application domain of mobile robots. The core challenge stems from the nonavailability of a feasible global path for guiding optimization-based local planners. As a result, existing local planners often get trapped in poor local minima. In this paper, we present a novel optimizer that can explore multiple homotopies to plan high-quality trajectories over long horizons while still being fast enough for real-time applications. We build on the gradient-free paradigm by augmenting the trajectory sampling strategy with a projection optimization that guides the samples toward a feasible region. As a result, our approach can recover from the frequently encountered pathological cases wherein all the sampled trajectories lie in the high-cost region. Furthermore, we also show that our projection optimization has a highly parallelizable structure that can be easily accelerated over GPUs. We push the state-of-the-art in the following respects. Over the navigation stack of the Robot Operating System (ROS), we show an improvement of 7-13% in success rate and up to two times in total travel time metric. On the same benchmarks and metrics, our approach achieves up to 44% improvement over MPPI and its recent variants. On simple point-to-point navigation tasks, our optimizer is up to two times more reliable than SOTA gradient-based solvers, as well as sampling-based approaches such as the Cross-Entropy Method (CEM) and VPSTO. Codes: https://github.com/fatemeh-rastgar/PRIEST
A Real-time Faint Space Debris Detector With Learning-based LCM
Authors: Zherui Lu, Gangyi Wang, Xinguo Wei, Jian Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
With the development of aerospace technology, the increasing population of space debris has posed a great threat to the safety of spacecraft. However, the low intensity of reflected light and high angular velocity of space debris impede the extraction. Besides, due to the limitations of the ground observation methods, small space debris can hardly be detected, making it necessary to enhance the spacecraft's capacity for space situational awareness (SSA). Considering that traditional methods have some defects in low-SNR target detection, such as low effectiveness and large time consumption, this paper proposes a method for low-SNR streak extraction based on local contrast and maximum likelihood estimation (MLE), which can detect space objects with SNR 2.0 efficiently. In the proposed algorithm, local contrast will be applied for crude classifications, which will return connected components as preliminary results, and then MLE will be performed to reconstruct the connected components of targets via orientated growth, further improving the precision. The algorithm has been verified with both simulated streaks and real star tracker images, and the average centroid error of the proposed algorithm is close to the state-of-the-art method like ODCC. At the same time, the algorithm in this paper has significant advantages in efficiency compared with ODCC. In conclusion, the algorithm in this paper is of high speed and precision, which guarantees its promising applications in the extraction of high dynamic targets.
Verifiable Privacy-Preserving Computing
Authors: Tariq Bontekoe, Dimka Karastoyanova, Fatih Turkmen
Abstract
Privacy-enhancing technologies (PETs), such as secure multi-party computation (MPC) and homomorphic encryption (HE), are deployed increasingly often to guarantee data confidentiality in computations over private, distributed data. Similarly, we observe a steep increase in the adoption of zero-knowledge proofs (ZKPs) to guarantee (public) verifiability of locally executed computations. We project that applications that are data intensive and require strong privacy guarantees, are also likely to require correctness guarantees. While the combination of methods for (public) verifiability and privacy protection has clear significance, many attempts are far from practical adoption. In this work, we analyze existing solutions that add (public) verifiability to privacy-preserving computations over distributed data, in order to preserve confidentiality and guarantee correctness. To determine the required security and usability properties and whether these are satisfied, we look at various application areas including verifiable outsourcing, distributed ledger technology (DLT), and genomics. We then classify the solutions and describe frequently used approaches as well as efficiency metrics. Last but not least, we identify open challenges and discuss directions for future research that make verifiable, privacy-preserving computations more secure, efficient, and applicable in the real world.
Sampling-Free Probabilistic Deep State-Space Models
Authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters
Abstract
Many real-world dynamical systems can be described as State-Space Models (SSMs). In this formulation, each observation is emitted by a latent state, which follows first-order Markovian dynamics. A Probabilistic Deep SSM (ProDSSM) generalizes this framework to dynamical systems of unknown parametric form, where the transition and emission models are described by neural networks with uncertain weights. In this work, we propose the first deterministic inference algorithm for models of this type. Our framework allows efficient approximations for training and testing. We demonstrate in our experiments that our new method can be employed for a variety of tasks and enjoys a superior balance between predictive performance and computational budget.
Structural Self-Supervised Objectives for Transformers
Authors: Luca Di Liello
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract
This thesis focuses on improving the pre-training of natural language models using unsupervised raw data to make them more efficient and aligned with downstream applications. In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution (C-RTS), and Swapped Language Modeling (SLM). These objectives involve token swapping instead of masking, with RTS and C-RTS aiming to predict token originality and SLM predicting the original token values. Results show that RTS and C-RTS require less pre-training time while maintaining performance comparable to MLM. Surprisingly, SLM outperforms MLM on certain tasks despite using the same computational budget. In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream applications, reducing the need for labeled data. We use large corpora like Wikipedia and CC-News to train models to recognize if text spans originate from the same paragraph or document in several ways. By doing continuous pre-training, starting from existing models like RoBERTa, ELECTRA, DeBERTa, BART, and T5, we demonstrate significant performance improvements in tasks like Fact Verification, Answer Sentence Selection, and Summarization. These improvements are especially pronounced when limited annotation data is available. The proposed objectives also achieve state-of-the-art results on various benchmark datasets, including FEVER (dev set), ASNQ, WikiQA, and TREC-QA, as well as enhancing the quality of summaries. Importantly, these techniques can be easily integrated with other methods without altering the internal structure of Transformer models, making them versatile for various NLP applications.
On Sparse Grid Interpolation for American Option Pricing with Multiple Underlying Assets
Abstract
In this work, we develop a novel efficient quadrature and sparse grid based polynomial interpolation method to price American options with multiple underlying assets. The approach is based on first formulating the pricing of American options using dynamic programming, and then employing static sparse grids to interpolate the continuation value function at each time step. To achieve high efficiency, we first transform the domain from $\mathbb{R}^d$ to $(-1,1)^d$ via a scaled tanh map, and then remove the boundary singularity of the resulting multivariate function over $(-1,1)^d$ by a bubble function and simultaneously, to significantly reduce the number of interpolation points. We rigorously establish that with a proper choice of the bubble function, the resulting function has bounded mixed derivatives up to a certain order, which provides theoretical underpinnings for the use of sparse grids. Numerical experiments for American arithmetic and geometric basket put options with the number of underlying assets up to 16 are presented to validate the effectiveness of the approach.
Optimal Mobility and Communication Strategy to Maximize the Value of Information in IoT Networks
Authors: Zijing Wang, Mihai-Alin Badiu, Justin P. Coon
Abstract
The Internet of Things (IoT) is an emerging next-generation technology in the fourth industrial revolution. In industrial IoT networks, sensing devices are largely deployed to monitor various types of physical processes. They are required to transmit the collected data in a timely manner to support real-time monitoring, control and automation. The timeliness of information is very important in such systems. Recently, an information-theoretic metric named the "value of information" (VoI) has been proposed to measure the usefulness of information. In this work, we consider an industrial IoT network with a set of heterogeneous sensing devices and an intelligent mobile entity. The concept of the value of information is applied to study a joint path planning and user scheduling optimisation problem. We aim to maximise the network-level VoI under mobility and communication constraints. We formulate this problem as a Markov decision process (MDP), and an efficient algorithm based on reinforcement learning is proposed to solve this problem. Through numerical results, we show that the proposed method is able to capture the usefulness of data from both time and space dimensions. By exploiting the correlation property of the data source, the proposed method is suitable for applications in resource-limited networks.
Uncertainty-bounded Active Monitoring of Unknown Dynamic Targets in Road-networks with Minimum Fleet
Abstract
Fleets of unmanned robots can be beneficial for the long-term monitoring of large areas, e.g., to monitor wild flocks, detect intruders, search and rescue. Monitoring numerous dynamic targets in a collaborative and efficient way is a challenging problem that requires online coordination and information fusion. The majority of existing works either assume a passive all-to-all observation model to minimize the summed uncertainties over all targets by all robots, or optimize over the jointed discrete actions while neglecting the dynamic constraints of the robots and unknown behaviors of the targets. This work proposes an online task and motion coordination algorithm that ensures an explicitly-bounded estimation uncertainty for the target states, while minimizing the average number of active robots. The robots have a limited-range perception to actively track a limited number of targets simultaneously, of which their future control decisions are all unknown. It includes: (i) the assignment of monitoring tasks, modeled as a flexible size multiple vehicle routing problem with time windows (m-MVRPTW), given the predicted target trajectories with uncertainty measure in the road-networks; (ii) the nonlinear model predictive control (NMPC) for optimizing the robot trajectories under uncertainty and safety constraints. It is shown that the robots can switch between active and inactive roles dynamically online as required by the unknown monitoring task. The proposed methods are validated via large-scale simulations of up to $100$ robots and targets.
An Efficient Wide-Range Pseudo-3D Vehicle Detection Using A Single Camera
Authors: Zhupeng Ye, Yinqi Li, Zejian Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Wide-range and fine-grained vehicle detection plays a critical role in enabling active safety features in intelligent driving systems. However, existing vehicle detection methods based on rectangular bounding boxes (BBox) often struggle with perceiving wide-range objects, especially small objects at long distances. And BBox expression cannot provide detailed geometric shape and pose information of vehicles. This paper proposes a novel wide-range Pseudo-3D Vehicle Detection method based on images from a single camera and incorporates efficient learning methods. This model takes a spliced image as input, which is obtained by combining two sub-window images from a high-resolution image. This image format maximizes the utilization of limited image resolution to retain essential information about wide-range vehicle objects. To detect pseudo-3D objects, our model adopts specifically designed detection heads. These heads simultaneously output extended BBox and Side Projection Line (SPL) representations, which capture vehicle shapes and poses, enabling high-precision detection. To further enhance the performance of detection, a joint constraint loss combining both the object box and SPL is designed during model training, improving the efficiency, stability, and prediction accuracy of the model. Experimental results on our self-built dataset demonstrate that our model achieves favorable performance in wide-range pseudo-3D vehicle detection across multiple evaluation metrics. Our demo video has been placed at https://www.youtube.com/watch?v=1gk1PmsQ5Q8.
Efficient Graphics Representation with Differentiable Indirection
Abstract
We introduce differentiable indirection -- a novel learned primitive that employs differentiable multi-scale lookup tables as an effective substitute for traditional compute and data operations across the graphics pipeline. We demonstrate its flexibility on a number of graphics tasks, i.e., geometric and image representation, texture mapping, shading, and radiance field representation. In all cases, differentiable indirection seamlessly integrates into existing architectures, trains rapidly, and yields both versatile and efficient results.
Bayes-Optimal Estimation in Generalized Linear Models via Spatial Coupling
Authors: Pablo Pascual Cobo, Kuan Hsieh, Ramji Venkataramanan
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
We consider the problem of signal estimation in a generalized linear model (GLM). GLMs include many canonical problems in statistical estimation, such as linear regression, phase retrieval, and 1-bit compressed sensing. Recent work has precisely characterized the asymptotic minimum mean-squared error (MMSE) for GLMs with i.i.d. Gaussian sensing matrices. However, in many models there is a significant gap between the MMSE and the performance of the best known feasible estimators. In this work, we address this issue by considering GLMs defined via spatially coupled sensing matrices. We propose an efficient approximate message passing (AMP) algorithm for estimation and prove that with a simple choice of spatially coupled design, the MSE of a carefully tuned AMP estimator approaches the asymptotic MMSE in the high-dimensional limit. To prove the result, we first rigorously characterize the asymptotic performance of AMP for a GLM with a generic spatially coupled design. This characterization is in terms of a deterministic recursion (`state evolution') that depends on the parameters defining the spatial coupling. Then, using a simple spatially coupled design and judicious choice of functions defining the AMP, we analyze the fixed points of the resulting state evolution and show that it achieves the asymptotic MMSE. Numerical results for phase retrieval and rectified linear regression show that spatially coupled designs can yield substantially lower MSE than i.i.d. Gaussian designs at finite dimensions when used with AMP algorithms.
Deformable Neural Radiance Fields using RGB and Event Cameras
Authors: Qi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Modeling Neural Radiance Fields for fast-moving deformable objects from visual data alone is a challenging problem. A major issue arises due to the high deformation and low acquisition rates. To address this problem, we propose to use event cameras that offer very fast acquisition of visual change in an asynchronous manner. In this work, we develop a novel method to model the deformable neural radiance fields using RGB and event cameras. The proposed method uses the asynchronous stream of events and calibrated sparse RGB frames. In our setup, the camera pose at the individual events required to integrate them into the radiance fields remains unknown. Our method jointly optimizes these poses and the radiance field. This happens efficiently by leveraging the collection of events at once and actively sampling the events during learning. Experiments conducted on both realistically rendered graphics and real-world datasets demonstrate a significant benefit of the proposed method over the state-of-the-art and the compared baseline. This shows a promising direction for modeling deformable neural radiance fields in real-world dynamic scenes.
A posteriori error control for fourth-order semilinear problems with quadratic nonlinearity
Abstract
A general a posteriori error analysis applies to five lowest-order finite element methods for two fourth-order semi-linear problems with trilinear non-linearity and a general source. A quasi-optimal smoother extends the source term to the discrete trial space, and more importantly, modifies the trilinear term in the stream-function vorticity formulation of the incompressible 2D Navier-Stokes and the von K\'{a}rm\'{a}n equations. This enables the first efficient and reliable a posteriori error estimates for the 2D Navier-Stokes equations in the stream-function vorticity formulation for Morley, two discontinuous Galerkin, $C^0$ interior penalty, and WOPSIP discretizations with piecewise quadratic polynomials.
SilverRetriever: Advancing Neural Passage Retrieval for Polish Question Answering
Authors: Piotr Rybak, Maciej Ogrodniczuk
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract
Modern open-domain question answering systems often rely on accurate and efficient retrieval components to find passages containing the facts necessary to answer the question. Recently, neural retrievers have gained popularity over lexical alternatives due to their superior performance. However, most of the work concerns popular languages such as English or Chinese. For others, such as Polish, few models are available. In this work, we present SilverRetriever, a neural retriever for Polish trained on a diverse collection of manually or weakly labeled datasets. SilverRetriever achieves much better results than other Polish models and is competitive with larger multilingual models. Together with the model, we open-source five new passage retrieval datasets.
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model
Authors: Phan The Duy, Nghi Hoang Khoa, Nguyen Huu Quyen, Le Cong Trinh, Vu Trung Kien, Trinh Minh Hoang, Van-Hau Pham
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
This paper presents VulnSense framework, a comprehensive approach to efficiently detect vulnerabilities in Ethereum smart contracts using a multimodal learning approach on graph-based and natural language processing (NLP) models. Our proposed framework combines three types of features from smart contracts comprising source code, opcode sequences, and control flow graph (CFG) extracted from bytecode. We employ Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Long Short-Term Memory (BiLSTM) and Graph Neural Network (GNN) models to extract and analyze these features. The final layer of our multimodal approach consists of a fully connected layer used to predict vulnerabilities in Ethereum smart contracts. Addressing limitations of existing vulnerability detection methods relying on single-feature or single-model deep learning techniques, our method surpasses accuracy and effectiveness constraints. We assess VulnSense using a collection of 1.769 smart contracts derived from the combination of three datasets: Curated, SolidiFI-Benchmark, and Smartbugs Wild. We then make a comparison with various unimodal and multimodal learning techniques contributed by GNN, BiLSTM and BERT architectures. The experimental outcomes demonstrate the superior performance of our proposed approach, achieving an average accuracy of 77.96\% across all three categories of vulnerable smart contracts.
Doeblin Coefficients and Related Measures
Authors: Anuran Makur, Japneet Singh
Subjects: Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST)
Abstract
Doeblin coefficients are a classical tool for analyzing the ergodicity and exponential convergence rates of Markov chains. Propelled by recent works on contraction coefficients of strong data processing inequalities, we investigate whether Doeblin coefficients also exhibit some of the notable properties of canonical contraction coefficients. In this paper, we present several new structural and geometric properties of Doeblin coefficients. Specifically, we show that Doeblin coefficients form a multi-way divergence, exhibit tensorization, and possess an extremal trace characterization. We then show that they also have extremal coupling and simultaneously maximal coupling characterizations. By leveraging these characterizations, we demonstrate that Doeblin coefficients act as a nice generalization of the well-known total variation (TV) distance to a multi-way divergence, enabling us to measure the "distance" between multiple distributions rather than just two. We then prove that Doeblin coefficients exhibit contraction properties over Bayesian networks similar to other canonical contraction coefficients. We additionally derive some other results and discuss an application of Doeblin coefficients to distribution fusion. Finally, in a complementary vein, we introduce and discuss three new quantities: max-Doeblin coefficient, max-DeGroot distance, and min-DeGroot distance. The max-Doeblin coefficient shares a connection with the concept of maximal leakage in information security; we explore its properties and provide a coupling characterization. On the other hand, the max-DeGroot and min-DeGroot measures extend the concept of DeGroot distance to multiple distributions.
A Spiking Binary Neuron -- Detector of Causal Links
Abstract
Causal relationship recognition is a fundamental operation in neural networks aimed at learning behavior, action planning, and inferring external world dynamics. This operation is particularly crucial for reinforcement learning (RL). In the context of spiking neural networks (SNNs), events are represented as spikes emitted by network neurons or input nodes. Detecting causal relationships within these events is essential for effective RL implementation. This research paper presents a novel approach to realize causal relationship recognition using a simple spiking binary neuron. The proposed method leverages specially designed synaptic plasticity rules, which are both straightforward and efficient. Notably, our approach accounts for the temporal aspects of detected causal links and accommodates the representation of spiking signals as single spikes or tight spike sequences (bursts), as observed in biological brains. Furthermore, this study places a strong emphasis on the hardware-friendliness of the proposed models, ensuring their efficient implementation on modern and future neuroprocessors. Being compared with precise machine learning techniques, such as decision tree algorithms and convolutional neural networks, our neuron demonstrates satisfactory accuracy despite its simplicity. In conclusion, we introduce a multi-neuron structure capable of operating in more complex environments with enhanced accuracy, making it a promising candidate for the advancement of RL applications in SNNs.
P-ROCKET: Pruning Random Convolution Kernels for Time Series Classification
Authors: Shaowu Chen, Weize Sun, Lei Huang, Xiaopeng Li, Qingyuan Wang, Deepu John
Abstract
In recent years, two time series classification models, ROCKET and MINIROCKET, have attracted much attention for their low training cost and state-of-the-art accuracy. Utilizing random 1-D convolutional kernels without training, ROCKET and MINIROCKET can rapidly extract features from time series data, allowing for the efficient fitting of linear classifiers. However, to comprehensively capture useful features, a large number of random kernels are required, which is incompatible for resource-constrained devices. Therefore, a heuristic evolutionary algorithm named S-ROCKET is devised to recognize and prune redundant kernels. Nevertheless, the inherent nature of evolutionary algorithms renders the evaluation of kernels within S-ROCKET an unacceptable time-consuming process. In this paper, diverging from S-ROCKET, which directly evaluates random kernels with nonsignificant differences, we remove kernels from a feature selection perspective by eliminating associating connections in the sequential classification layer. To this end, we start by formulating the pruning challenge as a Group Elastic Net classification problem and employ the ADMM method to arrive at a solution. Sequentially, we accelerate the aforementioned time-consuming solving process by bifurcating the $l_{2,1}$ and $l_2$ regularizations into two sequential stages and solve them separately, which ultimately forms our core algorithm, named P-ROCKET. Stage 1 of P-ROCKET employs group-wise regularization similarly to our initial ADMM-based Algorithm, but introduces dynamically varying penalties to greatly accelerate the process. To mitigate overfitting, Stage 2 of P-ROCKET implements element-wise regularization to refit a linear classifier, utilizing the retained features.
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Authors: Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Pre-trained vision transformers have strong representation benefits to various downstream tasks. Recently, many parameter-efficient fine-tuning (PEFT) methods have been proposed, and their experiments demonstrate that tuning only 1% of extra parameters could surpass full fine-tuning in low-data resource scenarios. However, these methods overlook the task-specific information when fine-tuning diverse downstream tasks. In this paper, we propose a simple yet effective method called "Salient Channel Tuning" (SCT) to leverage the task-specific information by forwarding the model with the task images to select partial channels in a feature map that enables us to tune only 1/8 channels leading to significantly lower parameter costs. Experiments outperform full fine-tuning on 18 out of 19 tasks in the VTAB-1K benchmark by adding only 0.11M parameters of the ViT-B, which is 780$\times$ fewer than its full fine-tuning counterpart. Furthermore, experiments on domain generalization and few-shot learning surpass other PEFT methods with lower parameter costs, demonstrating our proposed tuning technique's strong capability and effectiveness in the low-data regime.
Breathing New Life into 3D Assets with Generative Repainting
Authors: Tianfu Wang, Menelaos Kanakis, Konrad Schindler, Luc Van Gool, Anton Obukhov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Broad adoption of these models is due to significant improvement in the quality of generations and efficient conditioning on various modalities, not just text. However, lifting the rich generative priors of these 2D models into 3D is challenging. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools and demonstrate their ability to work together in a non-learned fashion. Such modularity has the intrinsic advantage of eased partial upgrades, which became an important property in such a fast-paced domain. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools, and outputs a painted input geometry in several formats. We conduct a large-scale study on a wide range of objects and categories from the ShapeNetSem dataset and demonstrate the advantages of our approach, both qualitatively and quantitatively. Project page: https://www.obukhov.ai/repainting_3d_assets
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Authors: Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
Abstract
In this paper, we propose methods to build a powerful and efficient Image-to-Speech captioning (Im2Sp) model. To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp. We set the output of the proposed Im2Sp as discretized speech units, i.e., the quantized speech features of a self-supervised speech model. The speech units mainly contain linguistic information while suppressing other characteristics of speech. This allows us to incorporate the language modeling capability of the pre-trained vision-language model into the spoken language modeling of Im2Sp. With the vision-language pre-training strategy, we set new state-of-the-art Im2Sp performances on two widely used benchmark databases, COCO and Flickr8k. Then, we further improve the efficiency of the Im2Sp model. Similar to the speech unit case, we convert the original image into image units, which are derived through vector quantization of the raw image. With these image units, we can drastically reduce the required data storage for saving image data to just 0.8% when compared to the original image data in terms of bits. Demo page: https://ms-dot-k.github.io/Image-to-Speech-Captioning.
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Authors: Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 9 datasets spanning language understanding and generation tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation by up to 25% and 14% respectively. Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.
Quadcopter Trajectory Time Minimization and Robust Collision Avoidance via Optimal Time Allocation
Abstract
Autonomous navigation requires robots to generate trajectories for collision avoidance efficiently. Although plenty of previous works have proven successful in generating smooth and spatially collision-free trajectories, their solutions often suffer from suboptimal time efficiency and potential unsafety, particularly when accounting for uncertainties in robot perception and control. To address this issue, this paper presents the Robust Optimal Time Allocation (ROTA) framework. This framework is designed to optimize the time progress of the trajectories temporally, serving as a post-processing tool to enhance trajectory time efficiency and safety under uncertainties. In this study, we begin by formulating a non-convex optimization problem aimed at minimizing trajectory execution time while incorporating constraints on collision probability as the robot approaches obstacles. Subsequently, we introduce the concept of the trajectory braking zone and adopt the chance-constrained formulation for robust collision avoidance in the braking zones. Finally, the non-convex optimization problem is reformulated into a second-order cone programming problem to achieve real-time performance. Through simulations and physical flight experiments, we demonstrate that the proposed approach effectively reduces trajectory execution time while enabling robust collision avoidance in complex environments.
Efficient and robust Sensor Placement in Complex Environments
Authors: Lukas Taus, Yen-Hsi Richard Tsai
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Optimization and Control (math.OC)
Abstract
We address the problem of efficient and unobstructed surveillance or communication in complex environments. On one hand, one wishes to use a minimal number of sensors to cover the environment. On the other hand, it is often important to consider solutions that are robust against sensor failure or adversarial attacks. This paper addresses these challenges of designing minimal sensor sets that achieve multi-coverage constraints -- every point in the environment is covered by a prescribed number of sensors. We propose a greedy algorithm to achieve the objective. Further, we explore deep learning techniques to accelerate the evaluation of the objective function formulated in the greedy algorithm. The training of the neural network reveals that the geometric properties of the data significantly impact the network's performance, particularly at the end stage. By taking into account these properties, we discuss the differences in using greedy and $\epsilon$-greedy algorithms to generate data and their impact on the robustness of the network.
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
Abstract
The pursuit of long-term autonomy mandates that robotic agents must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing for robotic applications as they are space efficient and typically do not increase in computational complexity as the number of tasks grows. Despite these desirable properties, prior-based approaches typically fail on important benchmarks and consequently are limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, leading to lower catastrophic forgetting. Our method boasts a range of desirable properties for robotic applications such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.
HINT: Healthy Influential-Noise based Training to Defend against Data Poisoning Attacks
Abstract
While numerous defense methods have been proposed to prohibit potential poisoning attacks from untrusted data sources, most research works only defend against specific attacks, which leaves many avenues for an adversary to exploit. In this work, we propose an efficient and robust training approach to defend against data poisoning attacks based on influence functions, named Healthy Influential-Noise based Training. Using influence functions, we craft healthy noise that helps to harden the classification model against poisoning attacks without significantly affecting the generalization ability on test data. In addition, our method can perform effectively when only a subset of the training data is modified, instead of the current method of adding noise to all examples that has been used in several previous works. We conduct comprehensive evaluations over two image datasets with state-of-the-art poisoning attacks under different realistic attack scenarios. Our empirical results show that HINT can efficiently protect deep learning models against the effect of both untargeted and targeted poisoning attacks.
Augmenting conformers with structured state space models for online speech recognition
Authors: Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4), which are a family of models that provide a parameter-efficient way of accessing arbitrarily long left context. We perform systematic ablation studies to compare variants of S4 models and propose two novel approaches that combine them with convolutions. We find that the most effective design is to stack a small S4 using real-valued recurrent weights with a local convolution, allowing them to work complementarily. Our best model achieves WERs of 4.01%/8.53% on test sets from Librispeech, outperforming Conformers with extensively tuned convolution.
A Bayesian Approach to Robust Inverse Reinforcement Learning
Authors: Ran Wei, Siliang Zeng, Chenliang Li, Alfredo Garcia, Anthony McDonald, Mingyi Hong
Abstract
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL). The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics. We make use of a class of prior distributions which parameterizes how accurate the expert's model of the environment is to develop efficient algorithms to estimate the expert's reward and subjective dynamics in high-dimensional settings. Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed (a priori) to have a highly accurate model of the environment. We verify this observation in the MuJoCo environments and show that our algorithms outperform state-of-the-art offline IRL algorithms.
Lamination-based efficient treatment of weak discontinuities for non-conforming finite element meshes
Abstract
When modelling discontinuities (interfaces) using the finite element method, the standard approach is to use a conforming finite-element mesh in which the mesh matches the interfaces. However, this approach can prove cumbersome if the geometry is complex, in particular in 3D. In this work, we develop an efficient technique for a non-conforming finite-element treatment of weak discontinuities by using laminated microstructures. The approach is inspired by the so-called composite voxel technique that has been developed for FFT-based spectral solvers in computational homogenization. The idea behind the method is rather simple. Each finite element that is cut by an interface is treated as a simple laminate with the volume fraction of the phases and the lamination orientation determined in terms of the actual geometrical arrangement of the interface within the element. The approach is illustrated by several computational examples relevant to the micromechanics of heterogeneous materials. Elastic and elastic-plastic materials at small and finite strain are considered in the examples. The performance of the proposed method is compared to two alternative, simple methods showing that the new approach is in most cases superior to them while maintaining the simplicity.
ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer
Abstract
While state-of-the-art language models excel at the style transfer task, current work does not address explainability of style transfer systems. Explanations could be generated using large language models such as GPT-3.5 and GPT-4, but the use of such complex systems is inefficient when smaller, widely distributed, and transparent alternatives are available. We propose a framework to augment and improve a formality style transfer dataset with explanations via model distillation from ChatGPT. To further refine the generated explanations, we propose a novel way to incorporate scarce expert human feedback using in-context learning (ICLEF: In-Context Learning from Expert Feedback) by prompting ChatGPT to act as a critic to its own outputs. We use the resulting dataset of 9,960 explainable formality style transfer instances (e-GYAFC) to show that current openly distributed instruction-tuned models (and, in some settings, ChatGPT) perform poorly on the task, and that fine-tuning on our high-quality dataset leads to significant improvements as shown by automatic evaluation. In human evaluation, we show that models much smaller than ChatGPT fine-tuned on our data align better with expert preferences. Finally, we discuss two potential applications of models fine-tuned on the explainable style transfer task: interpretable authorship verification and interpretable adversarial attacks on AI-generated text detectors.
Phase field method for quasi-static hydro-fracture in porous media under stress boundary condition considering the effect of initial stress field
Abstract
Phase field model (PFM) is an efficient fracture modeling method and has high potential for hydraulic fracturing (HF). However, the current PFMs in HF do not consider well the effect of in-situ stress field and the numerical examples of porous media with stress boundary conditions were rarely presented. The main reason is that if the remote stress is applied on the boundaries of the calculation domain, there will be relatively large deformation induced on these stress boundaries, which is not consistent with the engineering observations. To eliminate this limitation, this paper proposes a new phase field method to describe quasi-static hydraulic fracture propagation in porous media subjected to stress boundary conditions, and the new method is more in line with engineering practice. A new energy functional, which considers the effect of initial in-situ stress field, is established and then it is used to achieve the governing equations for the displacement and phase fields through the variational approach. Biot poroelasticity theory is used to couple the fluid pressure field and the displacement field while the phase field is used for determining the fluid properties from the intact domain to the fully broken domain. In addition, we present several 2D and 3D examples to show the effects of in-situ stress on hydraulic fracture propagation. The numerical examples indicate that under stress boundary condition our approach obtains correct displacement distribution and it is capable of capturing complex hydraulic fracture growth patterns.
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
Authors: Fabien Delattre, David Dirnfeld, Phat Nguyen, Stephen Scarano, Michael J. Jones, Pedro Miraldo, Erik Learned-Miller
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. While camera rotation estimation is a well-studied problem, no previous methods exhibit both high accuracy and acceptable speed in this setting. Because the setting is not addressed well by other datasets, we provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences. Methods developed for wide baseline stereo (e.g., 5-point methods) perform poorly on monocular video. On the other hand, methods used in autonomous driving (e.g., SLAM) leverage specific sensor setups, specific motion models, or local optimization strategies (lagging batch processing) and do not generalize well to handheld video. Finally, for dynamic scenes, commonly used robustification techniques like RANSAC require large numbers of iterations, and become prohibitively slow. We introduce a novel generalization of the Hough transform on SO(3) to efficiently and robustly find the camera rotation most compatible with optical flow. Among comparably fast methods, ours reduces error by almost 50\% over the next best, and is more accurate than any method, irrespective of speed. This represents a strong new performance point for crowded scenes, an important setting for computer vision. The code and the dataset are available at https://fabiendelattre.com/robust-rotation-estimation.
Neural Machine Translation Models Can Learn to be Few-shot Learners
Authors: Raphael Reinauer, Patrick Simianer, Kaden Uhlig, Johannes E. M. Mosig, Joern Wuebker
Abstract
The emergent ability of Large Language Models to use a small number of examples to learn to perform in novel domains and tasks, also called in-context learning (ICL). In this work, we show that a much smaller model can be trained to perform ICL by fine-tuning towards a specialized training objective, exemplified on the task of domain adaptation for neural machine translation. With this capacity for ICL, the model can take advantage of relevant few-shot examples to adapt its output towards the domain. We compare the quality of this domain adaptation to traditional supervised techniques and ICL with a 40B-parameter Large Language Model. Our approach allows efficient batch inference on a mix of domains and outperforms state-of-the-art baselines in terms of both translation quality and immediate adaptation rate, i.e. the ability to reproduce a specific term after being shown a single example.
Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion
Abstract
Event cameras offer many advantages over standard cameras due to their distinctive principle of operation: low power, low latency, high temporal resolution and high dynamic range. Nonetheless, the success of many downstream visual applications also hinges on an efficient and effective scene representation, where Neural Radiance Field (NeRF) is seen as the leading candidate. Such promise and potential of event cameras and NeRF inspired recent works to investigate on the reconstruction of NeRF from moving event cameras. However, these works are mainly limited in terms of the dependence on dense and low-noise event streams, as well as generalization to arbitrary contrast threshold values and camera speed profiles. In this work, we propose Robust e-NeRF, a novel method to directly and robustly reconstruct NeRFs from moving event cameras under various real-world conditions, especially from sparse and noisy events generated under non-uniform motion. It consists of two key components: a realistic event generation model that accounts for various intrinsic parameters (e.g. time-independent, asymmetric threshold and refractory period) and non-idealities (e.g. pixel-to-pixel threshold variation), as well as a complementary pair of normalized reconstruction losses that can effectively generalize to arbitrary speed profiles and intrinsic parameter values without such prior knowledge. Experiments on real and novel realistically simulated sequences verify our effectiveness. Our code, synthetic dataset and improved event simulator are public.
Keyword: faster
Traveling Waves Encode the Recent Past and Enhance Sequence Learning
Authors: T. Anderson Keller, Lyle Muller, Terrence Sejnowski, Max Welling
Abstract
Traveling waves of neural activity have been observed throughout the brain at a diversity of regions and scales; however, their precise computational role is still debated. One physically grounded hypothesis suggests that the cortical sheet may act like a wave-field capable of storing a short-term memory of sequential stimuli through induced waves traveling across the cortical surface. To date, however, the computational implications of this idea have remained hypothetical due to the lack of a simple recurrent neural network architecture capable of exhibiting such waves. In this work, we introduce a model to fill this gap, which we denote the Wave-RNN (wRNN), and demonstrate how both connectivity constraints and initialization play a crucial role in the emergence of wave-like dynamics. We then empirically show how such an architecture indeed efficiently encodes the recent past through a suite of synthetic memory tasks where wRNNs learn faster and perform significantly better than wave-free counterparts. Finally, we explore the implications of this memory storage system on more complex sequence modeling tasks such as sequential image classification and find that wave-based models not only again outperform comparable wave-free RNNs while using significantly fewer parameters, but additionally perform comparably to more complex gated architectures such as LSTMs and GRUs. We conclude with a discussion of the implications of these results for both neuroscience and machine learning.
MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU
Authors: Emre Adabag, Miloni Atal, William Gerard, Brian Plancher
Subjects: Robotics (cs.RO); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Nonlinear Model Predictive Control (NMPC) is a state-of-the-art approach for locomotion and manipulation which leverages trajectory optimization at each control step. While the performance of this approach is computationally bounded, implementations of direct trajectory optimization that use iterative methods to solve the underlying moderately-large and sparse linear systems, are a natural fit for parallel hardware acceleration. In this work, we introduce MPCGPU, a GPU-accelerated, real-time NMPC solver that leverages an accelerated preconditioned conjugate gradient (PCG) linear system solver at its core. We show that MPCGPU increases the scalability and real-time performance of NMPC, solving larger problems, at faster rates. In particular, for tracking tasks using the Kuka IIWA manipulator, MPCGPU is able to scale to kilohertz control rates with trajectories as long as 512 knot points. This is driven by a custom PCG solver which outperforms state-of-the-art, CPU-based, linear system solvers by at least 10x for a majority of solves and 3.6x on average.
Greedy Optimization of Resistance-based Graph Robustness with Global and Local Edge Insertions
Authors: Maria Predari (1), Lukas Berner (1), Robert Kooij (2 and 3), Henning Meyerhenke (1) ((1) Department of Computer Science, Humboldt-Universität zu Berlin, (2) Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, (3) UNIT ICT, Strategy & Policy, TNO (Netherlands Organisation for Applied Scientific Research))
Abstract
The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider two optimization problems of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i.e., is most robust) -- one where the new edges can be anywhere in the graph and one where the new edges need to be incident to a specified focus node. The total effective resistance and effective resistances between nodes can be computed using the pseudoinverse of the graph Laplacian. The pseudoinverse may be computed explicitly via pseudoinversion; yet, this takes cubic time in practice and quadratic space. We instead exploit combinatorial and algebraic connections to speed up gain computations in an established generic greedy heuristic. Moreover, we leverage existing randomized techniques to boost the performance of our approaches by introducing a sub-sampling step. Our different graph- and matrix-based approaches are indeed significantly faster than the state-of-the-art greedy algorithm, while their quality remains reasonably high and is often quite close. Our experiments show that we can now process larger graphs for which the application of the state-of-the-art greedy approach was impractical before.
Constraint-Free Structure Learning with Smooth Acyclic Orientations
Authors: Riccardo Massidda, Francesco Landolfi, Martina Cinquini, Davide Bacciu
Abstract
The structure learning problem consists of fitting data generated by a Directed Acyclic Graph (DAG) to correctly reconstruct its arcs. In this context, differentiable approaches constrain or regularize the optimization problem using a continuous relaxation of the acyclicity property. The computational cost of evaluating graph acyclicity is cubic on the number of nodes and significantly affects scalability. In this paper we introduce COSMO, a constraint-free continuous optimization scheme for acyclic structure learning. At the core of our method, we define a differentiable approximation of an orientation matrix parameterized by a single priority vector. Differently from previous work, our parameterization fits a smooth orientation matrix and the resulting acyclic adjacency matrix without evaluating acyclicity at any step. Despite the absence of explicit constraints, we prove that COSMO always converges to an acyclic solution. In addition to being asymptotically faster, our empirical analysis highlights how COSMO performance on graph reconstruction compares favorably with competing structure learning methods.
Keyword: mobile
Efficiently Identifying Hotspots in a Spatially Varying Field with Multiple Robots
Abstract
In this paper, we present algorithms to identify environmental hotspots using mobile sensors. We examine two approaches: one involving a single robot and another using multiple robots coordinated through a decentralized robot system. We introduce an adaptive algorithm that does not require precise knowledge of Gaussian Processes (GPs) hyperparameters, making the modeling process more flexible. The robots operate for a pre-defined time in the environment. The multi-robot system uses Voronoi partitioning to divide tasks and a Monte Carlo Tree Search for optimal path planning. Our tests on synthetic and a real-world dataset of Chlorophyll density from a Pacific Ocean sub-region suggest that accurate estimation of GP hyperparameters may not be essential for hotspot detection, potentially simplifying environmental monitoring tasks.
AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT
Authors: Fangbo Qin, Taogang Hou, Shan Lin, Kaiyuan Wang, Michael C. Yip, Shan Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbitrary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for generalizable and transferable feature extraction, which is followed by training-free feature enhancement. The best-prototype pairs (BPPs) are searched for in support and query images based on appearance similarity, to yield instance-unaware candidate keypoints.Then, the entire graph with all candidate keypoints as vertices are divided to sub-graphs according to the feature distributions on the graph edges. Finally, each sub-graph represents an object instance. AnyOKP is evaluated on real object images collected with the cameras of a robot arm, a mobile robot, and a surgical robot, which not only demonstrates the cross-category flexibility and instance awareness, but also show remarkable robustness to domain shift and viewpoint change.
A Testbed for Automating and Analysing Mobile Devices and their Applications
Abstract
The need for improved network situational awareness has been highlighted by the growing complexity and severity of cyber-attacks. Mobile phones pose a significant risk to network situational awareness due to their dynamic behaviour and lack of visibility on a network. Machine learning techniques enhance situational awareness by providing administrators insight into the devices and activities which form their network. Developing machine learning techniques for situational awareness requires a testbed to generate and label network traffic. Current testbeds, however, are unable to automate the generation and labelling of realistic network traffic. To address this, we describe a testbed which automates applications on mobile devices to generate and label realistic traffic. From this testbed, two labelled datasets of network traffic have been created. We provide an analysis of the testbed automation reliability and benchmark the datasets for the task of application classification.
AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness
Authors: Liyao Jiang, Chenglin Li, Haolan Chen, Xiaodong Gao, Xinwang Zhong, Yang Qiu, Shani Ye, Di Niu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Online advertisements are important elements in e-commerce sites, social media platforms, and search engines. With the increasing popularity of mobile browsing, many online ads are displayed with visual information in the form of a cover image in addition to text descriptions to grab the attention of users. Various recent studies have focused on predicting the click rates of online advertisements aware of visual features or composing optimal advertisement elements to enhance visibility. In this paper, we propose Advertisement Style Editing and Attractiveness Enhancement (AdSEE), which explores whether semantic editing to ads images can affect or alter the popularity of online advertisements. We introduce StyleGAN-based facial semantic editing and inversion to ads images and train a click rate predictor attributing GAN-based face latent representations in addition to traditional visual and textual features to click rates. Through a large collected dataset named QQ-AD, containing 20,527 online ads, we perform extensive offline tests to study how different semantic directions and their edit coefficients may impact click rates. We further design a Genetic Advertisement Editor to efficiently search for the optimal edit directions and intensity given an input ad cover image to enhance its projected click rates. Online A/B tests performed over a period of 5 days have verified the increased click-through rates of AdSEE-edited samples as compared to a control group of original ads, verifying the relation between image styles and ad popularity. We open source the code for AdSEE research at https://github.com/LiyaoJiang1998/adsee.
TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification
Authors: Yiqiang Cai, Peihong Zhang, Shengchen Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
Recent studies focus on developing efficient systems for acoustic scene classification (ASC) using convolutional neural networks (CNNs), which typically consist of consecutive kernels. This paper highlights the benefits of using separate kernels as a more powerful and efficient design approach in ASC tasks. Inspired by the time-frequency nature of audio signals, we propose TF-SepNet, a CNN architecture that separates the feature processing along the time and frequency dimensions. Features resulted from the separate paths are then merged by channels and directly forwarded to the classifier. Instead of the conventional two dimensional (2D) kernel, TF-SepNet incorporates one dimensional (1D) kernels to reduce the computational costs. Experiments have been conducted using the TAU Urban Acoustic Scene 2022 Mobile development dataset. The results show that TF-SepNet outperforms similar state-of-the-arts that use consecutive kernels. A further investigation reveals that the separate kernels lead to a larger effective receptive field (ERF), which enables TF-SepNet to capture more time-frequency features.
PRIEST: Projection Guided Sampling-Based Optimization For Autonomous Navigation
Abstract
Efficient navigation in unknown and dynamic environments is crucial for expanding the application domain of mobile robots. The core challenge stems from the nonavailability of a feasible global path for guiding optimization-based local planners. As a result, existing local planners often get trapped in poor local minima. In this paper, we present a novel optimizer that can explore multiple homotopies to plan high-quality trajectories over long horizons while still being fast enough for real-time applications. We build on the gradient-free paradigm by augmenting the trajectory sampling strategy with a projection optimization that guides the samples toward a feasible region. As a result, our approach can recover from the frequently encountered pathological cases wherein all the sampled trajectories lie in the high-cost region. Furthermore, we also show that our projection optimization has a highly parallelizable structure that can be easily accelerated over GPUs. We push the state-of-the-art in the following respects. Over the navigation stack of the Robot Operating System (ROS), we show an improvement of 7-13% in success rate and up to two times in total travel time metric. On the same benchmarks and metrics, our approach achieves up to 44% improvement over MPPI and its recent variants. On simple point-to-point navigation tasks, our optimizer is up to two times more reliable than SOTA gradient-based solvers, as well as sampling-based approaches such as the Cross-Entropy Method (CEM) and VPSTO. Codes: https://github.com/fatemeh-rastgar/PRIEST
Human-Inspired Topological Representations for Visual Object Recognition in Unseen Environments
Authors: Ekta U. Samani, Ashis G. Banerjee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. Toward this goal, we extend our previous work to propose the TOPS2 descriptor, and an accompanying recognition framework, THOR2, inspired by a human reasoning mechanism known as object unity. We interleave color embeddings obtained using the Mapper algorithm for topological soft clustering with the shape-based TOPS descriptor to obtain the TOPS2 descriptor. THOR2, trained using synthetic data, achieves substantially higher recognition accuracy than the shape-based THOR framework and outperforms RGB-D ViT on two real-world datasets: the benchmark OCID dataset and the UW-IS Occluded dataset. Therefore, THOR2 is a promising step toward achieving robust recognition in low-cost robots.
Optimal Mobility and Communication Strategy to Maximize the Value of Information in IoT Networks
Authors: Zijing Wang, Mihai-Alin Badiu, Justin P. Coon
Abstract
The Internet of Things (IoT) is an emerging next-generation technology in the fourth industrial revolution. In industrial IoT networks, sensing devices are largely deployed to monitor various types of physical processes. They are required to transmit the collected data in a timely manner to support real-time monitoring, control and automation. The timeliness of information is very important in such systems. Recently, an information-theoretic metric named the "value of information" (VoI) has been proposed to measure the usefulness of information. In this work, we consider an industrial IoT network with a set of heterogeneous sensing devices and an intelligent mobile entity. The concept of the value of information is applied to study a joint path planning and user scheduling optimisation problem. We aim to maximise the network-level VoI under mobility and communication constraints. We formulate this problem as a Markov decision process (MDP), and an efficient algorithm based on reinforcement learning is proposed to solve this problem. Through numerical results, we show that the proposed method is able to capture the usefulness of data from both time and space dimensions. By exploiting the correlation property of the data source, the proposed method is suitable for applications in resource-limited networks.
Object-oriented mapping in dynamic environments
Authors: Matti Pekkanen, Francesco Verdoja, Ville Kyrki
Abstract
Grid maps, especially occupancy grid maps, are ubiquitous in many mobile robot applications. To simplify the process of learning the map, grid maps subdivide the world into a grid of cells, whose occupancies are independently estimated using only measurements in the perceptual field of the particular cell. However, the world consists of objects that span multiple cells, which means that measurements falling onto a cell provide evidence on the occupancy of other cells belonging to the same object. This correlation is not captured by current models. In this work, we present a way to generalize the update of grid maps relaxing the assumption of independence by modeling the relationship between the measurements and the occupancy of each cell as a set of latent variables, and jointly estimating those variables and the posterior of the map. Additionally, we propose a method to estimate the latent variables by clustering based on semantic labels and an extension to the Normal Distributions Transfer Occupancy Map (NDT-OM) to facilitate the proposed map update method. We perform comprehensive experiments of map creation and localization with real world data sets, and show that the proposed method creates better maps in highly dynamic environments compared to state-of-the-art methods. Finally, we demonstrate the ability of the proposed method to remove occluded objects from the map in a lifelong map update scenario.
Abstract
Invariance describes transformations that do not alter data's underlying semantics. Neural networks that preserve natural invariance capture good inductive biases and achieve superior performance. Hence, modern networks are handcrafted to handle well-known invariances (ex. translations). We propose a framework to learn novel network architectures that capture data-dependent invariances via pruning. Our learned architectures consistently outperform dense neural networks on both vision and tabular datasets in both efficiency and effectiveness. We demonstrate our framework on multiple deep learning models across 3 vision and 40 tabular datasets.
P-ROCKET: Pruning Random Convolution Kernels for Time Series Classification
Authors: Shaowu Chen, Weize Sun, Lei Huang, Xiaopeng Li, Qingyuan Wang, Deepu John
Abstract
In recent years, two time series classification models, ROCKET and MINIROCKET, have attracted much attention for their low training cost and state-of-the-art accuracy. Utilizing random 1-D convolutional kernels without training, ROCKET and MINIROCKET can rapidly extract features from time series data, allowing for the efficient fitting of linear classifiers. However, to comprehensively capture useful features, a large number of random kernels are required, which is incompatible for resource-constrained devices. Therefore, a heuristic evolutionary algorithm named S-ROCKET is devised to recognize and prune redundant kernels. Nevertheless, the inherent nature of evolutionary algorithms renders the evaluation of kernels within S-ROCKET an unacceptable time-consuming process. In this paper, diverging from S-ROCKET, which directly evaluates random kernels with nonsignificant differences, we remove kernels from a feature selection perspective by eliminating associating connections in the sequential classification layer. To this end, we start by formulating the pruning challenge as a Group Elastic Net classification problem and employ the ADMM method to arrive at a solution. Sequentially, we accelerate the aforementioned time-consuming solving process by bifurcating the $l_{2,1}$ and $l_2$ regularizations into two sequential stages and solve them separately, which ultimately forms our core algorithm, named P-ROCKET. Stage 1 of P-ROCKET employs group-wise regularization similarly to our initial ADMM-based Algorithm, but introduces dynamically varying penalties to greatly accelerate the process. To mitigate overfitting, Stage 2 of P-ROCKET implements element-wise regularization to refit a linear classifier, utilizing the retained features.
Keyword: diffusion
Generative AI
Authors: Stefan Feuerriegel, Jochen Hartmann, Christian Janiesch, Patrick Zschech
Abstract
The term "generative AI" refers to computational techniques that are capable of generating seemingly new, meaningful content such as text, images, or audio from training data. The widespread diffusion of this technology with examples such as Dall-E 2, GPT-4, and Copilot is currently revolutionizing the way we work and communicate with each other. In this article, we provide a conceptualization of generative AI as an entity in socio-technical systems and provide examples of models, systems, and applications. Based on that, we introduce limitations of current generative AI and provide an agenda for Business & Information Systems Engineering (BISE) research. Different from previous works, we focus on generative AI in the context of information systems, and, to this end, we discuss several opportunities and challenges that are unique to the BISE community and make suggestions for impactful directions for BISE research.
Text-to-Image Models for Counterfactual Explanations: a Black-Box Approach
Abstract
This paper addresses the challenge of generating Counterfactual Explanations (CEs), involving the identification and modification of the fewest necessary features to alter a classifier's prediction for a given image. Our proposed method, Text-to-Image Models for Counterfactual Explanations (TIME), is a black-box counterfactual technique based on distillation. Unlike previous methods, this approach requires solely the image and its prediction, omitting the need for the classifier's structure, parameters, or gradients. Before generating the counterfactuals, TIME introduces two distinct biases into Stable Diffusion in the form of textual embeddings: the context bias, associated with the image's structure, and the class bias, linked to class-specific features learned by the target classifier. After learning these biases, we find the optimal latent code applying the classifier's predicted class token and regenerate the image using the target embedding as conditioning, producing the counterfactual explanation. Extensive empirical studies validate that TIME can generate explanations of comparable effectiveness even when operating within a black-box setting.
Abstract
Text-to-image diffusion models understand spatial relationship between objects, but do they represent the true 3D structure of the world from only 2D supervision? We demonstrate that yes, 3D knowledge is encoded in 2D image diffusion models like Stable Diffusion, and we show that this structure can be exploited for 3D vision tasks. Our method, Viewpoint Neural Textual Inversion (ViewNeTI), controls the 3D viewpoint of objects in generated images from frozen diffusion models. We train a small neural mapper to take camera viewpoint parameters and predict text encoder latents; the latents then condition the diffusion generation process to produce images with the desired camera viewpoint. ViewNeTI naturally addresses Novel View Synthesis (NVS). By leveraging the frozen diffusion model as a prior, we can solve NVS with very few input views; we can even do single-view novel view synthesis. Our single-view NVS predictions have good semantic details and photorealism compared to prior methods. Our approach is well suited for modeling the uncertainty inherent in sparse 3D vision problems because it can efficiently generate diverse samples. Our view-control mechanism is general, and can even change the camera view in images generated by user-defined prompts.
Abstract
The challenge in fine-grained visual categorization lies in how to explore the subtle differences between different subclasses and achieve accurate discrimination. Previous research has relied on large-scale annotated data and pre-trained deep models to achieve the objective. However, when only a limited amount of samples is available, similar methods may become less effective. Diffusion models have been widely adopted in data augmentation due to their outstanding diversity in data generation. However, the high level of detail required for fine-grained images makes it challenging for existing methods to be directly employed. To address this issue, we propose a novel approach termed the detail reinforcement diffusion model~(DRDM), which leverages the rich knowledge of large models for fine-grained data augmentation and comprises two key components including discriminative semantic recombination (DSR) and spatial knowledge reference~(SKR). Specifically, DSR is designed to extract implicit similarity relationships from the labels and reconstruct the semantic mapping between labels and instances, which enables better discrimination of subtle differences between different subclasses. Furthermore, we introduce the SKR module, which incorporates the distributions of different datasets as references in the feature space. This allows the SKR to aggregate the high-dimensional distribution of subclass features in few-shot FGVC tasks, thus expanding the decision boundary. Through these two critical components, we effectively utilize the knowledge from large models to address the issue of data scarcity, resulting in improved performance for fine-grained visual recognition tasks. Extensive experiments demonstrate the consistent performance gain offered by our DRDM.
Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models
Authors: Feihong He, Gang Li, Lingyu Si, Leilei Yan, Shimeng Hou, Hongwei Dong, Fanzhang Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image cartoonization has attracted significant interest in the field of image generation. However, most of the existing image cartoonization techniques require re-training models using images of cartoon style. In this paper, we present CartoonDiff, a novel training-free sampling approach which generates image cartoonization using diffusion transformer models. Specifically, we decompose the reverse process of diffusion models into the semantic generation phase and the detail generation phase. Furthermore, we implement the image cartoonization process by normalizing high-frequency signal of the noisy image in specific denoising steps. CartoonDiff doesn't require any additional reference images, complex model designs, or the tedious adjustment of multiple parameters. Extensive experimental results show the powerful ability of our CartoonDiff. The project page is available at: https://cartoondiff.github.io/
Unsupervised Disentangling of Facial Representations with 3D-aware Latent Diffusion Models
Authors: Ruian He, Zhen Xing, Weimin Tan, Bo Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Unsupervised learning of facial representations has gained increasing attention for face understanding ability without heavily relying on large-scale annotated datasets. However, it remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. Prior methods primarily focus on 2D factors and pixel-level consistency, leading to incomplete disentangling and suboptimal performance in downstream tasks. In this paper, we propose LatentFace, a novel unsupervised disentangling framework for facial expression and identity representation. We suggest the disentangling problem should be performed in latent space and propose the solution using a 3D-ware latent diffusion model. First, we introduce a 3D-aware autoencoder to encode face images into 3D latent embeddings. Second, we propose a novel representation diffusion model (RDM) to disentangle 3D latent into facial identity and expression. Consequently, our method achieves state-of-the-art performance in facial expression recognition and face verification among unsupervised facial representation learning models.
A Finite-Volume Scheme for Fractional Diffusion on Bounded Domains
Authors: Rafael Bailo, José A. Carrillo, Stefano Fronzoni, David Gómez-Castro
Abstract
We propose a new fractional Laplacian for bounded domains, expressed as a conservation law and thus particularly suited to finite-volume schemes. Our approach permits the direct prescription of no-flux boundary conditions. We first show the well-posedness theory for the fractional heat equation. We also develop a numerical scheme, which correctly captures the action of the fractional Laplacian and its anomalous diffusion effect. We benchmark numerical solutions for the L\'evy-Fokker-Planck equation against known analytical solutions. We conclude by numerically exploring properties of these equations with respect to their stationary states and long-time asymptotics.
Large Intestine 3D Shape Refinement Using Point Diffusion Models for Digital Phantom Generation
Authors: Kaouther Mouheb, Mobina Ghojogh Nejad, Lavsen Dahal, Ehsan Samei, W. Paul Segars, Joseph Y. Lo
Abstract
Accurate 3D modeling of human organs plays a crucial role in building computational phantoms for virtual imaging trials. However, generating anatomically plausible reconstructions of organ surfaces from computed tomography scans remains challenging for many structures in the human body. This challenge is particularly evident when dealing with the large intestine. In this study, we leverage recent advancements in geometric deep learning and denoising diffusion probabilistic models to refine the segmentation results of the large intestine. We begin by representing the organ as point clouds sampled from the surface of the 3D segmentation mask. Subsequently, we employ a hierarchical variational autoencoder to obtain global and local latent representations of the organ's shape. We train two conditional denoising diffusion models in the hierarchical latent space to perform shape refinement. To further enhance our method, we incorporate a state-of-the-art surface reconstruction model, allowing us to generate smooth meshes from the obtained complete point clouds. Experimental results demonstrate the effectiveness of our approach in capturing both the global distribution of the organ's shape and its fine details. Our complete refinement pipeline demonstrates remarkable enhancements in surface representation compared to the initial segmentation, reducing the Chamfer distance by 70%, the Hausdorff distance by 32%, and the Earth Mover's distance by 6%. By combining geometric deep learning, denoising diffusion models, and advanced surface reconstruction techniques, our proposed method offers a promising solution for accurately modeling the large intestine's surface and can easily be extended to other anatomical structures.
Breathing New Life into 3D Assets with Generative Repainting
Authors: Tianfu Wang, Menelaos Kanakis, Konrad Schindler, Luc Van Gool, Anton Obukhov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Broad adoption of these models is due to significant improvement in the quality of generations and efficient conditioning on various modalities, not just text. However, lifting the rich generative priors of these 2D models into 3D is challenging. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools and demonstrate their ability to work together in a non-learned fashion. Such modularity has the intrinsic advantage of eased partial upgrades, which became an important property in such a fast-paced domain. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools, and outputs a painted input geometry in several formats. We conduct a large-scale study on a wide range of objects and categories from the ShapeNetSem dataset and demonstrate the advantages of our approach, both qualitatively and quantitatively. Project page: https://www.obukhov.ai/repainting_3d_assets
Denoising Diffusion Probabilistic Models for Hardware-Impaired Communications
Authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Generative AI has received significant attention among a spectrum of diverse industrial and academic domains, thanks to the magnificent results achieved from deep generative models such as generative pre-trained transformers (GPT) and diffusion models. In this paper, we explore the applications of denoising diffusion probabilistic models (DDPMs) in wireless communication systems under practical assumptions such as hardware impairments (HWI), low-SNR regime, and quantization error. Diffusion models are a new class of state-of-the-art generative models that have already showcased notable success with some of the popular examples by OpenAI and Google Brain. The intuition behind DDPM is to decompose the data generation process over small "denoising" steps. Inspired by this, we propose using denoising diffusion model-based receiver for a practical wireless communication scheme, while providing network resilience in low-SNR regimes, non-Gaussian noise, different HWI levels, and quantization error. We evaluate the reconstruction performance of our scheme in terms of bit error rate (BER) and mean-squared error (MSE). Our results show that 30% and 20% improvement in BER could be achieved compared to deep neural network (DNN)-based receivers in AWGN and non-Gaussian scenarios, respectively.
Compositional Foundation Models for Hierarchical Planning
Abstract
To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarchical Planning (HiP), a foundation model which leverages multiple expert foundation model trained on language, vision and action data individually jointly together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos. To enable effective reasoning within this hierarchy, we enforce consistency between the models via iterative refinement. We illustrate the efficacy and adaptability of our approach in three different long-horizon table-top manipulation tasks.
Keyword: adaptive
Efficiently Identifying Hotspots in a Spatially Varying Field with Multiple Robots
Abstract
In this paper, we present algorithms to identify environmental hotspots using mobile sensors. We examine two approaches: one involving a single robot and another using multiple robots coordinated through a decentralized robot system. We introduce an adaptive algorithm that does not require precise knowledge of Gaussian Processes (GPs) hyperparameters, making the modeling process more flexible. The robots operate for a pre-defined time in the environment. The multi-robot system uses Voronoi partitioning to divide tasks and a Monte Carlo Tree Search for optimal path planning. Our tests on synthetic and a real-world dataset of Chlorophyll density from a Pacific Ocean sub-region suggest that accurate estimation of GP hyperparameters may not be essential for hotspot detection, potentially simplifying environmental monitoring tasks.
MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces
Abstract
Due to their highly structured characteristics, faces are easier to recover than natural scenes for blind image super-resolution. Therefore, we can extract the degradation representation of an image from the low-quality and recovered face pairs. Using the degradation representation, realistic low-quality images can then be synthesized to fine-tune the super-resolution model for the real-world low-quality image. However, such a procedure is time-consuming and laborious, and the gaps between recovered faces and the ground-truths further increase the optimization uncertainty. To facilitate efficient model adaptation towards image-specific degradations, we propose a method dubbed MetaF2N, which leverages the contained Faces to fine-tune model parameters for adapting to the whole Natural image in a Meta-learning framework. The degradation extraction and low-quality image synthesis steps are thus circumvented in our MetaF2N, and it requires only one fine-tuning step to get decent performance. Considering the gaps between the recovered faces and ground-truths, we further deploy a MaskNet for adaptively predicting loss weights at different positions to reduce the impact of low-confidence areas. To evaluate our proposed MetaF2N, we have collected a real-world low-quality dataset with one or multiple faces in each image, and our MetaF2N achieves superior performance on both synthetic and real-world datasets. Source code, pre-trained models, and collected datasets are available at https://github.com/yinzhicun/MetaF2N.
Adaptive finite element approximation of sparse optimal control with integral fractional Laplacian
Abstract
In this paper we present and analyze a weighted residual a posteriori error estimate for an optimal control problem. The problem involves a nondifferentiable cost functional, a state equation with an integral fractional Laplacian, and control constraints. We employ subdifferentiation in the context of nondifferentiable convex analysis to obtain first-order optimality conditions. Piecewise linear polynomials are utilized to approximate the solutions of the state and adjoint equations. The control variable is discretized using the variational discretization method. Upper and lower bounds for the a posteriori error estimate of the finite element approximation of the optimal control problem are derived. In the region where 3/2 < alpha < 2, the residuals do not satisfy the L2(Omega) regularity. To address this issue, an additional weight is included in the weighted residual estimator, which is based on a power of the distance from the mesh skeleton. Furthermore, we propose an h-adaptive algorithm driven by the posterior view error estimator, utilizing the Dorfler labeling criterion. The convergence analysis results show that the approximation sequence generated by the adaptive algorithm converges at the optimal algebraic rate. Finally, numerical experiments are conducted to validate the theoretical results.
Abstract
The key challenge in image-text retrieval is effectively leveraging semantic information to measure the similarity between vision and language data. However, using instance-level binary labels, where each image is paired with a single text, fails to capture multiple correspondences between different semantic units, leading to uncertainty in multi-modal semantic understanding. Although recent research has captured fine-grained information through more complex model structures or pre-training techniques, few studies have directly modeled uncertainty of correspondence to fully exploit binary labels. To address this issue, we propose an Uncertainty-Aware Multi-View Visual Semantic Embedding (UAMVSE)} framework that decomposes the overall image-text matching into multiple view-text matchings. Our framework introduce an uncertainty-aware loss function (UALoss) to compute the weighting of each view-text loss by adaptively modeling the uncertainty in each view-text correspondence. Different weightings guide the model to focus on different semantic information, enhancing the model's ability to comprehend the correspondence of images and texts. We also design an optimized image-text matching strategy by normalizing the similarity matrix to improve model performance. Experimental results on the Flicker30k and MS-COCO datasets demonstrate that UAMVSE outperforms state-of-the-art models.
Astrocyte-Integrated Dynamic Function Exchange in Spiking Neural Networks
Authors: Murat Isik, Kayode Inadagbo
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
This paper presents an innovative methodology for improving the robustness and computational efficiency of Spiking Neural Networks (SNNs), a critical component in neuromorphic computing. The proposed approach integrates astrocytes, a type of glial cell prevalent in the human brain, into SNNs, creating astrocyte-augmented networks. To achieve this, we designed and implemented an astrocyte model in two distinct platforms: CPU/GPU and FPGA. Our FPGA implementation notably utilizes Dynamic Function Exchange (DFX) technology, enabling real-time hardware reconfiguration and adaptive model creation based on current operating conditions. The novel approach of leveraging astrocytes significantly improves the fault tolerance of SNNs, thereby enhancing their robustness. Notably, our astrocyte-augmented SNN displays near-zero latency and theoretically infinite throughput, implying exceptional computational efficiency. Through comprehensive comparative analysis with prior works, it's established that our model surpasses others in terms of neuron and synapse count while maintaining an efficient power consumption profile. These results underscore the potential of our methodology in shaping the future of neuromorphic computing, by providing robust and energy-efficient systems.
A New Adaptive Phase-locked Loop for Synchronization of a Grid-Connected Voltage Source Converter: Simulation and Experimental Results
Abstract
In [1] a new adaptive phase-locked loop scheme for synchronization of a grid connected voltage source converter with guaranteed (almost) global stability properties was reported. To guarantee a suitable synchronization with the angle of the three-phase grid voltage we design an adaptive observer for such a signal requiring measurements only at the point of common coupling. An interesting feature of this scheme is the ability to synchronize in the challenging condition of connection with a grid with reduced short-circuit ratio. In this paper we present some simulation and experimental illustration of the excellent performance of the proposed solution.
Two-fingered Hand with Gear-type Synchronization Mechanism with Magnet for Improved Small and Offset Objects Grasping: F2 Hand
Abstract
A problem that plagues robotic grasping is the misalignment of the object and gripper due to difficulties in precise localization, actuation, etc. Under-actuated robotic hands with compliant mechanisms are used to adapt and compensate for these inaccuracies. However, these mechanisms come at the cost of controllability and coordination. For instance, adaptive functions that let the fingers of a two-fingered gripper adapt independently may affect the coordination necessary for grasping small objects. In this work, we develop a two-fingered robotic hand capable of grasping objects that are offset from the gripper's center, while still having the requisite coordination for grasping small objects via a novel gear-type synchronization mechanism with a magnet. This gear synchronization mechanism allows the adaptive finger's tips to be aligned enabling it to grasp objects as small as toothpicks and washers. The magnetic component allows this coordination to automatically turn off when needed, allowing for the grasping of objects that are offset/misaligned from the gripper. This equips the hand with the capability of grasping light, fragile objects (strawberries, creampuffs, etc) to heavy frying pan lids, all while maintaining their position and posture which is vital in numerous applications that require precise positioning or careful manipulation.
Adaptive Priority Reweighing for Generalizing Fairness Improvement
Authors: Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Abstract
With the increasing penetration of machine learning applications in critical decision-making areas, calls for algorithmic fairness are more prominent. Although there have been various modalities to improve algorithmic fairness through learning with fairness constraints, their performance does not generalize well in the test set. A performance-promising fair algorithm with better generalizability is needed. This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability. Most previous reweighing methods propose to assign a unified weight for each (sub)group. Rather, our method granularly models the distance from the sample predictions to the decision boundary. Our adaptive reweighing method prioritizes samples closer to the decision boundary and assigns a higher weight to improve the generalizability of fair classifiers. Extensive experiments are performed to validate the generalizability of our adaptive priority reweighing method for accuracy and fairness measures (i.e., equal opportunity, equalized odds, and demographic parity) in tabular benchmarks. We also highlight the performance of our method in improving the fairness of language and vision models. The code is available at https://github.com/che2198/APW.
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
Abstract
The pursuit of long-term autonomy mandates that robotic agents must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing for robotic applications as they are space efficient and typically do not increase in computational complexity as the number of tasks grows. Despite these desirable properties, prior-based approaches typically fail on important benchmarks and consequently are limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, leading to lower catastrophic forgetting. Our method boasts a range of desirable properties for robotic applications such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.
Keyword: quantization
A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge
Authors: Longwei Huang, Chao Fang, Qiong Li, Jun Lin, Zhongfeng Wang
Abstract
Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts of energy, memory, and computing resources. However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle the challenges above, we propose a precision-scalable RISC-V DNN processor with on-device learning capability. It facilitates diverse precision levels of fixed-point DNN inference, spanning from 2-bit to 16-bit, and enhances on-device learning through improved support with FP16 operations. Moreover, we employ multiple methods such as FP16 multiplier reuse and multi-precision integer multiplier reuse, along with balanced mapping of FPGA resources, to significantly improve hardware resource utilization. Experimental results on the Xilinx ZCU102 FPGA show that our processor significantly improves inference throughput by 1.6$\sim$14.6$\times$ and energy efficiency by 1.1$\sim$14.6$\times$ across various DNNs, compared to the prior art, XpulpNN. Additionally, our processor achieves a 16.5$\times$ higher FP throughput for on-device learning.
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Authors: Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
Abstract
In this paper, we propose methods to build a powerful and efficient Image-to-Speech captioning (Im2Sp) model. To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp. We set the output of the proposed Im2Sp as discretized speech units, i.e., the quantized speech features of a self-supervised speech model. The speech units mainly contain linguistic information while suppressing other characteristics of speech. This allows us to incorporate the language modeling capability of the pre-trained vision-language model into the spoken language modeling of Im2Sp. With the vision-language pre-training strategy, we set new state-of-the-art Im2Sp performances on two widely used benchmark databases, COCO and Flickr8k. Then, we further improve the efficiency of the Im2Sp model. Similar to the speech unit case, we convert the original image into image units, which are derived through vector quantization of the raw image. With these image units, we can drastically reduce the required data storage for saving image data to just 0.8% when compared to the original image data in terms of bits. Demo page: https://ms-dot-k.github.io/Image-to-Speech-Captioning.
Denoising Diffusion Probabilistic Models for Hardware-Impaired Communications
Authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Generative AI has received significant attention among a spectrum of diverse industrial and academic domains, thanks to the magnificent results achieved from deep generative models such as generative pre-trained transformers (GPT) and diffusion models. In this paper, we explore the applications of denoising diffusion probabilistic models (DDPMs) in wireless communication systems under practical assumptions such as hardware impairments (HWI), low-SNR regime, and quantization error. Diffusion models are a new class of state-of-the-art generative models that have already showcased notable success with some of the popular examples by OpenAI and Google Brain. The intuition behind DDPM is to decompose the data generation process over small "denoising" steps. Inspired by this, we propose using denoising diffusion model-based receiver for a practical wireless communication scheme, while providing network resilience in low-SNR regimes, non-Gaussian noise, different HWI levels, and quantization error. We evaluate the reconstruction performance of our scheme in terms of bit error rate (BER) and mean-squared error (MSE). Our results show that 30% and 20% improvement in BER could be achieved compared to deep neural network (DNN)-based receivers in AWGN and non-Gaussian scenarios, respectively.
Keyword: efficient
An Automated and Efficient Aerodynamic Design and Analysis Framework Integrated to PANAIR
Improved Shortest Path Restoration Lemmas for Multiple Edge Failures: Trade-offs Between Fault-tolerance and Subpaths
Fast Safe Rectangular Corridor-based Online AGV Trajectory Optimization with Obstacle Avoidance
Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures
Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
Bipedal Walking on Constrained Footholds with MPC Footstep Control
Test Case Generation and Test Oracle Support for Testing CPSs using Hybrid Models
Efficient online update of model predictive control in embedded systems using first-order methods
Traveling Waves Encode the Recent Past and Enhance Sequence Learning
VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research
A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling
SSL-Net: A Synergistic Spectral and Learning-based Network for Efficient Bird Sound Classification
Low-rank Tensor Train Decomposition Using TensorSketch
MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness
Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval
FedJudge: Federated Legal Large Language Model
A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge
TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification
MetaOCaml Theory and Implementation
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Practical Program Repair via Preference-based Ensemble Strategy
Astrocyte-Integrated Dynamic Function Exchange in Spiking Neural Networks
PRIEST: Projection Guided Sampling-Based Optimization For Autonomous Navigation
A Real-time Faint Space Debris Detector With Learning-based LCM
Verifiable Privacy-Preserving Computing
Sampling-Free Probabilistic Deep State-Space Models
Structural Self-Supervised Objectives for Transformers
On Sparse Grid Interpolation for American Option Pricing with Multiple Underlying Assets
Optimal Mobility and Communication Strategy to Maximize the Value of Information in IoT Networks
Uncertainty-bounded Active Monitoring of Unknown Dynamic Targets in Road-networks with Minimum Fleet
An Efficient Wide-Range Pseudo-3D Vehicle Detection Using A Single Camera
Efficient Graphics Representation with Differentiable Indirection
Bayes-Optimal Estimation in Generalized Linear Models via Spatial Coupling
Deformable Neural Radiance Fields using RGB and Event Cameras
A posteriori error control for fourth-order semilinear problems with quadratic nonlinearity
SilverRetriever: Advancing Neural Passage Retrieval for Polish Question Answering
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model
Doeblin Coefficients and Related Measures
A Spiking Binary Neuron -- Detector of Causal Links
P-ROCKET: Pruning Random Convolution Kernels for Time Series Classification
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Breathing New Life into 3D Assets with Generative Repainting
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Quadcopter Trajectory Time Minimization and Robust Collision Avoidance via Optimal Time Allocation
Efficient and robust Sensor Placement in Complex Environments
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
HINT: Healthy Influential-Noise based Training to Defend against Data Poisoning Attacks
Augmenting conformers with structured state space models for online speech recognition
A Bayesian Approach to Robust Inverse Reinforcement Learning
Lamination-based efficient treatment of weak discontinuities for non-conforming finite element meshes
ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer
Phase field method for quasi-static hydro-fracture in porous media under stress boundary condition considering the effect of initial stress field
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
Neural Machine Translation Models Can Learn to be Few-shot Learners
Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion
Keyword: faster
Traveling Waves Encode the Recent Past and Enhance Sequence Learning
MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU
Greedy Optimization of Resistance-based Graph Robustness with Global and Local Edge Insertions
Constraint-Free Structure Learning with Smooth Acyclic Orientations
Keyword: mobile
Efficiently Identifying Hotspots in a Spatially Varying Field with Multiple Robots
AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT
A Testbed for Automating and Analysing Mobile Devices and their Applications
AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness
TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification
PRIEST: Projection Guided Sampling-Based Optimization For Autonomous Navigation
Human-Inspired Topological Representations for Visual Object Recognition in Unseen Environments
Optimal Mobility and Communication Strategy to Maximize the Value of Information in IoT Networks
Object-oriented mapping in dynamic environments
Keyword: pruning
Unveiling Invariances via Neural Network Pruning
P-ROCKET: Pruning Random Convolution Kernels for Time Series Classification
Keyword: diffusion
Generative AI
Text-to-Image Models for Counterfactual Explanations: a Black-Box Approach
Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions
Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models
Unsupervised Disentangling of Facial Representations with 3D-aware Latent Diffusion Models
A Finite-Volume Scheme for Fractional Diffusion on Bounded Domains
Large Intestine 3D Shape Refinement Using Point Diffusion Models for Digital Phantom Generation
Breathing New Life into 3D Assets with Generative Repainting
Denoising Diffusion Probabilistic Models for Hardware-Impaired Communications
Compositional Foundation Models for Hierarchical Planning
Keyword: adaptive
Efficiently Identifying Hotspots in a Spatially Varying Field with Multiple Robots
MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces
Adaptive finite element approximation of sparse optimal control with integral fractional Laplacian
Uncertainty-Aware Multi-View Visual Semantic Embedding
Astrocyte-Integrated Dynamic Function Exchange in Spiking Neural Networks
A New Adaptive Phase-locked Loop for Synchronization of a Grid-Connected Voltage Source Converter: Simulation and Experimental Results
Two-fingered Hand with Gear-type Synchronization Mechanism with Magnet for Improved Small and Offset Objects Grasping: F2 Hand
Adaptive Priority Reweighing for Generalizing Fairness Improvement
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
Keyword: quantization
A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Denoising Diffusion Probabilistic Models for Hardware-Impaired Communications