Abstract
Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models. \blfootnote{*Work done as a student researcher of Google, $\dagger$ indicates equal contribution.
Auto-ICL: In-Context Learning without Human Supervision
Authors: Jinghan Yang, Shuming Ma, Furu Wei
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
In the era of Large Language Models (LLMs), human-computer interaction has evolved towards natural language, offering unprecedented flexibility. Despite this, LLMs are heavily reliant on well-structured prompts to function efficiently within the realm of In-Context Learning. Vanilla In-Context Learning relies on human-provided contexts, such as labeled examples, explicit instructions, or other guiding mechanisms that shape the model's outputs. To address this challenge, our study presents a universal framework named Automatic In-Context Learning. Upon receiving a user's request, we ask the model to independently generate examples, including labels, instructions, or reasoning pathways. The model then leverages this self-produced context to tackle the given problem. Our approach is universally adaptable and can be implemented in any setting where vanilla In-Context Learning is applicable. We demonstrate that our method yields strong performance across a range of tasks, standing up well when compared to existing methods.
Adversarially Robust Spiking Neural Networks Through Conversion
Abstract
Spiking neural networks (SNNs) provide an energy-efficient alternative to a variety of artificial neural network (ANN) based AI applications. As the progress in neuromorphic computing with SNNs expands their use in applications, the problem of adversarial robustness of SNNs becomes more pronounced. To the contrary of the widely explored end-to-end adversarial training based solutions, we address the limited progress in scalable robust SNN training methods by proposing an adversarially robust ANN-to-SNN conversion algorithm. Our method provides an efficient approach to embrace various computationally demanding robust learning objectives that have been proposed for ANNs. During a post-conversion robust finetuning phase, our method adversarially optimizes both layer-wise firing thresholds and synaptic connectivity weights of the SNN to maintain transferred robustness gains from the pre-trained ANN. We perform experimental evaluations in numerous adaptive adversarial settings that account for the spike-based operation dynamics of SNNs, and show that our approach yields a scalable state-of-the-art solution for adversarially robust deep SNNs with low-latency.
Linear time Evidence Accumulation Clustering with KMeans
Abstract
Among ensemble clustering methods, Evidence Accumulation Clustering is one of the simplest technics. In this approach, a co-association (CA) matrix representing the co-clustering frequency is built and then clustered to extract consensus clusters. Compared to other approaches, this one is simple as there is no need to find matches between clusters obtained from two different partitionings. Nevertheless, this method suffers from computational issues, as it requires to compute and store a matrix of size n x n, where n is the number of items. Due to the quadratic cost, this approach is reserved for small datasets. This work describes a trick which mimic the behavior of average linkage clustering. We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity. Additionally, we proved that the k-means maximizes naturally the density. We performed experiments on several benchmark datasets where we compared the k-means and the bisecting version to other state-of-the-art consensus algorithms. The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low. Additionally, the k-means led to the best results in terms of density. These results provide evidence that consensus clustering can be solved with simple algorithms.
Lighter, yet More Faithful: Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
Authors: George Chrysostomou, Zhixue Zhao, Miles Williams, Nikolaos Aletras
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Despite their remarkable performance on abstractive summarization, large language models (LLMs) face two significant challenges: their considerable size and tendency to hallucinate. Hallucinations are concerning because they erode the reliability of LLMs and raise safety issues. Pruning is a technique that reduces model size by removing redundant weights to create sparse models that enable more efficient inference. Pruned models yield comparable performance to their counterpart full-sized models, making them ideal alternatives when operating on a limited budget. However, the effect that pruning has upon hallucinations in abstractive summarization with LLMs has yet to be explored. In this paper, we provide an extensive empirical study on the hallucinations produced by pruned models across three standard summarization tasks, two pruning approaches, three instruction-tuned LLMs, and three hallucination evaluation metrics. Surprisingly, we find that pruned LLMs hallucinate less compared to their full-sized counterparts. Our follow-up analysis suggests that pruned models tend to depend more on the source input and less on their parametric knowledge from pre-training for generation. This greater dependency on the source input leads to a higher lexical overlap between generated content and the source input, which can be a reason for the reduction in hallucinations.
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization
Authors: Alexandra Chronopoulou, Jonas Pfeiffer, Joshua Maynez, Xinyi Wang, Sebastian Ruder, Priyanka Agrawal
Abstract
Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task specialized parameters. Our method composes language and task PEFT modules via element-wise arithmetic operations to leverage unlabeled data and English labeled data. We extend our approach to cases where labeled data from more languages is available and propose to arithmetically compose PEFT modules trained on languages related to the target. Empirical results on summarization demonstrate that our method is an effective strategy that obtains consistent gains using minimal training of PEFT modules.
DISTA: Denoising Spiking Transformer with intrinsic plasticity and spatiotemporal attention
Authors: Boxun Xu, Hejia Geng, Yuxuan Yin, Peng Li
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Among the array of neural network architectures, the Vision Transformer (ViT) stands out as a prominent choice, acclaimed for its exceptional expressiveness and consistent high performance in various vision applications. Recently, the emerging Spiking ViT approach has endeavored to harness spiking neurons, paving the way for a more brain-inspired transformer architecture that thrives in ultra-low power operations on dedicated neuromorphic hardware. Nevertheless, this approach remains confined to spatial self-attention and doesn't fully unlock the potential of spiking neural networks. We introduce DISTA, a Denoising Spiking Transformer with Intrinsic Plasticity and SpatioTemporal Attention, designed to maximize the spatiotemporal computational prowess of spiking neurons, particularly for vision applications. DISTA explores two types of spatiotemporal attentions: intrinsic neuron-level attention and network-level attention with explicit memory. Additionally, DISTA incorporates an efficient nonlinear denoising mechanism to quell the noise inherent in computed spatiotemporal attention maps, thereby resulting in further performance gains. Our DISTA transformer undergoes joint training involving synaptic plasticity (i.e., weight tuning) and intrinsic plasticity (i.e., membrane time constant tuning) and delivers state-of-the-art performances across several static image and dynamic neuromorphic datasets. With only 6 time steps, DISTA achieves remarkable top-1 accuracy on CIFAR10 (96.26%) and CIFAR100 (79.15%), as well as 79.1% on CIFAR10-DVS using 10 time steps.
A Characteristic Mapping Method for Vlasov-Poisson with Extreme Resolution Properties
Authors: Philipp Krah, Xi-Yuan Yin, Julius Bergmann, Jean-Christophe Nave, Kai Schneider
Abstract
We propose an efficient semi-Lagrangian characteristic mapping method for solving the one+one-dimensional Vlasov-Poisson equations with high precision on a coarse grid. The flow map is evolved numerically and exponential resolution in linear time is obtained. Global third-order convergence in space and time is shown and conservation properties are assessed. For benchmarking, we consider linear and nonlinear Landau damping and the two-stream instability. We compare the results with a Fourier pseudo-spectral method. The extreme fine-scale resolution features are illustrated showing the method's capabilities to efficiently treat filamentation in fusion plasma simulations.
DeepMartNet -- A Martingale Based Deep Neural Network Learning Method for Dirichlet BVP and Eigenvalue Problems of Elliptic PDEs
Abstract
In this paper, we propose DeepMartNet - a Martingale based deep neural network learning method for solving Dirichlet boundary value problems (BVPs) and eigenvalue problems for elliptic partial differential equations (PDEs) in high dimensions. The method is based on Varadhan's Martingale problem formulation for the BVP/eigenvalue problems where a loss function enforcing the Martingale property for the PDE solution is used for efficient optimization by sampling the stochastic processes associated with elliptic operators. High dimensional numerical results for BVPs of the Poisson-Boltzmann equation and eigenvalue problems of a Fokker-Planck equation demonstrate the capability of the proposed DeepMartNet learning method for solving high dimensional PDE problems.
Near-Optimal Streaming Ellipsoidal Rounding for General Convex Polytopes
Authors: Yury Makarychev, Naren Sarayu Manoj, Max Ovsiankin
Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG)
Abstract
We give near-optimal algorithms for computing an ellipsoidal rounding of a convex polytope whose vertices are given in a stream. The approximation factor is linear in the dimension (as in John's theorem) and only loses an excess logarithmic factor in the aspect ratio of the polytope. Our algorithms are nearly optimal in two senses: first, their runtimes nearly match those of the most efficient known algorithms for the offline version of the problem. Second, their approximation factors nearly match a lower bound we show against a natural class of geometric streaming algorithms. In contrast to existing works in the streaming setting that compute ellipsoidal roundings only for centrally symmetric convex polytopes, our algorithms apply to general convex polytopes. We also show how to use our algorithms to construct coresets from a stream of points that approximately preserve both the ellipsoidal rounding and the convex hull of the original set of points.
A new stabilized time-spectral finite element solver for fast simulation of blood flow
Abstract
The increasing application of cardiorespiratory simulations for diagnosis and surgical planning necessitates the development of computational methods significantly faster than the current technology. To achieve this objective, we leverage the time-periodic nature of these flows by discretizing equations in the frequency domain instead of the time domain. This approach markedly reduces the size of the discrete problem and, consequently, the simulation cost. With this motivation, we introduce a finite element method for simulating time-periodic flows that are physically stable. The proposed time-spectral method is formulated by augmenting the baseline Galerkin's method with a least-squares penalty term. This penalty term is weighted by a positive-definite stabilization tensor, computed by solving an eigenvalue problem that involves the contraction of the velocity convolution matrix with the element metric tensor. The outcome is a formally stable residual-based method that emulates the standard time method when simulating steady flows. Consequently, it preserves the appealing properties of the standard method, including stability in strong convection and the convenient use of equal-order interpolation functions for velocity and pressure, among other benefits. This method is tested on a patient-specific Fontan model at nominal Reynolds and Womersley numbers of 500 and 10, respectively, demonstrating its ability to replicate conventional time simulation results using as few as 7 modes at 11\% of the computational cost. Owing to its higher local-to-processor computation density, the proposed method also exhibits improved parallel scalability, thereby enabling efficient utilization of computational resources for the rapid simulation of time-critical applications.
Center Focusing Network for Real-Time LiDAR Panoptic Segmentation
Abstract
LiDAR panoptic segmentation facilitates an autonomous vehicle to comprehensively understand the surrounding objects and scenes and is required to run in real time. The recent proposal-free methods accelerate the algorithm, but their effectiveness and efficiency are still limited owing to the difficulty of modeling non-existent instance centers and the costly center-based clustering modules. To achieve accurate and real-time LiDAR panoptic segmentation, a novel center focusing network (CFNet) is introduced. Specifically, the center focusing feature encoding (CFFE) is proposed to explicitly understand the relationships between the original LiDAR points and virtual instance centers by shifting the LiDAR points and filling in the center points. Moreover, to leverage the redundantly detected centers, a fast center deduplication module (CDM) is proposed to select only one center for each instance. Experiments on the SemanticKITTI and nuScenes panoptic segmentation benchmarks demonstrate that our CFNet outperforms all existing methods by a large margin and is 1.6 times faster than the most efficient method. The code is available at https://github.com/GangZhang842/CFNet.
Abstract
Task-oriented dialogue (ToD) systems help users execute well-defined tasks across a variety of domains (e.g., $\textit{flight booking}$ or $\textit{food ordering}$), with their Natural Language Understanding (NLU) components being dedicated to the analysis of user utterances, predicting users' intents ($\textit{Intent Detection}$, ID) and extracting values for informational slots ($\textit{Value Extraction}$, VE). In most domains, labelled NLU data is scarce, making sample-efficient learning -- enabled with effective transfer paradigms -- paramount. In this work, we introduce SQATIN, a new framework for dialog NLU based on (i) instruction tuning and (ii) question-answering-based formulation of ID and VE tasks. According to the evaluation on established NLU benchmarks, SQATIN sets the new state of the art in dialogue NLU, substantially surpassing the performance of current models based on standard fine-tuning objectives in both in-domain training and cross-domain transfer. SQATIN yields particularly large performance gains in cross-domain transfer, owing to the fact that our QA-based instruction tuning leverages similarities between natural language descriptions of classes (i.e., slots and intents) across domains.
LightEMU: Hardware Assisted Fuzzing of Trusted Applications
Abstract
Trusted Execution Environments (TEEs) are deployed in many CPU designs because of the confidentiality and integrity guarantees they provide. ARM TrustZone is a TEE extensively deployed on smart phones, IoT devices, and notebooks. Specifically, TrustZone is used to separate code execution and data into two worlds, normal world and secure world. However, this separation inherently prevents traditional fuzzing approaches which rely upon coverage-guided feedback and existing fuzzing research is, therefore, extremely limited. In this paper, we present a native and generic method to perform efficient and scalable feedback-driven fuzzing on Trusted Applications (TAs) using ARM CoreSight. We propose LightEMU, a novel fuzzing framework that allows us to fuzz TAs by decoupling them from relied TEE. We argue that LightEMU is a promising first-stage approach for rapidly discovering TA vulnerabilities prior to investing effort in whole system TEE evaluation precisely because the majority of publicly disclosed TrustZone bugs reside in the TA code itself. We implement LightEMU and adapt it to Teegris, Trusty, OP-TEE and QSEE and evaluate 8 real-world TAs while triggering 3 unique crashes and achieving x10 time speedup when fuzzing TAs using the state-of-the-art TrustZone fuzzing framework.
Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta
Authors: Wei Zhang, Dai Li, Chen Liang, Fang Zhou, Zhongke Zhang, Xuewei Wang, Ru Li, Yi Zhou, Yaning Huang, Dong Liang, Kai Wang, Zhangyuan Wang, Zhengxing Chen, Min Li, Fenggang Wu, Minghai Chen, Huayu Li, Yunnan Wu, Zhan Shu, Mindi Yuan, Sri Reddy
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user representation learning for each model impractical. To address these challenges, we present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models. SUM leverages a few designated upstream user models to synthesize user embeddings from massive amounts of user features with advanced modeling techniques. These embeddings then serve as inputs to downstream online ads ranking models, promoting efficient representation sharing. To adapt to the dynamic nature of user features and ensure embedding freshness, we designed SUM Online Asynchronous Platform (SOAP), a latency free online serving system complemented with model freshness and embedding stabilization, which enables frequent user model updates and online inference of user embeddings upon each user request. We share our hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. To date, SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and infrastructure cost savings.
Accelerating material discovery with a threshold-driven hybrid acquisition policy-based Bayesian optimization
Authors: Ahmed Shoyeb Raihan, Hamed Khosravi, Srinjoy Das, Imtiaz Ahmed
Abstract
Advancements in materials play a crucial role in technological progress. However, the process of discovering and developing materials with desired properties is often impeded by substantial experimental costs, extensive resource utilization, and lengthy development periods. To address these challenges, modern approaches often employ machine learning (ML) techniques such as Bayesian Optimization (BO), which streamline the search for optimal materials by iteratively selecting experiments that are most likely to yield beneficial results. However, traditional BO methods, while beneficial, often struggle with balancing the trade-off between exploration and exploitation, leading to sub-optimal performance in material discovery processes. This paper introduces a novel Threshold-Driven UCB-EI Bayesian Optimization (TDUE-BO) method, which dynamically integrates the strengths of Upper Confidence Bound (UCB) and Expected Improvement (EI) acquisition functions to optimize the material discovery process. Unlike the classical BO, our method focuses on efficiently navigating the high-dimensional material design space (MDS). TDUE-BO begins with an exploration-focused UCB approach, ensuring a comprehensive initial sweep of the MDS. As the model gains confidence, indicated by reduced uncertainty, it transitions to the more exploitative EI method, focusing on promising areas identified earlier. The UCB-to-EI switching policy dictated guided through continuous monitoring of the model uncertainty during each step of sequential sampling results in navigating through the MDS more efficiently while ensuring rapid convergence. The effectiveness of TDUE-BO is demonstrated through its application on three different material datasets, showing significantly better approximation and optimization performance over the EI and UCB-based BO methods in terms of the RMSE scores and convergence efficiency, respectively.
Comprehensive Evaluation and Insights into the Use of Deep Neural Networks to Detect and Quantify Lymphoma Lesions in PET/CT Images
Authors: Shadab Ahamed, Yixi Xu, Claire Gowdy, Joo H. O, Ingrid Bloise, Don Wilson, Patrick Martineau, François Bénard, Fereshteh Yousefirizi, Rahul Dodhia, Juan M. Lavista, William B. Weeks, Carlos F. Uribe, Arman Rahmim
Abstract
This study performs comprehensive evaluation of four neural network architectures (UNet, SegResNet, DynUNet, and SwinUNETR) for lymphoma lesion segmentation from PET/CT images. These networks were trained, validated, and tested on a diverse, multi-institutional dataset of 611 cases. Internal testing (88 cases; total metabolic tumor volume (TMTV) range [0.52, 2300] ml) showed SegResNet as the top performer with a median Dice similarity coefficient (DSC) of 0.76 and median false positive volume (FPV) of 4.55 ml; all networks had a median false negative volume (FNV) of 0 ml. On the unseen external test set (145 cases with TMTV range: [0.10, 2480] ml), SegResNet achieved the best median DSC of 0.68 and FPV of 21.46 ml, while UNet had the best FNV of 0.41 ml. We assessed reproducibility of six lesion measures, calculated their prediction errors, and examined DSC performance in relation to these lesion measures, offering insights into segmentation accuracy and clinical relevance. Additionally, we introduced three lesion detection criteria, addressing the clinical need for identifying lesions, counting them, and segmenting based on metabolic characteristics. We also performed expert intra-observer variability analysis revealing the challenges in segmenting easy'' vs.hard'' cases, to assist in the development of more resilient segmentation algorithms. Finally, we performed inter-observer agreement assessment underscoring the importance of a standardized ground truth segmentation protocol involving multiple expert annotators. Code is available at: https://github.com/microsoft/lymphoma-segmentation-dnn
Reconstructing Continuous Light Field From Single Coded Image
Abstract
We propose a method for reconstructing a continuous light field of a target scene from a single observed image. Our method takes the best of two worlds: joint aperture-exposure coding for compressive light-field acquisition, and a neural radiance field (NeRF) for view synthesis. Joint aperture-exposure coding implemented in a camera enables effective embedding of 3-D scene information into an observed image, but in previous works, it was used only for reconstructing discretized light-field views. NeRF-based neural rendering enables high quality view synthesis of a 3-D scene from continuous viewpoints, but when only a single image is given as the input, it struggles to achieve satisfactory quality. Our method integrates these two techniques into an efficient and end-to-end trainable pipeline. Trained on a wide variety of scenes, our method can reconstruct continuous light fields accurately and efficiently without any test time optimization. To our knowledge, this is the first work to bridge two worlds: camera design for efficiently acquiring 3-D information and neural rendering.
Zenkai -- Framework For Exploring Beyond Backpropagation
Abstract
Zenkai is an open-source framework designed to give researchers more control and flexibility over building and training deep learning machines. It does this by dividing the deep learning machine into layers of semi-autonomous learning machines with their own target and learning algorithm. This is to allow researchers greater exploration such as the use of non-differentiable layers or learning algorithms beyond those based on error backpropagation. Backpropagation Rumelhart et al. [1986] has powered deep learning to become one of the most exciting fields of the 21st century. As a result, a large number of software tools have been developed to support efficient implementation and training of neural networks through the use of backpropa- gation. While these have been critical to the success of deep learning, building frameworks around backpropagation can make it challenging to implement solutions that do not adhere to it. Zenkai aims to make it easier to get around these limitations and help researchers more easily explore new frontiers in deep learning that do not strictly adhere to the backpropagation framework.
MacGyver: Are Large Language Models Creative Problem Solvers?
Authors: Yufei Tian, Abhilasha Ravichander, Lianhui Qin, Ronan Le Bras, Raja Marjieh, Nanyun Peng, Yejin Choi, Thomas L. Griffiths, Faeze Brahman
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
We explore the creative problem-solving capabilities of modern large language models (LLMs) in a constrained setting. The setting requires circumventing a cognitive bias known in psychology as ''functional fixedness'' to use familiar objects in innovative or unconventional ways. To this end, we create MacGyver, an automatically generated dataset consisting of 1,600 real-world problems that deliberately trigger functional fixedness and require thinking 'out-of-the-box'. We then present our collection of problems to both LLMs and humans to compare and contrast their problem-solving abilities. We show that MacGyver is challenging for both groups, but in unique and complementary ways. For example, humans typically excel in solving problems that they are familiar with but may struggle with tasks requiring domain-specific knowledge, leading to a higher variance. On the other hand, LLMs, being exposed to a variety of highly specialized knowledge, attempt broader problems but are prone to overconfidence and propose actions that are physically infeasible or inefficient. We also provide a detailed error analysis of LLMs, and demonstrate the potential of enhancing their problem-solving ability with novel prompting techniques such as iterative step-wise reflection and divergent-convergent thinking. This work provides insight into the creative problem-solving capabilities of humans and AI and illustrates how psychological paradigms can be extended into large-scale tasks for comparing humans and machines.
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Abstract
Deep Neural Networks (DNNs) have shown excellent performance in a wide range of machine learning applications. Knowing the latency of running a DNN model or tensor program on a specific device is useful in various tasks, such as DNN graph- or tensor-level optimization and device selection. Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices. However, none of the existing attempts have achieved a cost model that can accurately predict the performance of various tensor programs while supporting both training and inference accelerators. We propose CDMPP, an efficient tensor program latency prediction framework for both cross-model and cross-device prediction. We design an informative but efficient representation of tensor programs, called compact ASTs, and a pre-order-based positional encoding method, to capture the internal structure of tensor programs. We develop a domain-adaption-inspired method to learn domain-invariant representations and devise a KMeans-based sampling algorithm, for the predictor to learn from different domains (i.e., different DNN operators and devices). Our extensive experiments on a diverse range of DNN models and devices demonstrate that CDMPP significantly outperforms state-of-the-art baselines with 14.03% and 10.85% prediction error for cross-model and cross-device prediction, respectively, and one order of magnitude higher training efficiency. The implementation and the expanded dataset are available at https://github.com/joapolarbear/cdmpp.
Outcome-supervised Verifiers for Planning in Mathematical Reasoning
Authors: Fei Yu, Anningzhe Gao, Benyou Wang
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
Large language models (LLMs) often struggle with maintaining accuracy across a sequence of intermediate reasoning steps in mathematical reasoning, leading to error propagation that undermines the final result. The current methodology to mitigate this issue primarily involves using a verifier model to assess the correctness of generated solution candidates, focusing either on the overall reasoning path or on an incomplete reasoning path. By rethinking this approach, we argue that assessing potentials of incomplete reasoning paths could be more advantageous as it guides towards correct final answers, transforming the task into a \textit{planning} problem. Our proposed verifier, the Outcome-supervision Value Model (OVM), employs outcome supervision for training, offering an efficient and intuitive method for \textit{planning} by prioritizing steps that lead to accurate conclusions over mere per-step correctness. Furthermore, the OVM eschews the need for labor-intensive annotations on step-level correctness, enhancing its scalability. Our experiments on two multi-step mathematical reasoning datasets, GSM8K and Game of 24, demonstrate the superior performance of the OVM model. Notably, in GSM8K, our \textbf{OVM-7B model achieves state-of-the-art results among LLMs up to 13B parameters}; especially it does not utilize GPT-4 or code execution. These findings offer a novel perspective on the role of outcome supervision in training verifiers for multi-step reasoning tasks and provide theoretical justification for its advantage in value estimation for planning.
Redefining the Laparoscopic Spatial Sense: AI-based Intra- and Postoperative Measurement from Stereoimages
Authors: Leopold Müller, Patrick Hemmer, Moritz Queisner, Igor Sauer, Simeon Allmendinger, Johannes Jakubik, Michael Vössing, Niklas Kühl
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
A significant challenge in image-guided surgery is the accurate measurement task of relevant structures such as vessel segments, resection margins, or bowel lengths. While this task is an essential component of many surgeries, it involves substantial human effort and is prone to inaccuracies. In this paper, we develop a novel human-AI-based method for laparoscopic measurements utilizing stereo vision that has been guided by practicing surgeons. Based on a holistic qualitative requirements analysis, this work proposes a comprehensive measurement method, which comprises state-of-the-art machine learning architectures, such as RAFT-Stereo and YOLOv8. The developed method is assessed in various realistic experimental evaluation environments. Our results outline the potential of our method achieving high accuracies in distance measurements with errors below 1 mm. Furthermore, on-surface measurements demonstrate robustness when applied in challenging environments with textureless regions. Overall, by addressing the inherent challenges of image-guided surgery, we lay the foundation for a more robust and accurate solution for intra- and postoperative measurements, enabling more precise, safe, and efficient surgical procedures.
DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics
Abstract
Diffusion models have advanced generative AI significantly in terms of editing and creating naturalistic images. However, efficiently improving generated image quality is still of paramount interest. In this context, we propose a generic "naturalness" preserving loss function, viz., kurtosis concentration (KC) loss, which can be readily applied to any standard diffusion model pipeline to elevate the image quality. Our motivation stems from the projected kurtosis concentration property of natural images, which states that natural images have nearly constant kurtosis values across different band-pass versions of the image. To retain the "naturalness" of the generated images, we enforce reducing the gap between the highest and lowest kurtosis values across the band-pass versions (e.g., Discrete Wavelet Transform (DWT)) of images. Note that our approach does not require any additional guidance like classifier or classifier-free guidance to improve the image quality. We validate the proposed approach for three diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, and (3) image super-resolution. Integrating the proposed KC loss has improved the perceptual quality across all these tasks in terms of both FID, MUSIQ score, and user evaluation.
How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?
Abstract
Pruning and quantization form the foundation of model compression for neural networks, enabling efficient inference for large language models (LLMs). Recently, various quantization and pruning techniques have demonstrated state-of-the-art performance in a post-training setting. They rely upon calibration data, a small set of unlabeled examples, to generate layer activations. However, no prior work has systematically investigated how the calibration data impacts the effectiveness of model compression methods. In this paper, we present the first extensive empirical study on the effect of calibration data upon LLM performance. We trial a variety of pruning and quantization methods, tasks, models, and datasets. Surprisingly, we find substantial variations in downstream task performance, contrasting existing work that suggests a greater level of robustness to the calibration data. Finally, we make a series of recommendations for the effective use of calibration data in LLM quantization and pruning.
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders
Authors: Hyunji Lee, Luca Soldaini, Arman Cohan, Minjoon Seo, Kyle Lo
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Abstract
Prevailing research practice today often relies on training dense retrievers on existing large datasets such as MSMARCO and then experimenting with ways to improve zero-shot generalization capabilities to unseen domains. While prior work has tackled this challenge through resource-intensive steps such as data augmentation, architectural modifications, increasing model size, or even further base model pretraining, comparatively little investigation has examined whether the training procedures themselves can be improved to yield better generalization capabilities in the resulting models. In this work, we recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives. We validate these recommendations using the BEIR benchmark and find results are persistent across choice of dense encoder and base model size and are complementary to other resource-intensive strategies for out-of-domain generalization such as architectural modifications or additional pretraining. We hope that this thorough and impartial study around various training techniques, which augments other resource-intensive methods, offers practical insights for developing a dense retrieval model that effectively generalizes, even when trained on a single dataset.
To be or not to be? an exploration of continuously controllable prompt engineering
Abstract
As the use of large language models becomes more widespread, techniques like parameter-efficient fine-tuning and other methods for controlled generation are gaining traction for customizing models and managing their outputs. However, the challenge of precisely controlling how prompts influence these models is an area ripe for further investigation. In response, we introduce ControlPE (Continuously Controllable Prompt Engineering). ControlPE enables finer adjustments to prompt effects, complementing existing prompt engineering, and effectively controls continuous targets. This approach harnesses the power of LoRA (Low-Rank Adaptation) to create an effect akin to prompt weighting, enabling fine-tuned adjustments to the impact of prompts. Our methodology involves generating specialized datasets for prompt distillation, incorporating these prompts into the LoRA model, and carefully adjusting LoRA merging weight to regulate the influence of prompts. This provides a dynamic and adaptable tool for prompt control. Through our experiments, we have validated the practicality and efficacy of ControlPE. It proves to be a promising solution for control a variety of prompts, ranging from generating short responses prompts, refusal prompts to chain-of-thought prompts.
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Abstract
Graph Neural Networks (GNNs) are becoming a promising technique in various domains due to their excellent capabilities in modeling non-Euclidean data. Although a spectrum of accelerators has been proposed to accelerate the inference of GNNs, our analysis demonstrates that the latency and energy consumption induced by DRAM access still significantly impedes the improvement of performance and energy efficiency. To address this issue, we propose a Memory-Efficient GNN Accelerator (MEGA) through algorithm and hardware co-design in this work. Specifically, at the algorithm level, through an in-depth analysis of the node property, we observe that the data-independent quantization in previous works is not optimal in terms of accuracy and memory efficiency. This motivates us to propose the Degree-Aware mixed-precision quantization method, in which a proper bitwidth is learned and allocated to a node according to its in-degree to compress GNNs as much as possible while maintaining accuracy. At the hardware level, we employ a heterogeneous architecture design in which the aggregation and combination phases are implemented separately with different dataflows. In order to boost the performance and energy efficiency, we also present an Adaptive-Package format to alleviate the storage overhead caused by the fine-grained bitwidth and diverse sparsity, and a Condense-Edge scheduling method to enhance the data locality and further alleviate the access irregularity induced by the extremely sparse adjacency matrix in the graph. We implement our MEGA accelerator in a 28nm technology node. Extensive experiments demonstrate that MEGA can achieve an average speedup of 38.3x, 7.1x, 4.0x, 3.6x and 47.6x, 7.2x, 5.4x, 4.5x energy savings over four state-of-the-art GNN accelerators, HyGCN, GCNAX, GROW, and SGCN, respectively, while retaining task accuracy.
Model Checking for Closed-Loop Robot Reactive Planning
Authors: Christopher Chandler (School of Computing Science, University of Glasgow), Bernd Porr (School of Biomedical Engineering, University of Glasgow), Alice Miller (School of Computing Science, University of Glasgow), Giulia Lafratta (School of Engineering, University of Glasgow)
Abstract
In this paper, we show how model checking can be used to create multi-step plans for a differential drive wheeled robot so that it can avoid immediate danger. Using a small, purpose built model checking algorithm in situ we generate plans in real-time in a way that reflects the egocentric reactive response of simple biological agents. Our approach is based on chaining temporary control systems which are spawned to eliminate disturbances in the local environment that disrupt an autonomous agent from its preferred action (or resting state). The method involves a novel discretization of 2D LiDAR data which is sensitive to bounded stochastic variations in the immediate environment. We operationalise multi-step planning using invariant checking by forward depth-first search, using a cul-de-sac scenario as a first test case. Our results demonstrate that model checking can be used to plan efficient trajectories for local obstacle avoidance, improving on the performance of a reactive agent which can only plan one step. We achieve this in near real-time using no pre-computed data. While our method has limitations, we believe our approach shows promise as an avenue for the development of safe, reliable and transparent trajectory planning in the context of autonomous vehicles.
How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!
Authors: Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang
Abstract
Collecting diverse human data on subjective NLP topics is costly and challenging. As Large Language Models (LLMs) have developed human-like capabilities, there is a recent trend in collaborative efforts between humans and LLMs for generating diverse data, offering potential scalable and efficient solutions. However, the extent of LLMs' capability to generate diverse perspectives on subjective topics remains an unexplored question. In this study, we investigate LLMs' capacity for generating diverse perspectives and rationales on subjective topics, such as social norms and argumentative texts. We formulate this problem as diversity extraction in LLMs and propose a criteria-based prompting technique to ground diverse opinions and measure perspective diversity from the generated criteria words. Our results show that measuring semantic diversity through sentence embeddings and distance metrics is not enough to measure perspective diversity. To see how far we can extract diverse perspectives from LLMs, or called diversity coverage, we employ a step-by-step recall prompting for generating more outputs from the model in an iterative manner. As we apply our prompting method to other tasks (hate speech labeling and story continuation), indeed we find that LLMs are able to generate diverse opinions according to the degree of task subjectivity.
EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction on Mobile Devices
Abstract
Reconstructing real-world 3D objects has numerous applications in computer vision, such as virtual reality, video games, and animations. Ideally, 3D reconstruction methods should generate high-fidelity results with 3D consistency in real-time. Traditional methods match pixels between images using photo-consistency constraints or learned features, while differentiable rendering methods like Neural Radiance Fields (NeRF) use surface-based representations or differentiable volume rendering to generate high-fidelity scenes. However, these methods require excessive runtime for rendering, making them impractical for daily applications. To address these challenges, we present $\textbf{EvaSurf}$, an $\textbf{E}$fficient $\textbf{V}$iew-$\textbf{A}$ware Implicit Textured $\textbf{Surf}$ace Reconstruction method on Mobile Devices. In our method, we first employ an efficient surface-based model with a multi-view supervision module to ensure accurate mesh creation. To enable high-fidelity rendering, we learn an implicit texture embedded with a set of Gaussian lobes to capture view-dependent information. Furthermore, With the explicit geometry and the implicit texture, we can employ a lightweight neural shader to reduce the expense of computation and further support real-time rendering on common mobile devices. Extensive experiments demonstrate that our method can reconstruct high-quality appearance and accurate mesh on both synthetic and real-world datasets. Moreover, our method can be trained in just 1-2 hours using a single GPU and run on mobile devices at over 40FPS (Frames Per Second), with a final package required for rendering taking up only 40-50 MB.
Abstract
Table-to-Text has been traditionally approached as a linear language to text problem. However, visually represented tables are rich in visual information and serve as a concise, effective form of representing data and its relationships. When using text-based approaches, after the linearization process, this information is either lost or represented in a space inefficient manner. This inefficiency has remained a constant challenge for text-based approaches making them struggle with large tables. In this paper, we demonstrate that image representation of tables are more space-efficient than the typical textual linearizations, and multi-modal approaches are competitive in Table-to-Text tasks. We present PixT3, a multimodal table-to-text model that outperforms the state-of-the-art (SotA) in the ToTTo benchmark in a pure Table-to-Text setting while remaining competitive in controlled Table-to-Text scenarios. It also generalizes better in unseen datasets, outperforming ToTTo SotA in all generation settings. Additionally, we introduce a new intermediate training curriculum to reinforce table structural awareness, leading to improved generation and overall faithfulness of the models.
Runtime Verification of Learning Properties for Reinforcement Learning Algorithms
Authors: Tommaso Mannucci (TNO -- Netherlands Organisation for Applied Scientific Research), Julio de Oliveira Filho (TNO -- Netherlands Organisation for Applied Scientific Research)
Abstract
Reinforcement learning (RL) algorithms interact with their environment in a trial-and-error fashion. Such interactions can be expensive, inefficient, and timely when learning on a physical system rather than in a simulation. This work develops new runtime verification techniques to predict when the learning phase has not met or will not meet qualitative and timely expectations. This paper presents three verification properties concerning the quality and timeliness of learning in RL algorithms. With each property, we propose design steps for monitoring and assessing the properties during the system's operation.
Stacked Intelligent Metasurface-Aided MIMO Transceiver Design
Authors: Jiancheng An, Chau Yuen, Chao Xu, Hongbin Li, Derrick Wing Kwan Ng, Marco Di Renzo, Mérouane Debbah, Lajos Hanzo
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Next-generation wireless networks are expected to utilize the limited radio frequency (RF) resources more efficiently with the aid of intelligent transceivers. To this end, we propose a promising transceiver architecture relying on stacked intelligent metasurfaces (SIM). An SIM is constructed by stacking an array of programmable metasurface layers, where each layer consists of a massive number of low-cost passive meta-atoms that individually manipulate the electromagnetic (EM) waves. By appropriately configuring the passive meta-atoms, an SIM is capable of accomplishing advanced computation and signal processing tasks, such as multiple-input multiple-output (MIMO) precoding/combining, multi-user interference mitigation, and radar sensing, as the EM wave propagates through the multiple layers of the metasurface, which effectively reduces both the RF-related energy consumption and processing delay. Inspired by this, we provide an overview of the SIM-aided MIMO transceiver design, which encompasses its hardware architecture and its potential benefits over state-of-the-art solutions. Furthermore, we discuss promising application scenarios and identify the open research challenges associated with the design of advanced SIM architectures for next-generation wireless networks. Finally, numerical results are provided for quantifying the benefits of wave-based signal processing in wireless systems.
Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation
Abstract
Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challenge, we propose in this paper a new semantic relay (SemRelay), which is equipped with a semantic receiver for assisting text transmission from a resource-abundant base station (BS) to a resource-constrained mobile device. Specifically, the SemRelay first decodes the semantic information sent by the BS (with a semantic transmitter) and then forwards it to the user by adopting conventional bit transmission, hence effectively improving the text transmission efficiency. We formulate an optimization problem to maximize the achievable (effective) bit rate by jointly designing the SemRelay placement and bandwidth allocation. Although this problem is non-convex and generally difficult to solve, we propose an efficient penalty-based algorithm to obtain a high-quality suboptimal solution. Numerical results show the close-to-optimal performance of the proposed algorithm as well as significant rate performance gain of the proposed SemRelay over conventional decode-and-forward relay.
Contribution Evaluation in Federated Learning: Examining Current Approaches
Authors: Vasilis Siomos, Jonathan Passerat-Palmbach
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)
Abstract
Federated Learning (FL) has seen increasing interest in cases where entities want to collaboratively train models while maintaining privacy and governance over their data. In FL, clients with private and potentially heterogeneous data and compute resources come together to train a common model without raw data ever leaving their locale. Instead, the participants contribute by sharing local model updates, which, naturally, differ in quality. Quantitatively evaluating the worth of these contributions is termed the Contribution Evaluation (CE) problem. We review current CE approaches from the underlying mathematical framework to efficiently calculate a fair value for each client. Furthermore, we benchmark some of the most promising state-of-the-art approaches, along with a new one we introduce, on MNIST and CIFAR-10, to showcase their differences. Designing a fair and efficient CE method, while a small part of the overall FL system design, is tantamount to the mainstream adoption of FL.
Fuel Saving Effect and Performance of Velocity Control for Modern Combustion-Powered Scooters
Abstract
This paper investigates the performance and fuel-saving effect of a velocity control algorithm on modern 50 cc scooters (Euro 5). The European Parliament has adopted major CO$_2$ emission reductions by 2030. But modern combustion- powered scooters are inefficiently restricted and emit unnecessary amounts of CO$_2$. Replacing the original restriction method with the system presented in this paper, the engine's operating point is being improved significantly. Therefore, a Throttle-by-Wire-System senses the rider's throttle command and manipulates the throttle valve. A redundant wheel speed sensor measures the precise vehicle velocity using the magneto-resistive principle. The entire system is managed by a central ECU, executing the actual velocity control, fail-safe functions, power supply and handling inputs/outputs. For velocity control, an adaptive PI-controller has been simulated, virtually tuned and implemented, limiting the max. velocity regulated by legal constraints (45 km/h). In this way, the environmentally harmful restrictors used today can be bypassed. By implementing a human-machine interface, including a virtual dashboard, the system is capable of interfacing with the rider. For evaluation purposes a measurement box has been developed, logging vehicle orientation, system/control variables and engine parameters. A Peugeot Kisbee 50 4T (Euro 5) is serving as test vehicle. Finally, the system has been evaluated regarding performance and fuel efficiency both through simulation and road testing. Fuel savings of 13.6 % in real-world test scenarios were achieved while maintaining vehicle performance.
The Software Genome Project: Venture to the Genomic Pathways of Open Source Software and Its Applications
Authors: Yueming Wu, Chengwei Liu, Yang Liu
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Abstract
With the boom in modern software development, open-source software has become an integral part of various industries, driving progress in computer science. However, the immense complexity and diversity of the open-source ecosystem also pose a series of challenges, including issues of quality, security, management, maintenance, compliance, and sustainability. Existing open-source governance approaches, while excelling in community building and collaboration, still face shortcomings in decentralized management, security, and maintenance. To address these challenges, inspired by the Human Genome Project, we treat the software source code as software DNA and propose the \textbf{Software Genome Project}, which is geared towards the secure monitoring and exploitation of open-source software. By identifying and labeling integrated and classified code features at a fine-grained level, and effectively identifying safeguards for functional implementations and non-functional requirements at different levels of granularity, Software Genome Project builds a complete set of software genome maps to help developers and managers gain a deeper understanding of software complexity and diversity. By dissecting and summarizing functional and undesirable genes, Software Genome Project helps facilitate targeted software remediation and optimization, provides valuable insight and understanding of the entire software ecosystem, and supports critical development tasks such as technology selection and open source governance. This project is expected to drive the evolution of software development towards more efficient, reliable, and sustainable software solutions.
A Physics-Informed Neural Network approach for compartmental epidemiological models
Abstract
Compartmental models provide simple and efficient tools to analyze the relevant transmission processes during an outbreak, to produce short-term forecasts or transmission scenarios, and to assess the impact of vaccination campaigns. However, their calibration is not straightforward, since many factors contribute to the rapid change of the transmission dynamics during an epidemic. For example, there might be changes in the individual awareness, the imposition of non-pharmacological interventions and the emergence of new variants. As a consequence, model parameters such as the transmission rate are doomed to change in time, making their assessment more challenging. Here, we propose to use Physics-Informed Neural Networks (PINNs) to track the temporal changes in the model parameters and provide an estimate of the model state variables. PINNs recently gained attention in many engineering applications thanks to their ability to consider both the information from data (typically uncertain) and the governing equations of the system. The ability of PINNs to identify unknown model parameters makes them particularly suitable to solve ill-posed inverse problems, such as those arising in the application of epidemiological models. Here, we develop a reduced-split approach for the implementation of PINNs to estimate the temporal changes in the state variables and transmission rate of an epidemic based on the SIR model equation and infectious data. The main idea is to split the training first on the epidemiological data, and then on the residual of the system equations. The proposed method is applied to five synthetic test cases and two real scenarios reproducing the first months of the COVID-19 Italian pandemic. Our results show that the split implementation of PINNs outperforms the standard approach in terms of accuracy (up to one order of magnitude) and computational times (speed up of 20%).
Match and Locate: low-frequency monocular odometry based on deep feature matching
Authors: Stepan Konev, Yuriy Biktairov
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Accurate and robust pose estimation plays a crucial role in many robotic systems. Popular algorithms for pose estimation typically rely on high-fidelity and high-frequency signals from various sensors. Inclusion of these sensors makes the system less affordable and much more complicated. In this work we introduce a novel approach for the robotic odometry which only requires a single camera and, importantly, can produce reliable estimates given even extremely low-frequency signal of around one frame per second. The approach is based on matching image features between the consecutive frames of the video stream using deep feature matching models. The resulting coarse estimate is then adjusted by a convolutional neural network, which is also responsible for estimating the scale of the transition, otherwise irretrievable using only the feature matching information. We evaluate the performance of the approach in the AISG-SLA Visual Localisation Challenge and find that while being computationally efficient and easy to implement our method shows competitive results with only around $3^{\circ}$ of orientation estimation error and $2m$ of translation estimation error taking the third place in the challenge.
A characterization of efficiently compilable constraint languages
Authors: Christoph Berkholz, Stefan Mengel, Hermann Wilhelm
Subjects: Logic in Computer Science (cs.LO); Computational Complexity (cs.CC)
Abstract
A central task in knowledge compilation is to compile a CNF-SAT instance into a succinct representation format that allows efficient operations such as testing satisfiability, counting, or enumerating all solutions. Useful representation formats studied in this area range from ordered binary decision diagrams (OBDDs) to circuits in decomposable negation normal form (DNNFs). While it is known that there exist CNF formulas that require exponential size representations, the situation is less well studied for other types of constraints than Boolean disjunctive clauses. The constraint satisfaction problem (CSP) is a powerful framework that generalizes CNF-SAT by allowing arbitrary sets of constraints over any finite domain. The main goal of our work is to understand for which type of constraints (also called the constraint language) it is possible to efficiently compute representations of polynomial size. We answer this question completely and prove two tight characterizations of efficiently compilable constraint languages, depending on whether target format is structured. We first identify the combinatorial property of strong blockwise decomposability'' and show that if a constraint language has this property, we can compute DNNF representations of linear size. For all other constraint languages we construct families of CSP-instances that provably require DNNFs of exponential size. For a subclass ofstrong uniformly blockwise decomposable'' constraint languages we obtain a similar dichotomy for structured DNNFs. In fact, strong (uniform) blockwise decomposability even allows efficient compilation into multi-valued analogs of OBDDs and FBDDs, respectively. Thus, we get complete characterizations for all knowledge compilation classes between O(B)DDs and DNNFs.
Visual Environment Assessment for Safe Autonomous Quadrotor Landing
Authors: Mattia Secchiero, Nishanth Bobbili, Yang Zhou, Giuseppe Loianno
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Autonomous identification and evaluation of safe landing zones are of paramount importance for ensuring the safety and effectiveness of aerial robots in the event of system failures, low battery, or the successful completion of specific tasks. In this paper, we present a novel approach for detection and assessment of potential landing sites for safe quadrotor landing. Our solution efficiently integrates 2D and 3D environmental information, eliminating the need for external aids such as GPS and computationally intensive elevation maps. The proposed pipeline combines semantic data derived from a Neural Network (NN), to extract environmental features, with geometric data obtained from a disparity map, to extract critical geometric attributes such as slope, flatness, and roughness. We define several cost metrics based on these attributes to evaluate safety, stability, and suitability of regions in the environments and identify the most suitable landing area. Our approach runs in real-time on quadrotors equipped with limited computational capabilities. Experimental results conducted in diverse environments demonstrate that the proposed method can effectively assess and identify suitable landing areas, enabling the safe and autonomous landing of a quadrotor.
A Computationally Efficient Sparsified Online Newton Method
Abstract
Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner. The algorithm emerges from a novel use of the LogDet matrix divergence measure; we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Empirically, we test our method on large scale benchmarks of up to 1B parameters. We achieve up to 30% faster convergence, 3.4% relative improvement in validation performance, and 80% relative improvement in training loss, in comparison to memory efficient optimizers including first order methods. Powering the method is a surprising fact -- imposing structured sparsity patterns, like tridiagonal and banded structure, requires little to no overhead, making it as efficient and parallelizable as first-order methods. In wall-clock time, tridiagonal SONew is only about 3% slower per step than first-order methods but gives overall gains due to much faster convergence. In contrast, one of the state-of-the-art (SOTA) memory-intensive second-order methods, Shampoo, is unable to scale to large benchmarks. Additionally, while Shampoo necessitates significant engineering efforts to scale to large benchmarks, SONew offers a more straightforward implementation, increasing its practical appeal. SONew code is available at: https://github.com/devvrit/SONew
JaxMARL: Multi-Agent RL Environments in JAX
Authors: Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract
Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl.
Adaptive Shells for Efficient Neural Radiance Field Rendering
Authors: Zian Wang, Tianchang Shen, Merlin Nimier-David, Nicholas Sharp, Jun Gao, Alexander Keller, Sanja Fidler, Thomas Müller, Zan Gojcic
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Neural radiance fields achieve unprecedented quality for novel view synthesis, but their volumetric formulation remains expensive, requiring a huge number of samples to render high-resolution images. Volumetric encodings are essential to represent fuzzy geometry such as foliage and hair, and they are well-suited for stochastic optimization. Yet, many scenes ultimately consist largely of solid surfaces which can be accurately rendered by a single sample per pixel. Based on this insight, we propose a neural radiance formulation that smoothly transitions between volumetric- and surface-based rendering, greatly accelerating rendering speed and even improving visual fidelity. Our method constructs an explicit mesh envelope which spatially bounds a neural volumetric representation. In solid regions, the envelope nearly converges to a surface and can often be rendered with a single sample. To this end, we generalize the NeuS formulation with a learned spatially-varying kernel size which encodes the spread of the density, fitting a wide kernel to volume-like regions and a tight kernel to surface-like regions. We then extract an explicit mesh of a narrow band around the surface, with width determined by the kernel size, and fine-tune the radiance field within this band. At inference time, we cast rays against the mesh and evaluate the radiance field only within the enclosed region, greatly reducing the number of samples required. Experiments show that our approach enables efficient rendering at very high fidelity. We also demonstrate that the extracted envelope enables downstream applications such as animation and simulation.
Keyword: faster
Improved Sparse Ising Optimization
Authors: Kenneth M. Zick
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn)
Abstract
Sparse Ising problems can be found in application areas such as logistics, condensed matter physics and training of deep Boltzmann networks, but can be very difficult to tackle with high efficiency and accuracy. This report presents new data demonstrating significantly higher performance on some longstanding benchmark problems with up to 20,000 variables. The data come from a new heuristic algorithm tested on the large sparse instances from the Gset benchmark suite. Relative to leading reported combinations of speed and accuracy (e.g., from Toshiba's Simulated Bifurcation Machine and Breakout Local Search), a proof-of-concept implementation reached targets 2-4 orders of magnitude faster. For two instances (G72 and G77) the new algorithm discovered a better solution than all previously reported values. Solution bitstrings confirming these two best solutions are provided. The data suggest exciting possibilities for pushing the sparse Ising performance frontier to potentially strengthen algorithm portfolios, AI toolkits and decision-making systems.
A new stabilized time-spectral finite element solver for fast simulation of blood flow
Abstract
The increasing application of cardiorespiratory simulations for diagnosis and surgical planning necessitates the development of computational methods significantly faster than the current technology. To achieve this objective, we leverage the time-periodic nature of these flows by discretizing equations in the frequency domain instead of the time domain. This approach markedly reduces the size of the discrete problem and, consequently, the simulation cost. With this motivation, we introduce a finite element method for simulating time-periodic flows that are physically stable. The proposed time-spectral method is formulated by augmenting the baseline Galerkin's method with a least-squares penalty term. This penalty term is weighted by a positive-definite stabilization tensor, computed by solving an eigenvalue problem that involves the contraction of the velocity convolution matrix with the element metric tensor. The outcome is a formally stable residual-based method that emulates the standard time method when simulating steady flows. Consequently, it preserves the appealing properties of the standard method, including stability in strong convection and the convenient use of equal-order interpolation functions for velocity and pressure, among other benefits. This method is tested on a patient-specific Fontan model at nominal Reynolds and Womersley numbers of 500 and 10, respectively, demonstrating its ability to replicate conventional time simulation results using as few as 7 modes at 11\% of the computational cost. Owing to its higher local-to-processor computation density, the proposed method also exhibits improved parallel scalability, thereby enabling efficient utilization of computational resources for the rapid simulation of time-critical applications.
Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation
Authors: Yifu Qiu, Varun Embar, Shay B. Cohen, Benjamin Han
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Neural knowledge-to-text generation models often struggle to faithfully generate descriptions for the input facts: they may produce hallucinations that contradict the given facts, or describe facts not present in the input. To reduce hallucinations, we propose a novel decoding method, TWEAK (Think While Effectively Articulating Knowledge). TWEAK treats the generated sequences at each decoding step and its future sequences as hypotheses, and ranks each generation candidate based on how well their corresponding hypotheses support the input facts using a Hypothesis Verification Model (HVM). We first demonstrate the effectiveness of TWEAK by using a Natural Language Inference (NLI) model as the HVM and report improved faithfulness with minimal impact on the quality. We then replace the NLI model with our task-specific HVM trained with a first-of-a-kind dataset, FATE (Fact-Aligned Textual Entailment), which pairs input facts with their faithful and hallucinated descriptions with the hallucinated spans marked. The new HVM improves the faithfulness and the quality further and runs faster. Overall the best TWEAK variants improve on average 2.22/7.17 points on faithfulness measured by FactKB over WebNLG and TekGen/GenWiki, respectively, with only 0.14/0.32 points degradation on quality measured by BERTScore over the same datasets. Since TWEAK is a decoding-only approach, it can be integrated with any neural generative model without retraining.
Center Focusing Network for Real-Time LiDAR Panoptic Segmentation
Abstract
LiDAR panoptic segmentation facilitates an autonomous vehicle to comprehensively understand the surrounding objects and scenes and is required to run in real time. The recent proposal-free methods accelerate the algorithm, but their effectiveness and efficiency are still limited owing to the difficulty of modeling non-existent instance centers and the costly center-based clustering modules. To achieve accurate and real-time LiDAR panoptic segmentation, a novel center focusing network (CFNet) is introduced. Specifically, the center focusing feature encoding (CFFE) is proposed to explicitly understand the relationships between the original LiDAR points and virtual instance centers by shifting the LiDAR points and filling in the center points. Moreover, to leverage the redundantly detected centers, a fast center deduplication module (CDM) is proposed to select only one center for each instance. Experiments on the SemanticKITTI and nuScenes panoptic segmentation benchmarks demonstrate that our CFNet outperforms all existing methods by a large margin and is 1.6 times faster than the most efficient method. The code is available at https://github.com/GangZhang842/CFNet.
Future Full-Ocean Deep SSPs Prediction based on Hierarchical Long Short-Term Memory Neural Networks
Abstract
The spatial-temporal distribution of underwater sound velocity affects the propagation mode of underwater acoustic signals. Therefore, rapid estimation and prediction of underwater sound velocity distribution is crucial for providing underwater positioning, navigation and timing (PNT) services. Currently, sound speed profile (SSP) inversion methods have a faster time response rate compared to direct measurement methods, however, most SSP inversion methods focus on constructing spatial dimensional sound velocity fields and are highly dependent on sonar observation data, thus high requirements have been placed on observation data sources. To explore the distribution pattern of sound velocity in the time dimension and achieve future SSP prediction without sonar observation data, we propose a hierarchical long short-term memory (H-LSTM) neural network for SSP prediction. By our SSP prediction method, the sound speed distribution could be estimated without any on-site data measurement process, so that the time efficiency could be greatly improved. Through comparing with other state-of-the-art methods, H-LSTM has better accuracy performance on prediction of monthly average sound velocity distribution, which is less than 1 m/s in different depth layers.
A Speed Odyssey for Deployable Quantization of LLMs
Authors: Qingyuan Li, Ran Meng, Yiduo Li, Bo Zhang, Liang Li, Yifan Lu, Xiangxiang Chu, Yerui Sun, Yuchen Xie
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
The large language model era urges faster and less costly inference. Prior model compression works on LLMs tend to undertake a software-centric approach primarily focused on the simulated quantization performance. By neglecting the feasibility of deployment, these approaches are typically disabled in real practice. They used to drastically push down the quantization bit range for a reduced computation which might not be supported by the mainstream hardware, or involve sophisticated algorithms that introduce extra computation or memory access overhead. We argue that pursuing a hardware-centric approach in the construction of quantization algorithms is crucial. In this regard, we are driven to build our compression method on top of hardware awareness, eliminating impractical algorithm choices while maximizing the benefit of hardware acceleration. Our method, OdysseyLLM, comes with a novel W4A8 kernel implementation called FastGEMM and a combined recipe of quantization strategies. Extensive experiments manifest the superiority of our W4A8 method which brings the actual speed boosting up to \textbf{4$\times$} compared to Hugging Face FP16 inference and \textbf{2.23$\times$} vs. the state-of-the-art inference engine TensorRT-LLM in FP16, and \textbf{1.45$\times$} vs. TensorRT-LLM in INT8, yet without substantially harming the performance.
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
Authors: Shivanshu Gupta, Clemens Rosenbaum, Ethan R. Elenberg
Abstract
Large language models (LLMs) have the ability to perform in-context learning (ICL) of new tasks by conditioning on prompts comprising a few task examples. This work studies the problem of selecting the best examples given a candidate pool to improve ICL performance on given a test input. Existing approaches either require training with feedback from a much larger LLM or are computationally expensive. We propose a novel metric, GistScore, based on Example Gisting, a novel approach for training example retrievers for ICL using an attention bottleneck via Gisting, a recent technique for compressing task instructions. To tradeoff performance with ease of use, we experiment with both fine-tuning gist models on each dataset and multi-task training a single model on a large collection of datasets. On 21 diverse datasets spanning 9 tasks, we show that our fine-tuned models get state-of-the-art ICL performance with 20% absolute average gain over off-the-shelf retrievers and 7% over the best prior methods. Our multi-task model generalizes well out-of-the-box to new task categories, datasets, and prompt templates with retrieval speeds that are consistently thousands of times faster than the best prior training-free method.
SynDiffix: More accurate synthetic structured data
Authors: Paul Francis, Cristian Berneanu, Edon Gashi
Abstract
This paper introduces SynDiffix, a mechanism for generating statistically accurate, anonymous synthetic data for structured data. Recent open source and commercial systems use Generative Adversarial Networks or Transformed Auto Encoders to synthesize data, and achieve anonymity through overfitting-avoidance. By contrast, SynDiffix exploits traditional mechanisms of aggregation, noise addition, and suppression among others. Compared to CTGAN, ML models generated from SynDiffix are twice as accurate, marginal and column pairs data quality is one to two orders of magnitude more accurate, and execution time is two orders of magnitude faster. Compared to the best commercial product we measured (MostlyAI), ML model accuracy is comparable, marginal and pairs accuracy is 5 to 10 times better, and execution time is an order of magnitude faster. Similar to the other approaches, SynDiffix anonymization is very strong. This paper describes SynDiffix and compares its performance with other popular open source and commercial systems.
LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters
Authors: Yibin Wu, Tiziano Guadagnino, Louis Wiesmann, Lasse Klingbeil, Cyrill Stachniss, Heiner Kuhlmann
Abstract
Odometry estimation is a key element for every autonomous system requiring navigation in an unknown environment. In modern mobile robots, 3D LiDAR-inertial systems are often used for this task. By fusing LiDAR scans and IMU measurements, these systems can reduce the accumulated drift caused by sequentially registering individual LiDAR scans and provide a robust pose estimate. Although effective, LiDAR-inertial odometry systems require proper parameter tuning to be deployed. In this paper, we propose LIO-EKF, a tightly-coupled LiDAR-inertial odometry system based on point-to-point registration and the classical extended Kalman filter scheme. We propose an adaptive data association that considers the relative pose uncertainty, the map discretization errors, and the LiDAR noise. In this way, we can substantially reduce the parameters to tune for a given type of environment. The experimental evaluation suggests that the proposed system performs on par with the state-of-the-art LiDAR-inertial odometry pipelines, but is significantly faster in computing the odometry.
Fast multiplication by two's complement addition of numbers represented as a set of polynomial radix 2 indexes, stored as an integer list for massively parallel computation
Authors: Mark Stocks
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Abstract
We demonstrate a multiplication method based on numbers represented as set of polynomial radix 2 indices stored as an integer list. The 'polynomial integer index multiplication' method is a set of algorithms implemented in python code. We demonstrate the method to be faster than both the Number Theoretic Transform (NTT) and Karatsuba for multiplication within a certain bit range. Also implemented in python code for comparison purposes with the polynomial radix 2 integer method. We demonstrate that it is possible to express any integer or real number as a list of integer indices, representing a finite series in base two. The finite series of integer index representation of a number can then be stored and distributed across multiple CPUs / GPUs. We show that operations of addition and multiplication can be applied as two's complement additions operating on the index integer representations and can be fully distributed across a given CPU / GPU architecture. We demonstrate fully distributed arithmetic operations such that the 'polynomial integer index multiplication' method overcomes the current limitation of parallel multiplication methods. Ie, the need to share common core memory and common disk for the calculation of results and intermediate results.
A Computationally Efficient Sparsified Online Newton Method
Abstract
Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner. The algorithm emerges from a novel use of the LogDet matrix divergence measure; we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Empirically, we test our method on large scale benchmarks of up to 1B parameters. We achieve up to 30% faster convergence, 3.4% relative improvement in validation performance, and 80% relative improvement in training loss, in comparison to memory efficient optimizers including first order methods. Powering the method is a surprising fact -- imposing structured sparsity patterns, like tridiagonal and banded structure, requires little to no overhead, making it as efficient and parallelizable as first-order methods. In wall-clock time, tridiagonal SONew is only about 3% slower per step than first-order methods but gives overall gains due to much faster convergence. In contrast, one of the state-of-the-art (SOTA) memory-intensive second-order methods, Shampoo, is unable to scale to large benchmarks. Additionally, while Shampoo necessitates significant engineering efforts to scale to large benchmarks, SONew offers a more straightforward implementation, increasing its practical appeal. SONew code is available at: https://github.com/devvrit/SONew
JaxMARL: Multi-Agent RL Environments in JAX
Authors: Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract
Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl.
Keyword: mobile
Toward Ultra-Low-Power Remote Health Monitoring: An Optimal and Adaptive Compressed Sensing Framework for Activity Recognition
Authors: J. Pagan, R. Fallahzadeh, M. Pedram, José L. Risco-Martín, J. M. Moya, J. L. Ayala, H. Ghasemzadeh
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Activity recognition, as an important component of behavioral monitoring and intervention, has attracted enormous attention, especially in Mobile Cloud Computing (MCC) and Remote Health Monitoring (RHM) paradigms. While recently resource constrained wearable devices have been gaining popularity, their battery life is limited and constrained by the frequent wireless transmission of data to more computationally powerful back-ends. This paper proposes an ultra-low power activity recognition system using a novel adaptive compressed sensing technique that aims to minimize transmission costs. Coarse-grained on-body sensor localization and unsupervised clustering modules are devised to autonomously reconfigure the compressed sensing module for further power saving. We perform a thorough heuristic optimization using Grammatical Evolution (GE) to ensure minimal computation overhead of the proposed methodology. Our evaluation on a real-world dataset and a low power wearable sensing node demonstrates that our approach can reduce the energy consumption of the wireless data transmission up to $81.2\%$ and $61.5\%$, with up to $60.6\%$ and $35.0\%$ overall power savings in comparison with baseline and a naive state-of-the-art approaches, respectively. These solutions lead to an average activity recognition accuracy of $89.0\%$ -- only $4.8\%$ less than the baseline accuracy -- while having a negligible energy overhead of on-node computation.
FedCode: Communication-Efficient Federated Learning via Transferring Codebooks
Authors: Saeed Khalilian, Vasileios Tsouvalas, Tanir Ozcelebi, Nirvana Meratnia
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Federated Learning (FL) is a distributed machine learning paradigm that enables learning models from decentralized local data. While FL offers appealing properties for clients' data privacy, it imposes high communication burdens for exchanging model weights between a server and the clients. Existing approaches rely on model compression techniques, such as pruning and weight clustering to tackle this. However, transmitting the entire set of weight updates at each federated round, even in a compressed format, limits the potential for a substantial reduction in communication volume. We propose FedCode where clients transmit only codebooks, i.e., the cluster centers of updated model weight values. To ensure a smooth learning curve and proper calibration of clusters between the server and the clients, FedCode periodically transfers model weights after multiple rounds of solely communicating codebooks. This results in a significant reduction in communication volume between clients and the server in both directions, without imposing significant computational overhead on the clients or leading to major performance degradation of the models. We evaluate the effectiveness of FedCode using various publicly available datasets with ResNet-20 and MobileNet backbone model architectures. Our evaluations demonstrate a 12.2-fold data transmission reduction on average while maintaining a comparable model performance with an average accuracy loss of 1.3% compared to FedAvg. Further validation of FedCode performance under non-IID data distributions showcased an average accuracy loss of 2.0% compared to FedAvg while achieving approximately a 12.7-fold data transmission reduction.
GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production
Authors: Kostya Serebryany, Chris Kennelly, Mitch Phillips, Matt Denton, Marco Elver, Alexander Potapenko, Matt Morehouse, Vlad Tsyrklevich, Christian Holler, Julian Lettner, David Kilzer, Lander Brandt
Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
Abstract
Despite the recent advances in pre-production bug detection, heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++, across all major software ecosystems. Memory-safe languages solve this problem when they are used, but the existing code bases consisting of billions of lines of C and C++ continue to grow, and we need additional bug detection mechanisms. This paper describes a family of tools that detect these two classes of memory-safety bugs, while running in production, at near-zero overhead. These tools combine page-granular guarded allocation and low-rate sampling. In other words, we added an "if" statement to a 36-year-old idea and made it work at scale. We describe the basic algorithm, several of its variants and implementations, and the results of multi-year deployments across mobile, desktop, and server applications.
Adaptive Interventions with User-Defined Goals for Health Behavior Change
Authors: Aishwarya Mandyam, Matthew Joerke, Barbara E. Engelhardt, Emma Brunskill
Abstract
Physical inactivity remains a major public health concern, having associations with adverse health outcomes such as cardiovascular disease and type-2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable physical activity promotion, yet often suffer from small effect sizes and low adherence rates, particularly in comparison to human coaching. Goal-setting is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions. This paper introduces a modification to the Thompson sampling algorithm that places emphasis on individualized goal-setting by optimizing personalized reward functions. As a step towards supporting goal-setting, this paper offers a balanced approach that can leverage shared structure while optimizing individual preferences and goals. We prove that our modification incurs only a constant penalty on the cumulative regret while preserving the sample complexity benefits of data sharing. In a physical activity simulator, we demonstrate that our algorithm achieves substantial improvements in cumulative regret compared to baselines that do not share data or do not optimize for individualized rewards.
Modelling daily mobility using mobile data traffic at fine spatiotemporal scale
Authors: Panayotis Christidis, Maria Vega Gonzalo, Miklos Radics
Abstract
We applied a data-driven approach that explores the usability of the NetMob 2023 dataset in modelling mobility patterns within an urban context. We combined the data with a highly suitable external source, the ENACT dataset, which provides a 1 km x 1km grid with estimates of the day and night population across Europe. We developed three sets of XGBoost models that predict the population in each 100m x 100m grid cell used in NetMob2023 based on the mobile data traffic of the 68 online services covered in the dataset, using the ENACT values as ground truth. The results suggest that the NetMob 2023 data can be useful for the estimation of the day and night population and grid cell level and can explain part of the dynamics of urban mobility.
Learning effects in variable autonomy human-robot systems: how much training is enough?
Authors: Manolis Chiou, Mohammed Talha, Rustam Stolkin
Abstract
This paper investigates learning effects and human operator training practices in variable autonomy robotic systems. These factors are known to affect performance of a human-robot system and are frequently overlooked. We present the results from an experiment inspired by a search and rescue scenario in which operators remotely controlled a mobile robot with either Human-Initiative (HI) or Mixed-Initiative (MI) control. Evidence suggests learning in terms of primary navigation task and secondary (distractor) task performance. Further evidence is provided that MI and HI performance in a pure navigation task is equal. Lastly, guidelines are proposed for experimental design and operator training practices.
EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction on Mobile Devices
Abstract
Reconstructing real-world 3D objects has numerous applications in computer vision, such as virtual reality, video games, and animations. Ideally, 3D reconstruction methods should generate high-fidelity results with 3D consistency in real-time. Traditional methods match pixels between images using photo-consistency constraints or learned features, while differentiable rendering methods like Neural Radiance Fields (NeRF) use surface-based representations or differentiable volume rendering to generate high-fidelity scenes. However, these methods require excessive runtime for rendering, making them impractical for daily applications. To address these challenges, we present $\textbf{EvaSurf}$, an $\textbf{E}$fficient $\textbf{V}$iew-$\textbf{A}$ware Implicit Textured $\textbf{Surf}$ace Reconstruction method on Mobile Devices. In our method, we first employ an efficient surface-based model with a multi-view supervision module to ensure accurate mesh creation. To enable high-fidelity rendering, we learn an implicit texture embedded with a set of Gaussian lobes to capture view-dependent information. Furthermore, With the explicit geometry and the implicit texture, we can employ a lightweight neural shader to reduce the expense of computation and further support real-time rendering on common mobile devices. Extensive experiments demonstrate that our method can reconstruct high-quality appearance and accurate mesh on both synthetic and real-world datasets. Moreover, our method can be trained in just 1-2 hours using a single GPU and run on mobile devices at over 40FPS (Frames Per Second), with a final package required for rendering taking up only 40-50 MB.
Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation
Abstract
Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challenge, we propose in this paper a new semantic relay (SemRelay), which is equipped with a semantic receiver for assisting text transmission from a resource-abundant base station (BS) to a resource-constrained mobile device. Specifically, the SemRelay first decodes the semantic information sent by the BS (with a semantic transmitter) and then forwards it to the user by adopting conventional bit transmission, hence effectively improving the text transmission efficiency. We formulate an optimization problem to maximize the achievable (effective) bit rate by jointly designing the SemRelay placement and bandwidth allocation. Although this problem is non-convex and generally difficult to solve, we propose an efficient penalty-based algorithm to obtain a high-quality suboptimal solution. Numerical results show the close-to-optimal performance of the proposed algorithm as well as significant rate performance gain of the proposed SemRelay over conventional decode-and-forward relay.
LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters
Authors: Yibin Wu, Tiziano Guadagnino, Louis Wiesmann, Lasse Klingbeil, Cyrill Stachniss, Heiner Kuhlmann
Abstract
Odometry estimation is a key element for every autonomous system requiring navigation in an unknown environment. In modern mobile robots, 3D LiDAR-inertial systems are often used for this task. By fusing LiDAR scans and IMU measurements, these systems can reduce the accumulated drift caused by sequentially registering individual LiDAR scans and provide a robust pose estimate. Although effective, LiDAR-inertial odometry systems require proper parameter tuning to be deployed. In this paper, we propose LIO-EKF, a tightly-coupled LiDAR-inertial odometry system based on point-to-point registration and the classical extended Kalman filter scheme. We propose an adaptive data association that considers the relative pose uncertainty, the map discretization errors, and the LiDAR noise. In this way, we can substantially reduce the parameters to tune for a given type of environment. The experimental evaluation suggests that the proposed system performs on par with the state-of-the-art LiDAR-inertial odometry pipelines, but is significantly faster in computing the odometry.
Dynamic modeling of wing-assisted inclined running with a morphing multi-modal robot
Authors: Eric Sihite, Alireza Ramezani, Morteza Gharib
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Robot designs can take many inspirations from nature, where there are many examples of highly resilient and fault-tolerant locomotion strategies to navigate complex terrains by using multi-functional appendages. For example, Chukar and Hoatzin birds can repurpose their wings for quadrupedal walking and wing-assisted incline running (WAIR) to climb steep surfaces. We took inspiration from nature and designed a morphing robot with multi-functional thruster-wheel appendages that allows the robot to change its mode of locomotion by transforming into a rover, quad-rotor, mobile inverted pendulum (MIP), and other modes. In this work, we derive a dynamic model and formulate a nonlinear model predictive controller to perform WAIR to showcase the unique capabilities of our robot. We implemented the model and controller in a numerical simulation and experiments to show their feasibility and the capabilities of our transforming multi-modal robot.
Keyword: pruning
FedCode: Communication-Efficient Federated Learning via Transferring Codebooks
Authors: Saeed Khalilian, Vasileios Tsouvalas, Tanir Ozcelebi, Nirvana Meratnia
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Federated Learning (FL) is a distributed machine learning paradigm that enables learning models from decentralized local data. While FL offers appealing properties for clients' data privacy, it imposes high communication burdens for exchanging model weights between a server and the clients. Existing approaches rely on model compression techniques, such as pruning and weight clustering to tackle this. However, transmitting the entire set of weight updates at each federated round, even in a compressed format, limits the potential for a substantial reduction in communication volume. We propose FedCode where clients transmit only codebooks, i.e., the cluster centers of updated model weight values. To ensure a smooth learning curve and proper calibration of clusters between the server and the clients, FedCode periodically transfers model weights after multiple rounds of solely communicating codebooks. This results in a significant reduction in communication volume between clients and the server in both directions, without imposing significant computational overhead on the clients or leading to major performance degradation of the models. We evaluate the effectiveness of FedCode using various publicly available datasets with ResNet-20 and MobileNet backbone model architectures. Our evaluations demonstrate a 12.2-fold data transmission reduction on average while maintaining a comparable model performance with an average accuracy loss of 1.3% compared to FedAvg. Further validation of FedCode performance under non-IID data distributions showcased an average accuracy loss of 2.0% compared to FedAvg while achieving approximately a 12.7-fold data transmission reduction.
Lighter, yet More Faithful: Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
Authors: George Chrysostomou, Zhixue Zhao, Miles Williams, Nikolaos Aletras
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Despite their remarkable performance on abstractive summarization, large language models (LLMs) face two significant challenges: their considerable size and tendency to hallucinate. Hallucinations are concerning because they erode the reliability of LLMs and raise safety issues. Pruning is a technique that reduces model size by removing redundant weights to create sparse models that enable more efficient inference. Pruned models yield comparable performance to their counterpart full-sized models, making them ideal alternatives when operating on a limited budget. However, the effect that pruning has upon hallucinations in abstractive summarization with LLMs has yet to be explored. In this paper, we provide an extensive empirical study on the hallucinations produced by pruned models across three standard summarization tasks, two pruning approaches, three instruction-tuned LLMs, and three hallucination evaluation metrics. Surprisingly, we find that pruned LLMs hallucinate less compared to their full-sized counterparts. Our follow-up analysis suggests that pruned models tend to depend more on the source input and less on their parametric knowledge from pre-training for generation. This greater dependency on the source input leads to a higher lexical overlap between generated content and the source input, which can be a reason for the reduction in hallucinations.
Investigating the Impact of Weight Sharing Decisions on Knowledge Transfer in Continual Learning
Authors: Josh Andle, Ali Payani, Salimeh Yasaei-Sekeh
Abstract
Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks, improving network efficiency and adaptability to different tasks. Additionally, CL serves as an ideal setting for studying network behavior and Forward Knowledge Transfer (FKT) between tasks. Pruning methods for CL train subnetworks to handle the sequential tasks which allows us to take a structured approach to investigating FKT. Sharing prior subnetworks' weights leverages past knowledge for the current task through FKT. Understanding which weights to share is important as sharing all weights can yield sub-optimal accuracy. This paper investigates how different sharing decisions affect the FKT between tasks. Through this lens we demonstrate how task complexity and similarity influence the optimal weight sharing decisions, giving insights into the relationships between tasks and helping inform decision making in similar CL methods. We implement three sequential datasets designed to emphasize variation in task complexity and similarity, reporting results for both ResNet-18 and VGG-16. By sharing in accordance with the decisions supported by our findings, we show that we can improve task accuracy compared to other sharing decisions.
SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring
Abstract
Automated code generation and performance optimizations for sparse tensor algebra are cardinal since they have become essential in many real-world applications like quantum computing, physics, chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to optimize and generate asymptotically better schedules for complex tensor expressions using kernel fission and fusion. We present a generalized loop transformation to achieve loop nesting for minimized memory footprint and reduced asymptotic complexity. Furthermore, we present an auto-scheduler that uses a partially ordered set-based cost model that uses both time and auxiliary memory complexities in its pruning stages. In addition, we highlight the use of SMT solvers in sparse auto-schedulers to prune the Pareto frontier of schedules to the smallest number of possible schedules with user-defined constraints available at compile time. Finally, we show that our auto-scheduler can select asymptotically better schedules that use our compiler transformation to generate optimized code. Our results show that the auto-scheduler achieves orders of magnitude speedup compared to the TACO-generated code for several real-world tensor algebra computations on different real-world inputs.
CRISPR: Eliminating Bias Neurons from an Instruction-following Language Model
Authors: Nakyeong Yang, Taegwan Kang, Kyomin Jung
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Large language models (LLMs) executing tasks through instruction-based prompts often face challenges stemming from distribution differences between user instructions and training instructions. This leads to distractions and biases, especially when dealing with inconsistent dynamic labels. In this paper, we introduces a novel bias mitigation method, CRISPR, designed to alleviate instruction-label biases in LLMs. CRISPR utilizes attribution methods to identify bias neurons influencing biased outputs and employs pruning to eliminate the bias neurons. Experimental results demonstrate the method's effectiveness in mitigating biases in instruction-based prompting, enhancing language model performance on social bias benchmarks without compromising pre-existing knowledge. CRISPR proves highly practical, model-agnostic, offering flexibility in adapting to evolving social biases.
How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?
Abstract
Pruning and quantization form the foundation of model compression for neural networks, enabling efficient inference for large language models (LLMs). Recently, various quantization and pruning techniques have demonstrated state-of-the-art performance in a post-training setting. They rely upon calibration data, a small set of unlabeled examples, to generate layer activations. However, no prior work has systematically investigated how the calibration data impacts the effectiveness of model compression methods. In this paper, we present the first extensive empirical study on the effect of calibration data upon LLM performance. We trial a variety of pruning and quantization methods, tasks, models, and datasets. Surprisingly, we find substantial variations in downstream task performance, contrasting existing work that suggests a greater level of robustness to the calibration data. Finally, we make a series of recommendations for the effective use of calibration data in LLM quantization and pruning.
Abstract
The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised neural networks likely contain subnetworks that perform well without any training. Although unstructured pruning has been extensively studied in this context, its structured counterpart, which can deliver significant computational and memory efficiency gains, has been largely unexplored. One of the main reasons for this gap is the limitations of the underlying mathematical tools used in formal analyses of the SLTH. In this paper, we overcome these limitations: we leverage recent advances in the multidimensional generalisation of the Random Subset-Sum Problem and obtain a variant that admits the stochastic dependencies that arise when addressing structured pruning in the SLTH. We apply this result to prove, for a wide class of random Convolutional Neural Networks, the existence of structured subnetworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the SLTH for structured pruning, opening up new avenues for further research on the hypothesis and contributing to the understanding of the role of over-parameterization in deep learning.
Keyword: diffusion
Scalable Diffusion for Materials Generation
Authors: Mengjiao Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk
Abstract
Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in crystals), but generating structures can be difficult to scale to large and complex systems. Another challenge in generating materials is the mismatch between standard generative modeling metrics and downstream applications. For instance, common metrics such as the reconstruction error do not correlate well with the downstream goal of discovering stable materials. In this work, we tackle the scalability challenge by developing a unified crystal representation that can represent any crystal structure (UniMat), followed by training a diffusion probabilistic model on these UniMat representations. Our empirical results suggest that despite the lack of explicit structure modeling, UniMat can generate high fidelity crystal structures from larger and more complex chemical systems, outperforming previous graph-based approaches under various generative modeling metrics. To better connect the generation quality of materials to downstream applications, such as discovering novel stable materials, we propose additional metrics for evaluating generative models of materials, including per-composition formation energy and stability with respect to convex hulls through decomposition energy from Density Function Theory (DFT). Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Authors: Yanwu Xu, Yang Zhao, Zhisheng Xiao, Tingbo Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models. \blfootnote{*Work done as a student researcher of Google, $\dagger$ indicates equal contribution.
Disentangling the Potential Impacts of Papers into Diffusion, Conformity, and Contribution Values
Authors: Zhikai Xue, Guoxiu He, Zhuoren Jiang, Yangyang Kang, Star Zhao, Wei Lu
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Abstract
The potential impact of an academic paper is determined by various factors, including its popularity and contribution. Existing models usually estimate original citation counts based on static graphs and fail to differentiate values from nuanced perspectives. In this study, we propose a novel graph neural network to Disentangle the Potential impacts of Papers into Diffusion, Conformity, and Contribution values (called DPPDCC). Given a target paper, DPPDCC encodes temporal and structural features within the constructed dynamic heterogeneous graph. Particularly, to capture the knowledge flow, we emphasize the importance of comparative and co-cited/citing information between papers and aggregate snapshots evolutionarily. To unravel popularity, we contrast augmented graphs to extract the essence of diffusion and predict the accumulated citation binning to model conformity. We further apply orthogonal constraints to encourage distinct modeling of each perspective and preserve the inherent value of contribution. To evaluate models' generalization for papers published at various times, we reformulate the problem by partitioning data based on specific time points to mirror real-world conditions. Extensive experimental results on three datasets demonstrate that DPPDCC significantly outperforms baselines for previously, freshly, and immediately published papers. Further analyses confirm its robust capabilities. We will make our datasets and codes publicly available.
FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier
Authors: Zhongjie Duan, Chengyu Wang, Cen Chen, Weining Qian, Jun Huang, Mingyi Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
With the emergence of diffusion models and rapid development in image processing, it has become effortless to generate fancy images in tasks such as style transfer and image editing. However, these impressive image processing approaches face consistency issues in video processing. In this paper, we propose a powerful model-free toolkit called FastBlend to address the consistency problem for video processing. Based on a patch matching algorithm, we design two inference modes, including blending and interpolation. In the blending mode, FastBlend eliminates video flicker by blending the frames within a sliding window. Moreover, we optimize both computational efficiency and video quality according to different application scenarios. In the interpolation mode, given one or more keyframes rendered by diffusion models, FastBlend can render the whole video. Since FastBlend does not modify the generation process of diffusion models, it exhibits excellent compatibility. Extensive experiments have demonstrated the effectiveness of FastBlend. In the blending mode, FastBlend outperforms existing methods for video deflickering and video synthesis. In the interpolation mode, FastBlend surpasses video interpolation and model-based video processing approaches. The source codes have been released on GitHub.
Generative AI-Based Probabilistic Constellation Shaping With Diffusion Models
Authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Diffusion models are at the vanguard of generative AI research with renowned solutions such as ImageGen by Google Brain and DALL.E 3 by OpenAI. Nevertheless, the potential merits of diffusion models for communication engineering applications are not fully understood yet. In this paper, we aim to unleash the power of generative AI for PHY design of constellation symbols in communication systems. Although the geometry of constellations is predetermined according to networking standards, e.g., quadrature amplitude modulation (QAM), probabilistic shaping can design the probability of occurrence (generation) of constellation symbols. This can help improve the information rate and decoding performance of communication systems. We exploit the denoise-and-generate'' characteristics of denoising diffusion probabilistic models (DDPM) for probabilistic constellation shaping. The key idea is to learn generating constellation symbols out of noise,mimicking'' the way the receiver performs symbol reconstruction. This way, we make the constellation symbols sent by the transmitter, and what is inferred (reconstructed) at the receiver become as similar as possible, resulting in as few mismatches as possible. Our results show that the generative AI-based scheme outperforms deep neural network (DNN)-based benchmark and uniform shaping, while providing network resilience as well as robust out-of-distribution performance under low-SNR regimes and non-Gaussian assumptions. Numerical evaluations highlight 30% improvement in terms of cosine similarity and a threefold improvement in terms of mutual information compared to DNN-based approach for 64-QAM geometry.
Privacy Threats in Stable Diffusion Models
Authors: Thomas Cilloni, Charles Fleming, Charles Walter
Abstract
This paper introduces a novel approach to membership inference attacks (MIA) targeting stable diffusion computer vision models, specifically focusing on the highly sophisticated Stable Diffusion V2 by StabilityAI. MIAs aim to extract sensitive information about a model's training data, posing significant privacy concerns. Despite its advancements in image synthesis, our research reveals privacy vulnerabilities in the stable diffusion models' outputs. Exploiting this information, we devise a black-box MIA that only needs to query the victim model repeatedly. Our methodology involves observing the output of a stable diffusion model at different generative epochs and training a classification model to distinguish when a series of intermediates originated from a training sample or not. We propose numerous ways to measure the membership features and discuss what works best. The attack's efficacy is assessed using the ROC AUC method, demonstrating a 60\% success rate in inferring membership information. This paper contributes to the growing body of research on privacy and security in machine learning, highlighting the need for robust defenses against MIAs. Our findings prompt a reevaluation of the privacy implications of stable diffusion models, urging practitioners and developers to implement enhanced security measures to safeguard against such attacks.
Synthetically Enhanced: Unveiling Synthetic Data's Potential in Medical Imaging Research
Authors: Bardia Khosravi, Frank Li, Theo Dapamede, Pouria Rouzrokh, Cooper U. Gamble, Hari M. Trivedi, Cody C. Wyles, Andrew B. Sellergren, Saptarshi Purkayastha, Bradley J. Erickson, Judy W. Gichoya
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Chest X-rays (CXR) are the most common medical imaging study and are used to diagnose multiple medical conditions. This study examines the impact of synthetic data supplementation, using diffusion models, on the performance of deep learning (DL) classifiers for CXR analysis. We employed three datasets: CheXpert, MIMIC-CXR, and Emory Chest X-ray, training conditional denoising diffusion probabilistic models (DDPMs) to generate synthetic frontal radiographs. Our approach ensured that synthetic images mirrored the demographic and pathological traits of the original data. Evaluating the classifiers' performance on internal and external datasets revealed that synthetic data supplementation enhances model accuracy, particularly in detecting less prevalent pathologies. Furthermore, models trained on synthetic data alone approached the performance of those trained on real data. This suggests that synthetic data can potentially compensate for real data shortages in training robust DL models. However, despite promising outcomes, the superiority of real data persists.
Abstract
High-dimensional images, known for their rich semantic information, are widely applied in remote sensing and other fields. The spatial information in these images reflects the object's texture features, while the spectral information reveals the potential spectral representations across different bands. Currently, the understanding of high-dimensional images remains limited to a single-domain perspective with performance degradation. Motivated by the masking texture effect observed in the human visual system, we present a multi-domain diffusion-driven feature learning network (MDFL) , a scheme to redefine the effective information domain that the model really focuses on. This method employs diffusion-based posterior sampling to explicitly consider joint information interactions between the high-dimensional manifold structures in the spectral, spatial, and frequency domains, thereby eliminating the influence of masking texture effects in visual models. Additionally, we introduce a feature reuse mechanism to gather deep and raw features of high-dimensional data. We demonstrate that MDFL significantly improves the feature extraction performance of high-dimensional data, thereby providing a powerful aid for revealing the intrinsic patterns and structures of such data. The experimental results on three multi-modal remote sensing datasets show that MDFL reaches an average overall accuracy of 98.25%, outperforming various state-of-the-art baseline schemes. The code will be released, contributing to the computer vision community.
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
Abstract
In this work we develop 3D Paintbrush, a technique for automatically texturing local semantic regions on meshes via text descriptions. Our method is designed to operate directly on meshes, producing texture maps which seamlessly integrate into standard graphics pipelines. We opt to simultaneously produce a localization map (to specify the edit region) and a texture map which conforms to it. This synergistic approach improves the quality of both the localization and the stylization. To enhance the details and resolution of the textured area, we leverage multiple stages of a cascaded diffusion model to supervise our local editing technique with generative priors learned from images at different resolutions. Our technique, referred to as Cascaded Score Distillation (CSD), simultaneously distills scores at multiple resolutions in a cascaded fashion, enabling control over both the granularity and global understanding of the supervision. We demonstrate the effectiveness of 3D Paintbrush to locally texture a variety of shapes within different semantic regions. Project page: https://threedle.github.io/3d-paintbrush
DECDM: Document Enhancement using Cycle-Consistent Diffusion Models
Abstract
The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require supervised data pairs, which raises concerns about data separation and privacy protection, and makes it challenging to adapt these methods to new domain pairs. To address these issues, we propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models. Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models, making it possible to apply domain-specific diffusion models to other pairs. DECDM trains on one dataset at a time, eliminating the need to scan both datasets concurrently, and effectively preserving data privacy from the source or target domain. We also introduce simple data augmentation strategies to improve character-glyph conservation during translation. We compare DECDM with state-of-the-art methods on multiple synthetic data and benchmark datasets, such as document denoising and {\color{black}shadow} removal, and demonstrate the superiority of performance quantitatively and qualitatively.
What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization
Authors: Yuhan Liu, Shangbin Feng, Xiaochuang Han, Vidhisha Balachandran, Chan Young Park, Sachin Kumar, Yulia Tsvetkov
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
In this work, we take a first step towards designing summarization systems that are faithful to the author's opinions and perspectives. Focusing on a case study of preserving political perspectives in news summarization, we find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries, misrepresenting the intent and perspectives of the news authors. We thus propose P^3Sum, a diffusion model-based summarization approach controlled by political perspective classifiers. In P^3Sum, the political leaning of a generated summary is iteratively evaluated at each decoding step, and any drift from the article's original stance incurs a loss back-propagated to the embedding layers, steering the political stance of the summary at inference time. Extensive experiments on three news summarization datasets demonstrate that P^3Sum outperforms state-of-the-art summarization systems and large language models by up to 11.4% in terms of the success rate of stance preservation, with on-par performance on standard summarization utility metrics. These findings highlight the lacunae that even for state-of-the-art models it is still challenging to preserve author perspectives in news summarization, while P^3Sum presents an important first step towards evaluating and developing summarization systems that are faithful to author intent and perspectives.
DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics
Abstract
Diffusion models have advanced generative AI significantly in terms of editing and creating naturalistic images. However, efficiently improving generated image quality is still of paramount interest. In this context, we propose a generic "naturalness" preserving loss function, viz., kurtosis concentration (KC) loss, which can be readily applied to any standard diffusion model pipeline to elevate the image quality. Our motivation stems from the projected kurtosis concentration property of natural images, which states that natural images have nearly constant kurtosis values across different band-pass versions of the image. To retain the "naturalness" of the generated images, we enforce reducing the gap between the highest and lowest kurtosis values across the band-pass versions (e.g., Discrete Wavelet Transform (DWT)) of images. Note that our approach does not require any additional guidance like classifier or classifier-free guidance to improve the image quality. We validate the proposed approach for three diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, and (3) image super-resolution. Integrating the proposed KC loss has improved the perceptual quality across all these tasks in terms of both FID, MUSIQ score, and user evaluation.
Scene Text Image Super-resolution based on Text-conditional Diffusion Models
Abstract
Scene Text Image Super-resolution (STISR) has recently achieved great success as a preprocessing method for scene text recognition. STISR aims to transform blurred and noisy low-resolution (LR) text images in real-world settings into clear high-resolution (HR) text images suitable for scene text recognition. In this study, we leverage text-conditional diffusion models (DMs), known for their impressive text-to-image synthesis capabilities, for STISR tasks. Our experimental results revealed that text-conditional DMs notably surpass existing STISR methods. Especially when texts from LR text images are given as input, the text-conditional DMs are able to produce superior quality super-resolution text images. Utilizing this capability, we propose a novel framework for synthesizing LR-HR paired text image datasets. This framework consists of three specialized text-conditional DMs, each dedicated to text image synthesis, super-resolution, and image degradation. These three modules are vital for synthesizing distinct LR and HR paired images, which are more suitable for training STISR methods. Our experiments confirmed that these synthesized image pairs significantly enhance the performance of STISR methods in the TextZoom evaluation.
Diffusion-Augmented Neural Processes
Authors: Lorenzo Bonito, James Requeima, Aliaksandra Shysheya, Richard E. Turner
Abstract
Over the last few years, Neural Processes have become a useful modelling tool in many application areas, such as healthcare and climate sciences, in which data are scarce and prediction uncertainty estimates are indispensable. However, the current state of the art in the field (AR CNPs; Bruinsma et al., 2023) presents a few issues that prevent its widespread deployment. This work proposes an alternative, diffusion-based approach to NPs which, through conditioning on noised datasets, addresses many of these limitations, whilst also exceeding SOTA performance.
DSR-Diff: Depth Map Super-Resolution with Diffusion Model
Authors: Yuan Shi, Bin Xia, Rui Zhu, Qingmin Liao, Wenming Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Color-guided depth map super-resolution (CDSR) improve the spatial resolution of a low-quality depth map with the corresponding high-quality color map, benefiting various applications such as 3D reconstruction, virtual reality, and augmented reality. While conventional CDSR methods typically rely on convolutional neural networks or transformers, diffusion models (DMs) have demonstrated notable effectiveness in high-level vision tasks. In this work, we present a novel CDSR paradigm that utilizes a diffusion model within the latent space to generate guidance for depth map super-resolution. The proposed method comprises a guidance generation network (GGN), a depth map super-resolution network (DSRN), and a guidance recovery network (GRN). The GGN is specifically designed to generate the guidance while managing its compactness. Additionally, we integrate a simple but effective feature fusion module and a transformer-style feature extraction module into the DSRN, enabling it to leverage guided priors in the extraction, fusion, and reconstruction of multi-model images. Taking into account both accuracy and efficiency, our proposed method has shown superior performance in extensive experiments when compared to state-of-the-art methods. Our codes will be made available at https://github.com/shiyuan7/DSR-Diff.
TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
Abstract
Surface anomaly detection is a vital component in manufacturing inspection. Reconstructive anomaly detection methods restore the normal appearance of an object, ideally modifying only the anomalous regions. Due to the limitations of commonly used reconstruction architectures, the produced reconstructions are often poor and either still contain anomalies or lack details in anomaly-free regions. Recent reconstructive methods adopt diffusion models, however with the standard diffusion process the problems are not adequately addressed. We propose a novel transparency-based diffusion process, where the transparency of anomalous regions is progressively increased, restoring their normal appearance accurately and maintaining the appearance of anomaly-free regions without loss of detail. We propose TRANSparency DifFUSION (TransFusion), a discriminative anomaly detection method that implements the proposed diffusion process, enabling accurate downstream anomaly detection. TransFusion achieves state-of-the-art performance on both the VisA and the MVTec AD datasets, with an image-level AUROC of 98.5% and 99.2%, respectively.
Keyword: adaptive
Toward Ultra-Low-Power Remote Health Monitoring: An Optimal and Adaptive Compressed Sensing Framework for Activity Recognition
Authors: J. Pagan, R. Fallahzadeh, M. Pedram, José L. Risco-Martín, J. M. Moya, J. L. Ayala, H. Ghasemzadeh
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Activity recognition, as an important component of behavioral monitoring and intervention, has attracted enormous attention, especially in Mobile Cloud Computing (MCC) and Remote Health Monitoring (RHM) paradigms. While recently resource constrained wearable devices have been gaining popularity, their battery life is limited and constrained by the frequent wireless transmission of data to more computationally powerful back-ends. This paper proposes an ultra-low power activity recognition system using a novel adaptive compressed sensing technique that aims to minimize transmission costs. Coarse-grained on-body sensor localization and unsupervised clustering modules are devised to autonomously reconfigure the compressed sensing module for further power saving. We perform a thorough heuristic optimization using Grammatical Evolution (GE) to ensure minimal computation overhead of the proposed methodology. Our evaluation on a real-world dataset and a low power wearable sensing node demonstrates that our approach can reduce the energy consumption of the wireless data transmission up to $81.2\%$ and $61.5\%$, with up to $60.6\%$ and $35.0\%$ overall power savings in comparison with baseline and a naive state-of-the-art approaches, respectively. These solutions lead to an average activity recognition accuracy of $89.0\%$ -- only $4.8\%$ less than the baseline accuracy -- while having a negligible energy overhead of on-node computation.
Adversarially Robust Spiking Neural Networks Through Conversion
Abstract
Spiking neural networks (SNNs) provide an energy-efficient alternative to a variety of artificial neural network (ANN) based AI applications. As the progress in neuromorphic computing with SNNs expands their use in applications, the problem of adversarial robustness of SNNs becomes more pronounced. To the contrary of the widely explored end-to-end adversarial training based solutions, we address the limited progress in scalable robust SNN training methods by proposing an adversarially robust ANN-to-SNN conversion algorithm. Our method provides an efficient approach to embrace various computationally demanding robust learning objectives that have been proposed for ANNs. During a post-conversion robust finetuning phase, our method adversarially optimizes both layer-wise firing thresholds and synaptic connectivity weights of the SNN to maintain transferred robustness gains from the pre-trained ANN. We perform experimental evaluations in numerous adaptive adversarial settings that account for the spike-based operation dynamics of SNNs, and show that our approach yields a scalable state-of-the-art solution for adversarially robust deep SNNs with low-latency.
Flexible and Adaptive Manufacturing by Complementing Knowledge Representation, Reasoning and Planning with Reinforcement Learning
Abstract
This paper describes a novel approach to adaptive manufacturing in the context of small batch production and customization. It focuses on integrating task-level planning and reasoning with reinforcement learning (RL) in the SkiROS2 skill-based robot control platform. This integration enhances the efficiency and adaptability of robotic systems in manufacturing, enabling them to adjust to task variations and learn from interaction data. The paper highlights the architecture of SkiROS2, particularly its world model, skill libraries, and task management. It demonstrates how combining RL with robotic manipulators can learn and improve the execution of industrial tasks. It advocates a multi-objective learning model that eases the learning problem design. The approach can incorporate user priors or previous experiences to accelerate learning and increase safety. Spotlight video: https://youtu.be/H5PmZl2rRbs?si=8wmZ-gbwuSJRxe3S&t=1422 SkiROS2 code: https://github.com/RVMI/skiros2 SkiROS2 talk at ROSCon: https://vimeo.com/879001825/2a0e9d5412 SkiREIL code: https://github.com/matthias-mayr/SkiREIL
Local spline refinement driven by fault jump estimates for scattered data approximation
Authors: Cesare Bracco, Carlotta Giannelli, Francesco Patrizi, Alessandra Sestini
Abstract
We present a new fault (jump) indicator to guide local refinement in surface approximation schemes with adaptive spline constructions. The proposed approach is based on the idea that, since discontinuities in the data should naturally correspond to sharp variations in the reconstructed surface, the location and size of jumps detected in the input point cloud should drive the mesh refinement algorithm. To exploit the possibility of inserting local meshlines in one or the other coordinate direction, as suggested by the jump estimates, we propose a quasi-interpolation (QI) scheme based on locally refined B-splines (LR B-splines). Particular attention is devoted to the construction of the local operator of the LR B-spline QI scheme, which properly adapts the spline approximation according to the nature and density of the scattered data configuration. A selection of numerical examples outlines the performance of the method on synthetic and real datasets characterized by different geographical features.
Abstract
Synchronous condensers (SCs) play important roles in integrating wind energy into relatively weak power grids. However, the design of SCs usually depends on specific application requirements and may not be adaptive enough to the frequently-changing grid conditions caused by the transition from conventional to renewable power generation. This paper devises a software-defined virtual synchronous condenser (SDViSC) method to address the challenges. Our contributions are fourfold: 1) design of a virtual synchronous condenser (ViSC) to enable full converter wind turbines to provide built-in SC functionalities; 2) engineering SDViSCs to transfer hardware-based ViSC controllers into software services, where a Tustin transformation-based software-defined control algorithm guarantees accurate tracking of fast dynamics under limited communication bandwidth; 3) a software-defined networking-enhanced SDViSC communication scheme to allow enhanced communication reliability and reduced communication bandwidth occupation; and 4) Prototype of SDViSC on our real-time, cyber-in-the-loop digital twin of large-wind-farm in an RTDS environment. Extensive test results validate the excellent performance of SDViSC to support reliable and resilient operations of wind farms under various physical and cyber conditions.
Adaptive Interventions with User-Defined Goals for Health Behavior Change
Authors: Aishwarya Mandyam, Matthew Joerke, Barbara E. Engelhardt, Emma Brunskill
Abstract
Physical inactivity remains a major public health concern, having associations with adverse health outcomes such as cardiovascular disease and type-2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable physical activity promotion, yet often suffer from small effect sizes and low adherence rates, particularly in comparison to human coaching. Goal-setting is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions. This paper introduces a modification to the Thompson sampling algorithm that places emphasis on individualized goal-setting by optimizing personalized reward functions. As a step towards supporting goal-setting, this paper offers a balanced approach that can leverage shared structure while optimizing individual preferences and goals. We prove that our modification incurs only a constant penalty on the cumulative regret while preserving the sample complexity benefits of data sharing. In a physical activity simulator, we demonstrate that our algorithm achieves substantial improvements in cumulative regret compared to baselines that do not share data or do not optimize for individualized rewards.
Real-Time Adaptive Neural Network on FPGA: Enhancing Adaptability through Dynamic Classifier Selection
Authors: Achraf El Bouazzaoui, Abdelkader Hadjoudja, Omar Mouhib
Abstract
This research studies an adaptive neural network with a Dynamic Classifier Selection framework on Field-Programmable Gate Arrays (FPGAs). The evaluations are conducted across three different datasets. By adjusting parameters, the architecture surpasses all models in the ensemble set in accuracy and shows an improvement of up to 8% compared to a singular neural network implementation. The research also emphasizes considerable resource savings of up to 109.28%, achieved via partial reconfiguration rather than a traditional fixed approach. Such improved efficiency suggests that the architecture is ideal for settings limited by computational capacity, like in edge computing scenarios. The collected data highlights the architecture's two main benefits: high performance and real-world application, signifying a notable input to FPGA-based ensemble learning methods.
FunctionMarker: Watermarking Language Datasets via Knowledge Injection
Abstract
Large Language Models (LLMs) have demonstrated superior performance in various natural language processing tasks. Meanwhile, they require extensive training data, raising concerns related to dataset copyright protection. Backdoor-based watermarking is a viable approach to protect the copyright of classification datasets. However, these methods may introduce malicious misclassification behaviors into watermarked LLMs by attackers and also affect the semantic information of the watermarked text. To address these issues, we propose FunctionMarker, a novel copyright protection method for language datasets via knowledge injection. FunctionMarker enables LLMs to learn specific knowledge through fine-tuning on watermarked datasets, and we can extract the embedded watermark by obtaining the responses of LLMs to specific knowledge-related queries. Considering watermark capacity and stealthness, we select customizable functions as specific knowledge for LLMs to learn and embed the watermark into them. Moreover, FunctionMarker can embed multi-bit watermarks while preserving the original semantic information, thereby increasing the difficulty of adaptive attacks. We take mathematical functions as an instance to evaluate the effectiveness of FunctionMarker, and experiments show that only 0.3% of watermarked text achieves a 90% watermark extraction accuracy in most cases, validating our method's effectiveness.
Scalable and Adaptively Secure Any-Trust Distributed Key Generation and All-hands Checkpointing
Authors: Hanwen Feng, Tiancheng Mai, Qiang Tang
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
The classical distributed key generation protocols (DKG) are resurging due to their widespread applications in blockchain. While efforts have been made to improve DKG communication, practical large scale deployments are still yet to come, due to various challenges including broadcast channel scalability and worst-case complaint phase. In this paper, we propose a practical DKG for DL-based cryptosystems, with only (quasi-)linear computation/communication cost per participant, with the help of a public ledger, and beacon; Notably, our DKG only incurs constant-size blockchain storage cost for broadcast, even in the face of worst-case complaints. Moreover, our protocol satisfies adaptive security. The key to our improvements lies in delegating the most costly operations to an Any-Trust group. This group is randomly sampled and consists of a small number of individuals. The population only trusts that at least one member in the group is honest, without knowing which one. Additionally, we introduce an extended broadcast channel based on a blockchain and data dispersal network (such as IPFS), enabling reliable broadcasting of arbitrary-size messages at the cost of constant-size blockchain storage, which may be of independent interest. Our DKG leads to a fully practical instantiation of Filecoin's checkpointing mechanism, in which all validators of a Proof-of-Stake (PoS) blockcahin periodically run DKG and threshold signing to create checkpoints on Bitcoin, thereby enhancing the security of the PoS chain. In comparison with another checkpointing approach of Babylon (Oakland, 2023), ours enjoys a significally smaller monetary cost of Bitcoin transaction fees. For a PoS chain with $2^{12}$ validators, our cost is merely 0.6\% of that incurred by Babylon's approach.
Evolving Domain Adaptation of Pretrained Language Models for Text Classification
Authors: Yun-Shiuan Chuang, Yi Wu, Dhruv Gupta, Rheeya Uppaal, Ananya Kumar, Luhang Sun, Makesh Narsimhan Sreedhar, Sijia Yang, Timothy T. Rogers, Junjie Hu
Abstract
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method. Our analysis across various datasets reveals that this incremental method excels at adapting PLMs to EDS, outperforming traditional domain adaptation techniques. These findings highlight the importance of continually updating PLMs to ensure their effectiveness in real-world applications, paving the way for future research into PLM robustness against the natural temporal evolution of language.
Gradient-Map-Guided Adaptive Domain Generalization for Cross Modality MRI Segmentation
Authors: Bingnan Li, Zhitong Gao, Xuming He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Cross-modal MRI segmentation is of great value for computer-aided medical diagnosis, enabling flexible data acquisition and model generalization. However, most existing methods have difficulty in handling local variations in domain shift and typically require a significant amount of data for training, which hinders their usage in practice. To address these problems, we propose a novel adaptive domain generalization framework, which integrates a learning-free cross-domain representation based on image gradient maps and a class prior-informed test-time adaptation strategy for mitigating local domain shift. We validate our approach on two multi-modal MRI datasets with six cross-modal segmentation tasks. Across all the task settings, our method consistently outperforms competing approaches and shows a stable performance even with limited training data.
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Abstract
Graph Neural Networks (GNNs) are becoming a promising technique in various domains due to their excellent capabilities in modeling non-Euclidean data. Although a spectrum of accelerators has been proposed to accelerate the inference of GNNs, our analysis demonstrates that the latency and energy consumption induced by DRAM access still significantly impedes the improvement of performance and energy efficiency. To address this issue, we propose a Memory-Efficient GNN Accelerator (MEGA) through algorithm and hardware co-design in this work. Specifically, at the algorithm level, through an in-depth analysis of the node property, we observe that the data-independent quantization in previous works is not optimal in terms of accuracy and memory efficiency. This motivates us to propose the Degree-Aware mixed-precision quantization method, in which a proper bitwidth is learned and allocated to a node according to its in-degree to compress GNNs as much as possible while maintaining accuracy. At the hardware level, we employ a heterogeneous architecture design in which the aggregation and combination phases are implemented separately with different dataflows. In order to boost the performance and energy efficiency, we also present an Adaptive-Package format to alleviate the storage overhead caused by the fine-grained bitwidth and diverse sparsity, and a Condense-Edge scheduling method to enhance the data locality and further alleviate the access irregularity induced by the extremely sparse adjacency matrix in the graph. We implement our MEGA accelerator in a 28nm technology node. Extensive experiments demonstrate that MEGA can achieve an average speedup of 38.3x, 7.1x, 4.0x, 3.6x and 47.6x, 7.2x, 5.4x, 4.5x energy savings over four state-of-the-art GNN accelerators, HyGCN, GCNAX, GROW, and SGCN, respectively, while retaining task accuracy.
Fuel Saving Effect and Performance of Velocity Control for Modern Combustion-Powered Scooters
Abstract
This paper investigates the performance and fuel-saving effect of a velocity control algorithm on modern 50 cc scooters (Euro 5). The European Parliament has adopted major CO$_2$ emission reductions by 2030. But modern combustion- powered scooters are inefficiently restricted and emit unnecessary amounts of CO$_2$. Replacing the original restriction method with the system presented in this paper, the engine's operating point is being improved significantly. Therefore, a Throttle-by-Wire-System senses the rider's throttle command and manipulates the throttle valve. A redundant wheel speed sensor measures the precise vehicle velocity using the magneto-resistive principle. The entire system is managed by a central ECU, executing the actual velocity control, fail-safe functions, power supply and handling inputs/outputs. For velocity control, an adaptive PI-controller has been simulated, virtually tuned and implemented, limiting the max. velocity regulated by legal constraints (45 km/h). In this way, the environmentally harmful restrictors used today can be bypassed. By implementing a human-machine interface, including a virtual dashboard, the system is capable of interfacing with the rider. For evaluation purposes a measurement box has been developed, logging vehicle orientation, system/control variables and engine parameters. A Peugeot Kisbee 50 4T (Euro 5) is serving as test vehicle. Finally, the system has been evaluated regarding performance and fuel efficiency both through simulation and road testing. Fuel savings of 13.6 % in real-world test scenarios were achieved while maintaining vehicle performance.
LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters
Authors: Yibin Wu, Tiziano Guadagnino, Louis Wiesmann, Lasse Klingbeil, Cyrill Stachniss, Heiner Kuhlmann
Abstract
Odometry estimation is a key element for every autonomous system requiring navigation in an unknown environment. In modern mobile robots, 3D LiDAR-inertial systems are often used for this task. By fusing LiDAR scans and IMU measurements, these systems can reduce the accumulated drift caused by sequentially registering individual LiDAR scans and provide a robust pose estimate. Although effective, LiDAR-inertial odometry systems require proper parameter tuning to be deployed. In this paper, we propose LIO-EKF, a tightly-coupled LiDAR-inertial odometry system based on point-to-point registration and the classical extended Kalman filter scheme. We propose an adaptive data association that considers the relative pose uncertainty, the map discretization errors, and the LiDAR noise. In this way, we can substantially reduce the parameters to tune for a given type of environment. The experimental evaluation suggests that the proposed system performs on par with the state-of-the-art LiDAR-inertial odometry pipelines, but is significantly faster in computing the odometry.
From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning
Authors: Jiansong Zhang, Peizhong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In recent years, self-supervised contrastive learning has emerged as a distinguished paradigm in the artificial intelligence landscape. It facilitates unsupervised feature learning through contrastive delineations at the instance level. However, crafting an effective self-supervised paradigm remains a pivotal challenge within this field. This paper delves into two crucial factors impacting self-supervised contrastive learning-bach size and pretext tasks, and from a data processing standpoint, proposes an adaptive technique of batch fusion. The proposed method, via dimensionality reduction and reconstruction of batch data, enables formerly isolated individual data to partake in intra-batch communication through the Embedding Layer. Moreover, it adaptively amplifies the self-supervised feature encoding capability as the training progresses. We conducted a linear classification test of this method based on the classic contrastive learning framework on ImageNet-1k. The empirical findings illustrate that our approach achieves state-of-the-art performance under equitable comparisons. Benefiting from its "plug-and-play" characteristics, we further explored other contrastive learning methods. On the ImageNet-100, compared to the original performance, the top1 has seen a maximum increase of 1.25%. We suggest that the proposed method may contribute to the advancement of data-driven self-supervised learning research, bringing a fresh perspective to this community.
Keyword: quantization
A Speed Odyssey for Deployable Quantization of LLMs
Authors: Qingyuan Li, Ran Meng, Yiduo Li, Bo Zhang, Liang Li, Yifan Lu, Xiangxiang Chu, Yerui Sun, Yuchen Xie
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
The large language model era urges faster and less costly inference. Prior model compression works on LLMs tend to undertake a software-centric approach primarily focused on the simulated quantization performance. By neglecting the feasibility of deployment, these approaches are typically disabled in real practice. They used to drastically push down the quantization bit range for a reduced computation which might not be supported by the mainstream hardware, or involve sophisticated algorithms that introduce extra computation or memory access overhead. We argue that pursuing a hardware-centric approach in the construction of quantization algorithms is crucial. In this regard, we are driven to build our compression method on top of hardware awareness, eliminating impractical algorithm choices while maximizing the benefit of hardware acceleration. Our method, OdysseyLLM, comes with a novel W4A8 kernel implementation called FastGEMM and a combined recipe of quantization strategies. Extensive experiments manifest the superiority of our W4A8 method which brings the actual speed boosting up to \textbf{4$\times$} compared to Hugging Face FP16 inference and \textbf{2.23$\times$} vs. the state-of-the-art inference engine TensorRT-LLM in FP16, and \textbf{1.45$\times$} vs. TensorRT-LLM in INT8, yet without substantially harming the performance.
How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?
Abstract
Pruning and quantization form the foundation of model compression for neural networks, enabling efficient inference for large language models (LLMs). Recently, various quantization and pruning techniques have demonstrated state-of-the-art performance in a post-training setting. They rely upon calibration data, a small set of unlabeled examples, to generate layer activations. However, no prior work has systematically investigated how the calibration data impacts the effectiveness of model compression methods. In this paper, we present the first extensive empirical study on the effect of calibration data upon LLM performance. We trial a variety of pruning and quantization methods, tasks, models, and datasets. Surprisingly, we find substantial variations in downstream task performance, contrasting existing work that suggests a greater level of robustness to the calibration data. Finally, we make a series of recommendations for the effective use of calibration data in LLM quantization and pruning.
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Abstract
Graph Neural Networks (GNNs) are becoming a promising technique in various domains due to their excellent capabilities in modeling non-Euclidean data. Although a spectrum of accelerators has been proposed to accelerate the inference of GNNs, our analysis demonstrates that the latency and energy consumption induced by DRAM access still significantly impedes the improvement of performance and energy efficiency. To address this issue, we propose a Memory-Efficient GNN Accelerator (MEGA) through algorithm and hardware co-design in this work. Specifically, at the algorithm level, through an in-depth analysis of the node property, we observe that the data-independent quantization in previous works is not optimal in terms of accuracy and memory efficiency. This motivates us to propose the Degree-Aware mixed-precision quantization method, in which a proper bitwidth is learned and allocated to a node according to its in-degree to compress GNNs as much as possible while maintaining accuracy. At the hardware level, we employ a heterogeneous architecture design in which the aggregation and combination phases are implemented separately with different dataflows. In order to boost the performance and energy efficiency, we also present an Adaptive-Package format to alleviate the storage overhead caused by the fine-grained bitwidth and diverse sparsity, and a Condense-Edge scheduling method to enhance the data locality and further alleviate the access irregularity induced by the extremely sparse adjacency matrix in the graph. We implement our MEGA accelerator in a 28nm technology node. Extensive experiments demonstrate that MEGA can achieve an average speedup of 38.3x, 7.1x, 4.0x, 3.6x and 47.6x, 7.2x, 5.4x, 4.5x energy savings over four state-of-the-art GNN accelerators, HyGCN, GCNAX, GROW, and SGCN, respectively, while retaining task accuracy.
Keyword: efficient
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Auto-ICL: In-Context Learning without Human Supervision
Adversarially Robust Spiking Neural Networks Through Conversion
Linear time Evidence Accumulation Clustering with KMeans
Lighter, yet More Faithful: Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization
DISTA: Denoising Spiking Transformer with intrinsic plasticity and spatiotemporal attention
A Characteristic Mapping Method for Vlasov-Poisson with Extreme Resolution Properties
DeepMartNet -- A Martingale Based Deep Neural Network Learning Method for Dirichlet BVP and Eigenvalue Problems of Elliptic PDEs
Near-Optimal Streaming Ellipsoidal Rounding for General Convex Polytopes
A new stabilized time-spectral finite element solver for fast simulation of blood flow
Center Focusing Network for Real-Time LiDAR Panoptic Segmentation
SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU
LightEMU: Hardware Assisted Fuzzing of Trusted Applications
Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta
Accelerating material discovery with a threshold-driven hybrid acquisition policy-based Bayesian optimization
Comprehensive Evaluation and Insights into the Use of Deep Neural Networks to Detect and Quantify Lymphoma Lesions in PET/CT Images
easy'' vs.
hard'' cases, to assist in the development of more resilient segmentation algorithms. Finally, we performed inter-observer agreement assessment underscoring the importance of a standardized ground truth segmentation protocol involving multiple expert annotators. Code is available at: https://github.com/microsoft/lymphoma-segmentation-dnnReconstructing Continuous Light Field From Single Coded Image
Zenkai -- Framework For Exploring Beyond Backpropagation
MacGyver: Are Large Language Models Creative Problem Solvers?
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Outcome-supervised Verifiers for Planning in Mathematical Reasoning
Redefining the Laparoscopic Spatial Sense: AI-based Intra- and Postoperative Measurement from Stereoimages
DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics
How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders
To be or not to be? an exploration of continuously controllable prompt engineering
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Model Checking for Closed-Loop Robot Reactive Planning
How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!
EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction on Mobile Devices
PixT3: Pixel-based Table To Text generation
Runtime Verification of Learning Properties for Reinforcement Learning Algorithms
Stacked Intelligent Metasurface-Aided MIMO Transceiver Design
Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation
Contribution Evaluation in Federated Learning: Examining Current Approaches
Fuel Saving Effect and Performance of Velocity Control for Modern Combustion-Powered Scooters
The Software Genome Project: Venture to the Genomic Pathways of Open Source Software and Its Applications
A Physics-Informed Neural Network approach for compartmental epidemiological models
Match and Locate: low-frequency monocular odometry based on deep feature matching
A characterization of efficiently compilable constraint languages
strong blockwise decomposability'' and show that if a constraint language has this property, we can compute DNNF representations of linear size. For all other constraint languages we construct families of CSP-instances that provably require DNNFs of exponential size. For a subclass of
strong uniformly blockwise decomposable'' constraint languages we obtain a similar dichotomy for structured DNNFs. In fact, strong (uniform) blockwise decomposability even allows efficient compilation into multi-valued analogs of OBDDs and FBDDs, respectively. Thus, we get complete characterizations for all knowledge compilation classes between O(B)DDs and DNNFs.Visual Environment Assessment for Safe Autonomous Quadrotor Landing
A Computationally Efficient Sparsified Online Newton Method
JaxMARL: Multi-Agent RL Environments in JAX
Adaptive Shells for Efficient Neural Radiance Field Rendering
Keyword: faster
Improved Sparse Ising Optimization
A new stabilized time-spectral finite element solver for fast simulation of blood flow
Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation
Center Focusing Network for Real-Time LiDAR Panoptic Segmentation
Future Full-Ocean Deep SSPs Prediction based on Hierarchical Long Short-Term Memory Neural Networks
A Speed Odyssey for Deployable Quantization of LLMs
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
SynDiffix: More accurate synthetic structured data
LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters
Fast multiplication by two's complement addition of numbers represented as a set of polynomial radix 2 indexes, stored as an integer list for massively parallel computation
A Computationally Efficient Sparsified Online Newton Method
JaxMARL: Multi-Agent RL Environments in JAX
Keyword: mobile
Toward Ultra-Low-Power Remote Health Monitoring: An Optimal and Adaptive Compressed Sensing Framework for Activity Recognition
FedCode: Communication-Efficient Federated Learning via Transferring Codebooks
GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production
Adaptive Interventions with User-Defined Goals for Health Behavior Change
Modelling daily mobility using mobile data traffic at fine spatiotemporal scale
Learning effects in variable autonomy human-robot systems: how much training is enough?
EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction on Mobile Devices
Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation
LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters
Dynamic modeling of wing-assisted inclined running with a morphing multi-modal robot
Keyword: pruning
FedCode: Communication-Efficient Federated Learning via Transferring Codebooks
Lighter, yet More Faithful: Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
Investigating the Impact of Weight Sharing Decisions on Knowledge Transfer in Continual Learning
SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring
CRISPR: Eliminating Bias Neurons from an Instruction-following Language Model
How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?
Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
Keyword: diffusion
Scalable Diffusion for Materials Generation
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Disentangling the Potential Impacts of Papers into Diffusion, Conformity, and Contribution Values
FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier
Generative AI-Based Probabilistic Constellation Shaping With Diffusion Models
denoise-and-generate'' characteristics of denoising diffusion probabilistic models (DDPM) for probabilistic constellation shaping. The key idea is to learn generating constellation symbols out of noise,
mimicking'' the way the receiver performs symbol reconstruction. This way, we make the constellation symbols sent by the transmitter, and what is inferred (reconstructed) at the receiver become as similar as possible, resulting in as few mismatches as possible. Our results show that the generative AI-based scheme outperforms deep neural network (DNN)-based benchmark and uniform shaping, while providing network resilience as well as robust out-of-distribution performance under low-SNR regimes and non-Gaussian assumptions. Numerical evaluations highlight 30% improvement in terms of cosine similarity and a threefold improvement in terms of mutual information compared to DNN-based approach for 64-QAM geometry.Privacy Threats in Stable Diffusion Models
Synthetically Enhanced: Unveiling Synthetic Data's Potential in Medical Imaging Research
MDFL: Multi-domain Diffusion-driven Feature Learning
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
DECDM: Document Enhancement using Cycle-Consistent Diffusion Models
What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization
DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics
Scene Text Image Super-resolution based on Text-conditional Diffusion Models
Diffusion-Augmented Neural Processes
DSR-Diff: Depth Map Super-Resolution with Diffusion Model
TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
Keyword: adaptive
Toward Ultra-Low-Power Remote Health Monitoring: An Optimal and Adaptive Compressed Sensing Framework for Activity Recognition
Adversarially Robust Spiking Neural Networks Through Conversion
Flexible and Adaptive Manufacturing by Complementing Knowledge Representation, Reasoning and Planning with Reinforcement Learning
Local spline refinement driven by fault jump estimates for scattered data approximation
Software-Defined Virtual Synchronous Condenser
Adaptive Interventions with User-Defined Goals for Health Behavior Change
Real-Time Adaptive Neural Network on FPGA: Enhancing Adaptability through Dynamic Classifier Selection
FunctionMarker: Watermarking Language Datasets via Knowledge Injection
Scalable and Adaptively Secure Any-Trust Distributed Key Generation and All-hands Checkpointing
Evolving Domain Adaptation of Pretrained Language Models for Text Classification
Gradient-Map-Guided Adaptive Domain Generalization for Cross Modality MRI Segmentation
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Fuel Saving Effect and Performance of Velocity Control for Modern Combustion-Powered Scooters
LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters
From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning
Keyword: quantization
A Speed Odyssey for Deployable Quantization of LLMs
How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models?
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization