Abstract
Stochastic computing is a paradigm in which logical operations are performed on randomly generated bit streams. Complex arithmetic operations can be executed by simple logic circuits, resulting in a much smaller area footprint compared to conventional binary counterparts. However, the random or pseudorandom sources required for generating the bit streams are costly in terms of area and offset the advantages. Additionally, due to the inherent randomness, the computation lacks precision, limiting the applicability of this paradigm. Importantly, achieving reasonable accuracy in stochastic computing involves high latency. Recently, deterministic approaches to stochastic computing have been proposed, demonstrating that randomness is not a requirement. By structuring the computation deterministically, exact results can be obtained, and the latency greatly reduced. The bit stream generated adheres to a "unary" encoding, retaining the non-positional nature of the bits while discarding the random bit generation of traditional stochastic computing. This deterministic approach overcomes many drawbacks of stochastic computing, although the latency increases quadratically with each level of logic, becoming unmanageable beyond a few levels. In this paper, we present a method for approximating the results of the deterministic method while maintaining low latency at each level. This improvement comes at the cost of additional logic, but we demonstrate that the increase in area scales with the square root of n, where n represents the equivalent number of binary bits of precision. Our new approach is general, efficient, composable, and applicable to all arithmetic operations performed with stochastic logic. We show that this approach outperforms other stochastic designs for matrix multiplication (dot-product), which is an integral step in nearly all machine learning algorithms.
Joint Computing Offloading and Resource Allocation for Classification Intelligent Tasks in MEC Systems
Abstract
Mobile edge computing (MEC) enables low-latency and high-bandwidth applications by bringing computation and data storage closer to end-users. Intelligent computing is an important application of MEC, where computing resources are used to solve intelligent task-related problems based on task requirements. However, efficiently offloading computing and allocating resources for intelligent tasks in MEC systems is a challenging problem due to complex interactions between task requirements and MEC resources. To address this challenge, we investigate joint computing offloading and resource allocation for intelligent tasks in MEC systems. Our goal is to optimize system utility by jointly considering computing accuracy and task delay to achieve maximum system performance. We focus on classification intelligence tasks and formulate an optimization problem that considers both the accuracy requirements of tasks and the parallel computing capabilities of MEC systems. To solve the optimization problem, we decompose it into three subproblems: subcarrier allocation, computing capacity allocation, and compression offloading. We use convex optimization and successive convex approximation to derive closed-form expressions for the subcarrier allocation, offloading decisions, computing capacity, and compressed ratio. Based on our solutions, we design an efficient computing offloading and resource allocation algorithm for intelligent tasks in MEC systems. Our simulation results demonstrate that our proposed algorithm significantly improves the performance of intelligent tasks in MEC systems and achieves a flexible trade-off between system revenue and cost considering intelligent tasks compared with the benchmarks.
Sparse Graphical Linear Dynamical Systems
Authors: Emilie Chouzenoux, Victor Elvira
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO)
Abstract
Time-series datasets are central in numerous fields of science and engineering, such as biomedicine, Earth observation, and network analysis. Extensive research exists on state-space models (SSMs), which are powerful mathematical tools that allow for probabilistic and interpretable learning on time series. Estimating the model parameters in SSMs is arguably one of the most complicated tasks, and the inclusion of prior knowledge is known to both ease the interpretation but also to complicate the inferential tasks. Very recent works have attempted to incorporate a graphical perspective on some of those model parameters, but they present notable limitations that this work addresses. More generally, existing graphical modeling tools are designed to incorporate either static information, focusing on statistical dependencies among independent random variables (e.g., graphical Lasso approach), or dynamic information, emphasizing causal relationships among time series samples (e.g., graphical Granger approaches). However, there are no joint approaches combining static and dynamic graphical modeling within the context of SSMs. This work proposes a novel approach to fill this gap by introducing a joint graphical modeling framework that bridges the static graphical Lasso model and a causal-based graphical approach for the linear-Gaussian SSM. We present DGLASSO (Dynamic Graphical Lasso), a new inference method within this framework that implements an efficient block alternating majorization-minimization algorithm. The algorithm's convergence is established by departing from modern tools from nonlinear analysis. Experimental validation on synthetic and real weather variability data showcases the effectiveness of the proposed model and inference algorithm.
LFA-tuned matrix-free multigrid method for the elastic Helmholtz equation
Abstract
We present an efficient matrix-free geometric multigrid method for the elastic Helmholtz equation, and a suitable discretization. Many discretization methods had been considered in the literature for the Helmholtz equations, as well as many solvers and preconditioners, some of which are adapted for the elastic version of the equation. However, there is very little work considering the reciprocity of discretization and a solver. In this work, we aim to bridge this gap. By choosing an appropriate stencil for re-discretization of the equation on the coarse grid, we develop a multigrid method that can be easily implemented as matrix-free, relying on stencils rather than sparse matrices. This is crucial for efficient implementation on modern hardware. Using two-grid local Fourier analysis, we validate the compatibility of our discretization with our solver, and tune a choice of weights for the stencil for which the convergence rate of the multigrid cycle is optimal. It results in a scalable multigrid preconditioner that can tackle large real-world 3D scenarios.
Recovery of Multiple Parameters in Subdiffusion from One Lateral Boundary Measurement
Abstract
This work is concerned with numerically recovering multiple parameters simultaneously in the subdiffusion model from one single lateral measurement on a part of the boundary, while in an incompletely known medium. We prove that the boundary measurement corresponding to a fairly general boundary excitation uniquely determines the order of the fractional derivative and the polygonal support of the diffusion coefficient, without knowing either the initial condition or the source. The uniqueness analysis further inspires the development of a robust numerical algorithm for recovering the fractional order and diffusion coefficient. The proposed algorithm combines small-time asymptotic expansion, analytic continuation of the solution and the level set method. We present extensive numerical experiments to illustrate the feasibility of the simultaneous recovery. In addition, we discuss the uniqueness of recovering general diffusion and potential coefficients from one single partial boundary measurement, when the boundary excitation is more specialized.
ADASSM: Adversarial Data Augmentation in Statistical Shape Models From Images
Authors: Mokshagna Sai Teja Karanam, Tushar Kataria, Shireen Elhabian
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Statistical shape models (SSM) have been well-established as an excellent tool for identifying variations in the morphology of anatomy across the underlying population. Shape models use consistent shape representation across all the samples in a given cohort, which helps to compare shapes and identify the variations that can detect pathologies and help in formulating treatment plans. In medical imaging, computing these shape representations from CT/MRI scans requires time-intensive preprocessing operations, including but not limited to anatomy segmentation annotations, registration, and texture denoising. Deep learning models have demonstrated exceptional capabilities in learning shape representations directly from volumetric images, giving rise to highly effective and efficient Image-to-SSM. Nevertheless, these models are data-hungry and due to the limited availability of medical data, deep learning models tend to overfit. Offline data augmentation techniques, that use kernel density estimation based (KDE) methods for generating shape-augmented samples, have successfully aided Image-to-SSM networks in achieving comparable accuracy to traditional SSM methods. However, these augmentation methods focus on shape augmentation, whereas deep learning models exhibit texture bias results in sub-optimal models. This paper introduces a novel strategy for on-the-fly data augmentation for the Image-to-SSM framework by leveraging data-dependent noise generation or texture augmentation. The proposed framework is trained as an adversary to the Image-to-SSM network, augmenting diverse and challenging noisy samples. Our approach achieves improved accuracy by encouraging the model to focus on the underlying geometry rather than relying solely on pixel values.
Physics-Infused Machine Learning Based Prediction of VTOL Aerodynamics with Sparse Datasets
Authors: Manaswin Oddiraju, Divyang Amin, Michael Piedmonte, Souma Chowdhury
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Complex optimal design and control processes often require repeated evaluations of expensive objective functions and consist of large design spaces. Data-driven surrogates such as neural networks and Gaussian processes provide an attractive alternative to simulations and are utilized frequently to represent these objective functions in optimization. However, pure data-driven models, due to a lack of adherence to basic physics laws and constraints, are often poor at generalizing and extrapolating. This is particularly the case, when training occurs over sparse high-fidelity datasets. A class of Physics-infused machine learning (PIML) models integrate ML models with low-fidelity partial physics models to improve generalization performance while retaining computational efficiency. This paper presents two potential approaches for Physics infused modelling of aircraft aerodynamics which incorporate Artificial Neural Networks with a low-fidelity Vortex Lattice Method model with blown wing effects (BLOFI) to improve prediction performance while also keeping the computational cost tractable. This paper also develops an end-to-end auto differentiable open-source framework that enables efficient training of such hybrid models. These two PIML modelling approaches are then used to predict the aerodynamic coefficients of a 6 rotor eVTOL aircraft given its control parameters and flight conditions. The models are trained on a sparse high-fidelity dataset generated using a CHARM model. The trained models are then compared against the vanilla low-fidelity model and a standard pure data-driven ANN. Our results show that one of the proposed architectures outperforms all the other models at a nominal increase in run time. These results are promising and pave way for PIML frameworks which can generalize over different aircraft and configurations thereby significantly reducing costs of design and control.
Efficient parallel implementation of the multiplicative weight update method for graph-based linear programs
Abstract
Positive linear programs (LPs) model many graph and operations research problems. One can solve for a $(1+\epsilon)$-approximation for positive LPs, for any selected $\epsilon$, in polylogarithmic depth and near-linear work via variations of the multiplicative weight update (MWU) method. Despite extensive theoretical work on these algorithms through the decades, their empirical performance is not well understood. In this work, we implement and test an efficient parallel algorithm for solving positive LP relaxations, and apply it to graph problems such as densest subgraph, bipartite matching, vertex cover and dominating set. We accelerate the algorithm via a new step size search heuristic. Our implementation uses sparse linear algebra optimization techniques such as fusion of vector operations and use of sparse format. Furthermore, we devise an implicit representation for graph incidence constraints. We demonstrate the parallel scalability with the use of threading OpenMP and MPI on the Stampede2 supercomputer. We compare this implementation with exact libraries and specialized libraries for the above problems in order to evaluate MWU's practical standing for both accuracy and performance among other methods. Our results show this implementation is faster than general purpose LP solvers (IBM CPLEX, Gurobi) in all of our experiments, and in some instances, outperforms state-of-the-art specialized parallel graph algorithms.
Deciphering the Drivers of Smart Livestock Technology Adoption in Japan: A Scoping Review, Expert Interviews, and Grounded Theory Approach
Abstract
With global demand for animal products projected to increase significantly by 2050, understanding the factors that influence the adoption of smart livestock technologies has become increasingly crucial. Conducted within the unique agricultural context of Japan, our study builds upon traditional theoretical frameworks that often oversimplify farmers' decision-making processes. By employing a scoping review, expert interviews, and a Modified Grounded Theory Approach, our research uncovers the intricate interplay between individual farmer values, farm management policies, social relations, agricultural policies, and livestock industry trends. We particularly highlight the unique dynamics within family-owned businesses, noting the tension between an "advanced management mindset" and "conservatism." Our study underscores technology adoption's sequential and iterative nature, intricately tied to technology availability, farmers' digital literacy, technology implementation support, and observable technology impacts on animal health and productivity. Despite certain limitations, our findings carry profound implications for stakeholders, providing valuable insights to overcome adoption barriers and advocating for more sustainable, efficient, and animal welfare-oriented livestock production systems. This research establishes a solid foundation for future explorations into smart livestock technology adoption.
Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels
Authors: Nick Alger, Tucker Hartland, Noemi Petra, Omar Ghattas
Abstract
We present an efficient matrix-free point spread function (PSF) method for approximating operators that have locally supported non-negative integral kernels. The method computes impulse responses of the operator at scattered points, and interpolates these impulse responses to approximate integral kernel entries. Impulse responses are computed by applying the operator to Dirac comb batches of point sources, which are chosen by solving an ellipsoid packing problem. Evaluation of kernel entries allows us to construct a hierarchical matrix (H-matrix) approximation of the operator. Further matrix computations are performed with H-matrix methods. We use the method to build preconditioners for the Hessian operator in two inverse problems governed by partial differential equations (PDEs): inversion for the basal friction coefficient in an ice sheet flow problem and for the initial condition in an advective-diffusive transport problem. While for many ill-posed inverse problems the Hessian of the data misfit term exhibits a low rank structure, and hence a low rank approximation is suitable, for many problems of practical interest the numerical rank of the Hessian is still large. But Hessian impulse responses typically become more local as the numerical rank increases, which benefits the PSF method. Numerical results reveal that the PSF preconditioner clusters the spectrum of the preconditioned Hessian near one, yielding roughly 5x-10x reductions in the required number of PDE solves, as compared to regularization preconditioning and no preconditioning. We also present a numerical study for the influence of various parameters (that control the shape of the impulse responses) on the effectiveness of the advection-diffusion Hessian approximation. The results show that the PSF-based preconditioners are able to form good approximations of high-rank Hessians using a small number of operator applications.
Distilled Pruning: Using Synthetic Data to Win the Lottery
Abstract
This work introduces a novel approach to pruning deep learning models by using distilled data. Unlike conventional strategies which primarily focus on architectural or algorithmic optimization, our method reconsiders the role of data in these scenarios. Distilled datasets capture essential patterns from larger datasets, and we demonstrate how to leverage this capability to enable a computationally efficient pruning process. Our approach can find sparse, trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Authors: Chunhui Zhang, Xin Sun, Li Liu, Yiqian Yang, Qiong Liu, Xi Zhou, Yanfeng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Current mainstream vision-language (VL) tracking framework consists of three parts, \ie a visual feature extractor, a language feature extractor, and a fusion model. To pursue better performance, a natural modus operandi for VL tracking is employing customized and heavier unimodal encoders, and multi-modal fusion models. Albeit effective, existing VL trackers separate feature extraction and feature integration, resulting in extracted features that lack semantic guidance and have limited target-aware capability in complex scenarios, \eg similar distractors and extreme illumination. In this work, inspired by the recent success of exploring foundation models with unified architecture for both natural language and computer vision tasks, we propose an All-in-One framework, which learns joint feature extraction and interaction by adopting a unified transformer backbone. Specifically, we mix raw vision and language signals to generate language-injected vision tokens, which we then concatenate before feeding into the unified backbone architecture. This approach achieves feature integration in a unified backbone, removing the need for carefully-designed fusion modules and resulting in a more effective and efficient VL tracking framework. To further improve the learning efficiency, we introduce a multi-modal alignment module based on cross-modal and intra-modal contrastive objectives, providing more reasonable representations for the unified All-in-One transformer backbone. Extensive experiments on five benchmarks, \ie OTB99-L, TNL2K, LaSOT, LaSOT$_{\rm Ext}$ and WebUAV-3M, demonstrate the superiority of the proposed tracker against existing state-of-the-arts on VL tracking. Codes will be made publicly available.
Efficient Ground Vehicle Path Following in Game AI
Authors: Rodrigue de Schaetzen, Alessandro Sestini
Abstract
This short paper presents an efficient path following solution for ground vehicles tailored to game AI. Our focus is on adapting established techniques to design simple solutions with parameters that are easily tunable for an efficient benchmark path follower. Our solution pays particular attention to computing a target speed which uses quadratic Bezier curves to estimate the path curvature. The performance of the proposed path follower is evaluated through a variety of test scenarios in a first-person shooter game, demonstrating its effectiveness and robustness in handling different types of paths and vehicles. We achieved a 70% decrease in the total number of stuck events compared to an existing path following solution.
On Formal Feature Attribution and Its Approximation
Authors: Jinqiang Yu, Alexey Ignatiev, Peter J. Stuckey
Abstract
Recent years have witnessed the widespread use of artificial intelligence (AI) algorithms and machine learning (ML) models. Despite their tremendous success, a number of vital problems like ML model brittleness, their fairness, and the lack of interpretability warrant the need for the active developments in explainable artificial intelligence (XAI) and formal ML model verification. The two major lines of work in XAI include feature selection methods, e.g. Anchors, and feature attribution techniques, e.g. LIME and SHAP. Despite their promise, most of the existing feature selection and attribution approaches are susceptible to a range of critical issues, including explanation unsoundness and out-of-distribution sampling. A recent formal approach to XAI (FXAI) although serving as an alternative to the above and free of these issues suffers from a few other limitations. For instance and besides the scalability limitation, the formal approach is unable to tackle the feature attribution problem. Additionally, a formal explanation despite being formally sound is typically quite large, which hampers its applicability in practical settings. Motivated by the above, this paper proposes a way to apply the apparatus of formal XAI to the case of feature attribution based on formal explanation enumeration. Formal feature attribution (FFA) is argued to be advantageous over the existing methods, both formal and non-formal. Given the practical complexity of the problem, the paper then proposes an efficient technique for approximating exact FFA. Finally, it offers experimental evidence of the effectiveness of the proposed approximate FFA in comparison to the existing feature attribution algorithms not only in terms of feature importance and but also in terms of their relative order.
Teaching Arithmetic to Small Transformers
Authors: Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos
Abstract
Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction objective for rapidly eliciting arithmetic capabilities.
A Network Resource Allocation Recommendation Method with An Improved Similarity Measure
Abstract
Recommender systems have been acknowledged as efficacious tools for managing information overload. Nevertheless, conventional algorithms adopted in such systems primarily emphasize precise recommendations and, consequently, overlook other vital aspects like the coverage, diversity, and novelty of items. This approach results in less exposure for long-tail items. In this paper, to personalize the recommendations and allocate recommendation resources more purposively, a method named PIM+RA is proposed. This method utilizes a bipartite network that incorporates self-connecting edges and weights. Furthermore, an improved Pearson correlation coefficient is employed for better redistribution. The evaluation of PIM+RA demonstrates a significant enhancement not only in accuracy but also in coverage, diversity, and novelty of the recommendation. It leads to a better balance in recommendation frequency by providing effective exposure to long-tail items, while allowing customized parameters to adjust the recommendation list bias.
Unsupervised Hyperspectral and Multispectral Images Fusion Based on the Cycle Consistency
Authors: Shuaikai Shi, Lijun Zhang, Yoann Altmann, Jie Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Hyperspectral images (HSI) with abundant spectral information reflected materials property usually perform low spatial resolution due to the hardware limits. Meanwhile, multispectral images (MSI), e.g., RGB images, have a high spatial resolution but deficient spectral signatures. Hyperspectral and multispectral image fusion can be cost-effective and efficient for acquiring both high spatial resolution and high spectral resolution images. Many of the conventional HSI and MSI fusion algorithms rely on known spatial degradation parameters, i.e., point spread function, spectral degradation parameters, spectral response function, or both of them. Another class of deep learning-based models relies on the ground truth of high spatial resolution HSI and needs large amounts of paired training images when working in a supervised manner. Both of these models are limited in practical fusion scenarios. In this paper, we propose an unsupervised HSI and MSI fusion model based on the cycle consistency, called CycFusion. The CycFusion learns the domain transformation between low spatial resolution HSI (LrHSI) and high spatial resolution MSI (HrMSI), and the desired high spatial resolution HSI (HrHSI) are considered to be intermediate feature maps in the transformation networks. The CycFusion can be trained with the objective functions of marginal matching in single transform and cycle consistency in double transforms. Moreover, the estimated PSF and SRF are embedded in the model as the pre-training weights, which further enhances the practicality of our proposed model. Experiments conducted on several datasets show that our proposed model outperforms all compared unsupervised fusion methods. The codes of this paper will be available at this address: https: //github.com/shuaikaishi/CycFusion for reproducibility.
Abstract
3D facial avatar reconstruction has been a significant research topic in computer graphics and computer vision, where photo-realistic rendering and flexible controls over poses and expressions are necessary for many related applications. Recently, its performance has been greatly improved with the development of neural radiance fields (NeRF). However, most existing NeRF-based facial avatars focus on subject-specific reconstruction and reenactment, requiring multi-shot images containing different views of the specific subject for training, and the learned model cannot generalize to new identities, limiting its further applications. In this work, we propose a one-shot 3D facial avatar reconstruction framework that only requires a single source image to reconstruct a high-fidelity 3D facial avatar. For the challenges of lacking generalization ability and missing multi-view information, we leverage the generative prior of 3D GAN and develop an efficient encoder-decoder network to reconstruct the canonical neural volume of the source image, and further propose a compensation network to complement facial details. To enable fine-grained control over facial dynamics, we propose a deformation field to warp the canonical volume into driven expressions. Through extensive experimental comparisons, we achieve superior synthesis results compared to several state-of-the-art methods.
A GPU-accelerated simulator for the DEM analysis of granular systems composed of clump-shaped elements
Authors: Ruochun Zhang, Colin Vanden Heuvel, Alexander Schepelmann, Arno Rogg, Dimitrios Apostolopoulos, Samuel Chandler, Radu Serban, Dan Negrut
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
We discuss the use of the Discrete Element Method (DEM) to simulate the dynamics of granular systems made up of elements with nontrivial geometries. The DEM simulator is GPU accelerated and can handle elements whose shape is defined as the union with overlap of diverse sets of spheres with user-specified radii. The simulator can also handle complex materials since each sphere in an element can have its own Young's modulus $E$, Poisson ratio $\nu$, friction coefficient $\mu$, and coefficient of restitution CoR. To demonstrate the simulator, we produce a "digital simulant" (DS), a replica of the GRC-1 lunar simulant. The DS follows an element size distribution similar but not identical to that of GRC-1. We validate the predictive attributes of the simulator via several numerical experiments: repose angle, cone penetration, drawbar pull, and rover incline-climbing tests. Subsequently, we carry out a sensitivity analysis to gauge how the slope vs. slip curves change when the element shape, element size, and friction coefficient change. The paper concludes with a VIPER rover simulation that confirms a recently proposed granular scaling law. The simulation involves more than 11 million elements composed of more than 34 million spheres of different radii. The simulator works in the Chrono framework and utilizes two GPUs concurrently. The GPU code for the simulator and all numerical experiments discussed are open-source and available on GitHub for reproducibility studies and unfettered use and distribution.
Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning
Authors: Seungyong Moon, Junyoung Yeom, Bumsoo Park, Hyun Oh Song
Abstract
Discovering achievements with a hierarchical structure on procedurally generated environments poses a significant challenge. This requires agents to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods are built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be beneficial for learning hierarchical achievements. However, these methods require an excessive amount of environment interactions or large model sizes, limiting their practicality. In this work, we identify that proximal policy optimization (PPO), a simple and versatile model-free algorithm, outperforms the prior methods with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, though with low confidence. Based on this observation, we propose a novel contrastive learning method, called achievement distillation, that strengthens the agent's capability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment using fewer model parameters in a sample-efficient regime.
ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
Authors: Gamze İslamoğlu (1), Moritz Scherer (1), Gianna Paulin (1), Tim Fischer (1), Victor J.B. Jung (1), Angelo Garofalo (1 and 2), Luca Benini (1 and 2) ((1) ETH Zürich, (2) University of Bologna)
Abstract
Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movement and energy consumption. ITA achieves competitive energy efficiency with respect to state-of-the-art transformer accelerators with 16.9 TOPS/W, while outperforming them in area efficiency with 5.93 TOPS/mm$^2$ in 22 nm fully-depleted silicon-on-insulator technology at 0.8 V.
Improving the accuracy of Raviart-Thomas mixed elements in two-dimensional smooth domains with straight-edged triangles
Abstract
Several physical problems modeled by second-order partial differential equations can be efficiently solved using mixed finite elements of the Raviart-Thomas family for N-simplexes, introduced in the seventies. In case Neumann conditions are prescribed on a curvilinear boundary, the normal component of the flux variable should preferably not take up values at nodes shifted to the boundary of the approximating polytope in the corresponding normal direction. This is because the method's accuracy downgrades, which was shown in \cite{FBRT}. In that work an order-preserving technique was studied, based on a parametric version of these elements with curved simplexes. In this paper an alternative with straight-edged triangles for two-dimensional problems is proposed. The key point of this method is a Petrov-Galerkin formulation of the mixed problem, in which the test-flux space is a little different from the shape-flux space. After carrying out a well-posedness and stability analysis, error estimates of optimal order are proven.
Derivative Free Weight-space Ensembling
Authors: Dean Ninalga
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.
Incentive Allocation in Vertical Federated Learning Based on Bankruptcy Problem
Authors: Afsana Khan, Marijn ten Thij, Frank Thuijsman, Anna Wilbik
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)
Abstract
Vertical federated learning (VFL) is a promising approach for collaboratively training machine learning models using private data partitioned vertically across different parties. Ideally in a VFL setting, the active party (party possessing features of samples with labels) benefits by improving its machine learning model through collaboration with some passive parties (parties possessing additional features of the same samples without labels) in a privacy preserving manner. However, motivating passive parties to participate in VFL can be challenging. In this paper, we focus on the problem of allocating incentives to the passive parties by the active party based on their contributions to the VFL process. We formulate this problem as a variant of the Nucleolus game theory concept, known as the Bankruptcy Problem, and solve it using the Talmud's division rule. We evaluate our proposed method on synthetic and real-world datasets and show that it ensures fairness and stability in incentive allocation among passive parties who contribute their data to the federated model. Additionally, we compare our method to the existing solution of calculating Shapley values and show that our approach provides a more efficient solution with fewer computations.
Improved Algorithms for White-Box Adversarial Streams
Abstract
We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.
Joint Perceptual Learning for Enhancement and Object Detection in Underwater Scenarios
Abstract
Underwater degraded images greatly challenge existing algorithms to detect objects of interest. Recently, researchers attempt to adopt attention mechanisms or composite connections for improving the feature representation of detectors. However, this solution does \textit{not} eliminate the impact of degradation on image content such as color and texture, achieving minimal improvements. Another feasible solution for underwater object detection is to develop sophisticated deep architectures in order to enhance image quality or features. Nevertheless, the visually appealing output of these enhancement modules do \textit{not} necessarily generate high accuracy for deep detectors. More recently, some multi-task learning methods jointly learn underwater detection and image enhancement, accessing promising improvements. Typically, these methods invoke huge architecture and expensive computations, rendering inefficient inference. Definitely, underwater object detection and image enhancement are two interrelated tasks. Leveraging information coming from the two tasks can benefit each task. Based on these factual opinions, we propose a bilevel optimization formulation for jointly learning underwater object detection and image enhancement, and then unroll to a dual perception network (DPNet) for the two tasks. DPNet with one shared module and two task subnets learns from the two different tasks, seeking a shared representation. The shared representation provides more structural details for image enhancement and rich content information for object detection. Finally, we derive a cooperative training strategy to optimize parameters for DPNet. Extensive experiments on real-world and synthetic underwater datasets demonstrate that our method outputs visually favoring images and higher detection accuracy.
Edge Element Approximation for the Spherical Interface Dynamo System
Abstract
Exploring the origin and properties of magnetic fields is crucial to the development of many fields such as physics, astronomy and meteorology. We focus on the edge element approximation and theoretical analysis of celestial dynamo system with quasi-vacuum boundary conditions. The system not only ensures that the magnetic field on the spherical shell is generated from the dynamo model, but also provides convenience for the application of the edge element method. We demonstrate the existence, uniqueness and stability of the solution to the system by the fixed point theorem. Then, we approximate the system using the edge element method, which is more efficient in dealing with electromagnetic field problems. Moreover, we also discuss the stability of the corresponding discrete scheme. And the convergence is demonstrated by later numerical tests. Finally, we simulate the three-dimensional time evolution of the spherical interface dynamo model, and the characteristics of the simulated magnetic field are consistent with existing work.
The impact of body and head dynamics on motion comfort assessment
Abstract
Head motion is a key determinant of motion comfort and differs substantially from seat motion due to seat and body compliance and dynamic postural stabilization. This paper compares different human body model fidelities to transmit seat accelerations to the head for the assessment of motion comfort through simulations. Six-degree of freedom dynamics were analyzed using frequency response functions derived from an advanced human model (AHM), a computationally efficient human model (EHM) and experimental studies. Simulations of dynamic driving show that human models strongly affected the predicted ride comfort (increased up to a factor 3). Furthermore, they modestly affected sickness using the available filters from the literature and ISO-2631 (increased up to 30%), but more strongly affected sickness predicted by the subjective vertical conflict (SVC) model (increased up to 70%).
Online Network Source Optimization with Graph-Kernel MAB
Abstract
We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.
Simulation-free Schrödinger bridges via score and flow matching
Authors: Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, Yoshua Bengio
Abstract
We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired source and target samples drawn from arbitrary distributions. Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous normalizing flows. [SF]$^2$M interprets continuous-time stochastic generative modeling as a Schr\"odinger bridge (SB) problem. It relies on static entropy-regularized optimal transport, or a minibatch approximation, to efficiently learn the SB without simulating the learned stochastic process. We find that [SF]$^2$M is more efficient and gives more accurate solutions to the SB problem than simulation-based methods from prior work. Finally, we apply [SF]$^2$M to the problem of learning cell dynamics from snapshot data. Notably, [SF]$^2$M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks from simulated data.
Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages
Authors: Shreyanth S
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
By combining the undecimated wavelet transform within a Word Embedded Semantic Marginal Autoencoder (WESMA), this research study provides a novel strategy for improving security measures and denoising multiple languages. The incorporation of these strategies is intended to address the issues of robustness, privacy, and multilingualism in data processing applications. The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns and structural qualities in the input data. The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data by employing this transform. This improves security measures by increasing the system's ability to detect abnormalities, discover hidden patterns, and distinguish between legitimate content and dangerous threats. The Word Embedded Semantic Marginal Autoencoder also functions as an intelligent framework for dimensionality and noise reduction. The autoencoder effectively learns the underlying semantics of the data and reduces noise components by exploiting word embeddings and semantic context. As a result, data quality and accuracy are increased in following processing stages. The suggested methodology is tested using a diversified dataset that includes several languages and security scenarios. The experimental results show that the proposed approach is effective in attaining security enhancement and denoising capabilities across multiple languages. The system is strong in dealing with linguistic variances, producing consistent outcomes regardless of the language used. Furthermore, incorporating the undecimated wavelet transform considerably improves the system's ability to efficiently address complex security concerns
Keyword: faster
Efficient parallel implementation of the multiplicative weight update method for graph-based linear programs
Abstract
Positive linear programs (LPs) model many graph and operations research problems. One can solve for a $(1+\epsilon)$-approximation for positive LPs, for any selected $\epsilon$, in polylogarithmic depth and near-linear work via variations of the multiplicative weight update (MWU) method. Despite extensive theoretical work on these algorithms through the decades, their empirical performance is not well understood. In this work, we implement and test an efficient parallel algorithm for solving positive LP relaxations, and apply it to graph problems such as densest subgraph, bipartite matching, vertex cover and dominating set. We accelerate the algorithm via a new step size search heuristic. Our implementation uses sparse linear algebra optimization techniques such as fusion of vector operations and use of sparse format. Furthermore, we devise an implicit representation for graph incidence constraints. We demonstrate the parallel scalability with the use of threading OpenMP and MPI on the Stampede2 supercomputer. We compare this implementation with exact libraries and specialized libraries for the above problems in order to evaluate MWU's practical standing for both accuracy and performance among other methods. Our results show this implementation is faster than general purpose LP solvers (IBM CPLEX, Gurobi) in all of our experiments, and in some instances, outperforms state-of-the-art specialized parallel graph algorithms.
Distilled Pruning: Using Synthetic Data to Win the Lottery
Abstract
This work introduces a novel approach to pruning deep learning models by using distilled data. Unlike conventional strategies which primarily focus on architectural or algorithmic optimization, our method reconsiders the role of data in these scenarios. Distilled datasets capture essential patterns from larger datasets, and we demonstrate how to leverage this capability to enable a computationally efficient pruning process. Our approach can find sparse, trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.
RGB-D Mapping and Tracking in a Plenoxel Radiance Field
Authors: Andreas L. Teigen, Yeonsoo Park, Annette Stahl, Rudolf Mester
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Building on the success of Neural Radiance Fields (NeRFs), recent years have seen significant advances in the domain of novel view synthesis. These models capture the scene's volumetric radiance field, creating highly convincing dense photorealistic models through the use of simple, differentiable rendering equations. Despite their popularity, these algorithms suffer from severe ambiguities in visual data inherent to the RGB sensor, which means that although images generated with view synthesis can visually appear very believable, the underlying 3D model will often be wrong. This considerably limits the usefulness of these models in practical applications like Robotics and Extended Reality (XR), where an accurate dense 3D reconstruction otherwise would be of significant value. In this technical report, we present the vital differences between view synthesis models and 3D reconstruction models. We also comment on why a depth sensor is essential for modeling accurate geometry in general outward-facing scenes using the current paradigm of novel view synthesis methods. Focusing on the structure-from-motion task, we practically demonstrate this need by extending the Plenoxel radiance field model: Presenting an analytical differential approach for dense mapping and tracking with radiance fields based on RGB-D data without a neural network. Our method achieves state-of-the-art results in both the mapping and tracking tasks while also being faster than competing neural network-based approaches.
Improving Bitswap Privacy with Forwarding and Source Obfuscation
Authors: Erik Daniel, Marcel Ebert, Florian Tschorsch
Abstract
IPFS is a content-addressed decentralized peer-to-peer data network, using the Bitswap protocol for exchanging data. The data exchange leaks the information to all neighbors, compromising a user's privacy. This paper investigates the suitability of forwarding with source obfuscation techniques for improving the privacy of the Bitswap protocol. The usage of forwarding can add plausible deniability and the source obfuscation provides additional protection against passive observers. First results showed that through trickle-spreading the source prediction could decrease to 40 %, at the cost of an increased content fetching time. However, assuming short distances between content provider and consumer the content fetching time can be faster even with the additional source obfuscation.
MALIBO: Meta-learning for Likelihood-free Bayesian Optimization
Authors: Jiarong Pan, Stefan Falkner, Felix Berkenkamp, Joaquin Vanschoren
Abstract
Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.
Keyword: mobile
Joint Computing Offloading and Resource Allocation for Classification Intelligent Tasks in MEC Systems
Abstract
Mobile edge computing (MEC) enables low-latency and high-bandwidth applications by bringing computation and data storage closer to end-users. Intelligent computing is an important application of MEC, where computing resources are used to solve intelligent task-related problems based on task requirements. However, efficiently offloading computing and allocating resources for intelligent tasks in MEC systems is a challenging problem due to complex interactions between task requirements and MEC resources. To address this challenge, we investigate joint computing offloading and resource allocation for intelligent tasks in MEC systems. Our goal is to optimize system utility by jointly considering computing accuracy and task delay to achieve maximum system performance. We focus on classification intelligence tasks and formulate an optimization problem that considers both the accuracy requirements of tasks and the parallel computing capabilities of MEC systems. To solve the optimization problem, we decompose it into three subproblems: subcarrier allocation, computing capacity allocation, and compression offloading. We use convex optimization and successive convex approximation to derive closed-form expressions for the subcarrier allocation, offloading decisions, computing capacity, and compressed ratio. Based on our solutions, we design an efficient computing offloading and resource allocation algorithm for intelligent tasks in MEC systems. Our simulation results demonstrate that our proposed algorithm significantly improves the performance of intelligent tasks in MEC systems and achieves a flexible trade-off between system revenue and cost considering intelligent tasks compared with the benchmarks.
Facial Landmark Detection Evaluation on MOBIO Database
Authors: Na Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
MOBIO is a bi-modal database that was captured almost exclusively on mobile phones. It aims to improve research into deploying biometric techniques to mobile devices. Research has been shown that face and speaker recognition can be performed in a mobile environment. Facial landmark localization aims at finding the coordinates of a set of pre-defined key points for 2D face images. A facial landmark usually has specific semantic meaning, e.g. nose tip or eye centre, which provides rich geometric information for other face analysis tasks such as face recognition, emotion estimation and 3D face reconstruction. Pretty much facial landmark detection methods adopt still face databases, such as 300W, AFW, AFLW, or COFW, for evaluation, but seldomly use mobile data. Our work is first to perform facial landmark detection evaluation on the mobile still data, i.e., face images from MOBIO database. About 20,600 face images have been extracted from this audio-visual database and manually labeled with 22 landmarks as the groundtruth. Several state-of-the-art facial landmark detection methods are adopted to evaluate their performance on these data. The result shows that the data from MOBIO database is pretty challenging. This database can be a new challenging one for facial landmark detection evaluation.
Metropolitan Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories
Authors: Takahiro Yabe, Kota Tsubouchi, Toru Shimizu, Yoshihide Sekimoto, Kaoru Sezaki, Esteban Moro, Alex Pentland
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Abstract
Modeling and predicting human mobility trajectories in urban areas is an essential task for various applications. The recent availability of large-scale human movement data collected from mobile devices have enabled the development of complex human mobility prediction models. However, human mobility prediction methods are often trained and tested on different datasets, due to the lack of open-source large-scale human mobility datasets amid privacy concerns, posing a challenge towards conducting fair performance comparisons between methods. To this end, we created an open-source, anonymized, metropolitan scale, and longitudinal (90 days) dataset of 100,000 individuals' human mobility trajectories, using mobile phone location data. The location pings are spatially and temporally discretized, and the metropolitan area is undisclosed to protect users' privacy. The 90-day period is composed of 75 days of business-as-usual and 15 days during an emergency. To promote the use of the dataset, we will host a human mobility prediction data challenge (`HuMob Challenge 2023') using the human mobility dataset, which will be held in conjunction with ACM SIGSPATIAL 2023.
In A Society of Strangers, Kin Is Still Key: Identified Family Relations In Large-Scale Mobile Phone Data
Authors: Tamás Dávid-Barrett, Sebastian Diaz, Carlos Rodriguez-Sickert, Isabel Behncke, Anna Rotkirch, János Kertész, Loreto Bravo
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph); Quantitative Methods (q-bio.QM)
Abstract
Mobile call networks have been widely used to investigate communication patterns and the network of interactions of humans at the societal scale. Yet, more detailed analysis is often hindered by having no information about the nature of the relationships, even if some metadata about the individuals are available. Using a unique, large mobile phone database with information about individual surnames in a population in which people inherit two surnames: one from their father, and one from their mother, we are able to differentiate among close kin relationship types. Here we focus on the difference between the most frequently called alters depending on whether they are family relationships or not. We find support in the data for two hypotheses: (1) phone calls between family members are more frequent and last longer than phone calls between non-kin, and (2) the phone call pattern between family members show a higher variation depending on the stage of life-course compared to non-family members. We give an interpretation of these findings within the framework of evolutionary anthropology: kinship matters even when demographic processes, such as low fertility, urbanisation and migration reduce the access to family members. Furthermore, our results provide tools for distinguishing between different kinds of kin relationships from mobile call data, when information about names are unavailable.
ContextLabeler Dataset: physical and virtual sensors data collected from smartphone usage in-the-wild
Authors: Mattia Giovanni Campana, Franca Delmastro
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
This paper describes a data collection campaign and the resulting dataset derived from smartphone sensors characterizing the daily life activities of 3 volunteers in a period of two weeks. The dataset is released as a collection of CSV files containing more than 45K data samples, where each sample is composed by 1332 features related to a heterogeneous set of physical and virtual sensors, including motion sensors, running applications, devices in proximity, and weather conditions. Moreover, each data sample is associated with a ground truth label that describes the user activity and the situation in which she was involved during the sensing experiment (e.g., working, at restaurant, and doing sport activity). To avoid introducing any bias during the data collection, we performed the sensing experiment in-the-wild, that is, by using the volunteers' devices, and without defining any constraint related to the user's behavior. For this reason, the collected dataset represents a useful source of real data to both define and evaluate a broad set of novel context-aware solutions (both algorithms and protocols) that aim to adapt their behavior according to the changes in the user's situation in a mobile environment.
LTE SFBC MIMO Transmitter Modelling and Performance Evaluation
Authors: Gabriela Morillo, John Cosmas
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
High data rates are one of the most prevalent requirements in current mobile communications. To cover this and other high standards regarding performance, increasing coverage, capacity, and reliability, numerous works have proposed the development of systems employing the combination of several techniques such as Multiple Input Multiple Output (MIMO) wireless technologies with Orthogonal Frequency Division Multiplexing (OFDM) in the evolving 4G wireless communications. Our proposed system is based on the 2x2 MIMO antenna technique, which is defined to enhance the performance of radio communication systems in terms of capacity and spectral efficiency, and the OFDM technique, which can be implemented using two types of sub-carrier mapping modes: Space-Time Block Coding and Space Frequency Block Code. SFBC has been considered in our developed model. The main advantage of SFBC over STBC is that SFBC encodes two modulated symbols over two subcarriers of the same OFDM symbol, whereas STBC encodes two modulated symbols over two subcarriers of the same OFDM symbol; thus, the coding is performed in the frequency domain. Our solution aims to demonstrate the performance analysis of the Space Frequency Block Codes scheme, increasing the Signal Noise Ratio (SNR) at the receiver and decreasing the Bit Error Rate (BER) through the use of 4 QAM, 16 QAM and 64QAM modulation over a 2x2 MIMO channel for an LTE downlink transmission, in different channel radio environments. In this work, an analytical tool to evaluate the performance of SFBC - Orthogonal Frequency Division Multiplexing, using two transmit antennas and two receive antennas has been implemented, and the analysis using the average SNR has been considered as a sufficient statistic to describe the performance of SFBC in the 3GPP Long Term Evolution system over Multiple Input Multiple Output channels.
Keyword: pruning
Distilled Pruning: Using Synthetic Data to Win the Lottery
Abstract
This work introduces a novel approach to pruning deep learning models by using distilled data. Unlike conventional strategies which primarily focus on architectural or algorithmic optimization, our method reconsiders the role of data in these scenarios. Distilled datasets capture essential patterns from larger datasets, and we demonstrate how to leverage this capability to enable a computationally efficient pruning process. Our approach can find sparse, trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.
Keyword: diffusion
Recovery of Multiple Parameters in Subdiffusion from One Lateral Boundary Measurement
Abstract
This work is concerned with numerically recovering multiple parameters simultaneously in the subdiffusion model from one single lateral measurement on a part of the boundary, while in an incompletely known medium. We prove that the boundary measurement corresponding to a fairly general boundary excitation uniquely determines the order of the fractional derivative and the polygonal support of the diffusion coefficient, without knowing either the initial condition or the source. The uniqueness analysis further inspires the development of a robust numerical algorithm for recovering the fractional order and diffusion coefficient. The proposed algorithm combines small-time asymptotic expansion, analytic continuation of the solution and the level set method. We present extensive numerical experiments to illustrate the feasibility of the simultaneous recovery. In addition, we discuss the uniqueness of recovering general diffusion and potential coefficients from one single partial boundary measurement, when the boundary excitation is more specialized.
Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels
Authors: Nick Alger, Tucker Hartland, Noemi Petra, Omar Ghattas
Abstract
We present an efficient matrix-free point spread function (PSF) method for approximating operators that have locally supported non-negative integral kernels. The method computes impulse responses of the operator at scattered points, and interpolates these impulse responses to approximate integral kernel entries. Impulse responses are computed by applying the operator to Dirac comb batches of point sources, which are chosen by solving an ellipsoid packing problem. Evaluation of kernel entries allows us to construct a hierarchical matrix (H-matrix) approximation of the operator. Further matrix computations are performed with H-matrix methods. We use the method to build preconditioners for the Hessian operator in two inverse problems governed by partial differential equations (PDEs): inversion for the basal friction coefficient in an ice sheet flow problem and for the initial condition in an advective-diffusive transport problem. While for many ill-posed inverse problems the Hessian of the data misfit term exhibits a low rank structure, and hence a low rank approximation is suitable, for many problems of practical interest the numerical rank of the Hessian is still large. But Hessian impulse responses typically become more local as the numerical rank increases, which benefits the PSF method. Numerical results reveal that the PSF preconditioner clusters the spectrum of the preconditioned Hessian near one, yielding roughly 5x-10x reductions in the required number of PDE solves, as compared to regularization preconditioning and no preconditioning. We also present a numerical study for the influence of various parameters (that control the shape of the impulse responses) on the effectiveness of the advection-diffusion Hessian approximation. The results show that the PSF-based preconditioners are able to form good approximations of high-rank Hessians using a small number of operator applications.
Simulation-free Schrödinger bridges via score and flow matching
Authors: Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, Yoshua Bengio
Abstract
We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired source and target samples drawn from arbitrary distributions. Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous normalizing flows. [SF]$^2$M interprets continuous-time stochastic generative modeling as a Schr\"odinger bridge (SB) problem. It relies on static entropy-regularized optimal transport, or a minibatch approximation, to efficiently learn the SB without simulating the learned stochastic process. We find that [SF]$^2$M is more efficient and gives more accurate solutions to the SB problem than simulation-based methods from prior work. Finally, we apply [SF]$^2$M to the problem of learning cell dynamics from snapshot data. Notably, [SF]$^2$M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks from simulated data.
Keyword: adaptive
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Abstract
We propose Prefix-Adaptive Decoding (PREADD), a flexible method for controlled text generation. Unlike existing methods that use auxiliary expert models to control for attributes, PREADD does not require an external model, instead relying on linearly combining output logits from multiple prompts. Specifically, PREADD contrasts the output logits generated using a raw prompt against those generated using a prefix-prepended prompt, enabling both positive and negative control with respect to any attribute encapsulated by the prefix. We evaluate PREADD on three tasks -- toxic output mitigation, gender bias reduction, and sentiment control -- and find that PREADD outperforms not only prompting baselines, but also an auxiliary-expert control method, by 12% or more in relative gain on our main metrics for each task.
Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification
Authors: Mahdi Alehdaghi, Arthur Josi, Pourya Shamsolmoali, Rafael M. O. Cruz, Eric Granger
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Visible-infrared person re-identification seeks to retrieve images of the same individual captured over a distributed network of RGB and IR sensors. Several V-I ReID approaches directly integrate both V and I modalities to discriminate persons within a shared representation space. However, given the significant gap in data distributions between V and I modalities, cross-modal V-I ReID remains challenging. Some recent approaches improve generalization by leveraging intermediate spaces that can bridge V and I modalities, yet effective methods are required to select or generate data for such informative domains. In this paper, the Adaptive Generation of Privileged Intermediate Information training approach is introduced to adapt and generate a virtual domain that bridges discriminant information between the V and I modalities. The key motivation behind AGPI^2 is to enhance the training of a deep V-I ReID backbone by generating privileged images that provide additional information. These privileged images capture shared discriminative features that are not easily accessible within the original V or I modalities alone. Towards this goal, a non-linear generative module is trained with an adversarial objective, translating V images into intermediate spaces with a smaller domain shift w.r.t. the I domain. Meanwhile, the embedding module within AGPI^2 aims to produce similar features for both V and generated images, encouraging the extraction of features that are common to all modalities. In addition to these contributions, AGPI^2 employs adversarial objectives for adapting the intermediate images, which play a crucial role in creating a non-modality-specific space to address the large domain shifts between V and I domains. Experimental results conducted on challenging V-I ReID datasets indicate that AGPI^2 increases matching accuracy without extra computational resources during inference.
CSCLog: A Component Subsequence Correlation-Aware Log Anomaly Detection Method
Abstract
Anomaly detection based on system logs plays an important role in intelligent operations, which is a challenging task due to the extremely complex log patterns. Existing methods detect anomalies by capturing the sequential dependencies in log sequences, which ignore the interactions of subsequences. To this end, we propose CSCLog, a Component Subsequence Correlation-Aware Log anomaly detection method, which not only captures the sequential dependencies in subsequences, but also models the implicit correlations of subsequences. Specifically, subsequences are extracted from log sequences based on components and the sequential dependencies in subsequences are captured by Long Short-Term Memory Networks (LSTMs). An implicit correlation encoder is introduced to model the implicit correlations of subsequences adaptively. In addition, Graph Convolution Networks (GCNs) are employed to accomplish the information interactions of subsequences. Finally, attention mechanisms are exploited to fuse the embeddings of all subsequences. Extensive experiments on four publicly available log datasets demonstrate the effectiveness of CSCLog, outperforming the best baseline by an average of 7.41% in Macro F1-Measure.
Swin Transformer-Based Dynamic Semantic Communication for Multi-User with Different Computing Capacity
Authors: Loc X. Nguyen, Ye Lin Tun, Yan Kyaw Tun, Minh N. H. Nguyen, Chaoning Zhang, Zhu Han, Choong Seon Hong
Abstract
Semantic communication has gained significant attention from researchers as a promising technique to replace conventional communication in the next generation of communication systems, primarily due to its ability to reduce communication costs. However, little literature has studied its effectiveness in multi-user scenarios, particularly when there are variations in the model architectures used by users and their computing capacities. To address this issue, we explore a semantic communication system that caters to multiple users with different model architectures by using a multi-purpose transmitter at the base station (BS). Specifically, the BS in the proposed framework employs semantic and channel encoders to encode the image for transmission, while the receiver utilizes its local channel and semantic decoder to reconstruct the original image. Our joint source-channel encoder at the BS can effectively extract and compress semantic features for specific users by considering the signal-to-noise ratio (SNR) and computing capacity of the user. Based on the network status, the joint source-channel encoder at the BS can adaptively adjust the length of the transmitted signal. A longer signal ensures more information for high-quality image reconstruction for the user, while a shorter signal helps avoid network congestion. In addition, we propose a hybrid loss function for training, which enhances the perceptual details of reconstructed images. Finally, we conduct a series of extensive evaluations and ablation studies to validate the effectiveness of the proposed system.
Anableps: Adapting Bitrate for Real-Time Communication Using VBR-encoded Video
Authors: Zicheng Zhang, Hao Chen, Xun Cao, Zhan Ma
Abstract
Content providers increasingly replace traditional constant bitrate with variable bitrate (VBR) encoding in real-time video communication systems for better video quality. However, VBR encoding often leads to large and frequent bitrate fluctuation, inevitably deteriorating the efficiency of existing adaptive bitrate (ABR) methods. To tackle it, we propose the Anableps to consider the network dynamics and VBR-encoding-induced video bitrate fluctuations jointly for deploying the best ABR policy. With this aim, Anableps uses sender-side information from the past to predict the video bitrate range of upcoming frames. Such bitrate range is then combined with the receiver-side observations to set the proper bitrate target for video encoding using a reinforcement-learning-based ABR model. As revealed by extensive experiments on a real-world trace-driven testbed, our Anableps outperforms the GCC with significant improvement of quality of experience, e.g., 1.88x video quality, 57% less bitrate consumption, 85% less stalling, and 74% shorter interaction delay.
Large AI Model-Based Semantic Communications
Authors: Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Abstract
Semantic communication (SC) is an emerging intelligent paradigm, offering solutions for various future applications like metaverse, mixed-reality, and the Internet of everything. However, in current SC systems, the construction of the knowledge base (KB) faces several issues, including limited knowledge representation, frequent knowledge updates, and insecure knowledge sharing. Fortunately, the development of the large AI model provides new solutions to overcome above issues. Here, we propose a large AI model-based SC framework (LAM-SC) specifically designed for image data, where we first design the segment anything model (SAM)-based KB (SKB) that can split the original image into different semantic segments by universal semantic knowledge. Then, we present an attention-based semantic integration (ASI) to weigh the semantic segments generated by SKB without human participation and integrate them as the semantic-aware image. Additionally, we propose an adaptive semantic compression (ASC) encoding to remove redundant information in semantic features, thereby reducing communication overhead. Finally, through simulations, we demonstrate the effectiveness of the LAM-SC framework and the significance of the large AI model-based KB development in future SC paradigms.
Improved Algorithms for White-Box Adversarial Streams
Abstract
We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.
Online Network Source Optimization with Graph-Kernel MAB
Abstract
We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Authors: Lakshmi Nair, Mikhail Bernadskiy, Arulselvan Madhavan, Craig Chan, Ayon Basumallik, Darius Bunandar
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract
The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ and RPTQ on the model performances. We hope INT-FP-QSim will enable researchers to flexibly simulate models at various precisions to support further research in quantization of LLMs and vision transformers.
Keyword: quantization
ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
Authors: Gamze İslamoğlu (1), Moritz Scherer (1), Gianna Paulin (1), Tim Fischer (1), Victor J.B. Jung (1), Angelo Garofalo (1 and 2), Luca Benini (1 and 2) ((1) ETH Zürich, (2) University of Bologna)
Abstract
Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movement and energy consumption. ITA achieves competitive energy efficiency with respect to state-of-the-art transformer accelerators with 16.9 TOPS/W, while outperforming them in area efficiency with 5.93 TOPS/mm$^2$ in 22 nm fully-depleted silicon-on-insulator technology at 0.8 V.
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Authors: Lakshmi Nair, Mikhail Bernadskiy, Arulselvan Madhavan, Craig Chan, Ayon Basumallik, Darius Bunandar
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract
The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ and RPTQ on the model performances. We hope INT-FP-QSim will enable researchers to flexibly simulate models at various precisions to support further research in quantization of LLMs and vision transformers.
Keyword: efficient
A Scalable Approach to Performing Multiplication and Matrix Dot-Products in Unary
Joint Computing Offloading and Resource Allocation for Classification Intelligent Tasks in MEC Systems
Sparse Graphical Linear Dynamical Systems
LFA-tuned matrix-free multigrid method for the elastic Helmholtz equation
Recovery of Multiple Parameters in Subdiffusion from One Lateral Boundary Measurement
ADASSM: Adversarial Data Augmentation in Statistical Shape Models From Images
Physics-Infused Machine Learning Based Prediction of VTOL Aerodynamics with Sparse Datasets
Efficient parallel implementation of the multiplicative weight update method for graph-based linear programs
Deciphering the Drivers of Smart Livestock Technology Adoption in Japan: A Scoping Review, Expert Interviews, and Grounded Theory Approach
Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels
Distilled Pruning: Using Synthetic Data to Win the Lottery
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Efficient Ground Vehicle Path Following in Game AI
On Formal Feature Attribution and Its Approximation
Teaching Arithmetic to Small Transformers
A Network Resource Allocation Recommendation Method with An Improved Similarity Measure
Unsupervised Hyperspectral and Multispectral Images Fusion Based on the Cycle Consistency
NOFA: NeRF-based One-shot Facial Avatar Reconstruction
A GPU-accelerated simulator for the DEM analysis of granular systems composed of clump-shaped elements
Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning
ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
Improving the accuracy of Raviart-Thomas mixed elements in two-dimensional smooth domains with straight-edged triangles
Derivative Free Weight-space Ensembling
Incentive Allocation in Vertical Federated Learning Based on Bankruptcy Problem
Improved Algorithms for White-Box Adversarial Streams
Joint Perceptual Learning for Enhancement and Object Detection in Underwater Scenarios
Edge Element Approximation for the Spherical Interface Dynamo System
The impact of body and head dynamics on motion comfort assessment
Online Network Source Optimization with Graph-Kernel MAB
Simulation-free Schrödinger bridges via score and flow matching
Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages
Keyword: faster
Efficient parallel implementation of the multiplicative weight update method for graph-based linear programs
Distilled Pruning: Using Synthetic Data to Win the Lottery
RGB-D Mapping and Tracking in a Plenoxel Radiance Field
Improving Bitswap Privacy with Forwarding and Source Obfuscation
MALIBO: Meta-learning for Likelihood-free Bayesian Optimization
Keyword: mobile
Joint Computing Offloading and Resource Allocation for Classification Intelligent Tasks in MEC Systems
Facial Landmark Detection Evaluation on MOBIO Database
Metropolitan Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories
In A Society of Strangers, Kin Is Still Key: Identified Family Relations In Large-Scale Mobile Phone Data
ContextLabeler Dataset: physical and virtual sensors data collected from smartphone usage in-the-wild
LTE SFBC MIMO Transmitter Modelling and Performance Evaluation
Keyword: pruning
Distilled Pruning: Using Synthetic Data to Win the Lottery
Keyword: diffusion
Recovery of Multiple Parameters in Subdiffusion from One Lateral Boundary Measurement
Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels
Simulation-free Schrödinger bridges via score and flow matching
Keyword: adaptive
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification
CSCLog: A Component Subsequence Correlation-Aware Log Anomaly Detection Method
Swin Transformer-Based Dynamic Semantic Communication for Multi-User with Different Computing Capacity
Anableps: Adapting Bitrate for Real-Time Communication Using VBR-encoded Video
Large AI Model-Based Semantic Communications
Improved Algorithms for White-Box Adversarial Streams
Online Network Source Optimization with Graph-Kernel MAB
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Keyword: quantization
ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers