New submissions for Wed, 15 Nov 23

Keyword: efficient

Polarimetric PatchMatch Multi-View Stereo

Authors: Jinyu Zhao, Jumpei Oishi, Yusuke Monno, Masatoshi Okutomi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.07600
Pdf link: https://arxiv.org/pdf/2311.07600
Abstract PatchMatch Multi-View Stereo (PatchMatch MVS) is one of the popular MVS approaches, owing to its balanced accuracy and efficiency. In this paper, we propose Polarimetric PatchMatch multi-view Stereo (PolarPMS), which is the first method exploiting polarization cues to PatchMatch MVS. The key of PatchMatch MVS is to generate depth and normal hypotheses, which form local 3D planes and slanted stereo matching windows, and efficiently search for the best hypothesis based on the consistency among multi-view images. In addition to standard photometric consistency, our PolarPMS evaluates polarimetric consistency to assess the validness of a depth and normal hypothesis, motivated by the physical property that the polarimetric information is related to the object's surface normal. Experimental results demonstrate that our PolarPMS can improve the accuracy and the completeness of reconstructed 3D models, especially for texture-less surfaces, compared with state-of-the-art PatchMatch MVS methods.
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors
Authors: Sameer Deshmukh, Rio Yokota, George Bosilca
Subjects: Performance (cs.PF); Mathematical Software (cs.MS)
Arxiv link: https://arxiv.org/abs/2311.07602
Pdf link: https://arxiv.org/pdf/2311.07602
Abstract Factorization and multiplication of dense matrices and tensors are critical, yet extremely expensive pieces of the scientific toolbox. Careful use of low rank approximation can drastically reduce the computation and memory requirements of these operations. In addition to a lower arithmetic complexity, such methods can, by their structure, be designed to efficiently exploit modern hardware architectures. The majority of existing work relies on batched BLAS libraries to handle the computation of many small dense matrices. We show that through careful analysis of the cache utilization, register accumulation using SIMD registers and a redesign of the implementation, one can achieve significantly higher throughput for these types of batched low-rank matrices across a large range of block and batch sizes. We test our algorithm on 3 CPUs using diverse ISAs -- the Fujitsu A64FX using ARM SVE, the Intel Xeon 6148 using AVX-512 and AMD EPYC 7502 using AVX-2, and show that our new batching methodology is able to obtain more than twice the throughput of vendor optimized libraries for all CPU architectures and problem sizes.
PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment
Authors: Amirhossein Dadashzadeh, Shuchao Duan, Alan Whone, Majid Mirmehdi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.07603
Pdf link: https://arxiv.org/pdf/2311.07603
Abstract The limited availability of labelled data in Action Quality Assessment (AQA), has forced previous works to fine-tune their models pretrained on large-scale domain-general datasets. This common approach results in weak generalisation, particularly when there is a significant domain shift. We propose a novel, parameter efficient, continual pretraining framework, PECoP, to reduce such domain shift via an additional pretraining stage. In PECoP, we introduce 3D-Adapters, inserted into the pretrained model, to learn spatiotemporal, in-domain information via self-supervised learning where only the adapter modules' parameters are updated. We demonstrate PECoP's ability to enhance the performance of recent state-of-the-art methods (MUSDL, CoRe, and TSA) applied to AQA, leading to considerable improvements on benchmark datasets, JIGSAWS ($\uparrow6.0\%$), MTL-AQA ($\uparrow0.99\%$), and FineDiving ($\uparrow2.54\%$). We also present a new Parkinson's Disease dataset, PD4T, of real patients performing four various actions, where we surpass ($\uparrow3.56\%$) the state-of-the-art in comparison. Our code, pretrained models, and the PD4T dataset are available at https://github.com/Plrbear/PECoP.
On Algorithmic Cache Optimization
Authors: Neil Bhavikatti
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.07615
Pdf link: https://arxiv.org/pdf/2311.07615
Abstract We study matrix-matrix multiplication of two matrices, $A$ and $B$, each of size $n \times n$. This operation results in a matrix $C$ of size $n\times n$. Our goal is to produce $C$ as efficiently as possible given a cache: a 1-D limited set of data values that we can work with to perform elementary operations (additions, multiplications, etc.). That is, we attempt to reuse the maximum amount of data from $A$, $B$ and $C$ during our computation (or equivalently, utilize data in the fast-access cache as often as possible). Firstly, we introduce the matrix-matrix multiplication algorithm. Secondly, we present a standard two-memory model to simulate the architecture of a computer, and we explain the LRU (Least Recently Used) Cache policy (which is standard in most computers). Thirdly, we introduce a basic model Cache Simulator, which possesses an $\mathcal{O}(M)$ time complexity (meaning we are limited to small $M$ values). Then we discuss and model the LFU (Least Frequently Used) Cache policy and the explicit control cache policy. Finally, we introduce the main result of this paper, the $\mathcal{O}(1)$ Cache Simulator, and use it to compare, experimentally, the savings of time, energy, and communication incurred from the ideal cache-efficient algorithm for matrix-matrix multiplication. The Cache Simulator simulates the amount of data movement that occurs between the main memory and the cache of the computer. One of the findings of this project is that, in some cases, there is a significant discrepancy in communication values between an LRU cache algorithm and explicit cache control. We propose to alleviate this problem by ``tricking'' the LRU cache algorithm by updating the timestamp of the data we want to keep in cache (namely entries of matrix $C$). This enables us to have the benefits of an explicit cache policy while being constrained by the LRU paradigm (realistic policy on a CPU).
ReIDTracker Sea: the technical report of BoaTrack and SeaDronesSee-MOT challenge at MaCVi of WACV24
Authors: Kaer Huang, Weitu Chong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.07616
Pdf link: https://arxiv.org/pdf/2311.07616
Abstract Multi-Object Tracking is one of the most important technologies in maritime computer vision. Our solution tries to explore Multi-Object Tracking in maritime Unmanned Aerial vehicles (UAVs) and Unmanned Surface Vehicles (USVs) usage scenarios. Most of the current Multi-Object Tracking algorithms require complex association strategies and association information (2D location and motion, 3D motion, 3D depth, 2D appearance) to achieve better performance, which makes the entire tracking system extremely complex and heavy. At the same time, most of the current Multi-Object Tracking algorithms still require video annotation data which is costly to obtain for training. Our solution tries to explore Multi-Object Tracking in a completely unsupervised way. The scheme accomplishes instance representation learning by using self-supervision on ImageNet. Then, by cooperating with high-quality detectors, the multi-target tracking task can be completed simply and efficiently. The scheme achieved top 3 performance on both UAV-based Multi-Object Tracking with Reidentification and USV-based Multi-Object Tracking benchmarks and the solution won the championship in many multiple Multi-Object Tracking competitions. such as BDD100K MOT,MOTS, Waymo 2D MOT
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
Authors: Chenyu Wang, Zhen Dong, Daquan Zhou, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07620
Pdf link: https://arxiv.org/pdf/2311.07620
Abstract The exploration of Processing-In-Memory (PIM) accelerators has garnered significant attention within the research community. However, the utilization of large-scale neural networks on Processing-In-Memory (PIM) accelerators encounters challenges due to constrained on-chip memory capacity. To tackle this issue, current works explore model compression algorithms to reduce the size of Convolutional Neural Networks (CNNs). Most of these algorithms either aim to represent neural operators with reduced-size parameters (e.g., quantization) or search for the best combinations of neural operators (e.g., neural architecture search). Designing neural operators to align with PIM accelerators' specifications is an area that warrants further study. In this paper, we introduce the Epitome, a lightweight neural operator offering convolution-like functionality, to craft memory-efficient CNN operators for PIM accelerators (EPIM). On the software side, we evaluate epitomes' latency and energy on PIM accelerators and introduce a PIM-aware layer-wise design method to enhance their hardware efficiency. We apply epitome-aware quantization to further reduce the size of epitomes. On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost. Experimental results reveal that our 3-bit quantized EPIM-ResNet50 attains 71.59% top-1 accuracy on ImageNet, reducing crossbar areas by 30.65 times. EPIM surpasses the state-of-the-art pruning methods on PIM.
Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference
Authors: Rishav Mukherji, Mark Schöne, Khaleelulla Khan Nazeer, Christian Mayr, Anand Subramoney
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07625
Pdf link: https://arxiv.org/pdf/2311.07625
Abstract Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to $20\times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task. This magnitude of reduction has not been achieved previously with solely sparsely connected LSTMs, and the language modeling performance of our model has not been achieved previously with any sparsely activated recurrent neural networks or spiking neural networks. Neuromorphic computing devices are especially good at taking advantage of the dynamic activity sparsity, and our results provide strong evidence that making deep learning models activity sparse and porting them to neuromorphic devices can be a viable strategy that does not compromise on task performance. Our results also drive further convergence of methods from deep learning and neuromorphic computing for efficient machine learning.
Rethinking and Benchmarking Predict-then-Optimize Paradigm for Combinatorial Optimization Problems
Authors: Haoyu Geng, Han Ruan, Runzhong Wang, Yang Li, Yang Wang, Lei Chen, Junchi Yan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.07633
Pdf link: https://arxiv.org/pdf/2311.07633
Abstract Numerous web applications rely on solving combinatorial optimization problems, such as energy cost-aware scheduling, budget allocation on web advertising, and graph matching on social networks. However, many optimization problems involve unknown coefficients, and improper predictions of these factors may lead to inferior decisions which may cause energy wastage, inefficient resource allocation, inappropriate matching in social networks, etc. Such a research topic is referred to as "Predict-Then-Optimize (PTO)" which considers the performance of prediction and decision-making in a unified system. A noteworthy recent development is the end-to-end methods by directly optimizing the ultimate decision quality which claims to yield better results in contrast to the traditional two-stage approach. However, the evaluation benchmarks in this field are fragmented and the effectiveness of various models in different scenarios remains unclear, hindering the comprehensive assessment and fast deployment of these methods. To address these issues, we provide a comprehensive categorization of current approaches and integrate existing experimental scenarios to establish a unified benchmark, elucidating the circumstances under which end-to-end training yields improvements, as well as the contexts in which it performs ineffectively. We also introduce a new dataset for the industrial combinatorial advertising problem for inclusive finance to open-source. We hope the rethinking and benchmarking of PTO could facilitate more convenient evaluation and deployment, and inspire further improvements both in the academy and industry within this field.
Estimating the matrix $p \rightarrow q$ norm
Authors: Larry Guth, Dominique Maldague, John Urschel
Subjects: Data Structures and Algorithms (cs.DS); Functional Analysis (math.FA)
Arxiv link: https://arxiv.org/abs/2311.07677
Pdf link: https://arxiv.org/pdf/2311.07677
Abstract The matrix $p \rightarrow q$ norm is a fundamental quantity appearing in a variety of areas of mathematics. This quantity is known to be efficiently computable in only a few special cases. The best known algorithms for approximately computing this quantity with theoretical guarantees essentially consist of computing the $p\to q$ norm for $p,q$ where this quantity can be computed exactly or up to a constant, and applying interpolation. We analyze the matrix $2 \to q$ norm problem and provide an improved approximation algorithm via a simple argument involving the rows of a given matrix. For example, we improve the best-known $2\to 4$ norm approximation from $m^{1/8}$ to $m^{1/12}$. This insight for the $2\to q$ norm improves the best known $p \to q$ approximation algorithm for the region $p \le 2 \le q$, and leads to an overall improvement in the best-known approximation for $p \to q$ norms from $m^{25/128}$ to $m^{3 - 2 \sqrt{2}}$.
Matching aggregate posteriors in the variational autoencoder
Authors: Surojit Saha, Sarang Joshi, Ross Whitaker
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07693
Pdf link: https://arxiv.org/pdf/2311.07693
Abstract The variational autoencoder (VAE) is a well-studied, deep, latent-variable model (DLVM) that efficiently optimizes the variational lower bound of the log marginal data likelihood and has a strong theoretical foundation. However, the VAE's known failure to match the aggregate posterior often results in \emph{pockets/holes} in the latent distribution (i.e., a failure to match the prior) and/or \emph{posterior collapse}, which is associated with a loss of information in the latent space. This paper addresses these shortcomings in VAEs by reformulating the objective function associated with VAEs in order to match the aggregate/marginal posterior distribution to the prior. We use kernel density estimate (KDE) to model the aggregate posterior in high dimensions. The proposed method is named the \emph{aggregate variational autoencoder} (AVAE) and is built on the theoretical framework of the VAE. Empirical evaluation of the proposed method on multiple benchmark data sets demonstrates the effectiveness of the AVAE relative to state-of-the-art (SOTA) methods.
Chaotic dynamics of two-dimensional flows around a cylinder
Authors: L. Ridgway Scott, Rebecca Durst
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.07698
Pdf link: https://arxiv.org/pdf/2311.07698
Abstract We study flow around a cylinder from a dynamics perspective, using drag and lift as indicators. We observe that the mean drag coefficient bifurcates from the steady case when the Karman vortex street emerges. We also find a jump in the dimension of the drag/lift attractor just above Reynolds number 100. We compare the simulated drag values with experimental data obtained over the last hundred years. Our simulations suggest that a vibrational resonance in the cylinder would be unlikely for Reynolds numbers greater than 1000, where the drag/lift behavior is fully chaotic.
AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising
Authors: Zhen Guo, Shangdi Yu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07700
Pdf link: https://arxiv.org/pdf/2311.07700
Abstract Large language models (LLMs) have opened up enormous opportunities while simultaneously posing ethical dilemmas. One of the major concerns is their ability to create text that closely mimics human writing, which can lead to potential misuse, such as academic misconduct, disinformation, and fraud. To address this problem, we present AuthentiGPT, an efficient classifier that distinguishes between machine-generated and human-written texts. Under the assumption that human-written text resides outside the distribution of machine-generated text, AuthentiGPT leverages a black-box LLM to denoise input text with artificially added noise, and then semantically compares the denoised text with the original to determine if the content is machine-generated. With only one trainable parameter, AuthentiGPT eliminates the need for a large training dataset, watermarking the LLM's output, or computing the log-likelihood. Importantly, the detection capability of AuthentiGPT can be easily adapted to any generative language model. With a 0.918 AUROC score on a domain-specific dataset, AuthentiGPT demonstrates its effectiveness over other commercial algorithms, highlighting its potential for detecting machine-generated text in academic settings.
Histopathologic Cancer Detection
Authors: Varan Singh Rohila, Neeraj Lalwani, Lochan Basyal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.07711
Pdf link: https://arxiv.org/pdf/2311.07711
Abstract Early diagnosis of the cancer cells is necessary for making an effective treatment plan and for the health and safety of a patient. Nowadays, doctors usually use a histological grade that pathologists determine by performing a semi-quantitative analysis of the histopathological and cytological features of hematoxylin-eosin (HE) stained histopathological images. This research contributes a potential classification model for cancer prognosis to efficiently utilize the valuable information underlying the HE-stained histopathological images. This work uses the PatchCamelyon benchmark datasets and trains them in a multi-layer perceptron and convolution model to observe the model's performance in terms of precision, Recall, F1 Score, Accuracy, and AUC Score. The evaluation result shows that the baseline convolution model outperforms the baseline MLP model. Also, this paper introduced ResNet50 and InceptionNet models with data augmentation, where ResNet50 is able to beat the state-of-the-art model. Furthermore, the majority vote and concatenation ensemble were evaluated and provided the future direction of using transfer learning and segmentation to understand the specific features.
Low-Cost Architecture for an Advanced Smart Shower System Using Internet of Things Platform
Authors: Shadeeb Hossain, Ahmed Abdelgawad
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2311.07712
Pdf link: https://arxiv.org/pdf/2311.07712
Abstract Wastage of water is a critical issue amongst the various global crises. This paper proposes an architecture model for a low-cost, energy efficient SMART Shower system that is ideal for efficient water management and be able to predict reliably any accidental fall in the shower space. The sensors in this prototype can document the surrounding temperature and humidity in real time and thereby circulate the ideal temperature of water for its patron, rather than its reliance on predictive values . Three different scenarios are discussed that can allow reliably predicting any accidental fall in the shower vicinity. Motion sensors, sound sensors and gesture sensors can be used to compliment prediction of possible injuries in the shower. The integration with the Internet of Things (IoT) platform will allow caretakers to monitor the activities in the shower space especially in the case of elderly individuals as there have been reported cases of casualties in the slippery shower space. The proposed proof-of-concept prototype is cost effective and can be incorporated into an existing system for the added precedence of safety and convenience. The intelligent system is conserving water by optimizing its flow temperature and the IoT platform allows real time monitoring for safety.
Sparse Regression LDPC Codes
Authors: Jamison R. Ebert, Jean-Francois Chamberland, Krishna R. Narayanan
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.07720
Pdf link: https://arxiv.org/pdf/2311.07720
Abstract This article introduces a novel concatenated coding scheme called sparse regression LDPC (SR-LDPC) codes. An SR-LDPC code consists of an outer non-binary LDPC code and an inner sparse regression code (SPARC) whose respective field size and section sizes are equal. For such codes, an efficient decoding algorithm is proposed based on approximate message passing (AMP) that dynamically shares soft information between inner and outer decoders. This dynamic exchange of information is facilitated by a denoiser that runs belief propagation (BP) on the factor graph of the outer LDPC code within each AMP iteration. It is shown that this denoiser falls within the class of non-separable pseudo-Lipschitz denoising functions and thus that state evolution holds for the proposed AMP-BP algorithm. Leveraging the rich structure of SR-LDPC codes, this article proposes an efficient low-dimensional approximate state evolution recursion that can be used for efficient hyperparameter tuning, thus paving the way for future work on optimal code design. Finally, numerical simulations demonstrate that SR-LDPC codes outperform contemporary codes over the AWGN channel for parameters of practical interest. SR-LDPC codes are shown to be viable means to obtain shaping gains over the AWGN channel.
Near-Field Integrated Sensing, Positioning, and Communication: A Downlink and Uplink Framework
Authors: Haochen Li, Zhaolin Wang, Xidong Mu, Zhiwen Pan, Yuanwei Liu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.07722
Pdf link: https://arxiv.org/pdf/2311.07722
Abstract A near-field integrated sensing, positioning, and communication (ISPAC) framework is proposed, where a base station (BS) simultaneously serves multiple communication users and carries out target sensing and positioning. A novel double-array structure is proposed to enable the near-field ISPAC at the BS. Specifically, a small-scale assisting transceiver (AT) is attached to the large-scale main transceiver (MT) to empower the communication system with the ability of sensing and positioning. Based on the proposed framework, the joint angle and distance Cram\'er-Rao bound (CRB) is first derived. Then, the CRB is minimized subject to the minimum communication rate requirement in both downlink and uplink ISPAC scenarios: 1) For downlink ISPAC, a downlink target positioning algorithm is proposed and a penalty dual decomposition (PDD)-based double-loop algorithm is developed to tackle the non-convex optimization problem. 2) For uplink ISPAC, an uplink target positioning algorithm is proposed and an efficient alternating optimization algorithm is conceived to solve the non-convex CRB minimization problem with coupled user communication and target probing design. Both proposed optimization algorithms can converge to a stationary point of the CRB minimization problem. Numerical results show that: 1) The proposed ISPAC system can locate the target in both angle and distance domains merely relying on single BS and limited bandwidths; and 2) the positioning performance achieved by the hybrid-analog-and-digital ISPAC approaches that achieved by fully digital ISPAC when the communication rate requirement is not stringent.
Quality-Aware Prototype Memory for Face Representation Learning
Authors: Evgeny Smirnov, Vasiliy Galyuk, Evgeny Lukyanets
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.07734
Pdf link: https://arxiv.org/pdf/2311.07734
Abstract Prototype Memory is a powerful model for face representation learning. It enables the training of face recognition models using datasets of any size, with on-the-fly generation of prototypes (classifier weights) and efficient ways of their utilization. Prototype Memory demonstrated strong results in many face recognition benchmarks. However, the algorithm of prototype generation, used in it, is prone to the problems of imperfectly calculated prototypes in case of low-quality or poorly recognizable faces in the images, selected for the prototype creation. All images of the same person, presented in the mini-batch, used with equal weights, and the resulting averaged prototype could be contaminated with imperfect embeddings of such face images. It can lead to misdirected training signals and impair the performance of the trained face recognition models. In this paper, we propose a simple and effective way to improve Prototype Memory with quality-aware prototype generation. Quality-Aware Prototype Memory uses different weights for images of different quality in the process of prototype generation. With this improvement, prototypes get more valuable information from high-quality images and less hurt by low-quality ones. We propose and compare several methods of quality estimation and usage, perform extensive experiments on the different face recognition benchmarks and demonstrate the advantages of the proposed model compared to the basic version of Prototype Memory.
Modeling Sequences as Star Graphs to Address Over-smoothing in Self-attentive Sequential Recommendation
Authors: Bo Peng, Ziqi Chen, Srinivasan Parthasarathy, Xia Ning
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.07742
Pdf link: https://arxiv.org/pdf/2311.07742
Abstract Self-attention (SA) mechanisms have been widely used in developing sequential recommendation (SR) methods, and demonstrated state-of-the-art performance. However, in this paper, we show that self-attentive SR methods substantially suffer from the over-smoothing issue that item embeddings within a sequence become increasingly similar across attention blocks. As widely demonstrated in the literature, this issue could lead to a loss of information in individual items, and significantly degrade models' scalability and performance. To address the over-smoothing issue, in this paper, we view items within a sequence constituting a star graph and develop a method, denoted as MSSG, for SR. Different from existing self-attentive methods, MSSG introduces an additional internal node to specifically capture the global information within the sequence, and does not require information propagation among items. This design fundamentally addresses the over-smoothing issue and enables MSSG a linear time complexity with respect to the sequence length. We compare MSSG with ten state-of-the-art baseline methods on six public benchmark datasets. Our experimental results demonstrate that MSSG significantly outperforms the baseline methods, with an improvement of as much as 10.10%. Our analysis shows the superior scalability of MSSG over the state-of-the-art self-attentive methods. Our complexity analysis and run-time performance comparison together show that MSSG is both theoretically and practically more efficient than self-attentive methods. Our analysis of the attention weights learned in SA-based methods indicates that on sparse recommendation data, modeling dependencies in all item pairs using the SA mechanism yields limited information gain, and thus, might not benefit the recommendation performance
Size-Aware Hypergraph Motifs
Authors: Jason Niu, Ilya D. Amburg, Sinan G. Aksoy, Ahmet Erdem Sarıyüce
Subjects: Discrete Mathematics (cs.DM); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2311.07783
Pdf link: https://arxiv.org/pdf/2311.07783
Abstract Complex systems frequently exhibit multi-way, rather than pairwise, interactions. These group interactions cannot be faithfully modeled as collections of pairwise interactions using graphs, and instead require hypergraphs. However, methods that analyze hypergraphs directly, rather than via lossy graph reductions, remain limited. Hypergraph motif mining holds promise in this regard, as motif patterns serve as building blocks for larger group interactions which are inexpressible by graphs. Recent work has focused on categorizing and counting hypergraph motifs based on the existence of nodes in hyperedge intersection regions. Here, we argue that the relative sizes of hyperedge intersections within motifs contain varied and valuable information. We propose a suite of efficient algorithms for finding triplets of hyperedges based on optimizing the sizes of these intersection patterns. This formulation uncovers interesting local patterns of interaction, finding hyperedge triplets that either (1) are the least correlated with each other, (2) have the highest pairwise but not groupwise correlation, or (3) are the most correlated with each other. We formalize this as a combinatorial optimization problem and design efficient algorithms based on filtering hyperedges. Our experimental evaluation shows that the resulting hyperedge triplets yield insightful information on real-world hypergraphs. Our approach is also orders of magnitude faster than a naive baseline implementation.
Leveraging Hamilton-Jacobi PDEs with time-dependent Hamiltonians for continual scientific machine learning
Authors: Paula Chen, Tingwei Meng, Zongren Zou, Jérôme Darbon, George Em Karniadakis
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.07790
Pdf link: https://arxiv.org/pdf/2311.07790
Abstract We address two major challenges in scientific machine learning (SciML): interpretability and computational efficiency. We increase the interpretability of certain learning processes by establishing a new theoretical connection between optimization problems arising from SciML and a generalized Hopf formula, which represents the viscosity solution to a Hamilton-Jacobi partial differential equation (HJ PDE) with time-dependent Hamiltonian. Namely, we show that when we solve certain regularized learning problems with integral-type losses, we actually solve an optimal control problem and its associated HJ PDE with time-dependent Hamiltonian. This connection allows us to reinterpret incremental updates to learned models as the evolution of an associated HJ PDE and optimal control problem in time, where all of the previous information is intrinsically encoded in the solution to the HJ PDE. As a result, existing HJ PDE solvers and optimal control algorithms can be reused to design new efficient training approaches for SciML that naturally coincide with the continual learning framework, while avoiding catastrophic forgetting. As a first exploration of this connection, we consider the special case of linear regression and leverage our connection to develop a new Riccati-based methodology for solving these learning problems that is amenable to continual learning applications. We also provide some corresponding numerical examples that demonstrate the potential computational and memory advantages our Riccati-based approach can provide.
Explainable History Distillation by Marked Temporal Point Process
Authors: Sishun Liu, Ke Deng, Yan Wang, Xiuzhen Zhang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07797
Pdf link: https://arxiv.org/pdf/2311.07797
Abstract Explainability of machine learning models is mandatory when researchers introduce these commonly believed black boxes to real-world tasks, especially high-stakes ones. In this paper, we build a machine learning system to automatically generate explanations of happened events from history by \gls{ca} based on the \acrfull{tpp}. Specifically, we propose a new task called \acrfull{ehd}. This task requires a model to distill as few events as possible from observed history. The target is that the event distribution conditioned on left events predicts the observed future noticeably worse. We then regard distilled events as the explanation for the future. To efficiently solve \acrshort{ehd}, we rewrite the task into a \gls{01ip} and directly estimate the solution to the program by a model called \acrfull{model}. This work fills the gap between our task and existing works, which only spot the difference between factual and counterfactual worlds after applying a predefined modification to the environment. Experiment results on Retweet and StackOverflow datasets prove that \acrshort{model} significantly outperforms other \acrshort{ehd} baselines and can reveal the rationale underpinning real-world processes.
Assessing Test-time Variability for Interactive 3D Medical Image Segmentation with Diverse Point Prompts
Authors: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.07806
Pdf link: https://arxiv.org/pdf/2311.07806
Abstract Interactive segmentation model leverages prompts from users to produce robust segmentation. This advancement is facilitated by prompt engineering, where interactive prompts serve as strong priors during test-time. However, this is an inherently subjective and hard-to-reproduce process. The variability in user expertise and inherently ambiguous boundaries in medical images can lead to inconsistent prompt selections, potentially affecting segmentation accuracy. This issue has not yet been extensively explored for medical imaging. In this paper, we assess the test-time variability for interactive medical image segmentation with diverse point prompts. For a given target region, the point is classified into three sub-regions: boundary, margin, and center. Our goal is to identify a straightforward and efficient approach for optimal prompt selection during test-time based on three considerations: (1) benefits of additional prompts, (2) effects of prompt placement, and (3) strategies for optimal prompt selection. We conduct extensive experiments on the public Medical Segmentation Decathlon dataset for challenging colon tumor segmentation task. We suggest an optimal strategy for prompt selection during test-time, supported by comprehensive results. The code is publicly available at https://github.com/MedICL-VU/variability
A novel and simple spectral method for nonlocal PDEs with the fractional Laplacian
Authors: Shiping Zhou, Yanzhi Zhang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.07814
Pdf link: https://arxiv.org/pdf/2311.07814
Abstract We propose a novel and simple spectral method based on the semi-discrete Fourier transforms to discretize the fractional Laplacian $(-\Delta)^\frac{\alpha}{2}$. Numerical analysis and experiments are provided to study its performance. Our method has the same symbol $|\xi|^\alpha$ as the fractional Laplacian $(-\Delta)^\frac{\alpha}{2}$ at the discrete level, and thus it can be viewed as the exact discrete analogue of the fractional Laplacian. This {\it unique feature} distinguishes our method from other existing methods for the fractional Laplacian. Note that our method is different from the Fourier pseudospectral methods in the literature, which are usually limited to periodic boundary conditions (see Remark \ref{remark0}). Numerical analysis shows that our method can achieve a spectral accuracy. The stability and convergence of our method in solving the fractional Poisson equations were analyzed. Our scheme yields a multilevel Toeplitz stiffness matrix, and thus fast algorithms can be developed for efficient matrix-vector products. The computational complexity is ${\mathcal O}(2N\log(2N))$, and the memory storage is ${\mathcal O}(N)$ with $N$ the total number of points. Extensive numerical experiments verify our analytical results and demonstrate the effectiveness of our method in solving various problems.
On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model
Authors: Nohil Park, Joonsuk Park, Kang Min Yoo, Sungroh Yoon
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.07820
Pdf link: https://arxiv.org/pdf/2311.07820
Abstract An exciting advancement in the field of multilingual models is the emergence of autoregressive models with zero- and few-shot capabilities, a phenomenon widely reported in large-scale language models. To further improve model adaptation to cross-lingual tasks, another trend is to further fine-tune the language models with either full fine-tuning or parameter-efficient tuning. However, the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models has yet to be studied. Specifically, we lack an understanding of the role of linguistic distributions in multilingual models in the effectiveness of token-based prompt tuning. To address this question, we conduct experiments comparing prompt tuning and fine-tuning on the decoder-based multilingual model, XGLM, with four cross-lingual tasks (XNLI, PAWS-X, POS, NER). According to our study, prompt tuning achieves on par or better performance over fine-tuning across all languages while updating at most 0.13\% of the model parameters. Moreover, we empirically show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning. Our further analysis shows that the phenomenon is related to the tokenization scheme of the multilingual model.
Adaptive Search Optimization: Dynamic Algorithm Selection and Caching for Enhanced Database Performance
Authors: Hakikat Singh
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.07826
Pdf link: https://arxiv.org/pdf/2311.07826
Abstract Efficient search operations in databases are paramount for timely retrieval of information various applications. This research introduces a novel approach, combining dynamicalgorithm1 selection and caching2 strategies, to optimize search performance. The proposed dynamic search algorithm intelligently switches between Binary3 and Interpolation 4 Search based on dataset characteristics, significantly improving efficiency for non-uniformly distributed data. Additionally, a robust caching mechanism5 stores and retrieves previous search results, further enhancing computational efficiency6. Theoretical analysis and extensive experiments demonstrate the effectiveness of the approach, showcasing its potential to revolutionize database performance7 in scenarios with diverse data distributions. This research contributes valuable insights and practical solutions to the realm of database optimization, offering a promising avenue for enhancing search operations in real-world applications
A Coding Scheme for Straggler Resilient Quantum $X$-Secure $T$-Private Information Retrieval
Authors: Yuxiang Lu, Syed A. Jafar
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.07829
Pdf link: https://arxiv.org/pdf/2311.07829
Abstract Building on recent constructions of Quantum Cross Subspace Alignment (QCSA) codes, this work develops a coding scheme for QEXSTPIR, i.e., classical private information retrieval with $X$-secure storage and $T$-private queries, over a quantum multiple access channel, that is resilient to any set of up to $E$ erased servers (equivalently known as unresponsive servers, or stragglers). The scheme is accordingly labeled QECSA, with the `E' indicating resilience to erased servers. The novelty of QECSA lies in achieving efficient $E$-straggler resilience on top of existing QCSA codes that already achieve $X$-secure storage, $T$-private queries, and distributed superdense coding gains for communication efficient decoding. The QECSA code structure may be broadly useful for problems such as quantum coded secure distributed computation, where security, straggler resilience, and distributed superdense coding gains are simultaneously required.
Toward Efficient and Incremental Spectral Clustering via Parametric Spectral Clustering
Authors: Jo-Chun Chen, Hung-Hsuan Chen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07833
Pdf link: https://arxiv.org/pdf/2311.07833
Abstract Spectral clustering is a popular method for effectively clustering nonlinearly separable data. However, computational limitations, memory requirements, and the inability to perform incremental learning challenge its widespread application. To overcome these limitations, this paper introduces a novel approach called parametric spectral clustering (PSC). By extending the capabilities of spectral clustering, PSC addresses the challenges associated with big data and real-time scenarios and enables efficient incremental clustering with new data points. Experimental evaluations conducted on various open datasets demonstrate the superiority of PSC in terms of computational efficiency while achieving clustering quality mostly comparable to standard spectral clustering. The proposed approach has significant potential for incremental and real-time data analysis applications, facilitating timely and accurate clustering in dynamic and evolving datasets. The findings of this research contribute to the advancement of clustering techniques and open new avenues for efficient and effective data analysis. We publish the experimental code at https://github.com/109502518/PSC_BigData.
Enabling Decision-Support Systems through Automated Cell Tower Detection
Authors: Natasha Krell, Will Gleave, Daniel Nakada, Justin Downes, Amanda Willet, Matthew Baran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.07840
Pdf link: https://arxiv.org/pdf/2311.07840
Abstract Cell phone coverage and high-speed service gaps persist in rural areas in sub-Saharan Africa, impacting public access to mobile-based financial, educational, and humanitarian services. Improving maps of telecommunications infrastructure can help inform strategies to eliminate gaps in mobile coverage. Deep neural networks, paired with remote sensing images, can be used for object detection of cell towers and eliminate the need for inefficient and burdensome manual mapping to find objects over large geographic regions. In this study, we demonstrate a partially automated workflow to train an object detection model to locate cell towers using OpenStreetMap (OSM) features and high-resolution Maxar imagery. For model fine-tuning and evaluation, we curated a diverse dataset of over 6,000 unique images of cell towers in 26 countries in eastern, southern, and central Africa using automatically generated annotations from OSM points. Our model achieves an average precision at 50% Intersection over Union (IoU) (AP@50) of 81.2 with good performance across different geographies and out-of-sample testing. Accurate localization of cell towers can yield more accurate cell coverage maps, in turn enabling improved delivery of digital services for decision-support applications.
PEMS: Pre-trained Epidmic Time-series Models
Authors: Harshavardhan Kamarthi, B. Aditya Prakash
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2311.07841
Pdf link: https://arxiv.org/pdf/2311.07841
Abstract Providing accurate and reliable predictions about the future of an epidemic is an important problem for enabling informed public health decisions. Recent works have shown that leveraging data-driven solutions that utilize advances in deep learning methods to learn from past data of an epidemic often outperform traditional mechanistic models. However, in many cases, the past data is sparse and may not sufficiently capture the underlying dynamics. While there exists a large amount of data from past epidemics, leveraging prior knowledge from time-series data of other diseases is a non-trivial challenge. Motivated by the success of pre-trained models in language and vision tasks, we tackle the problem of pre-training epidemic time-series models to learn from multiple datasets from different diseases and epidemics. We introduce Pre-trained Epidemic Time-Series Models (PEMS) that learn from diverse time-series datasets of a variety of diseases by formulating pre-training as a set of self-supervised learning (SSL) tasks. We tackle various important challenges specific to pre-training for epidemic time-series such as dealing with heterogeneous dynamics and efficiently capturing useful patterns from multiple epidemic datasets by carefully designing the SSL tasks to learn important priors about the epidemic dynamics that can be leveraged for fine-tuning to multiple downstream tasks. The resultant PEM outperforms previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion including the novel Covid-19 pandemic unseen in pre-trained data with better efficiency using smaller fraction of datasets.
Replay Clocks
Authors: Ishaan Lagwankar, Sandeep S Kulkarni
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.07842
Pdf link: https://arxiv.org/pdf/2311.07842
Abstract In this work, we focus on the problem of replay clocks (RepCL). The need for replay clocks arises from the observation that analyzing distributed computation for all desired properties of interest may not be feasible in an online environment. These properties can be analyzed by replaying the computation. However, to be beneficial, such replay must account for all the uncertainty that is possible in a distributed computation. Specifically, if event 'e' must occur before 'f' then the replay clock must ensure that 'e' is replayed before 'f'. On the other hand, if 'e' and 'f' could occur in any order then replay should not force an order between them. After identifying the limitations of existing clocks to provide the replay primitive, we present RepCL and identify an efficient representation for the same. We demonstrate that RepCL can be implemented with less than four integers for 64 processes for various system parameters if clocks are synchronized within 1 ms. Furthermore, the overhead of RepCL (for computing/comparing timestamps and message size) is proportional to the size of the clock. Using simulations, we identify the expected overhead of RepCL based on the given system settings. We also identify how a user can the identify feasibility region for RepCL. Specifically, given the desired overhead of RepCL, it identifies the region where unabridged replay is possible.
Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series
Authors: Onur Poyraz, Pekka Marttinen
Subjects: Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.07867
Pdf link: https://arxiv.org/pdf/2311.07867
Abstract Analysis of multivariate healthcare time series data is inherently challenging: irregular sampling, noisy and missing values, and heterogeneous patient groups with different dynamics violating exchangeability. In addition, interpretability and quantification of uncertainty are critically important. Here, we propose a novel class of models, a mixture of coupled hidden Markov models (M-CHMM), and demonstrate how it elegantly overcomes these challenges. To make the model learning feasible, we derive two algorithms to sample the sequences of the latent variables in the CHMM: samplers based on (i) particle filtering and (ii) factorized approximation. Compared to existing inference methods, our algorithms are computationally tractable, improve mixing, and allow for likelihood estimation, which is necessary to learn the mixture model. Experiments on challenging real-world epidemiological and semi-synthetic data demonstrate the advantages of the M-CHMM: improved data fit, capacity to efficiently handle missing and noisy measurements, improved prediction accuracy, and ability to identify interpretable subsets in the data.
AutoML for Large Capacity Modeling of Meta Ranking Systems
Authors: Hang Yin, Kuang-Hung Liu, Mengying Sun, Yuxin Chen, Buyun Zhang, Jiang Liu, Vivek Sehgal, Rudresh Rajnikant Panchal, Eugen Hotaj, Xi Liu, Daifeng Guo, Jamey Zhang, Zhou Wang, Shali Jiang, Huayu Li, Zhengxing Chen, Wen-Yen Chen, Jiyan Yang, Wei Wen
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.07870
Pdf link: https://arxiv.org/pdf/2311.07870
Abstract Web-scale ranking systems at Meta serving billions of users is complex. Improving ranking models is essential but engineering heavy. Automated Machine Learning (AutoML) can release engineers from labor intensive work of tuning ranking models; however, it is unknown if AutoML is efficient enough to meet tight production timeline in real-world and, at the same time, bring additional improvements to the strong baselines. Moreover, to achieve higher ranking performance, there is an ever-increasing demand to scale up ranking models to even larger capacity, which imposes more challenges on the efficiency. The large scale of models and tight production schedule requires AutoML to outperform human baselines by only using a small number of model evaluation trials (around 100). We presents a sampling-based AutoML method, focusing on neural architecture search and hyperparameter optimization, addressing these challenges in Meta-scale production when building large capacity models. Our approach efficiently handles large-scale data demands. It leverages a lightweight predictor-based searcher and reinforcement learning to explore vast search spaces, significantly reducing the number of model evaluations. Through experiments in large capacity modeling for CTR and CVR applications, we show that our method achieves outstanding Return on Investment (ROI) versus human tuned baselines, with up to 0.09% Normalized Entropy (NE) loss reduction or $25\%$ Query per Second (QPS) increase by only sampling one hundred models on average from a curated search space. The proposed AutoML method has already made real-world impact where a discovered Instagram CTR model with up to -0.36% NE gain (over existing production baseline) was selected for large-scale online A/B test and show statistically significant gain. These production results proved AutoML efficacy and accelerated its adoption in ranking systems at Meta.
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
Authors: Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.07876
Pdf link: https://arxiv.org/pdf/2311.07876
Abstract In this work, we study the low-rank MDPs with adversarially changed losses in the full-information feedback setting. In particular, the unknown transition probability kernel admits a low-rank matrix decomposition \citep{REPUCB22}, and the loss functions may change adversarially but are revealed to the learner at the end of each episode. We propose a policy optimization-based algorithm POLO, and we prove that it attains the $\widetilde{O}(K^{\frac{5}{6}}A^{\frac{1}{2}}d\ln(1+M)/(1-\gamma)^2)$ regret guarantee, where $d$ is rank of the transition kernel (and hence the dimension of the unknown representations), $A$ is the cardinality of the action space, $M$ is the cardinality of the model class, and $\gamma$ is the discounted factor. Notably, our algorithm is oracle-efficient and has a regret guarantee with no dependence on the size of potentially arbitrarily large state space. Furthermore, we also prove an $\Omega(\frac{\gamma^2}{1-\gamma} \sqrt{d A K})$ regret lower bound for this problem, showing that low-rank MDPs are statistically more difficult to learn than linear MDPs in the regret minimization setting. To the best of our knowledge, we present the first algorithm that interleaves representation learning, exploration, and exploitation to achieve the sublinear regret guarantee for RL with nonlinear function approximation and adversarial losses.
bpftime: userspace eBPF Runtime for Uprobe, Syscall and Kernel-User Interactions
Authors: Yusheng Zheng, Tong Yu, Yiwei Yang, Yanpeng Hu, XiaoZheng Lai, Andrew Quinn
Subjects: Operating Systems (cs.OS)
Arxiv link: https://arxiv.org/abs/2311.07923
Pdf link: https://arxiv.org/pdf/2311.07923
Abstract In kernel-centric operations, the uprobe component of eBPF frequently encounters performance bottlenecks, largely attributed to the overheads borne by context switches. Transitioning eBPF operations to user space bypasses these hindrances, thereby optimizing performance. This also enhances configurability and obviates the necessity for root access or privileges for kernel eBPF, subsequently minimizing the kernel attack surface. This paper introduces bpftime, a novel user-space eBPF runtime, which leverages binary rewriting to implement uprobe and syscall hook capabilities. Through bpftime, userspace uprobes achieve a 10x speed enhancement compared to their kernel counterparts without requiring dual context switches. Additionally, this runtime facilitates the programmatic hooking of syscalls within a process, both safely and efficiently. Bpftime can be seamlessly attached to any running process, limiting the need for either a restart or manual recompilation. Our implementation also extends to interprocess eBPF Maps within shared memory, catering to summary aggregation or control plane communication requirements. Compatibility with existing eBPF toolchains such as clang and libbpf is maintained, not only simplifying the development of user-space eBPF without necessitating any modifications but also supporting CO-RE through BTF. Through bpftime, we not only enhance uprobe performance but also extend the versatility and user-friendliness of eBPF runtime in user space, paving the way for more efficient and secure kernel operations.
Cross-subject dual-domain fusion network with task-related and task-discriminant component analysis enhancing one-shot SSVEP classification
Authors: Yang Deng, Zhiwei Ji, Yijun Wang, S.Kevin Zhou
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.07932
Pdf link: https://arxiv.org/pdf/2311.07932
Abstract This study addresses the significant challenge of developing efficient decoding algorithms for classifying steady-state visual evoked potentials (SSVEPs) in scenarios characterized by extreme scarcity of calibration data, where only one calibration is available for each stimulus target. To tackle this problem, we introduce a novel cross-subject dual-domain fusion network (CSDuDoFN) incorporating task-related and task-discriminant component analysis (TRCA and TDCA) for one-shot SSVEP classification. The CSDuDoFN framework is designed to comprehensively transfer information from source subjects, while TRCA and TDCA are employed to exploit the single available calibration of the target subject. Specifically, we develop multi-reference least-squares transformation (MLST) to map data from both source subjects and the target subject into the domain of sine-cosine templates, thereby mitigating inter-individual variability and benefiting transfer learning. Subsequently, the transformed data in the sine-cosine templates domain and the original domain data are separately utilized to train a convolutional neural network (CNN) model, with the adequate fusion of their feature maps occurring at distinct network layers. To further capitalize on the calibration of the target subject, source aliasing matrix estimation (SAME) data augmentation is incorporated into the training process of the ensemble TRCA (eTRCA) and TDCA models. Ultimately, the outputs of the CSDuDoFN, eTRCA, and TDCA are combined for SSVEP classification. The effectiveness of our proposed approach is comprehensively evaluated on three publicly available SSVEP datasets, achieving the best performance on two datasets and competitive performance on one. This underscores the potential for integrating brain-computer interface (BCI) into daily life.
The Impact of Adversarial Node Placement in Decentralized Federated Learning Networks
Authors: Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis, Christopher G. Brinton
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2311.07946
Pdf link: https://arxiv.org/pdf/2311.07946
Abstract As Federated Learning (FL) grows in popularity, new decentralized frameworks are becoming widespread. These frameworks leverage the benefits of decentralized environments to enable fast and energy-efficient inter-device communication. However, this growing popularity also intensifies the need for robust security measures. While existing research has explored various aspects of FL security, the role of adversarial node placement in decentralized networks remains largely unexplored. This paper addresses this gap by analyzing the performance of decentralized FL for various adversarial placement strategies when adversaries can jointly coordinate their placement within a network. We establish two baseline strategies for placing adversarial node: random placement and network centrality-based placement. Building on this foundation, we propose a novel attack algorithm that prioritizes adversarial spread over adversarial centrality by maximizing the average network distance between adversaries. We show that the new attack algorithm significantly impacts key performance metrics such as testing accuracy, outperforming the baseline frameworks by between 9% and 66.5% for the considered setups. Our findings provide valuable insights into the vulnerabilities of decentralized FL systems, setting the stage for future research aimed at developing more secure and robust decentralized FL frameworks.
Finding Inductive Loop Invariants using Large Language Models
Authors: Adharsh Kamath, Aditya Senthilnathan, Saikat Chakraborty, Pantazis Deligiannis, Shuvendu K. Lahiri, Akash Lal, Aseem Rastogi, Subhajit Roy, Rahul Sharma
Subjects: Programming Languages (cs.PL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07948
Pdf link: https://arxiv.org/pdf/2311.07948
Abstract Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting the entire program, thus are indispensable artifacts in a formal proof of correctness. Finding inductive loop invariants is an undecidable problem, and despite a long history of research towards practical solutions, it remains far from a solved problem. This paper investigates the capabilities of the Large Language Models (LLMs) in offering a new solution towards this old, yet important problem. To that end, we first curate a dataset of verification problems on programs with loops. Next, we design a prompt for exploiting LLMs, obtaining inductive loop invariants, that are checked for correctness using sound symbolic tools. Finally, we explore the effectiveness of using an efficient combination of a symbolic tool and an LLM on our dataset and compare it against a purely symbolic baseline. Our results demonstrate that LLMs can help improve the state-of-the-art in automated program verification.
Deep Learning-Based Object Detection in Maritime Unmanned Aerial Vehicle Imagery: Review and Experimental Comparisons
Authors: Chenjie Zhao, Ryan Wen Liu, Jingxiang Qu, Ruobin Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.07955
Pdf link: https://arxiv.org/pdf/2311.07955
Abstract With the advancement of maritime unmanned aerial vehicles (UAVs) and deep learning technologies, the application of UAV-based object detection has become increasingly significant in the fields of maritime industry and ocean engineering. Endowed with intelligent sensing capabilities, the maritime UAVs enable effective and efficient maritime surveillance. To further promote the development of maritime UAV-based object detection, this paper provides a comprehensive review of challenges, relative methods, and UAV aerial datasets. Specifically, in this work, we first briefly summarize four challenges for object detection on maritime UAVs, i.e., object feature diversity, device limitation, maritime environment variability, and dataset scarcity. We then focus on computational methods to improve maritime UAV-based object detection performance in terms of scale-aware, small object detection, view-aware, rotated object detection, lightweight methods, and others. Next, we review the UAV aerial image/video datasets and propose a maritime UAV aerial dataset named MS2ship for ship detection. Furthermore, we conduct a series of experiments to present the performance evaluation and robustness analysis of object detection methods on maritime datasets. Eventually, we give the discussion and outlook on future works for maritime UAV-based object detection. The MS2ship dataset is available at \href{https://github.com/zcj234/MS2ship}{https://github.com/zcj234/MS2ship}.
Configurable convolutional neural networks for real-time pedestrian-level wind prediction in urban environments
Authors: Alfredo Vicente Clemente, Knut Erik Teigen Giljarhus, Luca Oggiano, Massimiliano Ruocco
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2311.07985
Pdf link: https://arxiv.org/pdf/2311.07985
Abstract Urbanization has underscored the importance of understanding the pedestrian wind environment in urban and architectural design contexts. Pedestrian Wind Comfort (PWC) focuses on the effects of wind on the safety and comfort of pedestrians and cyclists, given the influence of urban structures on the local microclimate. Traditional Computational Fluid Dynamics (CFD) methods used for PWC analysis have limitations in computation, cost, and time. Deep-learning models have the potential to significantly speed up this process. The prevailing state-of-the-art methodologies largely rely on GAN-based models, such as pix2pix, which have exhibited training instability issues. In contrast, our work introduces a convolutional neural network (CNN) approach based on the U-Net architecture, offering a more stable and streamlined solution. The process of generating a wind flow prediction at pedestrian level is reformulated from a 3D CFD simulation into a 2D image-to-image translation task, using the projected building heights as input. Testing on standard consumer hardware shows that our model can efficiently predict wind velocities in urban settings in real time. Further tests on different configurations of the model, combined with a Pareto front analysis, helped identify the trade-off between accuracy and computational efficiency. This CNN-based approach provides a fast and efficient method for PWC analysis, potentially aiding in more efficient urban design processes.
On the View-and-Channel Aggregation Gain in Integrated Sensing and Edge AI
Authors: Xu Chen, Khaled B. Letaief, Kaibin Huang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.07986
Pdf link: https://arxiv.org/pdf/2311.07986
Abstract Sensing and edge artificial intelligence (AI) are two key features of the sixth-generation (6G) mobile networks. Their natural integration, termed Integrated sensing and edge AI (ISEA), is envisioned to automate wide-ranging Internet-of-Tings (IoT) applications. To achieve a high sensing accuracy, multi-view features are uploaded to an edge server for aggregation and inference using an AI model. The view aggregation is realized efficiently using over-the-air computing (AirComp), which also aggregates channels to suppress channel noise. At its nascent stage, ISEA still lacks a characterization of the fundamental performance gains from view-and-channel aggregation, which motivates this work. Our framework leverages a well-established distribution model of multi-view sensing data where the classic Gaussian-mixture model is modified by adding sub-spaces matrices to represent individual sensor observation perspectives. Based on the model, we study the End-to-End sensing (inference) uncertainty, a popular measure of inference accuracy, of the said ISEA system by a novel approach involving designing a scaling-tight uncertainty surrogate function, global discriminant gain, distribution of receive Signal-to-Noise Ratio (SNR), and channel induced discriminant loss. We prove that the E2E sensing uncertainty diminishes at an exponential rate as the number of views/sensors grows, where the rate is proportional to global discriminant gain. Given channel distortion, we further show that the exponential scaling remains with a reduced decay rate related to the channel induced discriminant loss. Furthermore, we benchmark AirComp against equally fast, traditional analog orthogonal access, which reveals a sensing-accuracy crossing point between the schemes, leading to the proposal of adaptive access-mode switching. Last, the insights from our framework are validated by experiments using real-world dataset.
Probable Object Location (POLo) Score Estimation for Efficient Object Goal Navigation
Authors: Jiaming Wang, Harold Soh
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.07992
Pdf link: https://arxiv.org/pdf/2311.07992
Abstract To advance the field of autonomous robotics, particularly in object search tasks within unexplored environments, we introduce a novel framework centered around the Probable Object Location (POLo) score. Utilizing a 3D object probability map, the POLo score allows the agent to make data-driven decisions for efficient object search. We further enhance the framework's practicality by introducing POLoNet, a neural network trained to approximate the computationally intensive POLo score. Our approach addresses critical limitations of both end-to-end reinforcement learning methods, which suffer from memory decay over long-horizon tasks, and traditional map-based methods that neglect visibility constraints. Our experiments, involving the first phase of the OVMM 2023 challenge, demonstrate that an agent equipped with POLoNet significantly outperforms a range of baseline methods, including end-to-end RL techniques and prior map-based strategies. To provide a comprehensive evaluation, we introduce new performance metrics that offer insights into the efficiency and effectiveness of various agents in object goal navigation.
Adversarial Preference Optimization
Authors: Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Nan Du
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.08045
Pdf link: https://arxiv.org/pdf/2311.08045
Abstract Human preference alignment is a crucial training step to improve the interaction quality of large language models (LLMs). Existing aligning methods depend on manually annotated preference data to guide the LLM optimization directions. However, in practice, continuously updating LLMs raises a distribution gap between model-generated samples and human-preferred responses, which hinders model fine-tuning efficiency. To mitigate this issue, previous methods require additional preference annotation on generated samples to adapt the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an adversarial preference optimization (APO) framework, where the LLM agent and the preference model update alternatively via a min-max game. Without additional annotation, our APO method can make a self-adaption to the generation distribution gap through the adversarial learning process. In experiments, we empirically verify the effectiveness of APO in improving LLM's helpfulness and harmlessness compared with rejection sampling baselines.
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Authors: Peng Jin, Ryuichi Takanobu, Caiwan Zhang, Xiaochun Cao, Li Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.08046
Pdf link: https://arxiv.org/pdf/2311.08046
Abstract Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations. However, existing methods encounter challenges in effectively handling both image and video understanding, particularly with limited visual tokens. In this work, we introduce Chat-UniVi, a unified vision-language model capable of comprehending and engaging in conversations involving images and videos through a unified visual representation. Specifically, we employ a set of dynamic visual tokens to uniformly represent images and videos. This representation framework empowers the model to efficiently utilize a limited number of visual tokens to simultaneously capture the spatial details necessary for images and the comprehensive temporal relationship required for videos. Moreover, we leverage a multi-scale representation, enabling the model to perceive both high-level semantic concepts and low-level visual details. Notably, Chat-UniVi is trained on a mixed dataset containing both images and videos, allowing direct application to tasks involving both mediums without requiring any modifications. Extensive experimental results demonstrate that Chat-UniVi, as a unified model, consistently outperforms even existing methods exclusively designed for either images or videos.
Act-VIT: A Representationally Robust Attention Architecture for Skeleton Based Action Recognition Using Vision Transformer
Authors: Ozge Oztimur Karadag
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.08094
Pdf link: https://arxiv.org/pdf/2311.08094
Abstract Skeleton-based action recognition receives the attention of many researchers as it is robust to viewpoint and illumination changes, and its processing is much more efficient than video frames. With the emergence of deep learning models, it has become very popular to represent the skeleton data in pseudo-image form and apply Convolutional Neural Networks for action recognition. Thereafter, studies concentrated on finding effective methods for forming pseudo-images. Recently, attention networks, more specifically transformers have provided promising results in various vision problems. In this study, the effectiveness of vision transformers for skeleton-based action recognition is examined and its robustness on the pseudo-image representation scheme is investigated. To this end, a three-level architecture, Act-VIT is proposed, which forms a set of pseudo images apply a classifier on each of the representation and combine their results to find the final action class. The classifiers of Act-VIT are first realized by CNNs and then by VITs and their performances are compared. Experimental studies reveal that the vision transformer is less sensitive to the initial pseudo-image representation compared to CNN. Nevertheless, even with the vision transformer, the recognition performance can be further improved by consensus of classifiers.
Smart Skin separation control using distributed-input distributed-output, multi-modal actuators, and machine learning
Authors: Songqi Li
Subjects: Systems and Control (eess.SY); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2311.08116
Pdf link: https://arxiv.org/pdf/2311.08116
Abstract Efficient flow separation control represents significant economic benefit. This study applies a machine learning algorithm to minimize flow separation in Smart Skin, a flow control device that features distributed-input and distributed-output (DIDO). Smart Skin comprises 30 hybrid actuator units, each integrating a height-adjustable vortex generator and a mini-jet actuator. These units are deployed on a backward-facing ramp to reduce flow separation in a distributed manner. To monitor the flow state, distributed pressure taps are deployed around the multi-modal actuators. Parametric studies indicate that the mapping between control parameters and separation control performance is complex. To optimize separation control, a cutting-edge variant of the particle swarm optimization (PSO-TPME) is used for the control parameters in the Smart Skin. This algorithm is capable of achieving fast optimization in high-dimensional parameter spaces. The results demonstrate the efficiency of PSO-TPME, and the optimized solution significantly outperforms the best result from the parametric study. These findings represent a promising future of machine learning-based flow control using distributed actuators and sensors.
Memory-efficient Stochastic methods for Memory-based Transformers
Authors: Vishwajit Kumar Vishnu, C. Chandra Sekhar
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.08123
Pdf link: https://arxiv.org/pdf/2311.08123
Abstract Training Memory-based transformers can require a large amount of memory and can be quite inefficient. We propose a novel two-phase training mechanism and a novel regularization technique to improve the training efficiency of memory-based transformers, which are often used for long-range context problems. For our experiments, we consider transformer-XL as our baseline model which is one of memorybased transformer models. We show that our resultant model, Skip Cross-head TransformerXL, outperforms the baseline on character level language modeling task with similar parameters and outperforms the baseline on word level language modelling task with almost 20% fewer parameters. Our proposed methods do not require any additional memory. We also demonstrate the effectiveness of our regularization mechanism on BERT which shows similar performance with reduction in standard deviation of scores of around 30% on multiple GLUE tasks.
TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree transformation
Authors: Zixiang Xian, Rubing Huang, Dave Towey, Chunrong Fang, Zhenyu Chen
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.08157
Pdf link: https://arxiv.org/pdf/2311.08157
Abstract Large-scale language models have made great progress in the field of software engineering in recent years. They can be used for many code-related tasks such as code clone detection, code-to-code search, and method name prediction. However, these large-scale language models based on each code token have several drawbacks: They are usually large in scale, heavily dependent on labels, and require a lot of computing power and time to fine-tune new datasets.Furthermore, code embedding should be performed on the entire code snippet rather than encoding each code token. The main reason for this is that encoding each code token would cause model parameter inflation, resulting in a lot of parameters storing information that we are not very concerned about. In this paper, we propose a novel framework, called TransformCode, that learns about code embeddings in a contrastive learning manner. The framework uses the Transformer encoder as an integral part of the model. We also introduce a novel data augmentation technique called abstract syntax tree transformation: This technique applies syntactic and semantic transformations to the original code snippets to generate more diverse and robust anchor samples. Our proposed framework is both flexible and adaptable: It can be easily extended to other downstream tasks that require code representation such as code clone detection and classification. The framework is also very efficient and scalable: It does not require a large model or a large amount of training data, and can support any programming language.Finally, our framework is not limited to unsupervised learning, but can also be applied to some supervised learning tasks by incorporating task-specific labels or objectives. To explore the effectiveness of our framework, we conducted extensive experiments on different software engineering tasks using different programming languages and multiple datasets.
Channel Estimation with Dynamic Metasurface Antennas via Model-Based Learning
Authors: Xiangyu Zhang, Haiyang Zhang, Luxi Yang, Yonina C.Eldar
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.08158
Pdf link: https://arxiv.org/pdf/2311.08158
Abstract Dynamic Metasurface Antenna (DMA) is a cutting-edge antenna technology offering scalable and sustainable solutions for large antenna arrays. The effectiveness of DMAs stems from their inherent configurable analog signal processing capabilities, which facilitate cost-limited implementations. However, when DMAs are used in multiple input multiple output (MIMO) communication systems, they pose challenges in channel estimation due to their analog compression. In this paper, we propose two model-based learning methods to overcome this challenge. Our approach starts by casting channel estimation as a compressed sensing problem. Here, the sensing matrix is formed using a random DMA weighting matrix combined with a spatial gridding dictionary. We then employ the learned iterative shrinkage and thresholding algorithm (LISTA) to recover the sparse channel parameters. LISTA unfolds the iterative shrinkage and thresholding algorithm into a neural network and trains the neural network into a highly efficient channel estimator fitting with the previous channel. As the sensing matrix is crucial to the accuracy of LISTA recovery, we introduce another data-aided method, LISTA-sensing matrix optimization (LISTA-SMO), to jointly optimize the sensing matrix. LISTA-SMO takes LISTA as a backbone and embeds the sensing matrix optimization layers in LISTA's neural network, allowing for the optimization of the sensing matrix along with the training of LISTA. Furthermore, we propose a self-supervised learning technique to tackle the difficulty of acquiring noise-free data. Our numerical results demonstrate that LISTA outperforms traditional sparse recovery methods regarding channel estimation accuracy and efficiency. Besides, LISTA-SMO achieves better channel accuracy than LISTA, demonstrating the effectiveness in optimizing the sensing matrix.
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
Authors: Shengguang Wu, Keming Lu, Benfeng Xu, Junyang Lin, Qi Su, Chang Zhou
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.08182
Pdf link: https://arxiv.org/pdf/2311.08182
Abstract Enhancing the instruction-following ability of Large Language Models (LLMs) primarily demands substantial instruction-tuning datasets. However, the sheer volume of these imposes a considerable computational burden and annotation cost. To investigate a label-efficient instruction tuning method that allows the model itself to actively sample subsets that are equally or even more effective, we introduce a self-evolving mechanism DiverseEvol. In this process, a model iteratively augments its training subset to refine its own performance, without requiring any intervention from humans or more advanced LLMs. The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets, as the model selects new data points most distinct from any existing ones according to its current embedding space. Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol. Our models, trained on less than 8% of the original dataset, maintain or improve performance compared with finetuning on full data. We also provide empirical evidence to analyze the importance of diversity in instruction data and the iterative scheme as opposed to one-time sampling. Our code is publicly available at https://github.com/OFA-Sys/DiverseEvol.git.
Fast List Decoding of High-Rate Polar Codes
Authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.08188
Pdf link: https://arxiv.org/pdf/2311.08188
Abstract Due to the ability to provide superior error-correction performance, the successive cancellation list (SCL) algorithm is widely regarded as one of the most promising decoding algorithms for polar codes with short-to-moderate code lengths. However, the application of SCL decoding in low-latency communication scenarios is limited due to its sequential nature. To reduce the decoding latency, developing tailored fast and efficient list decoding algorithms of specific polar substituent codes (special nodes) is a promising solution. Recently, fast list decoding algorithms are proposed by considering special nodes with low code rates. Aiming to further speedup the SCL decoding, this paper presents fast list decoding algorithms for two types of high-rate special nodes, namely single-parity-check (SPC) nodes and sequence rate one or single-parity-check (SR1/SPC) nodes. In particular, we develop two classes of fast list decoding algorithms for these nodes, where the first class uses a sequential decoding procedure to yield decoding latency that is linear with the list size, and the second further parallelizes the decoding process by pre-determining the redundant candidate paths offline. Simulation results show that the proposed list decoding algorithms are able to achieve up to 70.7\% lower decoding latency than state-of-the-art fast SCL decoders, while exhibiting the same error-correction performance.
Unlocking Science: Novel Dataset and Benchmark for Cross-Modality Scientific Information Extraction
Authors: Yuhan Li, Jian Wu, Zhiwei Yu, Börje F. Karlsso, Wei Shen, Manabu Okumura, Chin-Yew Lin
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.08189
Pdf link: https://arxiv.org/pdf/2311.08189
Abstract Extracting key information from scientific papers has the potential to help researchers work more efficiently and accelerate the pace of scientific progress. Over the last few years, research on Scientific Information Extraction (SciIE) witnessed the release of several new systems and benchmarks. However, existing paper-focused datasets mostly focus only on specific parts of a manuscript (e.g., abstracts) and are single-modality (i.e., text- or table-only), due to complex processing and expensive annotations. Moreover, core information can be present in either text or tables or across both. To close this gap in data availability and enable cross-modality IE, while alleviating labeling costs, we propose a semi-supervised pipeline for annotating entities in text, as well as entities and relations in tables, in an iterative procedure. Based on this pipeline, we release novel resources for the scientific community, including a high-quality benchmark, a large-scale corpus, and a semi-supervised annotation pipeline. We further report the performance of state-of-the-art IE models on the proposed benchmark dataset, as a baseline. Lastly, we explore the potential capability of large language models such as ChatGPT for the current task. Our new dataset, results, and analysis validate the effectiveness and efficiency of our semi-supervised pipeline, and we discuss its remaining limitations.
Counterfactual Explanation for Regression via Disentanglement in Latent Space
Authors: Xuan Zhao, Klaus Broelemann, Gjergji Kasneci
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.08228
Pdf link: https://arxiv.org/pdf/2311.08228
Abstract Counterfactual Explanations (CEs) help address the question: How can the factors that influence the prediction of a predictive model be changed to achieve a more favorable outcome from a user's perspective? Thus, they bear the potential to guide the user's interaction with AI systems since they represent easy-to-understand explanations. To be applicable, CEs need to be realistic and actionable. In the literature, various methods have been proposed to generate CEs. However, the majority of research on CEs focuses on classification problems where questions like What should I do to get my rejected loan approved?" are raised. In practice, answering questions likeWhat should I do to increase my salary?" are of a more regressive nature. In this paper, we introduce a novel method to generate CEs for a pre-trained regressor by first disentangling the label-relevant from the label-irrelevant dimensions in the latent space. CEs are then generated by combining the label-irrelevant dimensions and the predefined output. The intuition behind this approach is that the ideal counterfactual search should focus on the label-irrelevant characteristics of the input and suggest changes toward target-relevant characteristics. Searching in the latent space could help achieve this goal. We show that our method maintains the characteristics of the query sample during the counterfactual search. In various experiments, we demonstrate that the proposed method is competitive based on different quality measures on image and tabular datasets in regression problem settings. It efficiently returns results closer to the original data manifold compared to three state-of-the-art methods, which is essential for realistic high-dimensional machine learning applications. Our code will be made available as an open-source package upon the publication of this work.
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
Authors: Nicholas E. Corrado, Josiah P. Hanna
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.08290
Pdf link: https://arxiv.org/pdf/2311.08290
Abstract On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match the expected on-policy data distribution. This sampling error leads to noisy updates and data inefficient on-policy learning. Recent work in the policy evaluation setting has shown that non-i.i.d., off-policy sampling can produce data with lower sampling error than on-policy sampling can produce. Motivated by this observation, we introduce an adaptive, off-policy sampling method to improve the data efficiency of on-policy policy gradient algorithms. Our method, Proximal Robust On-Policy Sampling (PROPS), reduces sampling error by collecting data with a behavior policy that increases the probability of sampling actions that are under-sampled with respect to the current policy. Rather than discarding data from old policies -- as is commonly done in on-policy algorithms -- PROPS uses data collection to adjust the distribution of previously collected data to be approximately on-policy. We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms. Our work improves the RL community's understanding of a nuance in the on-policy vs off-policy dichotomy: on-policy learning requires on-policy data, not on-policy sampling.
Resource Efficient Over-the-Air Fronthaul Signaling for Uplink Cell-Free Massive MIMO Systems
Authors: Zakir Hussain Shaik, Sai Subramanyam Thoota, Emil Björnson, Erik G. Larsson
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.08319
Pdf link: https://arxiv.org/pdf/2311.08319
Abstract We propose a novel resource efficient analog over-the-air (OTA) computation framework to address the demanding requirements of the uplink (UL) fronthaul between the access points (APs) and the central processing unit (CPU) in cell-free massive multiple-input multiple-output (MIMO) systems. We discuss the drawbacks of the wired and wireless fronthaul solutions, and show that our proposed mechanism is efficient and scalable as the number of APs increases. We present the transmit precoding and two-phase power assignment strategies at the APs to coherently combine the signals OTA in a spectrally efficient manner. We derive the statistics of the APs locally available signals which enable us to to obtain the analytical expressions for the Bayesian and classical estimators of the OTA combined signals. We empirically evaluate the normalized mean square error (NMSE), symbol error rate (SER), and the coded bit error rate (BER) of our developed solution and benchmark against the state-of-the-art wired fronthaul based system
GT4Py: High Performance Stencils for Weather and Climate Applications using Python
Authors: Enrique G. Paredes, Linus Groner, Stefano Ubbiali, Hannes Vogt, Alberto Madonna, Kean Mariotti, Felipe Cruz, Lucas Benedicic, Mauro Bianco, Joost VandeVondele, Thomas C. Schulthess
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2311.08322
Pdf link: https://arxiv.org/pdf/2311.08322
Abstract All major weather and climate applications are currently developed using languages such as Fortran or C++. This is typical in the domain of high performance computing (HPC), where efficient execution is an important concern. Unfortunately, this approach leads to implementations that intermix optimizations for specific hardware architectures with the high-level numerical methods that are typical for the domain. This leads to code that is verbose, difficult to extend and maintain, and difficult to port to different hardware architectures. Here, we propose a different strategy based on GT4Py (GridTools for Python). GT4Py is a Python framework to write weather and climate applications that includes a high-level embedded domain specific language (DSL) to write stencil computations. The toolchain integrated in GT4Py enables automatic code-generation,to obtain the performance of state-of-the-art C++ and CUDA implementations. The separation of concerns between the mathematical definitions and the actual implementations allows for performance portability of the computations on a wide range of computing architectures, while being embedded in Python allows easy access to the tools of the Python ecosystem to enhance the productivity of the scientists and facilitate integration in complex workflows. Here, the initial release of GT4Py is described, providing an overview of the current state of the framework and performance results showing how GT4Py can outperform pure Python implementations by orders of magnitude.
KTRL+F: Knowledge-Augmented In-Document Search
Authors: Hanseok Oh, Haebin Shin, Miyoung Ko, Hyunji Lee, Minjoon Seo
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.08329
Pdf link: https://arxiv.org/pdf/2311.08329
Abstract We introduce a new problem KTRL+F, a knowledge-augmented in-document search task that necessitates real-time identification of all semantic targets within a document with the awareness of external sources through a single natural query. This task addresses following unique challenges for in-document search: 1) utilizing knowledge outside the document for extended use of additional information about targets to bridge the semantic gap between the query and the targets, and 2) balancing between real-time applicability with the performance. We analyze various baselines in KTRL+F and find there are limitations of existing models, such as hallucinations, low latency, or difficulties in leveraging external knowledge. Therefore we propose a Knowledge-Augmented Phrase Retrieval model that shows a promising balance between speed and performance by simply augmenting external knowledge embedding in phrase embedding. Additionally, we conduct a user study to verify whether solving KTRL+F can enhance search experience of users. It demonstrates that even with our simple model users can reduce the time for searching with less queries and reduced extra visits to other sources for collecting evidence. We encourage the research community to work on KTRL+F to enhance more efficient in-document information access.
Calibration of an Elastic Humanoid Upper Body and Efficient Compensation for Motion Planning
Authors: Johannes Tenhumberg, Berthold Bäuml
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.08333
Pdf link: https://arxiv.org/pdf/2311.08333
Abstract High absolute accuracy is an essential prerequisite for a humanoid robot to autonomously and robustly perform manipulation tasks while avoiding obstacles. We present for the first time a kinematic model for a humanoid upper body incorporating joint and transversal elasticities. These elasticities lead to significant deformations due to the robot's own weight, and the resulting model is implicitly defined via a torque equilibrium. We successfully calibrate this model for DLR's humanoid Agile Justin, including all Denavit-Hartenberg parameters and elasticities. The calibration is formulated as a combined least-squares problem with priors and based on measurements of the end effector positions of both arms via an external tracking system. The absolute position error is massively reduced from 21mm to 3.1mm on average in the whole workspace. Using this complex and implicit kinematic model in motion planning is challenging. We show that for optimization-based path planning, integrating the iterative solution of the implicit model into the optimization loop leads to an elegant and highly efficient solution. For mildly elastic robots like Agile Justin, there is no performance impact, and even for a simulated highly flexible robot with 20 times higher elasticities, the runtime increases by only 30%.
Speeding Up Optimization-based Motion Planning through Deep Learning
Authors: Johannes Tenhumberg, Darius Burschka, Berthold Bäuml
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.08345
Pdf link: https://arxiv.org/pdf/2311.08345
Abstract Planning collision-free motions for robots with many degrees of freedom is challenging in environments with complex obstacle geometries. Recent work introduced the idea of speeding up the planning by encoding prior experience of successful motion plans in a neural network. However, this "neural motion planning" did not scale to complex robots in unseen 3D environments as needed for real-world applications. Here, we introduce "basis point set", well-known in computer vision, to neural motion planning as a modern compact environment encoding enabling efficient supervised training networks that generalize well over diverse 3D worlds. Combined with a new elaborate training scheme, we reach a planning success rate of 100%. We use the network to predict an educated initial guess for an optimization-based planner (OMP), which quickly converges to a feasible solution, massively outperforming random multi-starts when tested on previously unseen environments. For the DLR humanoid Agile Justin with 19DoF and in challenging obstacle environments, optimal paths can be generated in 200ms using only a single CPU core. We also show a first successful real-world experiment based on a high-resolution world model from an integrated 3D sensor.
Transformers can optimally learn regression mixture models
Authors: Reese Pathak, Rajat Sen, Weihao Kong, Abhimanyu Das
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.08362
Pdf link: https://arxiv.org/pdf/2311.08362
Abstract Mixture models arise in many regression problems, but most methods have seen limited adoption partly due to these algorithms' highly-tailored and model-specific nature. On the other hand, transformers are flexible, neural sequence models that present the intriguing possibility of providing general-purpose prediction methods, even in this mixture setting. In this work, we investigate the hypothesis that transformers can learn an optimal predictor for mixtures of regressions. We construct a generative process for a mixture of linear regressions for which the decision-theoretic optimal procedure is given by data-driven exponential weights on a finite set of parameters. We observe that transformers achieve low mean-squared error on data generated via this process. By probing the transformer's output at inference time, we also show that transformers typically make predictions that are close to the optimal predictor. Our experiments also demonstrate that transformers can learn mixtures of regressions in a sample-efficient fashion and are somewhat robust to distribution shifts. We complement our experimental observations by proving constructively that the decision-theoretic optimal procedure is indeed implementable by a transformer.
Arboricity-Dependent Algorithms for Edge Coloring
Authors: Sayan Bhattacharya, Martín Costa, Nadav Panski, Shay Solomon
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.08367
Pdf link: https://arxiv.org/pdf/2311.08367
Abstract The problem of edge coloring has been extensively studied over the years. The main conceptual contribution of this work is in identifying a surprisingly simple connection between the problem of $(\Delta +O(\alpha))$-edge coloring and a certain canonical graph decomposition in graphs of arboricity $\alpha$, for which efficient algorithms are known across various computational models. We first leverage such graph decompositions to provide fast $(\Delta +O(\alpha))$-edge coloring algorithms in the standard {\em static} (sequential and distributed) settings. Further, as our main technical contribution, we show how to efficiently maintain a $(\Delta +O(\alpha))$-edge coloring in the standard {\em dynamic} model. Consequently, we improve over the state-of-the-art edge coloring algorithms in these models for graphs of sufficiently small arboricity.
Aid Nexus : A Blockchain Based Financial Distribution System
Authors: Md. Raisul Hasan Shahrukh, Md. Tabassinur Rahman, Nafees Mansoor
Subjects: Software Engineering (cs.SE); Computational Engineering, Finance, and Science (cs.CE); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.08372
Pdf link: https://arxiv.org/pdf/2311.08372
Abstract Blockchain technology has emerged as a disruptive force with transformative potential across numerous industries, promising efficient and automated solutions that can revolutionize traditional systems. By leveraging decentralized ledger systems, blockchain offers enhanced security, transparency, and transaction verification without the need for intermediaries. The finance sector is exploring blockchain-based solutions for payments, remittances, lending, and investments, while healthcare adopts the technology for medical record keeping, supply chain tracking, and data management. Similarly, supply chain management benefits from blockchain's ability to enhance transparency, traceability, and accountability from raw materials to finished products. Other sectors, including real estate, energy, and government, are also investigating blockchain-based solutions to improve efficiency, security, and transparency. Furthermore, smart contracts within the blockchain enable process automation, reducing manual intervention in distribution workflows. AidNeux, a consortium-based blockchain DApp, reimagines the distribution of financial assistance by addressing inefficiencies and opaqueness. Using smart contracts ensures the security and directness of money transfers. Its robust digital identity verification and real-time auditability reduce fraud risks and strengthen accountability, thereby presenting a scalable, transparent solution to problems inherent to conventional financial aid systems.
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Authors: Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.08384
Pdf link: https://arxiv.org/pdf/2311.08384
Abstract Hybrid RL is the setting where an RL agent has access to both offline data and online data by interacting with the real-world environment. In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data. On-policy methods such as policy gradient and natural policy gradient (NPG) have shown to be more robust to model misspecification, though sometimes it may not be as sample efficient as methods that rely on off-policy learning. On the other hand, offline methods that depend on off-policy training often require strong assumptions in theory and are less stable to train in practice. Our new approach integrates a procedure of off-policy training on the offline data into an on-policy NPG framework. We show that our approach, in theory, can obtain a best-of-both-worlds type of result -- it achieves the state-of-art theoretical guarantees of offline RL when offline RL-specific assumptions hold, while at the same time maintaining the theoretical guarantees of on-policy NPG regardless of the offline RL assumptions' validity. Experimentally, in challenging rich-observation environments, we show that our approach outperforms a state-of-the-art hybrid RL baseline which only relies on off-policy policy optimization, demonstrating the empirical benefit of combining on-policy and off-policy learning. Our code is publicly available at https://github.com/YifeiZhou02/HNPG.
Iterative Network Pricing for Ridesharing Platforms
Authors: Chenkai Yu, Hongyao Ma
Subjects: Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.08392
Pdf link: https://arxiv.org/pdf/2311.08392
Abstract Ridesharing platforms match riders and drivers, using dynamic pricing to balance supply and demand. The origin-based "surge pricing", however, does not take into consideration market conditions at trip destinations, leading to inefficient driver flows in space and incentivizes drivers to strategize. In this work, we introduce the Iterative Network Pricing mechanism, addressing a main challenge in the practical implementation of optimal origin-destination (OD) based prices, that the model for rider demand is hard to estimate. Assuming that the platform's surge algorithm clears the market for each origin in real-time, our mechanism updates the OD-based price adjustments week-over-week, using only information immediately observable during the same time window in the prior weeks. For stationary market conditions, we prove that our mechanism converges to an outcome that is approximately welfare-optimal. Using data from the City of Chicago, we illustrate (via simulation) the iterative updates under our mechanism for morning rush hours, demonstrating substantial welfare improvements despite significant fluctuations of market conditions from early 2019 through the end of 2020.
Keyword: faster

Size-Aware Hypergraph Motifs
Authors: Jason Niu, Ilya D. Amburg, Sinan G. Aksoy, Ahmet Erdem Sarıyüce
Subjects: Discrete Mathematics (cs.DM); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2311.07783
Pdf link: https://arxiv.org/pdf/2311.07783
Abstract Complex systems frequently exhibit multi-way, rather than pairwise, interactions. These group interactions cannot be faithfully modeled as collections of pairwise interactions using graphs, and instead require hypergraphs. However, methods that analyze hypergraphs directly, rather than via lossy graph reductions, remain limited. Hypergraph motif mining holds promise in this regard, as motif patterns serve as building blocks for larger group interactions which are inexpressible by graphs. Recent work has focused on categorizing and counting hypergraph motifs based on the existence of nodes in hyperedge intersection regions. Here, we argue that the relative sizes of hyperedge intersections within motifs contain varied and valuable information. We propose a suite of efficient algorithms for finding triplets of hyperedges based on optimizing the sizes of these intersection patterns. This formulation uncovers interesting local patterns of interaction, finding hyperedge triplets that either (1) are the least correlated with each other, (2) have the highest pairwise but not groupwise correlation, or (3) are the most correlated with each other. We formalize this as a combinatorial optimization problem and design efficient algorithms based on filtering hyperedges. Our experimental evaluation shows that the resulting hyperedge triplets yield insightful information on real-world hypergraphs. Our approach is also orders of magnitude faster than a naive baseline implementation.
Predicting the First Response Latency of Maintainers and Contributors in Pull Requests
Authors: SayedHassan Khatoonabadi, Ahmad Abdellatif, Diego Elias Costa, Emad Shihab
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.07786
Pdf link: https://arxiv.org/pdf/2311.07786
Abstract The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also perform permutation feature importance and SHAP analyses to understand the importance and impact of different features on the predicted response latencies. Our best-performing models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. Our findings indicate that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.
Container Resource Allocation versus Performance of Data-intensive Applications on Different Cloud Servers
Authors: Qing Wang, Snigdhaswin Kar, Prabodh Mishra, Caleb Linduff, Ryan Izard, Khayam Anjam, Geddings Barrineau, Junaid Zulfiqar, Kuang-Ching Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.07818
Pdf link: https://arxiv.org/pdf/2311.07818
Abstract In recent years, data-intensive applications have been increasingly deployed on cloud systems. Such applications utilize significant compute, memory, and I/O resources to process large volumes of data. Optimizing the performance and cost-efficiency for such applications is a non-trivial problem. The problem becomes even more challenging with the increasing use of containers, which are popular due to their lower operational overheads and faster boot speed at the cost of weaker resource assurances for the hosted applications. In this paper, two containerized data-intensive applications with very different performance objectives and resource needs were studied on cloud servers with Docker containers running on Intel Xeon E5 and AMD EPYC Rome multi-core processors with a range of CPU, memory, and I/O configurations. Primary findings from our experiments include: 1) Allocating multiple cores to a compute-intensive application can improve performance, but only if the cores do not contend for the same caches, and the optimal core counts depend on the specific workload; 2) allocating more memory to a memory-intensive application than its deterministic data workload does not further improve performance; however, 3) having multiple such memory-intensive containers on the same server can lead to cache and memory bus contention leading to significant and volatile performance degradation. The comparative observations on Intel and AMD servers provided insights into trade-offs between larger numbers of distributed chiplets interconnected with higher speed buses (AMD) and larger numbers of centrally integrated cores and caches with lesser speed buses (Intel). For the two types of applications studied, the more distributed caches and faster data buses have benefited the deployment of larger numbers of containers.
Quantum Algorithms for Graph Coloring and other Partitioning, Covering, and Packing Problems
Authors: Serge Gaspers, Jerry Zirui Li
Subjects: Data Structures and Algorithms (cs.DS); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2311.08042
Pdf link: https://arxiv.org/pdf/2311.08042
Abstract Let U be a universe on n elements, let k be a positive integer, and let F be a family of (implicitly defined) subsets of U. We consider the problems of partitioning U into k sets from F, covering U with k sets from F, and packing k non-intersecting sets from F into U. Classically, these problems can be solved via inclusion-exclusion in O(2^n) time [BjorklundHK09]. Quantumly, there are faster algorithms for graph coloring with running time O(1.9140^n) [ShimizuM22] and for Set Cover with a small number of sets with running time O(1.7274^n |F|^O(1)) [AmbainisBIKPV19]. In this paper, we give a quantum speedup for Set Partition, Set Cover, and Set Packing whenever there is a classical enumeration algorithm that lends itself to a quadratic quantum speedup, which, for any subinstance on a subset X of U, enumerates at least one member of a k-partition, k-cover, or k-packing (if one exists) restricted to (or projected onto, in the case of k-cover) the set X in O(c^{|X|}) time with c<2. Our bounded-error quantum algorithm runs in O((2+c)^(n/2)) for Set Partition, Set Cover, and Set Packing. When c<=1.147899, our algorithm is slightly faster than O((2+c)^(n/2)); when c approaches 1, it matches the running time of [AmbainisBIKPV19] for Set Cover when |F| is subexponential in n. For Graph Coloring, we further improve the running time to O(1.7956^n) by leveraging faster algorithms for coloring with a small number of colors to better balance our divide-and-conquer steps. For Domatic Number, we obtain a O((2-\epsilon)^n) running time for some \epsilon>0.
DynamicSurf: Dynamic Neural RGB-D Surface Reconstruction with an Optimizable Feature Grid
Authors: Mirgahney Mohamed, Lourdes Agapito
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.08159
Pdf link: https://arxiv.org/pdf/2311.08159
Abstract We propose DynamicSurf, a model-free neural implicit surface reconstruction method for high-fidelity 3D modelling of non-rigid surfaces from monocular RGB-D video. To cope with the lack of multi-view cues in monocular sequences of deforming surfaces, one of the most challenging settings for 3D reconstruction, DynamicSurf exploits depth, surface normals, and RGB losses to improve reconstruction fidelity and optimisation time. DynamicSurf learns a neural deformation field that maps a canonical representation of the surface geometry to the current frame. We depart from current neural non-rigid surface reconstruction models by designing the canonical representation as a learned feature grid which leads to faster and more accurate surface reconstruction than competing approaches that use a single MLP. We demonstrate DynamicSurf on public datasets and show that it can optimize sequences of varying frames with $6\times$ speedup over pure MLP-based approaches while achieving comparable results to the state-of-the-art methods. Project is available at https://mirgahney.github.io//DynamicSurf.io/.
REST: Retrieval-Based Speculative Decoding
Authors: Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, Di He
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.08252
Pdf link: https://arxiv.org/pdf/2311.08252
Abstract We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain common phases and patterns. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieval to generate draft tokens. This method draws from the reservoir of existing knowledge, retrieving and employing relevant tokens based on the current context. Its plug-and-play nature allows for seamless integration and acceleration of any language models, all without necessitating additional training. When benchmarked on 7B and 13B language models in a single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on code or text generation. The code of REST is available at https://github.com/FasterDecoding/REST.
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Authors: Hongxuan Zhang, Zhining Liu, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.08263
Pdf link: https://arxiv.org/pdf/2311.08263
Abstract In this work, we propose FastCoT, a model-agnostic framework based on parallel decoding without any further training of an auxiliary model or modification to the LLM itself. FastCoT uses a size-varying context window whose size changes with position to conduct parallel decoding and auto-regressive decoding simultaneously, thus fully utilizing GPU computation resources. In FastCoT, the parallel decoding part provides the LLM with a quick glance of the future composed of approximate tokens, which could lead to faster answers compared to regular autoregressive decoding used by causal transformers. We also provide an implementation of parallel decoding within LLM, which supports KV-cache generation and batch processing. Through extensive experiments, we demonstrate that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach. Additionally, we show that the context window size exhibits considerable robustness for different tasks.
Optimally Managing the Impacts of Convergence Tolerance for Distributed Optimal Power Flow
Authors: Rachel Harris, Mohannad Alkhraijah, Daniel K. Molzahn
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.08305
Pdf link: https://arxiv.org/pdf/2311.08305
Abstract The future power grid may rely on distributed optimization to determine the set-points for huge numbers of distributed energy resources. There has been significant work on applying distributed algorithms to optimal power flow (OPF) problems, which require separate computing agents to agree on shared boundary variable values. Looser tolerances for the mismatches in these shared variables generally yield faster convergence at the expense of exacerbating constraint violations, but there is little quantitative understanding of how the convergence tolerance affects solution quality. To address this gap, we first quantify how convergence tolerance impacts constraint violations when the distributed OPF generator dispatch is applied to the power system. Using insights from this analysis, we then develop a bound tightening algorithm which guarantees that operating points from distributed OPF algorithms will not result in violations despite the possibility of shared variable mismatches within the convergence tolerance. We also explore how bounding the cumulative shared variable mismatches can prevent unnecessary conservativeness in the bound tightening. The proposed approach enables control of the trade-off between computational speed, which improves as the convergence tolerance increases, and distributed OPF solution cost, which increases with convergence tolerance due to tightened constraints, while ensuring feasibility.
Keyword: mobile

Synchrophasor Data Anomaly Detection on Grid Edge by 5G Communication and Adjacent Compute
Authors: Chuan Qin, Dexin Wang, Kishan Prudhvi Guddanti, Xiaoyuan Fan, Zhangshuan Hou
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.07758
Pdf link: https://arxiv.org/pdf/2311.07758
Abstract The fifth-generation mobile communication (5G) technology offers opportunities to enhance the real-time monitoring of grids. The 5G-enabled phasor measurement units (PMUs) feature flexible positioning and cost-effective long-term maintenance without the constraints of fixing wires. This paper is the first to demonstrate the applicability of 5G in PMU communication, and the experiment was carried out at Verizon non-standalone test-bed at Pacific Northwest National Laboratory (PNNL) Advanced Wireless Communication lab. The performance of the 5G-enabled PMU communication setup is reviewed and discussed in this paper, and a generalized dynamic linear model (GDLM) based real-time synchrophasor data anomaly detection use-case is presented. Last but not least, the practicability of implementing 5G for wide-area protection strategies is explored and discussed by analyzing the experimental results.
FedOpenHAR: Federated Multi-Task Transfer Learning for Sensor-Based Human Activity Recognition
Authors: Egemen İşgüder, Özlem Durmaz İncel
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.07765
Pdf link: https://arxiv.org/pdf/2311.07765
Abstract Motion sensors integrated into wearable and mobile devices provide valuable information about the device users. Machine learning and, recently, deep learning techniques have been used to characterize sensor data. Mostly, a single task, such as recognition of activities, is targeted, and the data is processed centrally at a server or in a cloud environment. However, the same sensor data can be utilized for multiple tasks and distributed machine-learning techniques can be used without the requirement of the transmission of data to a centre. This paper explores Federated Transfer Learning in a Multi-Task manner for both sensor-based human activity recognition and device position identification tasks. The OpenHAR framework is used to train the models, which contains ten smaller datasets. The aim is to obtain model(s) applicable for both tasks in different datasets, which may include only some label types. Multiple experiments are carried in the Flower federated learning environment using the DeepConvLSTM architecture. Results are presented for federated and centralized versions under different parameters and restrictions. By utilizing transfer learning and training a task-specific and personalized federated model, we obtained a similar accuracy with training each client individually and higher accuracy than a fully centralized approach.
Enabling Decision-Support Systems through Automated Cell Tower Detection
Authors: Natasha Krell, Will Gleave, Daniel Nakada, Justin Downes, Amanda Willet, Matthew Baran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.07840
Pdf link: https://arxiv.org/pdf/2311.07840
Abstract Cell phone coverage and high-speed service gaps persist in rural areas in sub-Saharan Africa, impacting public access to mobile-based financial, educational, and humanitarian services. Improving maps of telecommunications infrastructure can help inform strategies to eliminate gaps in mobile coverage. Deep neural networks, paired with remote sensing images, can be used for object detection of cell towers and eliminate the need for inefficient and burdensome manual mapping to find objects over large geographic regions. In this study, we demonstrate a partially automated workflow to train an object detection model to locate cell towers using OpenStreetMap (OSM) features and high-resolution Maxar imagery. For model fine-tuning and evaluation, we curated a diverse dataset of over 6,000 unique images of cell towers in 26 countries in eastern, southern, and central Africa using automatically generated annotations from OSM points. Our model achieves an average precision at 50% Intersection over Union (IoU) (AP@50) of 81.2 with good performance across different geographies and out-of-sample testing. Accurate localization of cell towers can yield more accurate cell coverage maps, in turn enabling improved delivery of digital services for decision-support applications.
On the IRS Deployment in Smart Factories Considering Blockage Effects: Collocated or Distributed?
Authors: Yixin Zhang, Saeed R. Khosravirad, Xiaoli Chu, Mikko A. Uusitalo
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.07843
Pdf link: https://arxiv.org/pdf/2311.07843
Abstract In this article, we study the collocated and distributed deployment of intelligent reflecting surfaces (IRS) for a fixed total number of IRS elements to support enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) services inside a factory. We build a channel model that incorporates the line-of-sight (LOS) probability and power loss of each transmission path, and propose three metrics, namely, the expected received signal-to-noise ratio (SNR), expected finite-blocklength (FB) capacity, and expected outage probability, where the expectation is taken over the probability distributions of interior blockages and channel fading. The expected received SNR and expected FB capacity for extremely high blockage densities are derived in closed-form as functions of the amount and height of IRSs and the density, size, and penetration loss of blockages, which are verified by Monte Carlo simulations. Results show that deploying IRSs vertically higher leads to higher expected received SNR and expected FB capacity. By analysing the average/minimum/maximum of the three metrics versus the number of IRSs, we find that for high blockage densities, both eMBB and URLLC services benefit from distributed deployment; and for low blockage densities, URLLC services benefit from distributed deployment while eMBB services see limited difference between collocated and distributed deployment.
Cost-Efficient Computation Offloading and Service Chain Caching in LEO Satellite Networks
Authors: Yantong Wang, Chuanfen Feng, Jiande Sun
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.07872
Pdf link: https://arxiv.org/pdf/2311.07872
Abstract The ever-increasing demand for ubiquitous, continuous, and high-quality services poses a great challenge to the traditional terrestrial network. To mitigate this problem, the mobile-edge-computing-enhanced low earth orbit (LEO) satellite network, which provides both communication connectivity and on-board processing services, has emerged as an effective method. The main issue in LEO satellites includes finding the optimal locations to host network functions (NFs) and then making offloading decisions. In this article, we jointly consider the problem of service chain caching and computation offloading to minimize the overall cost, which consists of task latency and energy consumption. In particular, the collaboration among satellites, the network resource limitations, and the specific operation order of NFs in service chains are taken into account. Then, the problem is formulated and linearized as an integer linear programming model. Moreover, to accelerate the solution, we provide a greedy algorithm with cubic time complexity. Numerical investigations demonstrate the effectiveness of the proposed scheme, which can reduce the overall cost by around 20% compared to the nominal case where NFs are served in data centers.
On the View-and-Channel Aggregation Gain in Integrated Sensing and Edge AI
Authors: Xu Chen, Khaled B. Letaief, Kaibin Huang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.07986
Pdf link: https://arxiv.org/pdf/2311.07986
Abstract Sensing and edge artificial intelligence (AI) are two key features of the sixth-generation (6G) mobile networks. Their natural integration, termed Integrated sensing and edge AI (ISEA), is envisioned to automate wide-ranging Internet-of-Tings (IoT) applications. To achieve a high sensing accuracy, multi-view features are uploaded to an edge server for aggregation and inference using an AI model. The view aggregation is realized efficiently using over-the-air computing (AirComp), which also aggregates channels to suppress channel noise. At its nascent stage, ISEA still lacks a characterization of the fundamental performance gains from view-and-channel aggregation, which motivates this work. Our framework leverages a well-established distribution model of multi-view sensing data where the classic Gaussian-mixture model is modified by adding sub-spaces matrices to represent individual sensor observation perspectives. Based on the model, we study the End-to-End sensing (inference) uncertainty, a popular measure of inference accuracy, of the said ISEA system by a novel approach involving designing a scaling-tight uncertainty surrogate function, global discriminant gain, distribution of receive Signal-to-Noise Ratio (SNR), and channel induced discriminant loss. We prove that the E2E sensing uncertainty diminishes at an exponential rate as the number of views/sensors grows, where the rate is proportional to global discriminant gain. Given channel distortion, we further show that the exponential scaling remains with a reduced decay rate related to the channel induced discriminant loss. Furthermore, we benchmark AirComp against equally fast, traditional analog orthogonal access, which reveals a sensing-accuracy crossing point between the schemes, leading to the proposal of adaptive access-mode switching. Last, the insights from our framework are validated by experiments using real-world dataset.
On The Evaluation of Collision Probability along a Path
Authors: Lorenzo Paiola, Giorgio Grioli, Antonio Bicchi
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.08204
Pdf link: https://arxiv.org/pdf/2311.08204
Abstract Characterizing the risk of operations is a fundamental requirement in robotics, and a crucial ingredient of safe planning. The problem is multifaceted, with multiple definitions arising in the vast recent literature fitting different application scenarios and leading to different computational approaches. A basic element shared by most frameworks is the definition and evaluation of the probability of collision for a mobile object in an environment with obstacles. We observe that, even in basic cases, different interpretations are possible. This paper proposes an index we call Risk Density, which offers a theoretical link between conceptually distant assumptions about the interplay of single collision events along a continuous path. We show how this index can be used to approximate the collision probability in the case where the robot evolves along a nominal continuous curve from random initial conditions. Indeed under this hypothesis the proposed approximation outperforms some well-established methods either in accuracy or computational cost.
Keyword: pruning

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
Authors: Chenyu Wang, Zhen Dong, Daquan Zhou, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07620
Pdf link: https://arxiv.org/pdf/2311.07620
Abstract The exploration of Processing-In-Memory (PIM) accelerators has garnered significant attention within the research community. However, the utilization of large-scale neural networks on Processing-In-Memory (PIM) accelerators encounters challenges due to constrained on-chip memory capacity. To tackle this issue, current works explore model compression algorithms to reduce the size of Convolutional Neural Networks (CNNs). Most of these algorithms either aim to represent neural operators with reduced-size parameters (e.g., quantization) or search for the best combinations of neural operators (e.g., neural architecture search). Designing neural operators to align with PIM accelerators' specifications is an area that warrants further study. In this paper, we introduce the Epitome, a lightweight neural operator offering convolution-like functionality, to craft memory-efficient CNN operators for PIM accelerators (EPIM). On the software side, we evaluate epitomes' latency and energy on PIM accelerators and introduce a PIM-aware layer-wise design method to enhance their hardware efficiency. We apply epitome-aware quantization to further reduce the size of epitomes. On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost. Experimental results reveal that our 3-bit quantized EPIM-ResNet50 attains 71.59% top-1 accuracy on ImageNet, reducing crossbar areas by 30.65 times. EPIM surpasses the state-of-the-art pruning methods on PIM.
Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference
Authors: Rishav Mukherji, Mark Schöne, Khaleelulla Khan Nazeer, Christian Mayr, Anand Subramoney
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07625
Pdf link: https://arxiv.org/pdf/2311.07625
Abstract Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to $20\times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task. This magnitude of reduction has not been achieved previously with solely sparsely connected LSTMs, and the language modeling performance of our model has not been achieved previously with any sparsely activated recurrent neural networks or spiking neural networks. Neuromorphic computing devices are especially good at taking advantage of the dynamic activity sparsity, and our results provide strong evidence that making deep learning models activity sparse and porting them to neuromorphic devices can be a viable strategy that does not compromise on task performance. Our results also drive further convergence of methods from deep learning and neuromorphic computing for efficient machine learning.
Lite it fly: An All-Deformable-Butterfly Network
Authors: Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Binxiao Huang, Jie Ran, Ngai Wong
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.08125
Pdf link: https://arxiv.org/pdf/2311.08125
Abstract Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors, thus achieving network compression orthogonal to the traditional ways of pruning or low-rank decomposition. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions, which explains the empirically good performance of DeBut layers. By developing an automated DeBut chain generator, we show for the first time the viability of homogenizing a DNN into all DeBut layers, thus achieving an extreme sparsity and compression. Various examples and hardware benchmarks verify the advantages of All-DeBut networks. In particular, we show it is possible to compress a PointNet to < 5% parameters with < 5% accuracy drop, a record not achievable by other compression schemes.
Keyword: diffusion

Finetuning Text-to-Image Diffusion Models for Fairness
Authors: Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, Mohan Kankanhalli
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2311.07604
Pdf link: https://arxiv.org/pdf/2311.07604
Abstract The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a distorted worldview and limit opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) biased direct finetuning of diffusion model's sampling process, which leverages a biased gradient to more effectively optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We hope our work facilitates the social alignment of T2I generative AI. We will share code and various debiased diffusion model adaptors.
CLAMP: A Contrastive Language And Molecule Pre-training Network
Authors: Neel Redkar
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.07617
Pdf link: https://arxiv.org/pdf/2311.07617
Abstract This paper highlights a shift in how to approach material generation. Instead of material-to-material, we propose a language-to-material generation architecture that utilizes millions of untapped data points. Using a web scraper to collect crystal text pairs from open-source research papers, a contrastive model can be trained using a convolutional graph neural network encoder and a language encoder. This would allow unsupervised zero-shot classification which can be trained by taking advantage of linguistic structure. Without any specific training data, an ~82\% accuracy was achieved and ~75\% accuracy for photocatalyst prediction with an extremely small dataset. This novel network could ideally be cross-applied to any reaction that can be described via text, opening completely new methods to think about 3D chemical framework generation. In the full experiment diffusion models would likely be incorporated to fully exploit the latent space.
A Consistent Diffusion-Based Algorithm for Semi-Supervised Graph Learning
Authors: Thomas Bonald (IP Paris), Nathan de Lara (IP Paris)
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2311.07627
Pdf link: https://arxiv.org/pdf/2311.07627
Abstract The task of semi-supervised classification aims at assigning labels to all nodes of a graph based on the labels known for a few nodes, called the seeds. One of the most popular algorithms relies on the principle of heat diffusion, where the labels of the seeds are spread by thermoconductance and the temperature of each node at equilibrium is used as a score function for each label. In this paper, we prove that this algorithm is not consistent unless the temperatures of the nodes at equilibrium are centered before scoring. This crucial step does not only make the algorithm provably consistent on a block model but brings significant performance gains on real graphs.
Distributed pressure matching strategy using diffusion adaptation
Authors: Mengfei Zhang, Junqing Zhang, Jie Chen, Cédric Richard
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
Arxiv link: https://arxiv.org/abs/2311.07729
Pdf link: https://arxiv.org/pdf/2311.07729
Abstract Personal sound zone (PSZ) systems, which aim to create listening (bright) and silent (dark) zones in neighboring regions of space, are often based on time-varying acoustics. Conventional adaptive-based methods for handling PSZ tasks suffer from the collection and processing of acoustic transfer functions~(ATFs) between all the matching microphones and all the loudspeakers in a centralized manner, resulting in high calculation complexity and costly accuracy requirements. This paper presents a distributed pressure-matching (PM) method relying on diffusion adaptation (DPM-D) to spread the computational load amongst nodes in order to overcome these issues. The global PM problem is defined as a sum of local costs, and the diffusion adaption approach is then used to create a distributed solution that just needs local information exchanges. Simulations over multi-frequency bins and a computational complexity analysis are conducted to evaluate the properties of the algorithm and to compare it with centralized counterparts.
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Authors: Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.07885
Pdf link: https://arxiv.org/pdf/2311.07885
Abstract Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image. Our project webpage: https://sudo-ai-3d.github.io/One2345plus_page.
Brain-Driven Representation Learning Based on Diffusion Model
Authors: Soowon Kim, Seo-Hyun Lee, Young-Eun Lee, Ji-Won Lee, Ji-Ha Park, Seong-Whan Lee
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.07925
Pdf link: https://arxiv.org/pdf/2311.07925
Abstract Interpreting EEG signals linked to spoken language presents a complex challenge, given the data's intricate temporal and spatial attributes, as well as the various noise factors. Denoising diffusion probabilistic models (DDPMs), which have recently gained prominence in diverse areas for their capabilities in representation learning, are explored in our research as a means to address this issue. Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms and established baseline models in accuracy. Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals. This could lead to significant advances in brain-computer interfaces tailored for spoken communication.
Keyword: adaptive

Application of a Dense Fusion Attention Network in Fault Diagnosis of Centrifugal Fan
Authors: Ruijun Wang, Yuan Liu, Zhixia Fan, Xiaogang Xu, Huijie Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07614
Pdf link: https://arxiv.org/pdf/2311.07614
Abstract Although the deep learning recognition model has been widely used in the condition monitoring of rotating machinery. However, it is still a challenge to understand the correspondence between the structure and function of the model and the diagnosis process. Therefore, this paper discusses embedding distributed attention modules into dense connections instead of traditional dense cascading operations. It not only decouples the influence of space and channel on fault feature adaptive recalibration feature weights, but also forms a fusion attention function. The proposed dense fusion focuses on the visualization of the network diagnosis process, which increases the interpretability of model diagnosis. How to continuously and effectively integrate different functions to enhance the ability to extract fault features and the ability to resist noise is answered. Centrifugal fan fault data is used to verify this network. Experimental results show that the network has stronger diagnostic performance than other advanced fault diagnostic models.
Distributed pressure matching strategy using diffusion adaptation
Authors: Mengfei Zhang, Junqing Zhang, Jie Chen, Cédric Richard
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
Arxiv link: https://arxiv.org/abs/2311.07729
Pdf link: https://arxiv.org/pdf/2311.07729
Abstract Personal sound zone (PSZ) systems, which aim to create listening (bright) and silent (dark) zones in neighboring regions of space, are often based on time-varying acoustics. Conventional adaptive-based methods for handling PSZ tasks suffer from the collection and processing of acoustic transfer functions~(ATFs) between all the matching microphones and all the loudspeakers in a centralized manner, resulting in high calculation complexity and costly accuracy requirements. This paper presents a distributed pressure-matching (PM) method relying on diffusion adaptation (DPM-D) to spread the computational load amongst nodes in order to overcome these issues. The global PM problem is defined as a sum of local costs, and the diffusion adaption approach is then used to create a distributed solution that just needs local information exchanges. Simulations over multi-frequency bins and a computational complexity analysis are conducted to evaluate the properties of the algorithm and to compare it with centralized counterparts.
On the View-and-Channel Aggregation Gain in Integrated Sensing and Edge AI
Authors: Xu Chen, Khaled B. Letaief, Kaibin Huang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.07986
Pdf link: https://arxiv.org/pdf/2311.07986
Abstract Sensing and edge artificial intelligence (AI) are two key features of the sixth-generation (6G) mobile networks. Their natural integration, termed Integrated sensing and edge AI (ISEA), is envisioned to automate wide-ranging Internet-of-Tings (IoT) applications. To achieve a high sensing accuracy, multi-view features are uploaded to an edge server for aggregation and inference using an AI model. The view aggregation is realized efficiently using over-the-air computing (AirComp), which also aggregates channels to suppress channel noise. At its nascent stage, ISEA still lacks a characterization of the fundamental performance gains from view-and-channel aggregation, which motivates this work. Our framework leverages a well-established distribution model of multi-view sensing data where the classic Gaussian-mixture model is modified by adding sub-spaces matrices to represent individual sensor observation perspectives. Based on the model, we study the End-to-End sensing (inference) uncertainty, a popular measure of inference accuracy, of the said ISEA system by a novel approach involving designing a scaling-tight uncertainty surrogate function, global discriminant gain, distribution of receive Signal-to-Noise Ratio (SNR), and channel induced discriminant loss. We prove that the E2E sensing uncertainty diminishes at an exponential rate as the number of views/sensors grows, where the rate is proportional to global discriminant gain. Given channel distortion, we further show that the exponential scaling remains with a reduced decay rate related to the channel induced discriminant loss. Furthermore, we benchmark AirComp against equally fast, traditional analog orthogonal access, which reveals a sensing-accuracy crossing point between the schemes, leading to the proposal of adaptive access-mode switching. Last, the insights from our framework are validated by experiments using real-world dataset.
Content-Adaptive Variable Framerate Encoding Scheme for Green Live Streaming
Authors: Vignesh V Menon, Samira Afzal, Prajit T Rajendran, Klaus Schoeffmann, Radu Prodan, Christian Timmerer
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2311.08074
Pdf link: https://arxiv.org/pdf/2311.08074
Abstract Adaptive live video streaming applications use a fixed predefined configuration for the bitrate ladder with constant framerate and encoding presets in a session. However, selecting optimized framerates and presets for every bitrate ladder representation can enhance perceptual quality, improve computational resource allocation, and thus, the streaming energy efficiency. In particular, low framerates for low-bitrate representations reduce compression artifacts and decrease encoding energy consumption. In addition, an optimized preset may lead to improved compression efficiency. To this light, this paper proposes a Content-adaptive Variable Framerate (CVFR) encoding scheme, which offers two modes of operation: ecological (ECO) and high-quality (HQ). CVFR-ECO optimizes for the highest encoding energy savings by predicting the optimized framerate for each representation in the bitrate ladder. CVFR-HQ takes it further by predicting each representation's optimized framerate-encoding preset pair using low-complexity discrete cosine transform energy-based spatial and temporal features for compression efficiency and sustainable storage. We demonstrate the advantage of CVFR using the x264 open-source video encoder. The results show that CVFR-ECO yields an average PSNR and VMAF increase of 0.02 dB and 2.50 points, respectively, for the same bitrate, compared to the fastest preset highest framerate encoding. CVFR-ECO also yields an average encoding and storage energy consumption reduction of 34.54% and 76.24%, considering a just noticeable difference (JND) of six VMAF points. In comparison, CVFR-HQ yields an average increase in PSNR and VMAF of 2.43 dB and 10.14 points, respectively, for the same bitrate. Finally, CVFR-HQ resulted in an average reduction in storage energy consumption of 83.18%, considering a JND of six VMAF points.
High-order accurate well-balanced energy stable finite difference schemes for multi-layer shallow water equations on fixed and adaptive moving meshes
Authors: Zhihao Zhang, Huazhong Tang, Junming Duan
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.08124
Pdf link: https://arxiv.org/pdf/2311.08124
Abstract This paper develops high-order well-balanced (WB) energy stable (ES) finite difference schemes for multi-layer (the number of layers $M\geqslant 2$) shallow water equations (SWEs) on both fixed and adaptive moving meshes, extending our previous works [20,51]. To obtain an energy inequality, the convexity of an energy function for an arbitrary $M$ is proved by finding recurrence relations of the leading principal minors or the quadratic forms of the Hessian matrix of the energy function with respect to the conservative variables, which is more involved than the single-layer case due to the coupling between the layers in the energy function. An important ingredient in developing high-order semi-discrete ES schemes is the construction of a two-point energy conservative (EC) numerical flux. In pursuit of the WB property, a sufficient condition for such EC fluxes is given with compatible discretizations of the source terms similar to the single-layer case. It can be decoupled into $M$ identities individually for each layer, making it convenient to construct a two-point EC flux for the multi-layer system. To suppress possible oscillations near discontinuities, WENO-based dissipation terms are added to the high-order WB EC fluxes, which gives semi-discrete high-order WB ES schemes. Fully-discrete schemes are obtained by employing high-order explicit SSP-RK methods and proved to preserve the lake at rest. The schemes are further extended to moving meshes based on a modified energy function for a reformulated system, relying on the techniques proposed in [51]. Numerical experiments are conducted for some two- and three-layer cases to validate the high-order accuracy, WB and ES properties, and high efficiency of the schemes, with a suitable amount of dissipation chosen by estimating the maximal wave speed due to the lack of an analytical expression for the eigenstructure of the multi-layer system.
Unprecedented reach and recruitment paths for hate and extremism
Authors: Richard Sear, Neil F. Johnson
Subjects: Social and Information Networks (cs.SI); Human-Computer Interaction (cs.HC); Adaptation and Self-Organizing Systems (nlin.AO); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2311.08258
Pdf link: https://arxiv.org/pdf/2311.08258
Abstract Analyzing a unique real-time dataset from across 26 social media platforms, we show why the hate-extremism ecosystem now has unprecedented reach and recruitment paths online; why it is now able to exert instant and massive global mainstream influence, e.g. following the October 7 Hamas attack; why it will become increasingly robust in 2024 and beyond; why recent E.U. laws fall short because the effect of many smaller, lesser-known platforms outstrips larger ones like Twitter; and why law enforcement should expect increasingly hard-to-understand paths ahead of offline mass attacks. This new picture of online hate and extremism challenges current notions of a niche activity at the 'fringe' of the Internet driven by specific news sources. But it also suggests a new opportunity for system-wide control akin to adaptive vs. extinction treatments for cancer.
Noise-Resilient Group Testing with Order-Optimal Tests and Fast-and-Reliable Decoding
Authors: Venkatesan Guruswami, Hsin-Po Wang
Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.08283
Pdf link: https://arxiv.org/pdf/2311.08283
Abstract Group testing (GT) is the Boolean counterpart of compressed sensing and the marketplace of new ideas for related problems such as cognitive radio and heavy hitter. A GT scheme is considered good if it is nonadaptive, uses $O(k \log n)$ tests, resists noise, can be decoded in $O(k \operatorname{poly}(\log n))$ time, and makes nearly no mistakes. In this paper, we propose "Gacha GT", an elementary, self-contained, and unified randomized scheme that, for the first time, satisfies all criteria for a fairly large region of parameters, namely when $\log k < \log(n)^{1-1/O(1)}$. Outside this parameter region, Gacha can be specialized to outperform the state-of-the-art partial-recovery GTs, exact-recovery GTs, and worst-case GTs. The new idea that runs through this paper, using an analogy, is to ask every person to break her $9$-digit "phone number" into three $3$-digit numbers $x$, $y$, and $z$ and write $(b, x)$, $(b, y)$, and $(b, z)$ on three pieces of sticky notes, where $b$ is her "birthday". This way, one can sort the sticky notes by birthday to reassemble the phone numbers. This birthday--number code and other coded constructions can be stacked like a multipartite graph pyramid. Gacha's encoder will synthesize the test results from the bottom up; and Gacha's decoder will reassemble the phone numbers from the top down.
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
Authors: Nicholas E. Corrado, Josiah P. Hanna
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.08290
Pdf link: https://arxiv.org/pdf/2311.08290
Abstract On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match the expected on-policy data distribution. This sampling error leads to noisy updates and data inefficient on-policy learning. Recent work in the policy evaluation setting has shown that non-i.i.d., off-policy sampling can produce data with lower sampling error than on-policy sampling can produce. Motivated by this observation, we introduce an adaptive, off-policy sampling method to improve the data efficiency of on-policy policy gradient algorithms. Our method, Proximal Robust On-Policy Sampling (PROPS), reduces sampling error by collecting data with a behavior policy that increases the probability of sampling actions that are under-sampled with respect to the current policy. Rather than discarding data from old policies -- as is commonly done in on-policy algorithms -- PROPS uses data collection to adjust the distribution of previously collected data to be approximately on-policy. We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms. Our work improves the RL community's understanding of a nuance in the on-policy vs off-policy dichotomy: on-policy learning requires on-policy data, not on-policy sampling.
VERVE: Template-based ReflectiVE Rewriting for MotiVational IntErviewing
Authors: Do June Min, Verónica Pérez-Rosas, Kenneth Resnicow, Rada Mihalcea
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.08299
Pdf link: https://arxiv.org/pdf/2311.08299
Abstract Reflective listening is a fundamental skill that counselors must acquire to achieve proficiency in motivational interviewing (MI). It involves responding in a manner that acknowledges and explores the meaning of what the client has expressed in the conversation. In this work, we introduce the task of counseling response rewriting, which transforms non-reflective statements into reflective responses. We introduce VERVE, a template-based rewriting system with paraphrase-augmented training and adaptive template updating. VERVE first creates a template by identifying and filtering out tokens that are not relevant to reflections and constructs a reflective response using the template. Paraphrase-augmented training allows the model to learn less-strict fillings of masked spans, and adaptive template updating helps discover effective templates for rewriting without significantly removing the original content. Using both automatic and human evaluations, we compare our method against text rewriting baselines and show that our framework is effective in turning non-reflective statements into more reflective responses while achieving a good content preservation-reflection style trade-off.
Instant3D: Instant Text-to-3D Generation
Authors: Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, Xiangyu Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2311.08403
Pdf link: https://arxiv.org/pdf/2311.08403
Abstract Text-to-3D generation, which aims to synthesize vivid 3D objects from text prompts, has attracted much attention from the computer vision community. While several existing works have achieved impressive results for this task, they mainly rely on a time-consuming optimization paradigm. Specifically, these methods optimize a neural field from scratch for each text prompt, taking approximately one hour or more to generate one object. This heavy and repetitive training cost impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The project page is at https://ming1993li.github.io/Instant3DProj.
Keyword: quantization

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
Authors: Chenyu Wang, Zhen Dong, Daquan Zhou, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.07620
Pdf link: https://arxiv.org/pdf/2311.07620
Abstract The exploration of Processing-In-Memory (PIM) accelerators has garnered significant attention within the research community. However, the utilization of large-scale neural networks on Processing-In-Memory (PIM) accelerators encounters challenges due to constrained on-chip memory capacity. To tackle this issue, current works explore model compression algorithms to reduce the size of Convolutional Neural Networks (CNNs). Most of these algorithms either aim to represent neural operators with reduced-size parameters (e.g., quantization) or search for the best combinations of neural operators (e.g., neural architecture search). Designing neural operators to align with PIM accelerators' specifications is an area that warrants further study. In this paper, we introduce the Epitome, a lightweight neural operator offering convolution-like functionality, to craft memory-efficient CNN operators for PIM accelerators (EPIM). On the software side, we evaluate epitomes' latency and energy on PIM accelerators and introduce a PIM-aware layer-wise design method to enhance their hardware efficiency. We apply epitome-aware quantization to further reduce the size of epitomes. On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost. Experimental results reveal that our 3-bit quantized EPIM-ResNet50 attains 71.59% top-1 accuracy on ImageNet, reducing crossbar areas by 30.65 times. EPIM surpasses the state-of-the-art pruning methods on PIM.

A-suozhang / GetArxivDaily

New submissions for Wed, 15 Nov 23 #201

Keyword: efficient

Polarimetric PatchMatch Multi-View Stereo

Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors

PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment

On Algorithmic Cache Optimization

ReIDTracker Sea: the technical report of BoaTrack and SeaDronesSee-MOT challenge at MaCVi of WACV24

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

Rethinking and Benchmarking Predict-then-Optimize Paradigm for Combinatorial Optimization Problems

Estimating the matrix $p \rightarrow q$ norm

Matching aggregate posteriors in the variational autoencoder

Chaotic dynamics of two-dimensional flows around a cylinder

AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising

Histopathologic Cancer Detection

Low-Cost Architecture for an Advanced Smart Shower System Using Internet of Things Platform

Sparse Regression LDPC Codes

Near-Field Integrated Sensing, Positioning, and Communication: A Downlink and Uplink Framework

Quality-Aware Prototype Memory for Face Representation Learning

Modeling Sequences as Star Graphs to Address Over-smoothing in Self-attentive Sequential Recommendation

Size-Aware Hypergraph Motifs

Leveraging Hamilton-Jacobi PDEs with time-dependent Hamiltonians for continual scientific machine learning

Explainable History Distillation by Marked Temporal Point Process

Assessing Test-time Variability for Interactive 3D Medical Image Segmentation with Diverse Point Prompts

A novel and simple spectral method for nonlocal PDEs with the fractional Laplacian

On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model

Adaptive Search Optimization: Dynamic Algorithm Selection and Caching for Enhanced Database Performance

A Coding Scheme for Straggler Resilient Quantum $X$-Secure $T$-Private Information Retrieval

Toward Efficient and Incremental Spectral Clustering via Parametric Spectral Clustering

Enabling Decision-Support Systems through Automated Cell Tower Detection

PEMS: Pre-trained Epidmic Time-series Models

Replay Clocks

Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series

AutoML for Large Capacity Modeling of Meta Ranking Systems

Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

bpftime: userspace eBPF Runtime for Uprobe, Syscall and Kernel-User Interactions

Cross-subject dual-domain fusion network with task-related and task-discriminant component analysis enhancing one-shot SSVEP classification

The Impact of Adversarial Node Placement in Decentralized Federated Learning Networks

Finding Inductive Loop Invariants using Large Language Models

Deep Learning-Based Object Detection in Maritime Unmanned Aerial Vehicle Imagery: Review and Experimental Comparisons

Configurable convolutional neural networks for real-time pedestrian-level wind prediction in urban environments

On the View-and-Channel Aggregation Gain in Integrated Sensing and Edge AI

Probable Object Location (POLo) Score Estimation for Efficient Object Goal Navigation

Adversarial Preference Optimization

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Act-VIT: A Representationally Robust Attention Architecture for Skeleton Based Action Recognition Using Vision Transformer

Smart Skin separation control using distributed-input distributed-output, multi-modal actuators, and machine learning

Memory-efficient Stochastic methods for Memory-based Transformers

TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree transformation

Channel Estimation with Dynamic Metasurface Antennas via Model-Based Learning

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

Fast List Decoding of High-Rate Polar Codes

Unlocking Science: Novel Dataset and Benchmark for Cross-Modality Scientific Information Extraction

Counterfactual Explanation for Regression via Disentanglement in Latent Space

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

Resource Efficient Over-the-Air Fronthaul Signaling for Uplink Cell-Free Massive MIMO Systems

GT4Py: High Performance Stencils for Weather and Climate Applications using Python

KTRL+F: Knowledge-Augmented In-Document Search

Calibration of an Elastic Humanoid Upper Body and Efficient Compensation for Motion Planning

Speeding Up Optimization-based Motion Planning through Deep Learning

Transformers can optimally learn regression mixture models

Arboricity-Dependent Algorithms for Edge Coloring

Aid Nexus : A Blockchain Based Financial Distribution System

Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

Iterative Network Pricing for Ridesharing Platforms

Keyword: faster

Size-Aware Hypergraph Motifs

Predicting the First Response Latency of Maintainers and Contributors in Pull Requests

Container Resource Allocation versus Performance of Data-intensive Applications on Different Cloud Servers

Quantum Algorithms for Graph Coloring and other Partitioning, Covering, and Packing Problems

DynamicSurf: Dynamic Neural RGB-D Surface Reconstruction with an Optimizable Feature Grid

REST: Retrieval-Based Speculative Decoding

Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

Optimally Managing the Impacts of Convergence Tolerance for Distributed Optimal Power Flow

Keyword: mobile

Synchrophasor Data Anomaly Detection on Grid Edge by 5G Communication and Adjacent Compute

FedOpenHAR: Federated Multi-Task Transfer Learning for Sensor-Based Human Activity Recognition

Enabling Decision-Support Systems through Automated Cell Tower Detection

On the IRS Deployment in Smart Factories Considering Blockage Effects: Collocated or Distributed?