New submissions for Mon, 6 Nov 23

Keyword: efficient

The numerical linear algebra of weights: from the spectral analysis to conditioning and preconditioning in the Laplacian case

Authors: Ludovico Bruni Bruno, Matteo Semplice, Stefano Serra-Capizzano
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.01467
Pdf link: https://arxiv.org/pdf/2311.01467
Abstract Weights are geometrical degrees of freedom that allow to generalise Lagrangian finite elements. They are defined through integrals over specific supports, well understood in terms of differential forms and integration, and lie within the framework of finite element exterior calculus. In this work we exploit this formalism with the target of identifying supports that are appealing for finite element approximation. To do so, we study the related parametric matrix-sequences, with the matrix order tending to infinity as the mesh size tends to zero. We describe the conditioning and the spectral global behavior in terms of the standard Toeplitz machinery and GLT theory, leading to the identification of the optimal choices for weights. Moreover, we propose and test ad hoc preconditioners, in dependence of the discretization parameters and in connection with conjugate gradient method. The model problem we consider is a onedimensional Laplacian, both with constant and non constant coefficients. Numerical visualizations and experimental tests are reported and critically discussed, demonstrating the advantages of weights-induced bases over standard Lagrangian ones. Open problems and future steps are listed in the conclusive section, especially regarding the multidimensional case.
Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers
Authors: Weiwei Sun, Zheng Chen, Xinyu Ma, Lingyong Yan, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, Zhaochun Ren
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.01555
Pdf link: https://arxiv.org/pdf/2311.01555
Abstract Recent studies have demonstrated the great potential of Large Language Models (LLMs) serving as zero-shot relevance rankers. The typical approach involves making comparisons between pairs or lists of documents. Although effective, these listwise and pairwise methods are not efficient and also heavily rely on intricate prompt engineering. To tackle this problem, we introduce a novel instruction distillation method. The key idea is to distill the pairwise ranking ability of open-sourced LLMs to a simpler but more efficient pointwise ranking. Specifically, given the same LLM, we first rank documents using the effective pairwise approach with complex instructions, and then distill the teacher predictions to the pointwise approach with simpler instructions. Evaluation results on the BEIR, TREC, and ReDial datasets demonstrate that instruction distillation can improve efficiency by 10 to 100x and also enhance the ranking performance of LLMs. Furthermore, our approach surpasses the performance of existing supervised methods like monoT5 and is on par with the state-of-the-art zero-shot methods. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT.
Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation
Authors: Hadrien Reynaud, Bernhard Kainz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.01567
Pdf link: https://arxiv.org/pdf/2311.01567
Abstract This work presents an extensive hyperparameter search on Image Diffusion Models for Echocardiogram generation. The objective is to establish foundational benchmarks and provide guidelines within the realm of ultrasound image and video generation. This study builds over the latest advancements, including cutting-edge model architectures and training methodologies. We also examine the distribution shift between real and generated samples and consider potential solutions, crucial to train efficient models on generated data. We determine an Optimal FID score of $0.88$ for our research problem and achieve an FID of $2.60$. This work is aimed at contributing valuable insights and serving as a reference for further developments in the specialized field of ultrasound image and video generation.
Fast Many-to-Many Routing for Dynamic Taxi Sharing with Meeting Points
Authors: Moritz Laupichler, Peter Sanders
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.01581
Pdf link: https://arxiv.org/pdf/2311.01581
Abstract We introduce an improved algorithm for the dynamic taxi sharing problem, i.e. a dispatcher that schedules a fleet of shared taxis as it is used by services like UberXShare and Lyft Shared. We speed up the basic online algorithm that looks for all possible insertions of a new customer into a set of existing routes, we generalize the objective function, and we efficiently support a large number of possible pick-up and drop-off locations. This lays an algorithmic foundation for taxi sharing systems with higher vehicle occupancy - enabling greatly reduced cost and ecological impact at comparable service quality. We find that our algorithm computes assignments between vehicles and riders several times faster than a previous state-of-the-art approach. Further, we observe that allowing meeting points for vehicles and riders can reduce the operating cost of vehicle fleets by up to 15% while also reducing rider wait and trip times.
Vertical Decomposition in 3D and 4D with Applications to Line Nearest-Neighbor Searching in 3D
Authors: Pankaj K. Agarwal, Esther Ezra, Micha Sharir
Subjects: Computational Geometry (cs.CG)
Arxiv link: https://arxiv.org/abs/2311.01597
Pdf link: https://arxiv.org/pdf/2311.01597
Abstract Vertical decomposition is a widely used general technique for decomposing the cells of arrangements of semi-algebraic sets in $d$-space into constant-complexity subcells. In this paper, we settle in the affirmative a few long-standing open problems involving the vertical decomposition of substructures of arrangements for $d=3,4$: (i) Let $\mathcal{S}$ be a collection of $n$ semi-algebraic sets of constant complexity in 3D, and let $U(m)$ be an upper bound on the complexity of the union $\mathcal{U}(\mathcal{S}')$ of any subset $\mathcal{S}'\subseteq \mathcal{S}$ of size at most $m$. We prove that the complexity of the vertical decomposition of the complement of $\mathcal{U}(\mathcal{S})$ is $O^(n^2+U(n))$ (where the $O^(\cdot)$ notation hides subpolynomial factors). We also show that the complexity of the vertical decomposition of the entire arrangement $\mathcal{A}(\mathcal{S})$ is $O^(n^2+X)$, where $X$ is the number of vertices in $\mathcal{A}(\mathcal{S})$. (ii) Let $\mathcal{F}$ be a collection of $n$ trivariate functions whose graphs are semi-algebraic sets of constant complexity. We show that the complexity of the vertical decomposition of the portion of the arrangement $\mathcal{A}(\mathcal{F})$ in 4D lying below the lower envelope of $\mathcal{F}$ is $O^(n^3)$. These results lead to efficient algorithms for a variety of problems involving these decompositions, including algorithms for constructing the decompositions themselves, and for constructing $(1/r)$-cuttings of substructures of arrangements of the kinds considered above. One additional algorithm of interest is for output-sensitive point enclosure queries amid semi-algebraic sets in three or four dimensions. In addition, as a main domain of applications, we study various proximity problems involving points and lines in 3D.
DRNet: A Decision-Making Method for Autonomous Lane Changingwith Deep Reinforcement Learning
Authors: Kunpeng Xu, Lifei Chen, Shengrui Wang
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.01602
Pdf link: https://arxiv.org/pdf/2311.01602
Abstract Machine learning techniques have outperformed numerous rule-based methods for decision-making in autonomous vehicles. Despite recent efforts, lane changing remains a major challenge, due to the complex driving scenarios and changeable social behaviors of surrounding vehicles. To help improve the state of the art, we propose to leveraging the emerging \underline{D}eep \underline{R}einforcement learning (DRL) approach for la\underline{NE} changing at the \underline{T}actical level. To this end, we present "DRNet", a novel and highly efficient DRL-based framework that enables a DRL agent to learn to drive by executing reasonable lane changing on simulated highways with an arbitrary number of lanes, and considering driving style of surrounding vehicles to make better decisions. Furthermore, to achieve a safe policy for decision-making, DRNet incorporates ideas from safety verification, the most important component of autonomous driving, to ensure that only safe actions are chosen at any time. The setting of our state representation and reward function enables the trained agent to take appropriate actions in a real-world-like simulator. Our DRL agent has the ability to learn the desired task without causing collisions and outperforms DDQN and other baseline models.
FLAP: Fast Language-Audio Pre-training
Authors: Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-Wen Li, Gargi Gosh
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.01615
Pdf link: https://arxiv.org/pdf/2311.01615
Abstract We propose Fast Language-Audio Pre-training (FLAP), a self-supervised approach that efficiently and effectively learns aligned audio and language representations through masking, contrastive learning and reconstruction. For efficiency, FLAP randomly drops audio spectrogram tokens, focusing solely on the remaining ones for self-supervision. Through inter-modal contrastive learning, FLAP learns to align paired audio and text representations in a shared latent space. Notably, FLAP leverages multiple augmented views via masking for inter-modal contrast and learns to reconstruct the masked portion of audio tokens. Moreover, FLAP leverages large language models (LLMs) to augment the text inputs, contributing to improved performance. These approaches lead to more robust and informative audio-text representations, enabling FLAP to achieve state-of-the-art (SoTA) performance on audio-text retrieval tasks on AudioCaps (achieving 53.0% R@1) and Clotho (achieving 25.5% R@1).
Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula
Authors: Aryaman Reddi, Maximilian Tölle, Jan Peters, Georgia Chalvatzaki, Carlo D'Eramo
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.01642
Pdf link: https://arxiv.org/pdf/2311.01642
Abstract Robustness against adversarial attacks and distribution shifts is a long-standing goal of Reinforcement Learning (RL). To this end, Robust Adversarial Reinforcement Learning (RARL) trains a protagonist against destabilizing forces exercised by an adversary in a competitive zero-sum Markov game, whose optimal solution, i.e., rational strategy, corresponds to a Nash equilibrium. However, finding Nash equilibria requires facing complex saddle point optimization problems, which can be prohibitive to solve, especially for high-dimensional control. In this paper, we propose a novel approach for adversarial RL based on entropy regularization to ease the complexity of the saddle point optimization problem. We show that the solution of this entropy-regularized problem corresponds to a Quantal Response Equilibrium (QRE), a generalization of Nash equilibria that accounts for bounded rationality, i.e., agents sometimes play random actions instead of optimal ones. Crucially, the connection between the entropy-regularized objective and QRE enables free modulation of the rationality of the agents by simply tuning the temperature coefficient. We leverage this insight to propose our novel algorithm, Quantal Adversarial RL (QARL), which gradually increases the rationality of the adversary in a curriculum fashion until it is fully rational, easing the complexity of the optimization problem while retaining robustness. We provide extensive evidence of QARL outperforming RARL and recent baselines across several MuJoCo locomotion and navigation problems in overall performance and robustness.
Detecting Spurious Correlations via Robust Visual Concepts in Real and AI-Generated Image Classification
Authors: Preetam Prabhu Srikar Dammu, Chirag Shah
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.01655
Pdf link: https://arxiv.org/pdf/2311.01655
Abstract Often machine learning models tend to automatically learn associations present in the training data without questioning their validity or appropriateness. This undesirable property is the root cause of the manifestation of spurious correlations, which render models unreliable and prone to failure in the presence of distribution shifts. Research shows that most methods attempting to remedy spurious correlations are only effective for a model's known spurious associations. Current spurious correlation detection algorithms either rely on extensive human annotations or are too restrictive in their formulation. Moreover, they rely on strict definitions of visual artifacts that may not apply to data produced by generative models, as they are known to hallucinate contents that do not conform to standard specifications. In this work, we introduce a general-purpose method that efficiently detects potential spurious correlations, and requires significantly less human interference in comparison to the prior art. Additionally, the proposed method provides intuitive explanations while eliminating the need for pixel-level annotations. We demonstrate the proposed method's tolerance to the peculiarity of AI-generated images, which is a considerably challenging task, one where most of the existing methods fall short. Consequently, our method is also suitable for detecting spurious correlations that may propagate to downstream applications originating from generative models.
Comparando Estratégias de Roteamento em Redes Quânticas Oportunísticas
Authors: Diego Abreu, Alan Veloso, Antonio Abelém
Subjects: Networking and Internet Architecture (cs.NI); Emerging Technologies (cs.ET); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2311.01662
Pdf link: https://arxiv.org/pdf/2311.01662
Abstract This paper presents a comparative analysis of three routing strategies in opportunistic quantum networks. Quantum communication networks face unique challenges, such as the fragility of qubits and the need to create and maintain pairs of entangled states for reliable transmission. In this context, efficient and reliable routing is crucial to maximize the fidelity of the established routes, minimize the creation of new entangled pairs, and reduce the need for route recalculation. The routing strategies are compared based on the fidelity of the chosen routes, the number of entangled pairs created, and the number of route recalculations. The results obtained provide valuable information for the design and optimization of opportunistic quantum networks, contributing to advances in the efficiency and reliability of quantum communications.
CraterGrader: Autonomous Robotic Terrain Manipulation for Lunar Site Preparation and Earthmoving
Authors: Ryan Lee, Benjamin Younes, Alexander Pletta, John Harrington, Russell Q. Wong, William "Red" Whittaker
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.01697
Pdf link: https://arxiv.org/pdf/2311.01697
Abstract Establishing lunar infrastructure is paramount to long-term habitation on the Moon. To meet the demand for future lunar infrastructure development, we present CraterGrader, a novel system for autonomous robotic earthmoving tasks within lunar constraints. In contrast to the current approaches to construction autonomy, CraterGrader uses online perception for dynamic mapping of deformable terrain, devises an energy-efficient material movement plan using an optimization-based transport planner, precisely localizes without GPS, and uses integrated drive and tool control to manipulate regolith with unknown and non-constant geotechnical parameters. We demonstrate CraterGrader's ability to achieve unprecedented performance in autonomous smoothing and grading within a lunar-like environment, showing that this framework is capable, robust, and a benchmark for future planetary site preparation robotics.
Physics-Informed Generator-Encoder Adversarial Networks with Latent Space Matching for Stochastic Differential Equations
Authors: Ruisong Gao, Min Yang, Jin Zhang
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.01708
Pdf link: https://arxiv.org/pdf/2311.01708
Abstract We propose a new class of physics-informed neural networks, called Physics-Informed Generator-Encoder Adversarial Networks, to effectively address the challenges posed by forward, inverse, and mixed problems in stochastic differential equations. In these scenarios, while the governing equations are known, the available data consist of only a limited set of snapshots for system parameters. Our model consists of two key components: the generator and the encoder, both updated alternately by gradient descent. In contrast to previous approaches of directly matching the approximated solutions with real snapshots, we employ an indirect matching that operates within the lower-dimensional latent feature space. This method circumvents challenges associated with high-dimensional inputs and complex data distributions, while yielding more accurate solutions compared to existing neural network solvers. In addition, the approach also mitigates the training instability issues encountered in previous adversarial frameworks in an efficient manner. Numerical results provide compelling evidence of the effectiveness of the proposed method in solving different types of stochastic differential equations.
Second-Order Convergent Collision-Constrained Optimization-Based Planner
Authors: Chen Liang, Xifeng Gao, Kui Wu, Zherong Pan
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.01717
Pdf link: https://arxiv.org/pdf/2311.01717
Abstract Finding robot poses and trajectories represents a foundational aspect of robot motion planning. Despite decades of research, efficiently and robustly addressing these challenges is still difficult. Existing approaches are often plagued by various limitations, such as intricate geometric approximations, violations of collision constraints, or slow first-order convergence. In this paper, we introduce two novel optimization formulations that offer provable robustness, achieving second-order convergence while requiring only a convex approximation of the robot's links and obstacles. Our first method, known as the Explicit Collision Barrier (ECB) method, employs a barrier function to guarantee separation between convex objects. ECB uses an efficient matrix factorization technique, enabling a second-order Newton's method with an iterative complexity linear in the number of separating planes. Our second method, referred to as the Implicit Collision Barrier (ICB) method, further transforms the separating planes into implicit functions of robot poses. We show such an implicit objective function is twice-differentiable, with derivatives evaluated at a linear complexity. To assess the effectiveness of our approaches, we conduct a comparative study with a first-order baseline algorithm across six testing scenarios. Our results unequivocally justify that our method exhibits significantly faster convergence rates compared to the baseline algorithm.
Energy Efficiency Optimization for Subterranean LoRaWAN Using A Reinforcement Learning Approach: A Direct-to-Satellite Scenario
Authors: Kaiqiang Lin, Muhammad Asad Ullah, Hirley Alves, Konstantin Mikhaylov, Tong Hao
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.01743
Pdf link: https://arxiv.org/pdf/2311.01743
Abstract The integration of subterranean LoRaWAN and non-terrestrial networks (NTN) delivers substantial economic and societal benefits in remote agriculture and disaster rescue operations. The LoRa modulation leverages quasi-orthogonal spreading factors (SFs) to optimize data rates, airtime, coverage and energy consumption. However, it is still challenging to effectively assign SFs to end devices for minimizing co-SF interference in massive subterranean LoRaWAN NTN. To address this, we investigate a reinforcement learning (RL)-based SFs allocation scheme to optimize the system's energy efficiency (EE). To efficiently capture the device-to-environment interactions in dense networks, we proposed an SFs allocation technique using the multi-agent dueling double deep Q-network (MAD3QN) and the multi-agent advantage actor-critic (MAA2C) algorithms based on an analytical reward mechanism. Our proposed RL-based SFs allocation approach evinces better performance compared to four benchmarks in the extreme underground direct-to-satellite scenario. Remarkably, MAD3QN shows promising potentials in surpassing MAA2C in terms of convergence rate and EE.
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
Authors: Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Junyi Chen, Lingkun Long, Han Wan, Bei Yu, Weisheng Zhao
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2311.01759
Pdf link: https://arxiv.org/pdf/2311.01759
Abstract Developing deep learning models on tiny devices (e.g. Microcontroller units, MCUs) has attracted much attention in various embedded IoT applications. However, it is challenging to efficiently design and deploy recent advanced models (e.g. transformers) on tiny devices due to their severe hardware resource constraints. In this work, we propose TinyFormer, a framework specifically designed to develop and deploy resource-efficient transformers on MCUs. TinyFormer mainly consists of SuperNAS, SparseNAS and SparseEngine. Separately, SuperNAS aims to search for an appropriate supernet from a vast search space. SparseNAS evaluates the best sparse single-path model including transformer architecture from the identified supernet. Finally, SparseEngine efficiently deploys the searched sparse models onto MCUs. To the best of our knowledge, SparseEngine is the first deployment framework capable of performing inference of sparse models with transformer on MCUs. Evaluation results on the CIFAR-10 dataset demonstrate that TinyFormer can develop efficient transformers with an accuracy of $96.1\%$ while adhering to hardware constraints of $1$MB storage and $320$KB memory. Additionally, TinyFormer achieves significant speedups in sparse inference, up to $12.2\times$, when compared to the CMSIS-NN library. TinyFormer is believed to bring powerful transformers into TinyML scenarios and greatly expand the scope of deep learning applications.
Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation
Authors: Jiaqi Wu, Junbiao Pang, Qingming Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.01770
Pdf link: https://arxiv.org/pdf/2311.01770
Abstract Semi-supervised pose estimation is a practically challenging task for computer vision. Although numerous excellent semi-supervised classification methods have emerged, these methods typically use confidence to evaluate the quality of pseudo-labels, which is difficult to achieve in pose estimation tasks. For example, in pose estimation, confidence represents only the possibility that a position of the heatmap is a keypoint, not the quality of that prediction. In this paper, we propose a simple yet efficient framework to estimate the quality of pseudo-labels in semi-supervised pose estimation tasks from the perspective of modeling the uncertainty of the pseudo-labels. Concretely, under the dual mean-teacher framework, we construct the two maximum discrepant students (MDSs) to effectively push two teachers to generate different decision boundaries for the same sample. Moreover, we create multiple uncertainties to assess the quality of the pseudo-labels. Experimental results demonstrate that our method improves the performance of semi-supervised pose estimation on three datasets.
Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression
Authors: Jiaqi Wu, Junbiao Pang, Qingming Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.01782
Pdf link: https://arxiv.org/pdf/2311.01782
Abstract Both semi-supervised classification and regression are practically challenging tasks for computer vision. However, semi-supervised classification methods are barely applied to regression tasks. Because the threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label. It is successful for classification tasks but inefficient for regression tasks. In nature, regression also requires unbiased methods to generate high-quality labels. On the other hand, T2L for classification often fails if the confidence is generated by a biased method. To address this issue, in this paper, we propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality, combining multiple predictions to generate superior quality labels from several inferior ones. In terms of high-quality labels, the unbiased method naturally avoids the drawback of T2L. Specially, we propose an Unbiased Pseudo-labels network (UBPL network) with multiple branches to combine multiple predictions as pseudo-labels, where a Feature Decorrelation loss (FD loss) is proposed based on Chebyshev constraint. In principle, our method can be used for both classification and regression and can be easily extended to any semi-supervised framework, e.g. Mean Teacher, FixMatch, DualPose. Our approach achieves superior performance over SOTAs on the pose estimation datasets Mouse, FLIC and LSP, as well as the classification datasets CIFAR10/100 and SVHN.
TCM-GPT: Efficient Pre-training of Large Language Models for Domain Adaptation in Traditional Chinese Medicine
Authors: Guoxing Yang, Jianyu Shi, Zan Wang, Xiaohong Liu, Guangyu Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.01786
Pdf link: https://arxiv.org/pdf/2311.01786
Abstract Pre-training and fine-tuning have emerged as a promising paradigm across various natural language processing (NLP) tasks. The effectiveness of pretrained large language models (LLM) has witnessed further enhancement, holding potential for applications in the field of medicine, particularly in the context of Traditional Chinese Medicine (TCM). However, the application of these general models to specific domains often yields suboptimal results, primarily due to challenges like lack of domain knowledge, unique objectives, and computational efficiency. Furthermore, their effectiveness in specialized domains, such as Traditional Chinese Medicine, requires comprehensive evaluation. To address the above issues, we propose a novel domain specific TCMDA (TCM Domain Adaptation) approach, efficient pre-training with domain-specific corpus. Specifically, we first construct a large TCM-specific corpus, TCM-Corpus-1B, by identifying domain keywords and retreving from general corpus. Then, our TCMDA leverages the LoRA which freezes the pretrained model's weights and uses rank decomposition matrices to efficiently train specific dense layers for pre-training and fine-tuning, efficiently aligning the model with TCM-related tasks, namely TCM-GPT-7B. We further conducted extensive experiments on two TCM tasks, including TCM examination and TCM diagnosis. TCM-GPT-7B archived the best performance across both datasets, outperforming other models by relative increments of 17% and 12% in accuracy, respectively. To the best of our knowledge, our study represents the pioneering validation of domain adaptation of a large language model with 7 billion parameters in TCM domain. We will release both TCMCorpus-1B and TCM-GPT-7B model once accepted to facilitate interdisciplinary development in TCM and NLP, serving as the foundation for further study.
Near-Optimal Quantum Algorithms for Bounded Edit Distance and Lempel-Ziv Factorization
Authors: Daniel Gibney, Ce Jin, Tomasz Kociumaka, Sharma V. Thankachan
Subjects: Data Structures and Algorithms (cs.DS); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2311.01793
Pdf link: https://arxiv.org/pdf/2311.01793
Abstract Classically, the edit distance of two length-$n$ strings can be computed in $O(n^2)$ time, whereas an $O(n^{2-\epsilon})$-time procedure would falsify the Orthogonal Vectors Hypothesis. If the edit distance does not exceed $k$, the running time can be improved to $O(n+k^2)$, which is near-optimal (conditioned on OVH) as a function of $n$ and $k$. Our first main contribution is a quantum $\tilde{O}(\sqrt{nk}+k^2)$-time algorithm that uses $\tilde{O}(\sqrt{nk})$ queries, where $\tilde{O}(\cdot)$ hides polylogarithmic factors. This query complexity is unconditionally optimal, and any significant improvement in the time complexity would resolve a long-standing open question of whether edit distance admits an $O(n^{2-\epsilon})$-time quantum algorithm. Our divide-and-conquer quantum algorithm reduces the edit distance problem to a case where the strings have small Lempel-Ziv factorizations. Then, it combines a quantum LZ compression algorithm with a classical edit-distance subroutine for compressed strings. The LZ factorization problem can be classically solved in $O(n)$ time, which is unconditionally optimal in the quantum setting. We can, however, hope for a quantum speedup if we parameterize the complexity in terms of the factorization size $z$. Already a generic oracle identification algorithm yields the optimal query complexity of $\tilde{O}(\sqrt{nz})$ at the price of exponential running time. Our second main contribution is a quantum algorithm that achieves the optimal time complexity of $\tilde{O}(\sqrt{nz})$. The key tool is a novel LZ-like factorization of size $O(z\log^2n)$ whose subsequent factors can be efficiently computed through a combination of classical and quantum techniques. We can then obtain the string's run-length encoded Burrows-Wheeler Transform (BWT), construct the $r$-index, and solve many fundamental string processing problems in time $\tilde{O}(\sqrt{nz})$.
Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT
Authors: Mario Sänger, Ninon De Mecquenem, Katarzyna Ewa Lewińska, Vasilis Bountris, Fabian Lehmann, Ulf Leser, Thomas Kosch
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.01825
Pdf link: https://arxiv.org/pdf/2311.01825
Abstract Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages. To address these challenges, we investigate the efficiency of Large Language Models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed three user studies in two scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.
Universal Multi-modal Multi-domain Pre-trained Recommendation
Authors: Wenqi Sun, Ruobing Xie, Shuqing Bian, Wayne Xin Zhao, Jie Zhou
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.01831
Pdf link: https://arxiv.org/pdf/2311.01831
Abstract There is a rapidly-growing research interest in modeling user preferences via pre-training multi-domain interactions for recommender systems. However, Existing pre-trained multi-domain recommendations mostly select the item texts to be bridges across domains, and simply explore the user behaviors in target domains. Hence, they ignore other informative multi-modal item contents (e.g., visual information), and also lack of thorough consideration of user behaviors from all interactive domains. To address these issues, in this paper, we propose to pre-train universal multi-modal item content presentation for multi-domain recommendation, called UniM^2Rec, which could smoothly learn the multi-modal item content presentations and the multi-modal user preferences from all domains. With the pre-trained multi-domain recommendation model, UniM^2Rec could be efficiently and effectively transferred to new target domains in practice. Extensive experiments conducted on five real-world datasets in target domains demonstrate the superiority of the proposed method over existing competitive methods, especially for the real-world recommendation scenarios that usually struggle with seriously missing or noisy item contents.
Efficient Black-Box Adversarial Attacks on Neural Text Detectors
Authors: Vitalii Fishchuk, Daniel Braun
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.01873
Pdf link: https://arxiv.org/pdf/2311.01873
Abstract Neural text detectors are models trained to detect whether a given text was generated by a language model or written by a human. In this paper, we investigate three simple and resource-efficient strategies (parameter tweaking, prompt engineering, and character-level mutations) to alter texts generated by GPT-3.5 that are unsuspicious or unnoticeable for humans but cause misclassification by neural text detectors. The results show that especially parameter tweaking and character-level mutations are effective strategies.
Enhancing search engine precision and user experience through sentiment-based polysemy resolution
Authors: Mike Nkongolo
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.01895
Pdf link: https://arxiv.org/pdf/2311.01895
Abstract With the proliferation of digital content and the need for efficient information retrieval, this study's insights can be applied to various domains, including news services, e-commerce, and digital marketing, to provide users with more meaningful and tailored experiences. The study addresses the common problem of polysemy in search engines, where the same keyword may have multiple meanings. It proposes a solution to this issue by embedding a smart search function into the search engine, which can differentiate between different meanings based on sentiment. The study leverages sentiment analysis, a powerful natural language processing (NLP) technique, to classify and categorize news articles based on their emotional tone. This can provide more insightful and nuanced search results. The article reports an impressive accuracy rate of 85% for the proposed smart search function, which outperforms conventional search engines. This indicates the effectiveness of the sentiment-based approach. The research explores multiple sentiment analysis models, including Sentistrength and Valence Aware Dictionary for Sentiment Reasoning (VADER), to determine the best-performing approach. The findings can be applied to enhance search engines, making them more capable of understanding the context and intent behind users 'queries. This can lead to better search results that are more aligned with what users are looking for. The proposed smart search function can improve the user experience by reducing the need to sift through irrelevant search results. This is particularly important in an age where information overload is common.
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling
Authors: Tobias Katsch
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.01927
Pdf link: https://arxiv.org/pdf/2311.01927
Abstract Linear Recurrence has proven to be a powerful tool for modeling long sequences efficiently. In this work, we show that existing models fail to take full advantage of its potential. Motivated by this finding, we develop GateLoop, a foundational sequence model that generalizes linear recurrent models such as S4, S5, LRU and RetNet, by employing data-controlled state transitions. Utilizing this theoretical advance, GateLoop empirically outperforms existing models for auto-regressive language modeling. Our method comes with a low-cost $O(l)$ recurrent mode and an efficient $O(l \log_{2} l)$ parallel mode making use of highly optimized associative scan implementations. Furthermore, we derive an $O(l^2)$ surrogate attention mode, revealing remarkable implications for Transformer and recently proposed architectures. Specifically, we prove that our approach can be interpreted as providing data-controlled relative-positional information to Attention. While many existing models solely rely on data-controlled cumulative sums for context aggregation, our findings suggest that incorporating data-controlled complex cumulative products may be a crucial step towards more powerful sequence models.
Trust-Preserved Human-Robot Shared Autonomy enabled by Bayesian Relational Event Modeling
Authors: Yingke Li, Fumin Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.02009
Pdf link: https://arxiv.org/pdf/2311.02009
Abstract Shared autonomy functions as a flexible framework that empowers robots to operate across a spectrum of autonomy levels, allowing for efficient task execution with minimal human oversight. However, humans might be intimidated by the autonomous decision-making capabilities of robots due to perceived risks and a lack of trust. This paper proposed a trust-preserved shared autonomy strategy that grants robots to seamlessly adjust their autonomy level, striving to optimize team performance and enhance their acceptance among human collaborators. By enhancing the Relational Event Modeling framework with Bayesian learning techniques, this paper enables dynamic inference of human trust based solely on time-stamped relational events within human-robot teams. Adopting a longitudinal perspective on trust development and calibration in human-robot teams, the proposed shared autonomy strategy warrants robots to preserve human trust by not only passively adapting to it but also actively participating in trust repair when violations occur. We validate the effectiveness of the proposed approach through a user study on human-robot collaborative search and rescue scenarios. The objective and subjective evaluations demonstrate its merits over teleoperation on both task execution and user acceptability.
DeliverAI: Reinforcement Learning Based Distributed Path-Sharing Network for Food Deliveries
Authors: Ashman Mehra, Snehanshu Saha, Vaskar Raychoudhury, Archana Mathur
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02017
Pdf link: https://arxiv.org/pdf/2311.02017
Abstract Delivery of items from the producer to the consumer has experienced significant growth over the past decade and has been greatly fueled by the recent pandemic. Amazon Fresh, Shopify, UberEats, InstaCart, and DoorDash are rapidly growing and are sharing the same business model of consumer items or food delivery. Existing food delivery methods are sub-optimal because each delivery is individually optimized to go directly from the producer to the consumer via the shortest time path. We observe a significant scope for reducing the costs associated with completing deliveries under the current model. We model our food delivery problem as a multi-objective optimization, where consumer satisfaction and delivery costs, both, need to be optimized. Taking inspiration from the success of ride-sharing in the taxi industry, we propose DeliverAI - a reinforcement learning-based path-sharing algorithm. Unlike previous attempts for path-sharing, DeliverAI can provide real-time, time-efficient decision-making using a Reinforcement learning-enabled agent system. Our novel agent interaction scheme leverages path-sharing among deliveries to reduce the total distance traveled while keeping the delivery completion time under check. We generate and test our methodology vigorously on a simulation setup using real data from the city of Chicago. Our results show that DeliverAI can reduce the delivery fleet size by 12\%, the distance traveled by 13%, and achieve 50% higher fleet utilization compared to the baselines.
Fast Approximation Algorithms for Piercing Boxes by Points
Authors: Pankaj K. Agarwal, Sariel Har-Peled, Rahul Raychaudhury, Stavros Sintos
Subjects: Computational Geometry (cs.CG)
Arxiv link: https://arxiv.org/abs/2311.02050
Pdf link: https://arxiv.org/pdf/2311.02050
Abstract $ \newcommand{\Re}{\mathbb{R}} \newcommand{\BX}{\mathcal{B}} \newcommand{\bb}{\mathsf{b}} \newcommand{\eps}{\varepsilon} \newcommand{\polylog}{\mathrm{polylog}} $ Let $\BX={\bb_1, \ldots ,\bb_n}$ be a set of $n$ axis-aligned boxes in $\Re^d$ where $d\geq2$ is a constant. The piercing problem is to compute a smallest set of points $N \subset \Re^d$ that hits every box in $\BX$, i.e., $N\cap \bb_i\neq \emptyset$, for $i=1,\ldots, n$. The problem is known to be NP-Hard. Let $\psi:=\psi(\BX)$, the \emph{piercing number} be the minimum size of a piercing set of $\BX$. We first present a randomized $O(\log\log \psi)$-approximation algorithm with expected running time $O(n^{d/2}\polylog (n))$. Next, we show that the expected running time can be improved to near-linear using a sampling-based technique, if $\psi = O(n^{1/(d-1)})$. Specifically, in the plane, the improved running time is $O(n \log \psi)$, assuming $\psi < n/\log^{\Omega(1)} n$. Finally, we study the dynamic version of the piercing problem where boxes can be inserted or deleted. For boxes in $\Re^2$, we obtain a randomized $O(\log\log\psi)$-approximation algorithm with $O(n^{1/2}\polylog (n))$ amortized expected update time for insertion or deletion of boxes. For squares in $\Re^2$, the update time can be improved to $O(n^{1/3}\polylog (n))$. Our algorithms are based on the multiplicative weight-update (MWU) method and require the construction of a weak $\eps$-net for a point set with respect to boxes. A key idea of our work is to exploit the duality between the piercing set and independent set (for boxes) to speed up our MWU. We also present a simpler and slightly more efficient algorithm for constructing a weak $\eps$-net than in [Ezr10], which is of independent interest. Our approach also yields a simpler algorithm for constructing (regular) $\eps$-nets with respect to boxes for $d=2,3$.
LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery
Authors: Weikang Wan, Yifeng Zhu, Rutav Shah, Yuke Zhu
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02058
Pdf link: https://arxiv.org/pdf/2311.02058
Abstract We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan. The core idea behind LOTUS is constructing an ever-growing skill library from a sequence of new tasks with a small number of human demonstrations. LOTUS starts with a continual skill discovery process using an open-vocabulary vision model, which extracts skills as recurring patterns presented in unsegmented demonstrations. Continual skill discovery updates existing skills to avoid catastrophic forgetting of previous tasks and adds new skills to solve novel tasks. LOTUS trains a meta-controller that flexibly composes various skills to tackle vision-based manipulation tasks in the lifelong learning process. Our comprehensive experiments show that LOTUS outperforms state-of-the-art baselines by over 11% in success rate, showing its superior knowledge transfer ability compared to prior methods. More results and videos can be found on the project website: https://ut-austin-rpl.github.io/Lotus/.
Active Learning-Based Species Range Estimation
Authors: Christian Lange, Elijah Cole, Grant Van Horn, Oisin Mac Aodha
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02061
Pdf link: https://arxiv.org/pdf/2311.02061
Abstract We propose a new active learning approach for efficiently estimating the geographic range of a species from a limited number of on the ground observations. We model the range of an unmapped species of interest as the weighted combination of estimated ranges obtained from a set of different species. We show that it is possible to generate this candidate set of ranges by using models that have been trained on large weakly supervised community collected observation data. From this, we develop a new active querying approach that sequentially selects geographic locations to visit that best reduce our uncertainty over an unmapped species' range. We conduct a detailed evaluation of our approach and compare it to existing active learning methods using an evaluation dataset containing expert-derived ranges for one thousand species. Our results demonstrate that our method outperforms alternative active learning methods and approaches the performance of end-to-end trained models, even when only using a fraction of the data. This highlights the utility of active learning via transfer learned spatial representations for species range estimation. It also emphasizes the value of leveraging emerging large-scale crowdsourced datasets, not only for modeling a species' range, but also for actively discovering them.
Solving Woeginger's Hiking Problem: Wonderful Partitions in Anonymous Hedonic Games
Authors: Andrei Constantinescu, Pascal Lenzner, Rebecca Reiffenhäuser, Daniel Schmand, Giovanna Varricchio
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.02067
Pdf link: https://arxiv.org/pdf/2311.02067
Abstract A decade ago, Gerhard Woeginger posed an open problem that became well-known as "Gerhard's Hiking Problem": Consider a group of $n$ people that want to go hiking; everyone expresses preferences over the size of their hiking group in the form of an interval between $1$ and $n$. Is it possible to efficiently assign the $n$ people to a set of hiking subgroups so that every person approves the size of their assigned subgroup? The problem is also known as efficiently deciding if an instance of an anonymous Hedonic Game with interval approval preferences admits a wonderful partition. We resolve the open problem in the affirmative by presenting an $O(n^5)$ time algorithm for Gerhard's Hiking Problem. Our solution is based on employing a dynamic programming approach for a specific rectangle stabbing problem from computational geometry. Moreover, we propose natural more demanding extensions of the problem, e.g., maximizing the number of satisfied people, and show that they are also efficiently solvable. Additionally, we precisely map the boundary of tractability for the wonderful partition problem by proving that finding such a partition becomes NP-hard if non-interval approval size sets of size two are allowed. This closes a gap in the complexity landscape, since hardness was only known for the case with non-interval approval size sets of size at most 3. Last but not least, we employ our solution to efficiently compute a partition that maximizes the egalitarian welfare for anonymous single-peaked Hedonic Games.
Envy-Free Cake-Cutting for Four Agents
Authors: Alexandros Hollender, Aviad Rubinstein
Subjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2311.02075
Pdf link: https://arxiv.org/pdf/2311.02075
Abstract In the envy-free cake-cutting problem we are given a resource, usually called a cake and represented as the $[0,1]$ interval, and a set of $n$ agents with heterogeneous preferences over pieces of the cake. The goal is to divide the cake among the $n$ agents such that no agent is envious of any other agent. Even under a very general preferences model, this fundamental fair division problem is known to always admit an exact solution where each agent obtains a connected piece of the cake; we study the complexity of finding an approximate solution, i.e., a connected $\varepsilon$-envy-free allocation. For monotone valuations of cake pieces, Deng, Qi, and Saberi (2012) gave an efficient ($\textsf{poly}(\log(1/\varepsilon))$ queries) algorithm for three agents and posed the open problem of four (or more) monotone agents. Even for the special case of additive valuations, Br\^anzei and Nisan (2022) conjectured an $\Omega(1/\varepsilon)$ lower bound on the number of queries for four agents. We provide the first efficient algorithm for finding a connected $\varepsilon$-envy-free allocation with four monotone agents. We also prove that as soon as valuations are allowed to be non-monotone, the problem becomes hard: it becomes PPAD-hard, requires $\textsf{poly}(1/\varepsilon)$ queries in the black-box model, and even $\textsf{poly}(1/\varepsilon)$ communication complexity. This constitutes, to the best of our knowledge, the first intractability result for any version of the cake-cutting problem in the communication complexity model.
Keyword: faster

Fast Many-to-Many Routing for Dynamic Taxi Sharing with Meeting Points
Authors: Moritz Laupichler, Peter Sanders
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.01581
Pdf link: https://arxiv.org/pdf/2311.01581
Abstract We introduce an improved algorithm for the dynamic taxi sharing problem, i.e. a dispatcher that schedules a fleet of shared taxis as it is used by services like UberXShare and Lyft Shared. We speed up the basic online algorithm that looks for all possible insertions of a new customer into a set of existing routes, we generalize the objective function, and we efficiently support a large number of possible pick-up and drop-off locations. This lays an algorithmic foundation for taxi sharing systems with higher vehicle occupancy - enabling greatly reduced cost and ecological impact at comparable service quality. We find that our algorithm computes assignments between vehicles and riders several times faster than a previous state-of-the-art approach. Further, we observe that allowing meeting points for vehicles and riders can reduce the operating cost of vehicle fleets by up to 15% while also reducing rider wait and trip times.
Generalizations of Matrix Multiplication can solve the Light Bulb Problem
Authors: Josh Alman, Hengjie Zhang
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.01630
Pdf link: https://arxiv.org/pdf/2311.01630
Abstract In the light bulb problem, one is given uniformly random vectors $x_1, \ldots, x_n, y_1, \ldots, yn \in {-1,1}^d$. They are all chosen independently except a planted pair $(x{i^}, y_{j^})$ is chosen with correlation $\rho>0$. The goal is to find the planted pair. This problem was introduced over 30 years ago by L.~Valiant, and is known to have many applications in data analysis, statistics, and learning theory. The naive algorithm runs in $\Omega(n^2)$ time, and algorithms based on Locality-Sensitive Hashing approach quadratic time as $\rho \to 0$. In 2012, G.~Valiant gave a breakthrough algorithm using fast matrix multiplication that runs in time $O(n^{(5-\omega)/(4-\omega)}) < O(n^{1.615})$, no matter how small $\rho>0$ is. This was subsequently refined by Karppa, Kaski, and Kohonen in 2016 to $O(n^{2 \omega / 3}) < O(n^{1.582})$. In this paper, we propose a new approach which can replace matrix multiplication tensor with other tensors. Those tensors can omit some terms one is supposed to compute, and include additional error terms. Our new approach can make use of any tensors which previously had no known algorithmic applications, including tensors which arise naturally as intermediate steps in border rank methods and in the Laser method. We further show that our approach can be combined with locality-sensitive hashing to design an algorithm whose running time improves as $\rho$ gets larger. To our knowledge, this is the first algorithm which combines fast matrix multiplication with hashing for the light bulb problem or any closest pair problem, and it leads to faster algorithms for small $\rho>0$. We also introduce a new tensor $T_{2112}$, which has the same size of $2 \times 2$ matrix multiplication tensor, but runs faster than the Strassen's algorithm for light bulb problem.
Second-Order Convergent Collision-Constrained Optimization-Based Planner
Authors: Chen Liang, Xifeng Gao, Kui Wu, Zherong Pan
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.01717
Pdf link: https://arxiv.org/pdf/2311.01717
Abstract Finding robot poses and trajectories represents a foundational aspect of robot motion planning. Despite decades of research, efficiently and robustly addressing these challenges is still difficult. Existing approaches are often plagued by various limitations, such as intricate geometric approximations, violations of collision constraints, or slow first-order convergence. In this paper, we introduce two novel optimization formulations that offer provable robustness, achieving second-order convergence while requiring only a convex approximation of the robot's links and obstacles. Our first method, known as the Explicit Collision Barrier (ECB) method, employs a barrier function to guarantee separation between convex objects. ECB uses an efficient matrix factorization technique, enabling a second-order Newton's method with an iterative complexity linear in the number of separating planes. Our second method, referred to as the Implicit Collision Barrier (ICB) method, further transforms the separating planes into implicit functions of robot poses. We show such an implicit objective function is twice-differentiable, with derivatives evaluated at a linear complexity. To assess the effectiveness of our approaches, we conduct a comparative study with a first-order baseline algorithm across six testing scenarios. Our results unequivocally justify that our method exhibits significantly faster convergence rates compared to the baseline algorithm.
Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware
Authors: John Tramm, Bryce Allen, Kazutomo Yoshii, Andrew Siegel, Leighton Wilson
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.01739
Pdf link: https://arxiv.org/pdf/2311.01739
Abstract The recent trend toward deep learning has led to the development of a variety of highly innovative AI accelerator architectures. One such architecture, the Cerebras Wafer-Scale Engine 2 (WSE-2), features 40 GB of on-chip SRAM, making it a potentially attractive platform for latency- or bandwidth-bound HPC simulation workloads. In this study, we examine the feasibility of performing continuous energy Monte Carlo (MC) particle transport on the WSE-2 by porting a key kernel from the MC transport algorithm to Cerebras's CSL programming model. New algorithms for minimizing communication costs and for handling load balancing are developed and tested. The WSE-2 is found to run \SPEEDUP~times faster than a highly optimized CUDA version of the kernel run on an NVIDIA A100 GPU -- significantly outpacing the expected performance increase given the difference in transistor counts between the architectures.
Simplifying Transformer Blocks
Authors: Bobby He, Thomas Hofmann
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.01906
Pdf link: https://arxiv.org/pdf/2311.01906
Abstract A simple design recipe for deep Transformers is to compose identical building blocks. But standard transformer blocks are far from simple, interweaving attention and MLP sub-blocks with skip connections & normalisation layers in precise arrangements. This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable. In this work, we ask to what extent the standard transformer block can be simplified? Combining signal propagation theory and empirical observations, we motivate modifications that allow many block components to be removed with no loss of training speed, including skip connections, projection or value parameters, sequential sub-blocks and normalisation layers. In experiments on both autoregressive decoder-only and BERT encoder-only models, our simplified transformers emulate the per-update training speed and performance of standard transformers, while enjoying 15% faster training throughput, and using 15% fewer parameters.
ForecastPFN: Synthetically-Trained Zero-Shot Forecasting
Authors: Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha Naidu, Colin White
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.01933
Pdf link: https://arxiv.org/pdf/2311.01933
Abstract The vast majority of time-series forecasting approaches require a substantial training dataset. However, many real-life forecasting applications have very little initial observations, sometimes just 40 or fewer. Thus, the applicability of most forecasting methods is restricted in data-sparse commercial applications. While there is recent work in the setting of very limited initial data (so-called `zero-shot' forecasting), its performance is inconsistent depending on the data used for pretraining. In this work, we take a different approach and devise ForecastPFN, the first zero-shot forecasting model trained purely on a novel synthetic data distribution. ForecastPFN is a prior-data fitted network, trained to approximate Bayesian inference, which can make predictions on a new time series dataset in a single forward pass. Through extensive experiments, we show that zero-shot predictions made by ForecastPFN are more accurate and faster compared to state-of-the-art forecasting methods, even when the other methods are allowed to train on hundreds of additional in-distribution data points.
Streaming Algorithms for Weighted $k$-Disjoint Matchings
Authors: S M Ferdous, Bhargav Samineni, Alex Pothen, Mahantesh Halappanavar, Bala Krishnamoorthy
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.02073
Pdf link: https://arxiv.org/pdf/2311.02073
Abstract We design and implement two single-pass semi-streaming algorithms for the maximum weight $k$-disjoint matching ($k$-DM) problem. Given an integer $k$, the $k$-DM problem is to find $k$ pairwise edge-disjoint matchings such that the sum of the weights of the matchings is maximized. For $k \geq 2$, this problem is NP-hard. Our first algorithm is based on the primal-dual framework of a linear programming relaxation of the problem and is $\frac{1}{3+\varepsilon}$-approximate. We also develop an approximation preserving reduction from $k$-DM to the maximum weight $b$-matching problem. Leveraging this reduction and an existing semi-streaming $b$-matching algorithm, we design a $\frac{k}{(2+\varepsilon)(k+1)}$-approximate semi-streaming algorithm for $k$-DM. For any constant $\varepsilon > 0$, both of these algorithms require $O(nk \log_{1+\varepsilon}^2 n)$ bits of space. To the best of our knowledge, this is the first study of semi-streaming algorithms for the $k$-DM problem. We compare our two algorithms to state-of-the-art offline algorithms on 82 real-world and synthetic test problems. On the smaller instances, our streaming algorithms used significantly less memory (ranging from 6$\times$ to 114$\times$ less) and were faster in runtime than the offline algorithms. Our solutions were often within 5\% of the best weights from the offline algorithms. On a collection of six large graphs with a memory limit of 1 TB and with $k=8$, the offline algorithms terminated only on one graph (mycielskian20). The best offline algorithm on this instance required 640 GB of memory and 20 minutes to complete. In contrast, our slowest streaming algorithm for this instance took under four minutes and produced a matching that was 18\% better in weight, using only 1.4 GB of memory.
Keyword: mobile

Dispersion, Capacitated Nodes, and the Power of a Trusted Shepherd
Authors: William K. Moses Jr., Amanda Redlich
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.01511
Pdf link: https://arxiv.org/pdf/2311.01511
Abstract In this paper, we look at and expand the problems of dispersion and Byzantine dispersion of mobile robots on a graph, introduced by Augustine and Moses~Jr.~[ICDCN~2018] and by Molla, Mondal, and Moses~Jr.~[ALGOSENSORS~2020], respectively, to graphs where nodes have variable capacities. We use the idea of a single shepherd, a more powerful robot that will never act in a Byzantine manner, to achieve fast Byzantine dispersion, even when other robots may be strong Byzantine in nature. We also show the benefit of a shepherd for dispersion on capacitated graphs when no Byzantine robots are present.
Communication-Efficient Federated Non-Linear Bandit Optimization
Authors: Chuanhao Li, Chong Liu, Yu-Xiang Wang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.01695
Pdf link: https://arxiv.org/pdf/2311.01695
Abstract Federated optimization studies the problem of collaborative function optimization among multiple clients (e.g. mobile devices or organizations) under the coordination of a central server. Since the data is collected separately by each client and always remains decentralized, federated optimization preserves data privacy and allows for large-scale computing, which makes it a promising decentralized machine learning paradigm. Though it is often deployed for tasks that are online in nature, e.g., next-word prediction on keyboard apps, most works formulate it as an offline problem. The few exceptions that consider federated bandit optimization are limited to very simplistic function classes, e.g., linear, generalized linear, or non-parametric function class with bounded RKHS norm, which severely hinders its practical usage. In this paper, we propose a new algorithm, named Fed-GO-UCB, for federated bandit optimization with generic non-linear objective function. Under some mild conditions, we rigorously prove that Fed-GO-UCB is able to achieve sub-linear rate for both cumulative regret and communication cost. At the heart of our theoretical analysis are distributed regression oracle and individual confidence set construction, which can be of independent interests. Empirical evaluations also demonstrate the effectiveness of the proposed algorithm.
A Neural Radiance Field-Based Architecture for Intelligent Multilayered View Synthesis
Authors: D. Dhinakaran, S. M. Udhaya Sankar, G. Elumalai, N. Jagadish kumar
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.01842
Pdf link: https://arxiv.org/pdf/2311.01842
Abstract A mobile ad hoc network is made up of a number of wireless portable nodes that spontaneously come together en route for establish a transitory network with no need for any central management. A mobile ad hoc network (MANET) is made up of a sizable and reasonably dense community of mobile nodes that travel across any terrain and rely solely on wireless interfaces for communication, not on any well before centralized management. Furthermore, routing be supposed to offer a method for instantly delivering data across a network between any two nodes. Finding the best packet routing from across infrastructure is the major issue, though. The proposed protocol's major goal is to identify the least-expensive nominal capacity acquisition that assures the transportation of realistic transport that ensures its durability in the event of any node failure. This study suggests the Optimized Route Selection via Red Imported Fire Ants (RIFA) Strategy as a way to improve on-demand source routing systems. Predicting Route Failure and energy Utilization is used to pick the path during the routing phase. Proposed work assess the results of the comparisons based on performance parameters like as energy usage, packet delivery rate (PDR), and end-to-end (E2E) delay. The outcome demonstrates that the proposed strategy is preferable and increases network lifetime while lowering node energy consumption and typical E2E delay under the majority of network performance measures and factors.
Leveraging Mobile Learning Platforms for Flexible Education Delivery: Bridging Educational Gaps in Afghanistan
Authors: Mursal Dawodi, Jawid Ahmad Baktash, Sayed Mohammad Reza Dawodi
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.01850
Pdf link: https://arxiv.org/pdf/2311.01850
Abstract The educational landscape of Afghanistan, besieged by infrastructural inadequacies and socio-political tribulations, presents a compelling case for the integration of mobile learning platforms. This article embarks on an exploratory voyage into the realms of mobile learning as a potential harbinger of educational transformation in Afghanistan. It delineates the pervasive educational challenges, underscores the technological innovations powering mobile learning platforms, and illuminates the pathways through which mobile learning can transcend the extant barriers to education. Enriched by real-world case studies, the narrative unravels the pragmatic lessons that can be harnessed to tailor mobile learning solutions to Afghanistan's unique context. The discussion further traverses the collaborative horizon, elucidating the synergistic interplay among academia, government, the private sector, and international bodies essential for the successful implementation of mobile learning platforms. The article also furnishes pragmatic recommendations, emphasizing the triad of policy formulation, infrastructure enhancement, and capacity building as cornerstone imperatives. The envisioned integration of mobile learning platforms augurs a paradigmatic shift towards a more accessible, inclusive, and resilient educational framework in Afghanistan, with far-reaching implications for socio-economic development. Through a meticulous amalgamation of technology, policy, and collaborative endeavors, this article posits that Afghanistan stands on the cusp of an educational renaissance, with mobile learning platforms serving as a pivotal conduit toward this envisioned horizon.
Control Design for Trajectory Tracking and Stabilization of Sensor LOS in an Inertially Stabilized Platform
Authors: Abinash Agasti, Angana Hazarika, Bharath Bhikkaji
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.01859
Pdf link: https://arxiv.org/pdf/2311.01859
Abstract Optical sensors are often mounted on moving platforms to aid in a variety of tasks like data collection, surveillance and navigation. This necessitates the precise control of the inertial orientation of the optical sensor line-of-sight (LOS) towards a desired stationary or mobile target. A two-axis gimbal assembly is considered to achieve this control objective which can be broken into two parts - stabilization and tracking. The dynamics of a two-axis gimbal system is considered under a few design assumptions. Based on this dynamics, a novel state space model is proposed. Using a suitable change of variables, this state space model can be transformed into an LTI system. Feedback linearization based control laws are proposed that achieve the desired objectives of stabilization and tracking. The effectiveness of these control laws are demonstrated via simulation in MATLAB based on typical data of a two-axis gimbal system.
Keyword: pruning

CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference
Authors: Wenxuan Zeng, Meng Li, Haichuan Yang, Wen-jie Lu, Runsheng Wang, Ru Huang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.01737
Pdf link: https://arxiv.org/pdf/2311.01737
Abstract Deep neural network (DNN) inference based on secure 2-party computation (2PC) can offer cryptographically-secure privacy protection but suffers from orders of magnitude latency overhead due to enormous communication. Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead and focus on reducing the ReLUs to improve the communication efficiency. However, we observe these works achieve limited communication reduction for state-of-the-art (SOTA) 2PC protocols due to the ignorance of other linear and non-linear operations, which now contribute to the majority of communication. In this work, we present CoPriv, a framework that jointly optimizes the 2PC inference protocol and the DNN architecture. CoPriv features a new 2PC protocol for convolution based on Winograd transformation and develops DNN-aware optimization to significantly reduce the inference communication. CoPriv further develops a 2PC-aware network optimization algorithm that is compatible with the proposed protocol and simultaneously reduces the communication for all the linear and non-linear operations. We compare CoPriv with the SOTA 2PC protocol, CrypTFlow2, and demonstrate 2.1x communication reduction for both ResNet-18 and ResNet-32 on CIFAR-100. We also compare CoPriv with SOTA network optimization methods, including SNL, MetaPruning, etc. CoPriv achieves 9.98x and 3.88x online and total communication reduction with a higher accuracy compare to SNL, respectively. CoPriv also achieves 3.87x online communication reduction with more than 3% higher accuracy compared to MetaPruning.
Keyword: diffusion

Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation
Authors: Hadrien Reynaud, Bernhard Kainz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.01567
Pdf link: https://arxiv.org/pdf/2311.01567
Abstract This work presents an extensive hyperparameter search on Image Diffusion Models for Echocardiogram generation. The objective is to establish foundational benchmarks and provide guidelines within the realm of ultrasound image and video generation. This study builds over the latest advancements, including cutting-edge model architectures and training methodologies. We also examine the distribution shift between real and generated samples and consider potential solutions, crucial to train efficient models on generated data. We determine an Optimal FID score of $0.88$ for our research problem and achieve an FID of $2.60$. This work is aimed at contributing valuable insights and serving as a reference for further developments in the specialized field of ultrasound image and video generation.
Improving Fairness using Vision-Language Driven Image Augmentation
Authors: Moreno D'Incà, Christos Tzelepis, Ioannis Patras, Nicu Sebe
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.01573
Pdf link: https://arxiv.org/pdf/2311.01573
Abstract Fairness is crucial when training a deep-learning discriminative model, especially in the facial domain. Models tend to correlate specific characteristics (such as age and skin color) with unrelated attributes (downstream tasks), resulting in biases which do not correspond to reality. It is common knowledge that these correlations are present in the data and are then transferred to the models during training. This paper proposes a method to mitigate these correlations to improve fairness. To do so, we learn interpretable and meaningful paths lying in the semantic space of a pre-trained diffusion model (DiffAE) -- such paths being supervised by contrastive text dipoles. That is, we learn to edit protected characteristics (age and skin color). These paths are then applied to augment images to improve the fairness of a given dataset. We test the proposed method on CelebA-HQ and UTKFace on several downstream tasks with age and skin color as protected characteristics. As a proxy for fairness, we compute the difference in accuracy with respect to the protected characteristics. Quantitative results show how the augmented images help the model improve the overall accuracy, the aforementioned metric, and the disparity of equal opportunity. Code is available at: https://github.com/Moreno98/Vision-Language-Bias-Control.
CDGraph: Dual Conditional Social Graph Synthesizing via Diffusion Model
Authors: Jui-Yi Tsai, Ya-Wen Teng, Ho Chiok Yew, De-Nian Yang, Lydia Y. Chen
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.01729
Pdf link: https://arxiv.org/pdf/2311.01729
Abstract The social graphs synthesized by the generative models are increasingly in demand due to data scarcity and concerns over user privacy. One of the key performance criteria for generating social networks is the fidelity to specified conditionals, such as users with certain membership and financial status. While recent diffusion models have shown remarkable performance in generating images, their effectiveness in synthesizing graphs has not yet been explored in the context of conditional social graphs. In this paper, we propose the first kind of conditional diffusion model for social networks, CDGraph, which trains and synthesizes graphs based on two specified conditions. We propose the co-evolution dependency in the denoising process of CDGraph to capture the mutual dependencies between the dual conditions and further incorporate social homophily and social contagion to preserve the connectivity between nodes while satisfying the specified conditions. Moreover, we introduce a novel classifier loss, which guides the training of the diffusion process through the mutual dependency of dual conditions. We evaluate CDGraph against four existing graph generative methods, i.e., SPECTRE, GSM, EDGE, and DiGress, on four datasets. Our results show that the generated graphs from CDGraph achieve much higher dual-conditional validity and lower discrepancy in various social network metrics than the baselines, thus demonstrating its proficiency in generating dual-conditional social graphs.
PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation
Authors: Yuhan Ding, Fukun Yin, Jiayuan Fan, Hui Li, Xin Chen, Wen Liu, Chongshan Lu, Gang YU, Tao Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.01773
Pdf link: https://arxiv.org/pdf/2311.01773
Abstract Recent advances in implicit neural representations have achieved impressive results by sampling and fusing individual points along sampling rays in the sampling space. However, due to the explosively growing sampling space, finely representing and synthesizing detailed textures remains a challenge for unbounded large-scale outdoor scenes. To alleviate the dilemma of using individual points to perceive the entire colossal space, we explore learning the surface distribution of the scene to provide structural priors and reduce the samplable space and propose a Point Diffusion implicit Function, PDF, for large-scale scene neural representation. The core of our method is a large-scale point cloud super-resolution diffusion module that enhances the sparse point cloud reconstructed from several training images into a dense point cloud as an explicit prior. Then in the rendering stage, only sampling points with prior points within the sampling radius are retained. That is, the sampling space is reduced from the unbounded space to the scene surface. Meanwhile, to fill in the background of the scene that cannot be provided by point clouds, the region sampling based on Mip-NeRF 360 is employed to model the background representation. Expensive experiments have demonstrated the effectiveness of our method for large-scale scene novel view synthesis, which outperforms relevant state-of-the-art baselines.
On the Generalization Properties of Diffusion Models
Authors: Puheng Li, Zhong Li, Huishuai Zhang, Jiang Bian
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.01797
Pdf link: https://arxiv.org/pdf/2311.01797
Abstract Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models. We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models, suggesting a polynomially small generalization error ($O(n^{-2/5}+m^{-4/5})$) on both the sample size $n$ and the model capacity $m$, evading the curse of dimensionality (i.e., not exponentially large in the data dimension) when early-stopped. Furthermore, we extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities with progressively increasing distances between modes. This precisely elucidates the adverse effect of "modes shift" in ground truths on the model generalization. Moreover, these estimates are not solely theoretical constructs but have also been confirmed through numerical simulations. Our findings contribute to the rigorous understanding of diffusion models' generalization properties and provide insights that may guide practical applications.
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
Authors: Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.01811
Pdf link: https://arxiv.org/pdf/2311.01811
Abstract Generating high-quality and person-generic visual dubbing remains a challenge. Recent innovation has seen the advent of a two-stage paradigm, decoupling the rendering and lip synchronization process facilitated by intermediate representation as a conduit. Still, previous methodologies rely on rough landmarks or are confined to a single speaker, thus limiting their performance. In this paper, we propose DiffDub: Diffusion-based dubbing. We first craft the Diffusion auto-encoder by an inpainting renderer incorporating a mask to delineate editable zones and unaltered regions. This allows for seamless filling of the lower-face region while preserving the remaining parts. Throughout our experiments, we encountered several challenges. Primarily, the semantic encoder lacks robustness, constricting its ability to capture high-level features. Besides, the modeling ignored facial positioning, causing mouth or nose jitters across frames. To tackle these issues, we employ versatile strategies, including data augmentation and supplementary eye guidance. Moreover, we encapsulated a conformer-based reference encoder and motion generator fortified by a cross-attention mechanism. This enables our model to learn person-specific textures with varying references and reduces reliance on paired audio-visual data. Our rigorous experiments comprehensively highlight that our ground-breaking approach outpaces existing methods with considerable margins and delivers seamless, intelligible videos in person-generic and multilingual scenarios.
Keyword: adaptive

Sequential Subset Matching for Dataset Distillation
Authors: Jiawei Du, Qin Shi, Joey Tianyi Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.01570
Pdf link: https://arxiv.org/pdf/2311.01570
Abstract Dataset distillation is a newly emerging task that synthesizes a small-size dataset used in training deep neural networks (DNNs) for reducing data storage and model training costs. The synthetic datasets are expected to capture the essence of the knowledge contained in real-world datasets such that the former yields a similar performance as the latter. Recent advancements in distillation methods have produced notable improvements in generating synthetic datasets. However, current state-of-the-art methods treat the entire synthetic dataset as a unified entity and optimize each synthetic instance equally. This static optimization approach may lead to performance degradation in dataset distillation. Specifically, we argue that static optimization can give rise to a coupling issue within the synthetic data, particularly when a larger amount of synthetic data is being optimized. This coupling issue, in turn, leads to the failure of the distilled dataset to extract the high-level features learned by the deep neural network (DNN) in the latter epochs. In this study, we propose a new dataset distillation strategy called Sequential Subset Matching (SeqMatch), which tackles this problem by adaptively optimizing the synthetic data to encourage sequential acquisition of knowledge during dataset distillation. Our analysis indicates that SeqMatch effectively addresses the coupling issue by sequentially generating the synthetic instances, thereby enhancing its performance significantly. Our proposed SeqMatch outperforms state-of-the-art methods in various datasets, including SVNH, CIFAR-10, CIFAR-100, and Tiny ImageNet. Our code is available at https://github.com/shqii1j/seqmatch.
"Close...but not as good as an educator." -- Using ChatGPT to provide formative feedback in large-class collaborative learning
Authors: Cory Dal Ponte, Sathana Dushyanthen, Kayley Lyons
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2311.01634
Pdf link: https://arxiv.org/pdf/2311.01634
Abstract Delivering personalised, formative feedback to multiple problem-based learning groups in a short time period can be almost impossible. We employed ChatGPT to provide personalised formative feedback in a one-hour Zoom break-out room activity that taught practicing health professionals how to formulate evaluation plans for digital health initiatives. Learners completed an evaluation survey that included Likert scales and open-ended questions that were analysed. Half of the 44 survey respondents had never used ChatGPT before. Overall, respondents found the feedback favourable, described a wide range of group dynamics, and had adaptive responses to the feedback, yet only three groups used the feedback loop to improve their evaluation plans. Future educators can learn from our experience including engineering prompts, providing instructions on how to use ChatGPT, and scaffolding optimal group interactions with ChatGPT. Future researchers should explore the influence of ChatGPT on group dynamics and derive design principles for the use of ChatGPT in collaborative learning.
Distributed Multi-Robot Multi-Target Tracking Using Heterogeneous Limited-Range Sensors
Authors: Jun Chen, Mohammed Abugurain, Philip Dames, Shinkyu Park
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.01707
Pdf link: https://arxiv.org/pdf/2311.01707
Abstract This paper presents a cooperative multi-robot multi-target tracking framework aimed at enhancing the efficiency of the heterogeneous sensor network and, consequently, improving overall target tracking accuracy. The concept of normalized unused sensing capacity is introduced to quantify the information a sensor is currently gathering relative to its theoretical maximum. This measurement can be computed using entirely local information and is applicable to various sensor models, distinguishing it from previous literature on the subject. It is then utilized to develop a distributed coverage control strategy for a heterogeneous sensor network, adaptively balancing the workload based on each sensor's current unused capacity. The algorithm is validated through a series of ROS and MATLAB simulations, demonstrating superior results compared to standard approaches that do not account for heterogeneity or current usage rates.
Low Overhead Beam Alignment for Mobile Millimeter Channel Based on Continuous-Time Prediction
Authors: Huang-Chou Lin, Kuang-Hao (Stanley)Liu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.01752
Pdf link: https://arxiv.org/pdf/2311.01752
Abstract In millimeter-wave (mmWave) communications, directional transmission based on beamforming is important to compensate for high pathloss. To maintain the desired direction transmission gain, beam scanning that involves the transmitter sending the pilot signal over all available beam directions to find the optimal beam is often considered. Alternatively, beam tracking using partial beams can save the beam training overhead through algorithms such as statistical analysis models and kalman filter (KF). Unfortunately, existing beam tracking solutions are limited to a fixed beam variation pattern. In this work, we propose a beam alignment scheme called adaptive online beam alignment (AOBA), which aims to reduce training overhead and achieve accurate beam alignment for any movement profile. The proposed AOBA periodically performs beam tracking using a small amount but carefully selected candidate beams and switches to beam scanning using all available beams based on a given switching rule. During the interval without the pilot signal, the optimal beam at an arbitrary time instant is predicted with the aid of the recently proposed ordinary differential equation (ODE)-long short-term memory (LSTM) model. Extensive simulations are conducted to evaluate the performance of the proposed AOBA in comparison with several existing beam alignment schemes.
Mix-ME: Quality-Diversity for Multi-Agent Learning
Authors: Garðar Ingvarsson, Mikayel Samvelyan, Bryan Lim, Manon Flageat, Antoine Cully, Tim Rocktäschel
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2311.01829
Pdf link: https://arxiv.org/pdf/2311.01829
Abstract In many real-world systems, such as adaptive robotics, achieving a single, optimised solution may be insufficient. Instead, a diverse set of high-performing solutions is often required to adapt to varying contexts and requirements. This is the realm of Quality-Diversity (QD), which aims to discover a collection of high-performing solutions, each with their own unique characteristics. QD methods have recently seen success in many domains, including robotics, where they have been used to discover damage-adaptive locomotion controllers. However, most existing work has focused on single-agent settings, despite many tasks of interest being multi-agent. To this end, we introduce Mix-ME, a novel multi-agent variant of the popular MAP-Elites algorithm that forms new solutions using a crossover-like operator by mixing together agents from different teams. We evaluate the proposed methods on a variety of partially observable continuous control tasks. Our evaluation shows that these multi-agent variants obtained by Mix-ME not only compete with single-agent baselines but also often outperform them in multi-agent settings under partial observability.
Adaptive Assistance with an Active and Soft Back-Support Exosuit to Unknown External Loads via Model-Based Estimates of Internal Lumbosacral Moments
Authors: Alejandro Moya-Esteban, Saivimal Sridar, Mohamed Irfan Mohamed Refai, Herman van der Kooij, Massimo Sartori
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.01843
Pdf link: https://arxiv.org/pdf/2311.01843
Abstract State of the art controllers for back exoskeletons largely rely on body kinematics. This results in control strategies which cannot provide adaptive support under unknown external loads. We developed a neuromechanical model-based controller (NMBC) for a soft back exosuit, wherein assistive forces were proportional to the active component of lumbosacral joint moments, derived from real-time electromyography-driven models. The exosuit provided adaptive assistance forces with no a priori information on the external loading conditions. Across 10 participants, who stoop-lifted 5 and 15 kg boxes, our NMBC was compared to a non-adaptive virtual spring-based control(VSBC), in which exosuit forces were proportional to trunk inclination. Peak cable assistive forces were modulated across weight conditions for NMBC (5kg: 2.13 N/kg; 15kg: 2.82 N/kg) but not for VSBC (5kg: 1.92 N/kg; 15kg: 2.00 N/kg). The proposed NMBC strategy resulted in larger reduction of cumulative compression forces for 5 kg (NMBC: 18.2%; VSBC: 10.7%) and 15 kg conditions (NMBC: 21.3%; VSBC: 10.2%). Our proposed methodology may facilitate the adoption of non-hindering wearable robotics in real-life scenarios.
SortNet: Learning To Rank By a Neural-Based Sorting Algorithm
Authors: Leonardo Rigutini, Tiziano Papini, Marco Maggini, Franco Scarselli
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.01864
Pdf link: https://arxiv.org/pdf/2311.01864
Abstract The problem of relevance ranking consists of sorting a set of objects with respect to a given criterion. Since users may prefer different relevance criteria, the ranking algorithms should be adaptable to the user needs. Two main approaches exist in literature for the task of learning to rank: 1) a score function, learned by examples, which evaluates the properties of each object yielding an absolute relevance value that can be used to order the objects or 2) a pairwise approach, where a "preference function" is learned using pairs of objects to define which one has to be ranked first. In this paper, we present SortNet, an adaptive ranking algorithm which orders objects using a neural network as a comparator. The neural network training set provides examples of the desired ordering between pairs of items and it is constructed by an iterative procedure which, at each iteration, adds the most informative training examples. Moreover, the comparator adopts a connectionist architecture that is particularly suited for implementing a preference function. We also prove that such an architecture has the universal approximation property and can implement a wide class of functions. Finally, the proposed algorithm is evaluated on the LETOR dataset showing promising performances in comparison with other state of the art algorithms.
Domain Randomization via Entropy Maximization
Authors: Gabriele Tiboni, Pascal Klink, Jan Peters, Tatiana Tommasi, Carlo D'Eramo, Georgia Chalvatzaki
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.01885
Pdf link: https://arxiv.org/pdf/2311.01885
Abstract Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.
Does Difficulty even Matter? Investigating Difficulty Adjustment and Practice Behavior in an Open-Ended Learning Task
Authors: Anan Schütt, Tobias Huber, Jauwairia Nasir, Cristina Conati, Elisabeth André
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2311.01934
Pdf link: https://arxiv.org/pdf/2311.01934
Abstract Difficulty adjustment in practice exercises has been shown to be beneficial for learning. However, previous research has mostly investigated close-ended tasks, which do not offer the students multiple ways to reach a valid solution. Contrary to this, in order to learn in an open-ended learning task, students need to effectively explore the solution space as there are multiple ways to reach a solution. For this reason, the effects of difficulty adjustment could be different for open-ended tasks. To investigate this, as our first contribution, we compare different methods of difficulty adjustment in a user study conducted with 86 participants. Furthermore, as the practice behavior of the students is expected to influence how well the students learn, we additionally look at their practice behavior as a post-hoc analysis. Therefore, as a second contribution, we identify different types of practice behavior and how they link to students' learning outcomes and subjective evaluation measures as well as explore the influence the difficulty adjustment methods have on the practice behaviors. Our results suggest the usefulness of taking into account the practice behavior in addition to only using the practice performance to inform adaptive intervention and difficulty adjustment methods.
Keyword: quantization

Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
Authors: Björn Deiseroth, Max Meuer, Nikolas Gritsch, Constantin Eichenberg, Patrick Schramowski, Matthias Aßenmacher, Kristian Kersting
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.01544
Pdf link: https://arxiv.org/pdf/2311.01544
Abstract Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. Their ever-increasing size, however, raised concerns about their effective deployment and the need for LLM compressions. This study introduces the Divergent Token metrics (DTMs), a novel approach for assessing compressed LLMs, addressing the limitations of traditional measures like perplexity that fail to accurately reflect text generation quality. DTMs focus on token divergence, providing deeper insights into the subtleties of model compression. Our results indicate that significant levels of precision and sparsity can be achieved without compromising text generation quality. Moreover, DTMs offers a more precise evaluation of each component's impact individually. Utilizing the First Divergent Token metric (FDTM) in model sparsification reveals that nearly 20% of all components can be pruned over 90%. In terms of quantization, the FDTM suggests that over 80% of parameters can be straightforwardly transformed to int8 without special outlier management.
AFPQ: Asymmetric Floating Point Quantization for LLMs
Authors: Yijia Zhang, Sicheng Zhang, Shijie Cao, Dayou Du, Jianyu Wei, Ting Cao, Ningyi Xu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.01792
Pdf link: https://arxiv.org/pdf/2311.01792
Abstract Large language models (LLMs) show great performance in various tasks, but face deployment challenges from limited memory capacity and bandwidth. Low-bit weight quantization can save memory and accelerate inference. Although floating-point (FP) formats show good performance in LLM quantization, they tend to perform poorly with small group sizes or sub-4 bits. We find the reason is that the absence of asymmetry in previous FP quantization makes it unsuitable for handling asymmetric value distribution of LLM weight tensors. In this work, we propose asymmetric FP quantization (AFPQ), which sets separate scales for positive and negative values. Our method leads to large accuracy improvements and can be easily plugged into other quantization methods, including GPTQ and AWQ, for better performance. Besides, no additional storage is needed compared with asymmetric integer (INT) quantization. The code is available at https://github.com/zhangsichengsjtu/AFPQ.

A-suozhang / GetArxivDaily

New submissions for Mon, 6 Nov 23 #193

Keyword: efficient

The numerical linear algebra of weights: from the spectral analysis to conditioning and preconditioning in the Laplacian case

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation

Fast Many-to-Many Routing for Dynamic Taxi Sharing with Meeting Points

Vertical Decomposition in 3D and 4D with Applications to Line Nearest-Neighbor Searching in 3D

DRNet: A Decision-Making Method for Autonomous Lane Changingwith Deep Reinforcement Learning

FLAP: Fast Language-Audio Pre-training

Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula

Detecting Spurious Correlations via Robust Visual Concepts in Real and AI-Generated Image Classification

Comparando Estratégias de Roteamento em Redes Quânticas Oportunísticas

CraterGrader: Autonomous Robotic Terrain Manipulation for Lunar Site Preparation and Earthmoving

Physics-Informed Generator-Encoder Adversarial Networks with Latent Space Matching for Stochastic Differential Equations

Second-Order Convergent Collision-Constrained Optimization-Based Planner

Energy Efficiency Optimization for Subterranean LoRaWAN Using A Reinforcement Learning Approach: A Direct-to-Satellite Scenario

TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices

Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation

Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression

TCM-GPT: Efficient Pre-training of Large Language Models for Domain Adaptation in Traditional Chinese Medicine

Near-Optimal Quantum Algorithms for Bounded Edit Distance and Lempel-Ziv Factorization

Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT

Universal Multi-modal Multi-domain Pre-trained Recommendation

Efficient Black-Box Adversarial Attacks on Neural Text Detectors

Enhancing search engine precision and user experience through sentiment-based polysemy resolution

GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling

Trust-Preserved Human-Robot Shared Autonomy enabled by Bayesian Relational Event Modeling

DeliverAI: Reinforcement Learning Based Distributed Path-Sharing Network for Food Deliveries

Fast Approximation Algorithms for Piercing Boxes by Points

LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery

Active Learning-Based Species Range Estimation

Solving Woeginger's Hiking Problem: Wonderful Partitions in Anonymous Hedonic Games

Envy-Free Cake-Cutting for Four Agents

Keyword: faster

Fast Many-to-Many Routing for Dynamic Taxi Sharing with Meeting Points

Generalizations of Matrix Multiplication can solve the Light Bulb Problem

Second-Order Convergent Collision-Constrained Optimization-Based Planner

Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware

Simplifying Transformer Blocks

ForecastPFN: Synthetically-Trained Zero-Shot Forecasting

Streaming Algorithms for Weighted $k$-Disjoint Matchings

Keyword: mobile

Dispersion, Capacitated Nodes, and the Power of a Trusted Shepherd

Communication-Efficient Federated Non-Linear Bandit Optimization

A Neural Radiance Field-Based Architecture for Intelligent Multilayered View Synthesis

Leveraging Mobile Learning Platforms for Flexible Education Delivery: Bridging Educational Gaps in Afghanistan

Control Design for Trajectory Tracking and Stabilization of Sensor LOS in an Inertially Stabilized Platform

Keyword: pruning

CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference

Keyword: diffusion

Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation

Improving Fairness using Vision-Language Driven Image Augmentation

CDGraph: Dual Conditional Social Graph Synthesizing via Diffusion Model

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

On the Generalization Properties of Diffusion Models

DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder

Keyword: adaptive

Sequential Subset Matching for Dataset Distillation

"Close...but not as good as an educator." -- Using ChatGPT to provide formative feedback in large-class collaborative learning

Distributed Multi-Robot Multi-Target Tracking Using Heterogeneous Limited-Range Sensors

Low Overhead Beam Alignment for Mobile Millimeter Channel Based on Continuous-Time Prediction

Mix-ME: Quality-Diversity for Multi-Agent Learning

Adaptive Assistance with an Active and Soft Back-Support Exosuit to Unknown External Loads via Model-Based Estimates of Internal Lumbosacral Moments

SortNet: Learning To Rank By a Neural-Based Sorting Algorithm

Domain Randomization via Entropy Maximization

Does Difficulty even Matter? Investigating Difficulty Adjustment and Practice Behavior in an Open-Ended Learning Task

Keyword: quantization

Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

AFPQ: Asymmetric Floating Point Quantization for LLMs