New submissions for Fri, 6 Oct 23

Keyword: efficient

A quantum system control method based on enhanced reinforcement learning

Authors: Wenjie Liu, Bosi Wang, Jihao Fan, Yebo Ge, Mohammed Zidan
Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2310.03036
Pdf link: https://arxiv.org/pdf/2310.03036
Abstract Traditional quantum system control methods often face different constraints, and are easy to cause both leakage and stochastic control errors under the condition of limited resources. Reinforcement learning has been proved as an efficient way to complete the quantum system control task. To learn a satisfactory control strategy under the condition of limited resources, a quantum system control method based on enhanced reinforcement learning (QSC-ERL) is proposed. The states and actions in reinforcement learning are mapped to quantum states and control operations in quantum systems. By using new enhanced neural networks, reinforcement learning can quickly achieve the maximization of long-term cumulative rewards, and a quantum state can be evolved accurately from an initial state to a target state. According to the number of candidate unitary operations, the three-switch control is used for simulation experiments. Compared with other methods, the QSC-ERL achieves close to 1 fidelity learning control of quantum systems, and takes fewer episodes to quantum state evolution under the condition of limited resources.
A Deep Reinforcement Learning Approach for Interactive Search with Sentence-level Feedback
Authors: Jianghong Zhou, Joyce C. Ho, Chen Lin, Eugene Agichtein
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2310.03043
Pdf link: https://arxiv.org/pdf/2310.03043
Abstract Interactive search can provide a better experience by incorporating interaction feedback from the users. This can significantly improve search accuracy as it helps avoid irrelevant information and captures the users' search intents. Existing state-of-the-art (SOTA) systems use reinforcement learning (RL) models to incorporate the interactions but focus on item-level feedback, ignoring the fine-grained information found in sentence-level feedback. Yet such feedback requires extensive RL action space exploration and large amounts of annotated data. This work addresses these challenges by proposing a new deep Q-learning (DQ) approach, DQrank. DQrank adapts BERT-based models, the SOTA in natural language processing, to select crucial sentences based on users' engagement and rank the items to obtain more satisfactory responses. We also propose two mechanisms to better explore optimal actions. DQrank further utilizes the experience replay mechanism in DQ to store the feedback sentences to obtain a better initial ranking performance. We validate the effectiveness of DQrank on three search datasets. The results show that DQRank performs at least 12% better than the previous SOTA RL approaches. We also conduct detailed ablation studies. The ablation results demonstrate that each model component can efficiently extract and accumulate long-term engagement effects from the users' sentence-level feedback. This structure offers new technologies with promised performance to construct a search system with sentence-level interaction.
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
Authors: Ivan Tang, Eric Zhang, Ray Gu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03059
Pdf link: https://arxiv.org/pdf/2310.03059
Abstract The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code will be released at https://github.com/EvenJoker/Point-PEFT.
Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising
Authors: Hui Shi (IMB), Yann Traonmilin (IMB), J-F Aujol (IMB)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2310.03085
Pdf link: https://arxiv.org/pdf/2310.03085
Abstract We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural networks (DNN), complex distributions can be recovered from a large training database.To reduce the computational burden of this task, we adapt the compressive learning framework to the learning of regularizers parametrized by DNN. We propose two variants of stochastic gradient descent (SGD) for the recovery of deep regularization parameters from a heavily compressed database. These algorithms outperform the initially proposed method that was limited to low-dimensional signals, each iteration using information from the whole database. They also benefit from classical SGD convergence guarantees. Thanks to these improvements we show that this method can be applied for patch based image denoising.}
Privacy-preserving Multi-biometric Indexing based on Frequent Binary Patterns
Authors: Daile Osorio-Roig, Lazaro J. Gonzalez-Soler, Christian Rathgeb, Christoph Busch
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03091
Pdf link: https://arxiv.org/pdf/2310.03091
Abstract The development of large-scale identification systems that ensure the privacy protection of enrolled subjects represents a major challenge. Biometric deployments that provide interoperability and usability by including efficient multi-biometric solutions are a recent requirement. In the context of privacy protection, several template protection schemes have been proposed in the past. However, these schemes seem inadequate for indexing (workload reduction) in biometric identification systems. More specifically, they have been used in identification systems that perform exhaustive searches, leading to a degradation of computational efficiency. To overcome these limitations, we propose an efficient privacy-preserving multi-biometric identification system that retrieves protected deep cancelable templates and is agnostic with respect to biometric characteristics and biometric template protection schemes. To this end, a multi-biometric binning scheme is designed to exploit the low intra-class variation properties contained in the frequent binary patterns extracted from different types of biometric characteristics. Experimental results reported on publicly available databases using state-of-the-art Deep Neural Network (DNN)-based embedding extractors show that the protected multi-biometric identification system can reduce the computational workload to approximately 57\% (indexing up to three types of biometric characteristics) and 53% (indexing up to two types of biometric characteristics), while simultaneously improving the biometric performance of the baseline biometric system at the high-security thresholds. The source code of the proposed multi-biometric indexing approach together with the composed multi-biometric dataset, will be made available to the research community once the article is accepted.
NOCAP: Near-Optimal Correlation-Aware Partitioning Joins
Authors: Zichen Zhu, Xiao Hu, Manos Athanassoulis
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2310.03098
Pdf link: https://arxiv.org/pdf/2310.03098
Abstract Storage-based joins are still commonly used today because the memory budget does not always scale with the data size. One of the many join algorithms developed that has been widely deployed and proven to be efficient is the Hybrid Hash Join (HHJ), which is designed to exploit any available memory to maximize the data that is joined directly in memory. However, HHJ cannot fully exploit detailed knowledge of the join attribute correlation distribution. In this paper, we show that given a correlation skew in the join attributes, HHJ partitions data in a suboptimal way. To do that, we derive the optimal partitioning using a new cost-based analysis of partitioning-based joins that is tailored for primary key - foreign key (PK-FK) joins, one of the most common join types. This optimal partitioning strategy has a high memory cost, thus, we further derive an approximate algorithm that has tunable memory cost and leads to near-optimal results. Our algorithm, termed NOCAP (Near-Optimal Correlation-Aware Partitioning) join, outperforms the state-of-the-art for skewed correlations by up to $30\%$, and the textbook Grace Hash Join by up to $4\times$. Further, for a limited memory budget, NOCAP outperforms HHJ by up to $10\%$, even for uniform correlation. Overall, NOCAP dominates state-of-the-art algorithms and mimics the best algorithm for a memory budget varying from below $\sqrt{|\text{relation}|}$ to more than $|\text{relation}|$.
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition
Authors: Hamid Mohammadi, Ehsan Nazerfard, Tahereh Firoozi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03108
Pdf link: https://arxiv.org/pdf/2310.03108
Abstract Video violence recognition based on deep learning concerns accurate yet scalable human violence recognition. Currently, most state-of-the-art video violence recognition studies use CNN-based models to represent and categorize videos. However, recent studies suggest that pre-trained transformers are more accurate than CNN-based models on various video analysis benchmarks. Yet these models are not thoroughly evaluated for video violence recognition. This paper introduces a novel transformer-based Mixture of Experts (MoE) video violence recognition system. Through an intelligent combination of large vision transformers and efficient transformer architectures, the proposed system not only takes advantage of the vision transformer architecture but also reduces the cost of utilizing large vision transformers. The proposed architecture maximizes violence recognition system accuracy while actively reducing computational costs through a reinforcement learning-based router. The empirical results show the proposed MoE architecture's superiority over CNN-based models by achieving 92.4% accuracy on the RWF dataset.
Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models
Authors: Zihao Lin, Yan Sun, Yifan Shi, Xueqian Wang, Lifu Huang, Li Shen, Dacheng Tao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03123
Pdf link: https://arxiv.org/pdf/2310.03123
Abstract With the blowout development of pre-trained models (PTMs), the efficient tuning of these models for diverse downstream applications has emerged as a pivotal research concern. Although recent investigations into prompt tuning have provided promising avenues, three salient challenges persist: (1) memory constraint: the continuous growth in the size of open-source PTMs renders fine-tuning, even a fraction of their parameters, challenging for many practitioners. (2) model privacy: existing PTMs often function as public API services, with their parameters inaccessible for effective or tailored fine-tuning. (3) data privacy: the fine-tuning of PTMs necessitates high-quality datasets, which are typically localized and not shared to public. To optimally harness each local dataset while navigating memory constraints and preserving privacy, we propose Federated Black-Box Prompt Tuning (Fed-BBPT). This innovative approach eschews reliance on parameter architectures and private dataset access, instead capitalizing on a central server that aids local users in collaboratively training a prompt generator through regular aggregation. Local users leverage API-driven learning via a zero-order optimizer, obviating the need for PTM deployment. Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. A thorough evaluation across 40 datasets spanning CV and NLP tasks underscores the robustness of our proposed model.
Application-Oriented Co-Design of Motors and Motions for a 6DOF Robot Manipulator
Authors: Adrian Stein, Yebin Wang, Yusuke Sakamoto, Bingnan Wang, Huazhen Fang
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.03132
Pdf link: https://arxiv.org/pdf/2310.03132
Abstract This work investigates an application-driven co-design problem where the motion and motors of a six degrees of freedom robotic manipulator are optimized simultaneously, and the application is characterized by a set of tasks. Unlike the state-of-the-art which selects motors from a product catalogue and performs co-design for a single task, this work designs the motor geometry as well as motion for a specific application. Contributions are made towards solving the proposed co-design problem in a computationally-efficient manner. First, a two-step process is proposed, where multiple motor designs are identified by optimizing motions and motors for multiple tasks one by one, and then are reconciled to determine the final motor design. Second, magnetic equivalent circuit modeling is exploited to establish the analytic mapping from motor design parameters to dynamic models and objective functions to facilitate the subsequent differentiable simulation. Third, a direct-collocation-based differentiable simulator of motor and robotic arm dynamics is developed to balance the computational complexity and numerical stability. Simulation verifies that higher performance for a specific application can be achieved with the multi-task method, compared to several benchmark co-design methods.
Design and Optimization of Heterogeneous Coded Distributed Computing with Nonuniform File Popularity
Authors: Yong Deng, Min Dong
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2310.03142
Pdf link: https://arxiv.org/pdf/2310.03142
Abstract This paper studies MapReduce-based heterogeneous coded distributed computing (CDC) where, besides different computing capabilities at workers, input files to be accessed by computing jobs have nonuniform popularity. We propose a file placement strategy that can handle an arbitrary number of input files. Furthermore, we design a nested coded shuffling strategy that can efficiently manage the nonuniformity of file popularity to maximize the coded multicasting opportunity. We then formulate the joint optimization of the proposed file placement and nested shuffling design variables to optimize the proposed CDC scheme. To reduce the high computational complexity in solving the resulting mixed-integer linear programming (MILP) problem, we propose a simple two-file-group-based file placement approach to obtain an approximate solution. Numerical results show that the optimized CDC scheme outperforms other alternatives. Also, the proposed two-file-group-based approach achieves nearly the same performance as the conventional branch-and-cut method in solving the MILP problem but with substantially lower computational complexity that is scalable over the number of files and workers. For computing jobs with aggregate target functions that commonly appear in machine learning applications, we propose a heterogeneous compressed CDC (C-CDC) scheme to further improve the shuffling efficiency. The C-CDC scheme uses a local data aggregation technique to compress the data to be shuffled for the shuffling load reduction. We again optimize the proposed C-CDC scheme and explore the two-file-group-based low-complexity approach for an approximate solution. Numerical results show the proposed C-CDC scheme provides a considerable shuffling load reduction over the CDC scheme, and also, the two-file-group-based file placement approach maintains good performance.
New Auction Algorithms for the Assignment Problem and Extensions
Authors: Dimitri Bertsekas
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2310.03159
Pdf link: https://arxiv.org/pdf/2310.03159
Abstract We consider the classical linear assignment problem, and we introduce new auction algorithms for its optimal and suboptimal solution. The algorithms are founded on duality theory, and are related to ideas of competitive bidding by persons for objects and the attendant market equilibrium, which underlie real-life auction processes. We distinguish between two fundamentally different types of bidding mechanisms: aggressive and cooperative. Mathematically, aggressive bidding relies on a notion of approximate coordinate descent in dual space, an epsilon-complementary slackness condition to regulate the amount of descent approximation, and the idea of epsilon-scaling to resolve efficiently the price wars that occur naturally as multiple bidders compete for a smaller number of valuable objects. Cooperative bidding avoids price wars through detection and cooperative resolution of any competitive impasse that involves a group of persons. We discuss the relations between the aggressive and the cooperative bidding approaches, we derive new algorithms and variations that combine ideas from both of them, and we also make connections with other primal-dual methods, including the Hungarian method. Furthermore, our discussion points the way to algorithmic extensions that apply more broadly to network optimization, including shortest path, max-flow, transportation, and minimum cost flow problems with both linear and convex cost functions.
Enhancing Accuracy in Deep Learning Using Random Matrix Theory
Authors: Leonid Berlyand, Etienne Sandier, Yitzchak Shmalo, Lei Zhang
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.03165
Pdf link: https://arxiv.org/pdf/2310.03165
Abstract In this study, we explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning to simplify DNN architecture and loss landscape. RMT, recently used to address overfitting in deep learning, enables the examination of DNN's weight layer spectra. We use these techniques to optimally determine the number of singular values to be removed from the weight layers of a DNN during training via singular value decomposition (SVD). This process aids in DNN simplification and accuracy enhancement, as evidenced by training simple DNN models on the MNIST and Fashion MNIST datasets. Our method can be applied to any fully connected or convolutional layer of a pretrained DNN, decreasing the layer's parameters and simplifying the DNN architecture while preserving or even enhancing the model's accuracy. By discarding small singular values based on RMT criteria, the accuracy of the test set remains consistent, facilitating more efficient DNN training without compromising performance. We provide both theoretical and empirical evidence supporting our claim that the elimination of small singular values based on RMT does not negatively impact the DNN's accuracy. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors
Authors: Biagio Montaruli, Luca Demetrio, Maura Pintor, Luca Compagna, Davide Balzarotti, Battista Biggio
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03166
Pdf link: https://arxiv.org/pdf/2310.03166
Abstract Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.
Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication
Authors: Zhe Zhao, Qingyun Liu, Huan Gui, Bang An, Lichan Hong, Ed H. Chi
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03188
Pdf link: https://arxiv.org/pdf/2310.03188
Abstract Many recent breakthroughs in machine learning have been enabled by the pre-trained foundation models. By scaling up model parameters, training data, and computation resources, foundation models have significantly advanced the state-of-the-art in many applications. However, it is still an open question of how to use these models to perform downstream tasks efficiently. Knowledge distillation (KD) has been explored to tackle this challenge. KD transfers knowledge from a large teacher model to a smaller student model. While KD has been successful in improving student model performance, recent research has discovered that a powerful teacher does not necessarily lead to a powerful student, due to their huge capacity gap. In addition, the potential distribution shifts between the pre-training data and downstream tasks can make knowledge transfer in KD sub-optimal for improving downstream task performance. In this paper, we extend KD with an interactive communication process to help students of downstream tasks learn effectively from pre-trained foundation models. Our design is inspired by the way humans learn from teachers who can explain knowledge in a way that meets the students' needs. Specifically, we let each model (i.e., student and teacher) train two components: (1) an encoder encoding the model's hidden states to a message and (2) a decoder decoding any messages to its own hidden states. With encoder and decoder, not only can the teacher transfer rich information by encoding its hidden states, but also the student can send messages with information of downstream tasks to the teacher. Therefore, knowledge passing from teacher to student can be tailored to the student's capacity and downstream tasks' distributions. We conducted experiments on benchmark datasets to show that our communication mechanism outperforms state-of-the-art distillation techniques.
PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks
Authors: Samaneh Javadinia, Amirali Baniasadi
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03212
Pdf link: https://arxiv.org/pdf/2310.03212
Abstract Convolutional Neural Networks (CNNs) have produced state-of-the-art results for image classification tasks. However, they are limited in their ability to handle rotational and viewpoint variations due to information loss in max-pooling layers. Capsule Networks (CapsNets) employ a computationally-expensive iterative process referred to as dynamic routing to address these issues. CapsNets, however, often fall short on complex datasets and require more computational resources than CNNs. To overcome these challenges, we introduce the Parallel Dynamic Routing CapsNet (PDR-CapsNet), a deeper and more energy-efficient alternative to CapsNet that offers superior performance, less energy consumption, and lower overfitting rates. By leveraging a parallelization strategy, PDR-CapsNet mitigates the computational complexity of CapsNet and increases throughput, efficiently using hardware resources. As a result, we achieve 83.55\% accuracy while requiring 87.26\% fewer parameters, 32.27\% and 47.40\% fewer MACs, and Flops, achieving 3x faster inference and 7.29J less energy consumption on a 2080Ti GPU with 11GB VRAM compared to CapsNet and for the CIFAR-10 dataset.
TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design
Authors: Tony Shen, Mohit Pandey, Martin Ester
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03223
Pdf link: https://arxiv.org/pdf/2310.03223
Abstract We seek to automate the generation of drug-like compounds conditioned to specific protein pocket targets. Most current methods approximate the protein-molecule distribution of a finite dataset and, therefore struggle to generate molecules with significant binding improvement over the training dataset. We instead frame the pocket-conditioned molecular generation task as an RL problem and develop TacoGFN, a target conditional Generative Flow Network model. Our method is explicitly encouraged to generate molecules with desired properties as opposed to fitting on a pre-existing data distribution. To this end, we develop transformer-based docking score prediction to speed up docking score computation and propose TacoGFN to explore molecule space efficiently. Furthermore, we incorporate several rounds of active learning where generated samples are queried using a docking oracle to improve the docking score prediction. This approach allows us to accurately explore as much of the molecule landscape as we can afford computationally. Empirically, molecules generated using TacoGFN and its variants significantly outperform all baseline methods across every property (Docking score, QED, SA, Lipinski), while being orders of magnitude faster.
History Matching for Geological Carbon Storage using Data-Space Inversion with Spatio-Temporal Data Parameterization
Authors: Su Jiang, Louis J. Durlofsky
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03228
Pdf link: https://arxiv.org/pdf/2310.03228
Abstract History matching based on monitoring data will enable uncertainty reduction, and thus improved aquifer management, in industrial-scale carbon storage operations. In traditional model-based data assimilation, geomodel parameters are modified to force agreement between flow simulation results and observations. In data-space inversion (DSI), history-matched quantities of interest, e.g., posterior pressure and saturation fields conditioned to observations, are inferred directly, without constructing posterior geomodels. This is accomplished efficiently using a set of O(1000) prior simulation results, data parameterization, and posterior sampling within a Bayesian setting. In this study, we develop and implement (in DSI) a deep-learning-based parameterization to represent spatio-temporal pressure and CO2 saturation fields at a set of time steps. The new parameterization uses an adversarial autoencoder (AAE) for dimension reduction and a convolutional long short-term memory (convLSTM) network to represent the spatial distribution and temporal evolution of the pressure and saturation fields. This parameterization is used with an ensemble smoother with multiple data assimilation (ESMDA) in the DSI framework to enable posterior predictions. A realistic 3D system characterized by prior geological realizations drawn from a range of geological scenarios is considered. A local grid refinement procedure is introduced to estimate the error covariance term that appears in the history matching formulation. Extensive history matching results are presented for various quantities, for multiple synthetic true models. Substantial uncertainty reduction in posterior pressure and saturation fields is achieved in all cases. The framework is applied to efficiently provide posterior predictions for a range of error covariance specifications. Such an assessment would be expensive using a model-based approach.
${\tt MORALS}$: Analysis of High-Dimensional Robot Controllers via Topological Tools in a Latent Space
Authors: Ewerton R. Vieira, Aravind Sivaramakrishnan, Sumanth Tangirala, Edgar Granados, Konstantin Mischaikow, Kostas E. Bekris
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.03246
Pdf link: https://arxiv.org/pdf/2310.03246
Abstract Estimating the region of attraction (${\tt RoA}$) for a robotic system's controller is essential for safe application and controller composition. Many existing methods require access to a closed-form expression that limit applicability to data-driven controllers. Methods that operate only over trajectory rollouts tend to be data-hungry. In prior work, we have demonstrated that topological tools based on Morse Graphs offer data-efficient ${\tt RoA}$ estimation without needing an analytical model. They struggle, however, with high-dimensional systems as they operate over a discretization of the state space. This paper presents ${\it Mo}$rse Graph-aided discovery of ${\it R}$egions of ${\it A}$ttraction in a learned ${\it L}$atent ${\it S}$pace (${\tt MORALS}$). The approach combines autoencoding neural networks with Morse Graphs. ${\tt MORALS}$ shows promising predictive capabilities in estimating attractors and their ${\tt RoA}$s for data-driven controllers operating over high-dimensional systems, including a 67-dim humanoid robot and a 96-dim 3-fingered manipulator. It first projects the dynamics of the controlled system into a learned latent space. Then, it constructs a reduced form of Morse Graphs representing the bistability of the underlying dynamics, i.e., detecting when the controller results in a desired versus an undesired behavior. The evaluation on high-dimensional robotic datasets indicates the data efficiency of the approach in ${\tt RoA}$ estimation.
Xcrum: A Synergistic Approach Integrating Extreme Programming with Scrum
Authors: Siavash Hosseini
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2310.03248
Pdf link: https://arxiv.org/pdf/2310.03248
Abstract In today's modern world, software plays a pivotal role. Software development is a highly complex and time-consuming process, demanding multidimensional efforts. Companies continually adapt their requirements to align with the evolving environment, with a specific emphasis on rapid delivery and the acceptance of changing requirements. Traditional models, such as plan-driven development, often fall short in meeting these demands. In the realm of software development, Agile has been the focal point of global discourse for both researchers and developers. Agile development is better suited to customize and streamline the development process, offering a highly flexible, early, and rapid delivery lifecycle conducive to efficient software development. This article aims to provide an overview of two prominent Agile methodologies: Scrum and Extreme Programming (XP). It achieves this by reviewing relevant publications, analyzing their impact on software development, exploring the distinctive features of each methodology, and conducting a comparative assessment. Furthermore, the article offers personal insights and recommendations. Notably, the integration of XP practices into Scrum has given rise to a novel hybrid methodology known as "Xcrum," which retains its agility. It should be highlighted that, given this new approach's incorporation of the strengths of both methods, it holds the potential to outperform the original frameworks.
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
Authors: Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03270
Pdf link: https://arxiv.org/pdf/2310.03270
Abstract Diffusion models have demonstrated remarkable capabilities in image synthesis and related generative tasks. Nevertheless, their practicality for low-latency real-world applications is constrained by substantial computational costs and latency issues. Quantization is a dominant way to compress and accelerate diffusion models, where post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches, each bearing its own properties. While PTQ exhibits efficiency in terms of both time and data usage, it may lead to diminished performance in low bit-width. On the other hand, QAT can alleviate performance degradation but comes with substantial demands on computational and data resources. To capitalize on the advantages while avoiding their respective drawbacks, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. Specifically, we propose a quantization-aware variant of the low-rank adapter (QALoRA) that can be merged with model weights and jointly quantized to low bit-width. The fine-tuning process distills the denoising capabilities of the full-precision model into its quantized counterpart, eliminating the requirement for training data. We also introduce scale-aware optimization and employ temporal learned step-size quantization to further enhance performance. Extensive experimental results demonstrate that our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency. Specifically, there is only a marginal 0.05 sFID increase when quantizing both weights and activations of LDM-4 to 4-bit on ImageNet 256x256. Compared to QAT-based methods, our EfficientDM also boasts a 16.2x faster quantization speed with comparable generation quality.
Network Alignment with Transferable Graph Autoencoders
Authors: Jiashu He, Charilaos I. Kanatsoulis, Alejandro Ribeiro
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03272
Pdf link: https://arxiv.org/pdf/2310.03272
Abstract Network alignment is the task of establishing one-to-one correspondences between the nodes of different graphs and finds a plethora of applications in high-impact domains. However, this task is known to be NP-hard in its general form, and existing algorithms do not scale up as the size of the graphs increases. To tackle both challenges we propose a novel generalized graph autoencoder architecture, designed to extract powerful and robust node embeddings, that are tailored to the alignment task. We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs and can achieve more accurate alignment compared to classical spectral methods. Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining. Extensive experiments on both network and sub-network alignment with real-world graphs provide corroborating evidence supporting the effectiveness and scalability of the proposed approach.
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
Authors: Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.03294
Pdf link: https://arxiv.org/pdf/2310.03294
Abstract Increasing the context length of large language models (LLMs) unlocks fundamentally new capabilities, but also significantly increases the memory footprints of training. Previous model-parallel systems such as Megatron-LM partition and compute different attention heads in parallel, resulting in large communication volumes, so they cannot scale beyond the number of attention heads, thereby hindering its adoption. In this paper, we introduce a new approach, LightSeq, for long-context LLMs training. LightSeq has many notable advantages. First, LightSeq partitions over the sequence dimension, hence is agnostic to model architectures and readily applicable for models with varying numbers of attention heads, such as Multi-Head, Multi-Query and Grouped-Query attention. Second, LightSeq not only requires up to 4.7x less communication than Megatron-LM on popular LLMs but also overlaps the communication with computation. To further reduce the training time, LightSeq features a novel gradient checkpointing scheme to bypass an forward computation for memory-efficient attention. We evaluate LightSeq on Llama-7B and its variants with sequence lengths from 32K to 512K. Through comprehensive experiments on single and cross-node training, we show that LightSeq achieves up to 1.24-2.01x end-to-end speedup, and a 2-8x longer sequence length on models with fewer heads, compared to Megatron-LM. Codes will be available at https://github.com/RulinShao/LightSeq.
Can pre-trained models assist in dataset distillation?
Authors: Yao Lu, Xuguang Chen, Yuchen Zhang, Jianyang Gu, Tianle Zhang, Yifan Zhang, Xiaoniu Yang, Qi Xuan, Kai Wang, Yang You
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03295
Pdf link: https://arxiv.org/pdf/2310.03295
Abstract Dataset Distillation (DD) is a prominent technique that encapsulates knowledge from a large-scale original dataset into a small synthetic dataset for efficient training. Meanwhile, Pre-trained Models (PTMs) function as knowledge repositories, containing extensive information from the original dataset. This naturally raises a question: Can PTMs effectively transfer knowledge to synthetic datasets, guiding DD accurately? To this end, we conduct preliminary experiments, confirming the contribution of PTMs to DD. Afterwards, we systematically study different options in PTMs, including initialization parameters, model architecture, training epoch and domain knowledge, revealing that: 1) Increasing model diversity enhances the performance of synthetic datasets; 2) Sub-optimal models can also assist in DD and outperform well-trained ones in certain cases; 3) Domain-specific PTMs are not mandatory for DD, but a reasonable domain match is crucial. Finally, by selecting optimal options, we significantly improve the cross-architecture generalization over baseline DD methods. We hope our work will facilitate researchers to develop better DD techniques. Our code is available at https://github.com/yaolu-zjut/DDInterpreter.
Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning
Authors: Shaotian Yan, Chen Shen, Junjie Liu, Jieping Ye
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03309
Pdf link: https://arxiv.org/pdf/2310.03309
Abstract Exploiting large language models (LLMs) to tackle deductive reasoning has garnered growing attention. It still remains highly challenging to achieve satisfactory results in complex deductive problems, characterized by plenty of premises (i.e., facts or rules) entailing intricate relationships among entities and requiring multi-hop reasoning. One intuitive solution is to decompose the original task into smaller sub-tasks, and then chain the multiple casual reasoning steps together in a forward (e.g., Selection-Inference) or backward (e.g., LAMBADA) direction. However, these techniques inevitably necessitate a large number of overall stages, leading to computationally expensive operations and a higher possibility of making misleading steps. In addition to stage-by-stage decomposition, we draw inspiration from another aspect of human problem-solving. Humans tend to distill the most relevant information and organize their thoughts systematically (e.g., creating mind maps), which assists them in answering questions or drawing conclusions precisely and quickly. In light of this, we propose a novel reasoning approach named Concise and Organized Perception (COP). COP carefully analyzes the given statements to efficiently identify the most pertinent information while eliminating redundancy. It then prompts the LLMs in a more organized form that adapts to the model's inference process. By perceiving concise and organized proofs, the deductive reasoning abilities of LLMs can be better elicited, and the risk of acquiring errors caused by excessive reasoning stages is mitigated. Furthermore, our approach can be combined with the aforementioned ones to further boost their performance. Extensive experimental results on three popular deductive benchmarks (i.e., ProofWriter, PrOntoQA and PrOntoQA-OOD) show that COP significantly outperforms previous state-of-the-art methods.
Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction
Authors: Aadi Kothari, Tony Tohme, Xiaotong Zhang, Kamal Youcef-Toumi
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03314
Pdf link: https://arxiv.org/pdf/2310.03314
Abstract Human motion prediction is an essential step for efficient and safe human-robot collaboration. Current methods either purely rely on representing the human joints in some form of neural network-based architecture or use regression models offline to fit hyper-parameters in the hope of capturing a model encompassing human motion. While these methods provide good initial results, they are missing out on leveraging well-studied human body kinematic models as well as body and scene constraints which can help boost the efficacy of these prediction frameworks while also explicitly avoiding implausible human joint configurations. We propose a novel human motion prediction framework that incorporates human joint constraints and scene constraints in a Gaussian Process Regression (GPR) model to predict human motion over a set time horizon. This formulation is combined with an online context-aware constraints model to leverage task-dependent motions. It is tested on a human arm kinematic model and implemented on a human-robot collaborative setup with a UR5 robot arm to demonstrate the real-time capability of our approach. Simulations were also performed on datasets like HA4M and ANDY. The simulation and experimental results demonstrate considerable improvements in a Gaussian Process framework when these constraints are explicitly considered.
BioBridge: Bridging Biomedical Foundation Models via Knowledge Graph
Authors: Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N. Ioannidis, Huzefa Rangwala, Rishita Anubhai
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03320
Pdf link: https://arxiv.org/pdf/2310.03320
Abstract Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs.
Investigating the Limitation of CLIP Models: The Worst-Performing Categories
Authors: Jie-Jing Shao, Jiang-Xin Shi, Xiao-Wen Yang, Lan-Zhe Guo, Yu-Feng Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03324
Pdf link: https://arxiv.org/pdf/2310.03324
Abstract Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts, enabling zero-shot recognition on downstream tasks. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance. For example, on ImageNet, there are a total of 10 categories with class-wise accuracy as low as 0\%, even though the overall performance has achieved 64.1\%. This phenomenon reveals the potential risks associated with using CLIP models, particularly in risk-sensitive applications where specific categories hold significant importance. To address this issue, we investigate the alignment between the two modalities in the CLIP model and propose the Class-wise Matching Margin (\cmm) to measure the inference confusion. \cmm\ can effectively identify the worst-performing categories and estimate the potential performance of the candidate prompts. We further query large language models to enrich descriptions of worst-performing categories and build a weighted ensemble to highlight the efficient prompts. Experimental results clearly verify the effectiveness of our proposal, where the accuracy on the worst-10 categories on ImageNet is boosted to 5.2\%, without manual prompt engineering, laborious optimization, or access to labeled validation data.
CSI: Enhancing the Robustness of 3D Point Cloud Recognition against Corruption
Authors: Zhuoyuan Wu, Jiachen Sun, Chaowei Xiao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03360
Pdf link: https://arxiv.org/pdf/2310.03360
Abstract Despite recent advancements in deep neural networks for point cloud recognition, real-world safety-critical applications present challenges due to unavoidable data corruption. Current models often fall short in generalizing to unforeseen distribution shifts. In this study, we harness the inherent set property of point cloud data to introduce a novel critical subset identification (CSI) method, aiming to bolster recognition robustness in the face of data corruption. Our CSI framework integrates two pivotal components: density-aware sampling (DAS) and self-entropy minimization (SEM), which cater to static and dynamic CSI, respectively. DAS ensures efficient robust anchor point sampling by factoring in local density, while SEM is employed during training to accentuate the most salient point-to-point attention. Evaluations reveal that our CSI approach yields error rates of 18.4\% and 16.3\% on ModelNet40-C and PointCloud-C, respectively, marking a notable improvement over state-of-the-art methods by margins of 5.2\% and 4.2\% on the respective benchmarks. Code is available at \href{https://github.com/masterwu2115/CSI/tree/main}{https://github.com/masterwu2115/CSI/tree/main}
Motivating Next-Generation OS Physical Memory Management for Terabyte-Scale NVMMs
Authors: Shivank Garg, Aravinda Prasad, Debadatta Mishra, Sreenivas Subramoney
Subjects: Operating Systems (cs.OS)
Arxiv link: https://arxiv.org/abs/2310.03370
Pdf link: https://arxiv.org/pdf/2310.03370
Abstract Software managed byte-addressable hybrid memory systems consisting of DRAMs and NVMMs offer a lot of flexibility to design efficient large scale data processing applications. Operating systems (OS) play an important role in enabling the applications to realize the integrated benefits of DRAMs' low access latency and NVMMs' large capacity along with its persistent characteristics. In this paper, we comprehensively analyze the performance of conventional OS physical memory management subsystems that were designed only based on the DRAM memory characteristics in the context of modern hybrid byte-addressable memory systems. To study the impact of high access latency and large capacity of NVMMs on physical memory management, we perform an extensive evaluation on Linux with Intel's Optane NVMM. We observe that the core memory management functionalities such as page allocation are negatively impacted by high NVMM media latency, while functionalities such as conventional fragmentation management are rendered inadequate. We also demonstrate that certain traditional memory management functionalities are affected by neither aspects of modern NVMMs. We conclusively motivate the need to overhaul fundamental aspects of traditional OS physical memory management in order to fully exploit terabyte-scale NVMMs.
Design Optimizer for Planar Soft-Growing Robot Manipulators
Authors: Fabio Stroppa
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03374
Pdf link: https://arxiv.org/pdf/2310.03374
Abstract Soft-growing robots are innovative devices that feature plant-inspired growth to navigate environments. Thanks to their embodied intelligence of adapting to their surroundings and the latest innovation in actuation and manufacturing, it is possible to employ them for specific manipulation tasks. The applications of these devices include exploration of delicate/dangerous environments, manipulation of items, or assistance in domestic environments. This work presents a novel approach for design optimization of soft-growing robots, which will be used prior to manufacturing to suggest engineers -- or robot designer enthusiasts -- the optimal dimension of the robot to be built for solving a specific task. I modeled the design process as a multi-objective optimization problem, in which I optimize the kinematic chain of a soft manipulator to reach targets and avoid unnecessary overuse of material and resources. The method exploits the advantages of population-based optimization algorithms, in particular evolutionary algorithms, to transform the problem from multi-objective into a single-objective thanks to an efficient mathematical formulation, the novel rank-partitioning algorithm, and obstacle avoidance integrated within the optimizer operators. I tested the proposed method on different tasks to access its optimality, which showed significant performance in solving the problem. Finally, comparative experiments showed that the proposed method works better than the one existing in the literature in terms of precision, resource consumption, and run time.
Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning
Authors: Zhaorun Chen, Binhao Chen, Tairan He, Liang Gong, Chengliang Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.03379
Pdf link: https://arxiv.org/pdf/2310.03379
Abstract Safety assurance of Reinforcement Learning (RL) is critical for exploration in real-world scenarios. In handling the Constrained Markov Decision Process, current approaches experience intrinsic difficulties in trading-off between optimality and feasibility. Direct optimization methods cannot strictly guarantee state-wise in-training safety while projection-based methods are usually inefficient and correct actions through lengthy iterations. To address these two challenges, this paper proposes an adaptive surrogate chance constraint for the safety cost, and a hierarchical architecture that corrects actions produced by the upper policy layer via a fast Quasi-Newton method. Theoretical analysis indicates that the relaxed probabilistic constraint can sufficiently guarantee forward invariance to the safe set. We validate the proposed method on 4 simulated and real-world safety-critical robotic tasks. Results indicate that the proposed method can efficiently enforce safety (nearly zero-violation), while preserving optimality (+23.8%), robustness and generalizability to stochastic real-world settings.
Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations
Authors: Lorenc Kapllani, Long Teng, Matthias Rottmann
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03393
Pdf link: https://arxiv.org/pdf/2310.03393
Abstract Deep learning-based numerical schemes for solving high-dimensional backward stochastic differential equations (BSDEs) have recently raised plenty of scientific interest. While they enable numerical methods to approximate very high-dimensional BSDEs, their reliability has not been studied and is thus not understood. In this work, we study uncertainty quantification (UQ) for a class of deep learning-based BSDE schemes. More precisely, we review the sources of uncertainty involved in the schemes and numerically study the impact of different sources. Usually, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. This approach is computationally quite expensive, especially for high-dimensional problems. Hence, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be leveraged to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.
Learning to Simplify Spatial-Temporal Graphs in Gait Analysis
Authors: Adrian Cosma, Emilian Radoi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03396
Pdf link: https://arxiv.org/pdf/2310.03396
Abstract Gait analysis leverages unique walking patterns for person identification and assessment across multiple domains. Among the methods used for gait analysis, skeleton-based approaches have shown promise due to their robust and interpretable features. However, these methods often rely on hand-crafted spatial-temporal graphs that are based on human anatomy disregarding the particularities of the dataset and task. This paper proposes a novel method to simplify the spatial-temporal graph representation for gait-based gender estimation, improving interpretability without losing performance. Our approach employs two models, an upstream and a downstream model, that can adjust the adjacency matrix for each walking instance, thereby removing the fixed nature of the graph. By employing the Straight-Through Gumbel-Softmax trick, our model is trainable end-to-end. We demonstrate the effectiveness of our approach on the CASIA-B dataset for gait-based gender estimation. The resulting graphs are interpretable and differ qualitatively from fixed graphs used in existing models. Our research contributes to enhancing the explainability and task-specific adaptability of gait recognition, promoting more efficient and reliable gait-based biometrics.
IoTScent: Enhancing Forensic Capabilities in Internet of Things Gateways
Authors: Antonio Boiano, Alessandro Enrico Cesare Redondi, Matteo Cesana
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2310.03401
Pdf link: https://arxiv.org/pdf/2310.03401
Abstract The widespread deployment of Consumer Internet of Things devices in proximity to human activities makes them digital observers of our daily actions. This has led to a new field of digital forensics, known as IoT Forensics, where digital traces generated by IoT devices can serve as key evidence for forensic investigations. Thus, there is a need to develop tools that can efficiently acquire and store network traces from IoT ecosystems. This paper presents IoTScent, an open-source IoT forensic tool that enables IoT gateways and Home Automation platforms to perform IoT traffic capture and analysis. Unlike other works focusing on IP-based protocols, IoTScent is specifically designed to operate over IEEE 802.15.4-based traffic, which is the basis for many IoT-specific protocols such as Zigbee, 6LoWPAN and Thread. IoTScent offers live traffic capture and feature extraction capabilities, providing a framework for forensic data collection that simplifies the task of setting up a data collection pipeline, automating the data collection process, and providing ready-made features that can be used for forensic evidence extraction. This work provides a comprehensive description of the IoTScent tool, including a practical use case that demonstrates the use of the tool to perform device identification from Zigbee traffic. The study presented here significantly contributes to the ongoing research in IoT Forensics by addressing the challenges faced in the field and publicly releasing the IoTScent tool.
RUSOpt: Robotic UltraSound Probe Normalization with Bayesian Optimization for In-plane and Out-plane Scanning
Authors: Deepak Raina, Abhishek Mathur, Richard M. Voyles, Juan Wachs, SH Chandrashekhara, Subir Kumar Saha
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03406
Pdf link: https://arxiv.org/pdf/2310.03406
Abstract The one of the significant challenges faced by autonomous robotic ultrasound systems is acquiring high-quality images across different patients. The proper orientation of the robotized probe plays a crucial role in governing the quality of ultrasound images. To address this challenge, we propose a sample-efficient method to automatically adjust the orientation of the ultrasound probe normal to the point of contact on the scanning surface, thereby improving the acoustic coupling of the probe and resulting image quality. Our method utilizes Bayesian Optimization (BO) based search on the scanning surface to efficiently search for the normalized probe orientation. We formulate a novel objective function for BO that leverages the contact force measurements and underlying mechanics to identify the normal. We further incorporate a regularization scheme in BO to handle the noisy objective function. The performance of the proposed strategy has been assessed through experiments on urinary bladder phantoms. These phantoms included planar, tilted, and rough surfaces, and were examined using both linear and convex probes with varying search space limits. Further, simulation-based studies have been carried out using 3D human mesh models. The results demonstrate that the mean ($\pm$SD) absolute angular error averaged over all phantoms and 3D models is $\boldsymbol{2.4\pm0.7^\circ}$ and $\boldsymbol{2.1\pm1.3^\circ}$, respectively.
Pre-Training and Fine-Tuning Generative Flow Networks
Authors: Ling Pan, Moksh Jain, Kanika Madan, Yoshua Bengio
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03419
Pdf link: https://arxiv.org/pdf/2310.03419
Abstract Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.
Which mode is better for federated learning? Centralized or Decentralized
Authors: Yan Sun, Li Shen, Dacheng Tao
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.03461
Pdf link: https://arxiv.org/pdf/2310.03461
Abstract Both centralized and decentralized approaches have shown excellent performance and great application value in federated learning (FL). However, current studies do not provide sufficient evidence to show which one performs better. Although from the optimization perspective, decentralized methods can approach the comparable convergence of centralized methods with less communication, its test performance has always been inefficient in empirical studies. To comprehensively explore their behaviors in FL, we study their excess risks, including the joint analysis of both optimization and generalization. We prove that on smooth non-convex objectives, 1) centralized FL (CFL) always generalizes better than decentralized FL (DFL); 2) from perspectives of the excess risk and test error in CFL, adopting partial participation is superior to full participation; and, 3) there is a necessary requirement for the topology in DFL to avoid performance collapse as the training scale increases. Based on some simple hardware metrics, we could evaluate which framework is better in practice. Extensive experiments are conducted on common setups in FL to validate that our theoretical analysis is contextually valid in practical scenarios.
Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards
Authors: Litton J Kurisinkel, Nancy F chen
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.03473
Pdf link: https://arxiv.org/pdf/2310.03473
Abstract Memory-efficient large language models are good at refining text input for better readability. However, controllability is a matter of concern when it comes to text generation tasks with long inputs, such as multi-document summarization. In this work, we investigate for a generic controllable approach for multi-document summarization that leverages the capabilities of LLMs to refine the text. In particular, we train a controllable content extraction scheme to extract the text that will be refined by an LLM. The scheme is designed with a novel coverage and coherence intuitive policy, which is duly rewarded by a passively trained LLM. Our approach yields competitive results in the evaluation using ROUGE metrics and outperforms potential baselines in coherence, as per human evaluation.
Fair Division with Allocator's Preference
Authors: Xiaolin Bu, Zihao Li, Shengxin Liu, Jiaxin Song, Biaoshuai Tao
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2310.03475
Pdf link: https://arxiv.org/pdf/2310.03475
Abstract We consider the fair allocation problem of indivisible items. Most previous work focuses on fairness and/or efficiency among agents given agents' preferences. However, besides the agents, the allocator as the resource owner may also be involved in many real-world scenarios, e.g., heritage division. The allocator has the inclination to obtain a fair or efficient allocation based on her own preference over the items and to whom each item is allocated. In this paper, we propose a new model and focus on the following two problems: 1) Is it possible to find an allocation that is fair for both the agents and the allocator? 2) What is the complexity of maximizing the allocator's social welfare while satisfying the agents' fairness? We consider the two fundamental fairness criteria: envy-freeness and proportionality. For the first problem, we study the existence of an allocation that is envy-free up to $c$ goods (EF-$c$) or proportional up to $c$ goods (PROP-$c$) from both the agents' and the allocator's perspectives, in which such an allocation is called doubly EF-$c$ or doubly PROP-$c$ respectively. When the allocator's utility depends exclusively on the items (but not to whom an item is allocated), we prove that a doubly EF-$1$ allocation always exists. For the general setting where the allocator has a preference over the items and to whom each item is allocated, we prove that a doubly EF-$1$ allocation always exists for two agents, a doubly PROP-$2$ allocation always exists for binary valuations, and a doubly PROP-$O(\log n)$ allocation always exists in general. For the second problem, we provide various (in)approximability results in which the gaps between approximation and inapproximation ratios are asymptotically closed under most settings. Most results are based on novel technical tools including the chromatic numbers of the Kneser graphs and linear programming-based analysis.
Supervising Smart Home Device Interactions: A Profile-Based Firewall Approach
Authors: François De Keersmaeker, Ramin Sadre, Cristel Pelsser
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.03510
Pdf link: https://arxiv.org/pdf/2310.03510
Abstract Internet of Things devices can now be found everywhere, including in our households in the form of Smart Home networks. Despite their ubiquity, their security is unsatisfactory, as demonstrated by recent attacks. The IETF's MUD standard has as goal to simplify and automate the secure deployment of end devices in networks. A MUD file contains a device specific description of allowed network activities (e.g., allowed IP ports or host addresses) and can be used to configure for example a firewall. A major weakness of MUD is that it is not expressive enough to describe traffic patterns representing device interactions, which often occur in modern Smart Home platforms. In this article, we present a new language for describing such traffic patterns. The language allows writing device profiles that are more expressive than MUD files and take into account the interdependencies of traffic connections. We show how these profiles can be translated to efficient code for a lightweight firewall leveraging NFTables to block non-conforming traffic. We evaluate our approach on traffic generated by various Smart Home devices, and show that our system can accurately block unwanted traffic while inducing negligible latency.
High-dimensional Bayesian Optimization with Group Testing
Authors: Erik Orm Hellsten, Carl Hvarfner, Leonard Papenmeier, Luigi Nardi
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.03515
Pdf link: https://arxiv.org/pdf/2310.03515
Abstract Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
Large Language Models for Software Engineering: Survey and Open Problems
Authors: Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, Jie M. Zhang
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2310.03533
Pdf link: https://arxiv.org/pdf/2310.03533
Abstract This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE.
Reverse-Mode AD of Reduce-by-Index and Scan in Futhark
Authors: Lotte Maria Bruun, Ulrik Stuhr Larsen, Nikolaj Hinnerskov, Cosmin Oancea
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2310.03568
Pdf link: https://arxiv.org/pdf/2310.03568
Abstract We present and evaluate the Futhark implementation of reverse-mode automatic differentiation (AD) for the basic blocks of parallel programming: reduce, prefix sum (scan), and reduce by index. We first present derivations of general-case algorithms and then discuss several specializations that result in efficient differentiation of most cases of practical interest. We report an experiment that evaluates the performance of the differentiated code in the context of GPU execution and highlights the impact of the proposed specializations as well as the strengths and weaknesses of differentiating at high level vs. low level (i.e., ``differentiating the memory'').
Liquid Cooling System for a High Power, Medium Frequency, and Medium Voltage Isolated Power Converter
Authors: Hooman Taghavi, Ahmad El Shafei, Adel Nasiri
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.03577
Pdf link: https://arxiv.org/pdf/2310.03577
Abstract Power electronics systems, widely used in various applications such as industrial automation, electric cars, and renewable energy, have the primary function of converting and controlling electrical power to the desired type of load. Despite their reliability and efficiency, power losses in these systems generate significant heat that must be dissipated to maintain performance and prevent damage. Cooling systems play a crucial role in ensuring safe operating temperatures for system components. Air and liquid cooling are the leading technologies used in the power electronics world. Air cooling is simple and cost-effective but is limited by ambient temperature and component thermal resistance. While more efficient, liquid cooling requires more maintenance and has higher upfront costs. Water-cooling systems have become famous for regulating thermal loads as they can effectively remove heat from localized high-temperature areas, such as the challenging hotspots in power electronics systems. In addition to designing a cooling system for a power electronic system, this study investigated the impact of three major parameters; cold plate material, channel shape/size, and coolant inlet velocity. The research examined and analyzed these factors and their trade-off analysis to obtain cooling system design and optimization insights. This study might improve power electronics system performance, reliability, and durability by improving heat dissipation and thermal management.
Smoothing Methods for Automatic Differentiation Across Conditional Branches
Authors: Justin N. Kreikemeyer, Philipp Andelfinger
Subjects: Machine Learning (cs.LG); Mathematical Software (cs.MS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.03585
Pdf link: https://arxiv.org/pdf/2310.03585
Abstract Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.
Solving a Class of Non-Convex Minimax Optimization in Federated Learning
Authors: Xidong Wu, Jianhui Sun, Zhengmian Hu, Aidong Zhang, Heng Huang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03613
Pdf link: https://arxiv.org/pdf/2310.03613
Abstract The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (\emph{i.e.} single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to $O(\varepsilon^{-6})$. Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$. FedSGDA-M is the first algorithm to match the best sample complexity $O(\varepsilon^{-3})$ achieved by the single-machine method under the nonconvex-strongly-concave setting. Extensive experimental results on fair classification and AUROC maximization show the efficiency of our algorithms.
Animatable Virtual Humans: Learning pose-dependent human representations in UV space for interactive performance synthesis
Authors: Wieland Morgenstern, Milena T. Bagdasarian, Anna Hilsmann, Peter Eisert
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2310.03615
Pdf link: https://arxiv.org/pdf/2310.03615
Abstract We propose a novel representation of virtual humans for highly realistic real-time animation and rendering in 3D applications. We learn pose dependent appearance and geometry from highly accurate dynamic mesh sequences obtained from state-of-the-art multiview-video reconstruction. Learning pose-dependent appearance and geometry from mesh sequences poses significant challenges, as it requires the network to learn the intricate shape and articulated motion of a human body. However, statistical body models like SMPL provide valuable a-priori knowledge which we leverage in order to constrain the dimension of the search space enabling more efficient and targeted learning and define pose-dependency. Instead of directly learning absolute pose-dependent geometry, we learn the difference between the observed geometry and the fitted SMPL model. This allows us to encode both pose-dependent appearance and geometry in the consistent UV space of the SMPL model. This approach not only ensures a high level of realism but also facilitates streamlined processing and rendering of virtual humans in real-time scenarios.
RouteKG: A knowledge graph-based framework for route prediction on road networks
Authors: Yihong Tang, Weipeng Deng, Shuyu Lei, Yuebing Liang, Zhenliang Ma, Zhan Zhao
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2310.03617
Pdf link: https://arxiv.org/pdf/2310.03617
Abstract Short-term route prediction on road networks allows us to anticipate the future trajectories of road users, enabling a plethora of intelligent transportation applications such as dynamic traffic control or personalized route recommendation. Despite the recent advances in this area, existing methods focus primarily on learning sequential patterns, neglecting the inherent spatial structure in road networks that can affect human routing decisions. To fill the gap, this paper introduces RouteKG, a novel Knowledge Graph-based framework for route prediction. Specifically, we construct a Knowledge Graph on the road network, thereby learning and leveraging spatial relations, especially moving directions, which are crucial for human navigation. Moreover, an n-ary tree-based algorithm is introduced to efficiently generate top-K routes in a batch mode, enhancing scalability and computational efficiency. To further optimize the prediction performance, a rank refinement module is incorporated to fine-tune the candidate route rankings. The model performance is evaluated using two real-world vehicle trajectory datasets from two Chinese cities, Chengdu and Shanghai, under various practical scenarios. The results demonstrate a significant improvement in accuracy over baseline methods, with an average increase of 6.2%, 7.8%, and 6.1% in top-1, 5, and 10 routes predictions, respectively. We further validate our model through a case study that utilizes the pretrained model as a simulator for real-time traffic flow estimation at the link level. The proposed RouteKG promises wide-ranging applications in vehicle navigation, traffic management, and other intelligent transportation tasks.
Distributional PAC-Learning from Nisan's Natural Proofs
Authors: Ari Karchmer
Subjects: Computational Complexity (cs.CC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03641
Pdf link: https://arxiv.org/pdf/2310.03641
Abstract (Abridged) Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for \Lambda imply efficient algorithms for learning \Lambda-circuits, but only over the uniform distribution, with membership queries, and provided \AC^0[p] \subseteq \Lambda. We consider whether this implication can be generalized to \Lambda \not\supseteq \AC^0[p], and to learning algorithms in Valiant's PAC model, which use only random examples and learn over arbitrary example distributions. We give results of both positive and negative flavor. On the negative side, we observe that if, for every circuit class \Lambda, the implication from natural proofs for \Lambda to learning \Lambda-circuits in Valiant's PAC model holds, then there is a polynomial time solution to O(n^{1.5})-uSVP (unique Shortest Vector Problem), and polynomial time quantum solutions to O(n^{1.5})-SVP (Shortest Vector Problem) and O(n^{1.5})-SIVP (Shortest Independent Vector Problem). This indicates that whether natural proofs for \Lambda imply efficient learning algorithms for \Lambda in Valiant's PAC model may depend on \Lambda. On the positive side, our main result is that specific natural proofs arising from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits) imply PAC-learning algorithms in a new distributional variant of Valiant's model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being boosting-friendly. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.
Deep surrogate model for learning Green's function associated with linear reaction-diffusion operator
Authors: Junqing Ji, Lili Ju, Xiaoping Zhang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.03642
Pdf link: https://arxiv.org/pdf/2310.03642
Abstract In this paper, we present a deep surrogate model for learning the Green's function associated with the reaction-diffusion operator in rectangular domain. The U-Net architecture is utilized to effectively capture the mapping from source to solution of the target partial differential equations (PDEs). To enable efficient training of the model without relying on labeled data, we propose a novel loss function that draws inspiration from traditional numerical methods used for solving PDEs. Furthermore, a hard encoding mechanism is employed to ensure that the predicted Green's function is perfectly matched with the boundary conditions. Based on the learned Green's function from the trained deep surrogate model, a fast solver is developed to solve the corresponding PDEs with different sources and boundary conditions. Various numerical examples are also provided to demonstrate the effectiveness of the proposed model.
Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning
Authors: Yang Liu, Chen Chen, Can Wang, Xulin King, Mengyuan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03670
Pdf link: https://arxiv.org/pdf/2310.03670
Abstract Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for both 2D and 3D computer vision. Nevertheless, existing MAE-based methods still have certain drawbacks. Firstly, the functional decoupling between the encoder and decoder is incomplete, which limits the encoder's representation learning ability. Secondly, downstream tasks solely utilize the encoder, failing to fully leverage the knowledge acquired through the encoder-decoder architecture in the pre-text task. In this paper, we propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning. The proposed method decouples functions between the decoder and the encoder by introducing a mask regressor, which predicts the masked patch representation from the visible patch representation encoded by the encoder and the decoder reconstructs the target from the predicted masked patch representation. By doing so, we minimize the impact of decoder updates on the representation space of the encoder. Moreover, we introduce an alignment constraint to ensure that the representations for masked patches, predicted from the encoded representations of visible patches, are aligned with the masked patch presentations computed from the encoder. To make full use of the knowledge learned in the pre-training stage, we design a new finetune mode for the proposed Point-RAE. Extensive experiments demonstrate that our approach is efficient during pre-training and generalizes well on various downstream tasks. Specifically, our pre-trained models achieve a high accuracy of \textbf{90.28\%} on the ScanObjectNN hardest split and \textbf{94.1\%} accuracy on ModelNet40, surpassing all the other self-supervised learning methods. Our code and pretrained model are public available at: \url{https://github.com/liuyyy111/Point-RAE}.
PV-OSIMr: A Lowest Order Complexity Algorithm for Computing the Delassus Matrix
Authors: Ajay Suresha Sathya, Wilm Decre, Jan Swevers
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.03676
Pdf link: https://arxiv.org/pdf/2310.03676
Abstract We present PV-OSIMr, an efficient algorithm for computing the Delassus matrix (also known as the inverse operational space inertia matrix) for a kinematic tree, with the lowest order computational complexity known in literature. PV-OSIMr is derived by optimizing the Popov-Vereshchagin (PV) solver computations using the compositionality of the force and motion propagators. It has a computational complexity of O(n + m^2 ) compared to O(n + m^2d) of the original PV-OSIM algorithm and O(n+md+m^2 ) of the extended force propagator algorithm (EFPA), where n is the number of joints, m is the number of constraints and d is the depth of the kinematic tree. Since Delassus matrix computation requires constructing an m x m sized matrix and must consider all the n joints at least once, the asymptotic computational complexity of PV-OSIMr is optimal. We further benchmark our algorithm and find it to be often more efficient than the PV-OSIM and EFPA in practice.
Multimarginal generative modeling with stochastic interpolants
Authors: Michael S. Albergo, Nicholas M. Boffi, Michael Lindsey, Eric Vanden-Eijnden
Subjects: Machine Learning (cs.LG); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2310.03695
Pdf link: https://arxiv.org/pdf/2310.03695
Abstract Given a set of $K$ probability densities, we consider the multimarginal generative modeling problem of learning a joint distribution that recovers these densities as marginals. The structure of this joint distribution should identify multi-way correspondences among the prescribed marginals. We formalize an approach to this task within a generalization of the stochastic interpolant framework, leading to efficient learning algorithms built upon dynamical transport of measure. Our generative models are defined by velocity and score fields that can be characterized as the minimizers of simple quadratic objectives, and they are defined on a simplex that generalizes the time variable in the usual dynamical transport framework. The resulting transport on the simplex is influenced by all marginals, and we show that multi-way correspondences can be extracted. The identification of such correspondences has applications to style transfer, algorithmic fairness, and data decorruption. In addition, the multimarginal perspective enables an efficient algorithm for reducing the dynamical transport cost in the ordinary two-marginal setting. We demonstrate these capacities with several numerical examples.
Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization
Authors: Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03708
Pdf link: https://arxiv.org/pdf/2310.03708
Abstract Language models (LMs), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences. Recent approaches therefore opt for customization by collecting multi-dimensional feedback and creating distinct rewards for each dimension (e.g., helpfulness, harmlessness, honesty). LMs can then be tailored to different preferences using multi-objective RL (MORL) with different reward weightings. Yet, RL fine-tuning is unstable and resource-heavy, especially for MORLHF with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO trains different LMs to represent different collective reward models that combine all objectives with specific weightings. With a simple cross-entropy loss, the LMs optimized against the MODPO objective are analytically the exact solutions of the original MORLHF objective. Empirical results in safety alignment and long-form question answering confirm that MODPO matches or outperforms existing methods, efficiently producing a Pareto-optimal set of LMs that cater to diverse preferences with 3 times less computational resources compared with MORLHF.
Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
Authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03718
Pdf link: https://arxiv.org/pdf/2310.03718
Abstract Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements: training efficiency and zero-shot adaptation capability. To address them, we introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules: (1) Versatile Value Estimation (VVE) for approximating value functions under unseen threshold conditions, and (2) Conditioned Variational Inference (CVI) for encoding arbitrary constraint thresholds during policy optimization. Our extensive experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance while preserving zero-shot adaptation capabilities to different constraint thresholds data-efficiently. This makes our approach suitable for real-world dynamic applications.
Improved Baselines with Visual Instruction Tuning
Authors: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03744
Pdf link: https://arxiv.org/pdf/2310.03744
Abstract Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~1 day on a single 8-A100 node. We hope this can make state-of-the-art LMM research more accessible. Code and model will be publicly available.
Keyword: faster

Physics-Informed Neural Networks for Accelerating Power System State Estimation
Authors: Solon Falas, Markos Asprou, Charalambos Konstantinou, Maria K. Michael
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.03088
Pdf link: https://arxiv.org/pdf/2310.03088
Abstract State estimation is the cornerstone of the power system control center since it provides the operating condition of the system in consecutive time intervals. This work investigates the application of physics-informed neural networks (PINNs) for accelerating power systems state estimation in monitoring the operation of power systems. Traditional state estimation techniques often rely on iterative algorithms that can be computationally intensive, particularly for large-scale power systems. In this paper, a novel approach that leverages the inherent physical knowledge of power systems through the integration of PINNs is proposed. By incorporating physical laws as prior knowledge, the proposed method significantly reduces the computational complexity associated with state estimation while maintaining high accuracy. The proposed method achieves up to 11% increase in accuracy, 75% reduction in standard deviation of results, and 30% faster convergence, as demonstrated by comprehensive experiments on the IEEE 14-bus system.
Speech-Based Human-Exoskeleton Interaction for Lower Limb Motion Planning
Authors: Eddie Guo, Christopher Perlette, Mojtaba Sharifi, Lukas Grasse, Matthew Tata, Vivian K. Mushahwar, Mahdi Tavakoli
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2310.03137
Pdf link: https://arxiv.org/pdf/2310.03137
Abstract This study presents a speech-based motion planning strategy (SBMP) developed for lower limb exoskeletons to facilitate safe and compliant human-robot interaction. A speech processing system, finite state machine, and central pattern generator are the building blocks of the proposed strategy for online planning of the exoskeleton's trajectory. According to experimental evaluations, this speech-processing system achieved low levels of word and intent errors. Regarding locomotion, the completion time for users with voice commands was 54% faster than that using a mobile app interface. With the proposed SBMP, users are able to maintain their postural stability with both hands-free. This supports its use as an effective motion planning method for the assistance and rehabilitation of individuals with lower-limb impairments.
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly
Authors: Herbert Woisetschläger, Alexander Isenko, Shiqiang Wang, Ruben Mayer, Hans-Arno Jacobsen
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2310.03150
Pdf link: https://arxiv.org/pdf/2310.03150
Abstract Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent
Authors: Ziyao Wang, Jianyu Wang, Ang Li
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.03156
Pdf link: https://arxiv.org/pdf/2310.03156
Abstract The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.
Neural architecture impact on identifying temporally extended Reinforcement Learning tasks
Authors: Victor Vadakechirayath George
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03161
Pdf link: https://arxiv.org/pdf/2310.03161
Abstract Inspired by recent developments in attention models for image classification and natural language processing, we present various Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari-2600 game suite. In spite of the recent success of Deep Reinforcement learning techniques in various fields like robotics, gaming and healthcare, they suffer from a major drawback that neural networks are difficult to interpret. We try to get around this problem with the help of Attention based models. In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions and easier interpretation of logic behind the chosen actions. Our models in addition to playing well on gym-Atari environments, also provide insights on how agent perceives its environment. In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too. Compared to previous works in Vision Transformer, our model is faster to train and requires fewer computational resources. 3
PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks
Authors: Samaneh Javadinia, Amirali Baniasadi
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03212
Pdf link: https://arxiv.org/pdf/2310.03212
Abstract Convolutional Neural Networks (CNNs) have produced state-of-the-art results for image classification tasks. However, they are limited in their ability to handle rotational and viewpoint variations due to information loss in max-pooling layers. Capsule Networks (CapsNets) employ a computationally-expensive iterative process referred to as dynamic routing to address these issues. CapsNets, however, often fall short on complex datasets and require more computational resources than CNNs. To overcome these challenges, we introduce the Parallel Dynamic Routing CapsNet (PDR-CapsNet), a deeper and more energy-efficient alternative to CapsNet that offers superior performance, less energy consumption, and lower overfitting rates. By leveraging a parallelization strategy, PDR-CapsNet mitigates the computational complexity of CapsNet and increases throughput, efficiently using hardware resources. As a result, we achieve 83.55\% accuracy while requiring 87.26\% fewer parameters, 32.27\% and 47.40\% fewer MACs, and Flops, achieving 3x faster inference and 7.29J less energy consumption on a 2080Ti GPU with 11GB VRAM compared to CapsNet and for the CIFAR-10 dataset.
TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design
Authors: Tony Shen, Mohit Pandey, Martin Ester
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03223
Pdf link: https://arxiv.org/pdf/2310.03223
Abstract We seek to automate the generation of drug-like compounds conditioned to specific protein pocket targets. Most current methods approximate the protein-molecule distribution of a finite dataset and, therefore struggle to generate molecules with significant binding improvement over the training dataset. We instead frame the pocket-conditioned molecular generation task as an RL problem and develop TacoGFN, a target conditional Generative Flow Network model. Our method is explicitly encouraged to generate molecules with desired properties as opposed to fitting on a pre-existing data distribution. To this end, we develop transformer-based docking score prediction to speed up docking score computation and propose TacoGFN to explore molecule space efficiently. Furthermore, we incorporate several rounds of active learning where generated samples are queried using a docking oracle to improve the docking score prediction. This approach allows us to accurately explore as much of the molecule landscape as we can afford computationally. Empirically, molecules generated using TacoGFN and its variants significantly outperform all baseline methods across every property (Docking score, QED, SA, Lipinski), while being orders of magnitude faster.
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
Authors: Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03270
Pdf link: https://arxiv.org/pdf/2310.03270
Abstract Diffusion models have demonstrated remarkable capabilities in image synthesis and related generative tasks. Nevertheless, their practicality for low-latency real-world applications is constrained by substantial computational costs and latency issues. Quantization is a dominant way to compress and accelerate diffusion models, where post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches, each bearing its own properties. While PTQ exhibits efficiency in terms of both time and data usage, it may lead to diminished performance in low bit-width. On the other hand, QAT can alleviate performance degradation but comes with substantial demands on computational and data resources. To capitalize on the advantages while avoiding their respective drawbacks, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. Specifically, we propose a quantization-aware variant of the low-rank adapter (QALoRA) that can be merged with model weights and jointly quantized to low bit-width. The fine-tuning process distills the denoising capabilities of the full-precision model into its quantized counterpart, eliminating the requirement for training data. We also introduce scale-aware optimization and employ temporal learned step-size quantization to further enhance performance. Extensive experimental results demonstrate that our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency. Specifically, there is only a marginal 0.05 sFID increase when quantizing both weights and activations of LDM-4 to 4-bit on ImageNet 256x256. Compared to QAT-based methods, our EfficientDM also boasts a 16.2x faster quantization speed with comparable generation quality.
Generalized Benders Decomposition with Continual Learning for Hybrid Model Predictive Control in Dynamic Environment
Authors: Lin Xuan
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.03344
Pdf link: https://arxiv.org/pdf/2310.03344
Abstract Hybrid model predictive control (MPC) with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contact with the environment. Due to the combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we proposed a hybrid MPC solver based on Generalized Benders Decomposition (GBD) with continual learning. The algorithm accumulates cutting planes from the invariant dual space of the subproblems. After a short cold-start phase, the accumulated cuts provide warm-starts for the new problem instances to increase the solving speed. Despite the randomly changing environment that the control is unprepared for, the solving speed maintains. We verified our solver on controlling a cart-pole system with randomly moving soft contact walls and show that the solving speed is 2-3 times faster than the off-the-shelf solver Gurobi.
DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models
Authors: Damien Masson, Sylvain Malacria, Géry Casiez, Daniel Vogel
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2310.03691
Pdf link: https://arxiv.org/pdf/2310.03691
Abstract We characterize and demonstrate how the principles of direct manipulation can improve interaction with large language models. This includes: continuous representation of generated objects of interest; reuse of prompt syntax in a toolbar of commands; manipulable outputs to compose or control the effect of prompts; and undo mechanisms. This idea is exemplified in DirectGPT, a user interface layer on top of ChatGPT that works by transforming direct manipulation actions to engineered prompts. A study shows participants were 50% faster and relied on 50% fewer and 72% shorter prompts to edit text, code, and vector images compared to baseline ChatGPT. Our work contributes a validated approach to integrate LLMs into traditional software using direct manipulation.
Keyword: mobile

Speech-Based Human-Exoskeleton Interaction for Lower Limb Motion Planning
Authors: Eddie Guo, Christopher Perlette, Mojtaba Sharifi, Lukas Grasse, Matthew Tata, Vivian K. Mushahwar, Mahdi Tavakoli
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2310.03137
Pdf link: https://arxiv.org/pdf/2310.03137
Abstract This study presents a speech-based motion planning strategy (SBMP) developed for lower limb exoskeletons to facilitate safe and compliant human-robot interaction. A speech processing system, finite state machine, and central pattern generator are the building blocks of the proposed strategy for online planning of the exoskeleton's trajectory. According to experimental evaluations, this speech-processing system achieved low levels of word and intent errors. Regarding locomotion, the completion time for users with voice commands was 54% faster than that using a mobile app interface. With the proposed SBMP, users are able to maintain their postural stability with both hands-free. This supports its use as an effective motion planning method for the assistance and rehabilitation of individuals with lower-limb impairments.
Roadmaps with Gaps over Controllers: Achieving Efficiency in Planning under Dynamics
Authors: Aravind Sivaramakrishnan, Noah R. Carver, Sumanth Tangirala, Kostas E. Bekris
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.03239
Pdf link: https://arxiv.org/pdf/2310.03239
Abstract This paper aims to improve the computational efficiency of motion planning for mobile robots with non-trivial dynamics by taking advantage of learned controllers. It adopts a decoupled strategy, where a system-specific controller is first trained offline in an empty environment to deal with the system's dynamics. For an environment, the proposed approach constructs offline a data structure, a "Roadmap with Gaps," to approximately learn how to solve planning queries in this environment using the learned controller. Its nodes correspond to local regions and edges correspond to applications of the learned control policy that approximately connect these regions. Gaps arise due to the controller not perfectly connecting pairs of individual states along edges. Online, given a query, a tree sampling-based motion planner uses the roadmap so that the tree's expansion is informed towards the goal region. The tree expansion selects local subgoals given a wavefront on the roadmap that guides towards the goal. When the controller cannot reach a subgoal region, the planner resorts to random exploration to maintain probabilistic completeness and asymptotic optimality. The experimental evaluation shows that the approach significantly improves the computational efficiency of motion planning on various benchmarks, including physics-based vehicular models on uneven and varying friction terrains as well as a quadrotor under air pressure effects.
Non-coresident family as a driver of migration change in a crisis: The case of the COVID-19 pandemic
Authors: Unchitta Kan, Jericho McLeod, Eduardo López
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2310.03254
Pdf link: https://arxiv.org/pdf/2310.03254
Abstract Changes in U.S. migration trends during the COVID-19 pandemic show that many moved to less populated cities from larger cities, deviating from previous trends. In this study, building on prior work in the literature showing that the abundance of family ties are inversely related to population size, we analyze these migration changes with a focus on the crucial, yet overlooked factor of extended family. Employing two large-scale data sets, census microdata and mobile phone GPS relocation data, we show a collection of empirical results that paint a picture of migration change affected by family. Namely, we establish that people migrated closer to family at higher rates after the COVID-19 pandemic started. Moreover, even controlling for factors such as population density and costs of living, we find that changes in net in-migration tended to be larger and positive in cities with larger proportions of people who can be parents to adult children, our proxy for parental family availability. Our study suggests an underexplored explanation for internal migration patterns during a crisis and advances the demography-disaster nexus.
RadaRays: Real-time Simulation of Rotating FMCW Radar for Mobile Robotics via Hardware-accelerated Ray Tracing
Authors: Alexander Mock, Martin Magnusson, Joachim Hertzberg
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.03505
Pdf link: https://arxiv.org/pdf/2310.03505
Abstract RadaRays allows for the accurate modeling and simulation of rotating FMCW radar sensors in complex environments, including the simulation of reflection, refraction, and scattering of radar waves. Our software is able to handle large numbers of objects and materials, making it suitable for use in a variety of mobile robotics applications. We demonstrate the effectiveness of RadaRays through a series of experiments and show that it can more accurately reproduce the behavior of FMCW radar sensors in a variety of environments, compared to the ray casting-based lidar-like simulations that are commonly used in simulators for autonomous driving such as CARLA. Our experiments additionally serve as valuable reference point for researchers to evaluate their own radar simulations. By using RadaRays, developers can significantly reduce the time and cost associated with prototyping and testing FMCW radar-based algorithms. We also provide a Gazebo plugin that makes our work accessible to the mobile robotics community.
Open RAN for 5G Supply Chain Diversification: The BEACON-5G Approach and Key Achievements
Authors: Adnan Aijaz, Sajida Gufran, Tim Farnham, Sita Chintalapati, Adrián Sánchez-Mompó, Peizheng Li
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.03580
Pdf link: https://arxiv.org/pdf/2310.03580
Abstract Open RAN brings multi-vendor diversity and interoperability to mobile/cellular networks. It is becoming part of governmental strategies for diversifying telecoms supply chains. This paper describes the approach and key achievements of the BEACON-5G project, jointly funded by the UK government and industry. The BEACON-5G project aims at developing a competitive edge for 5G Open RAN and contributing toward its maturity. It addresses some of the key challenges in this respect and provides various innovations for system integration, network slicing, marketplace integration, cyber security, and white-box RAN. It also conducts real-world technology trials for urban use-cases. The paper also captures some of the key lessons learned during delivery, the main outcomes, and highlights potential impact on the wider UK 5G diversification strategy.
Keyword: pruning

Enhancing Accuracy in Deep Learning Using Random Matrix Theory
Authors: Leonid Berlyand, Etienne Sandier, Yitzchak Shmalo, Lei Zhang
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.03165
Pdf link: https://arxiv.org/pdf/2310.03165
Abstract In this study, we explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning to simplify DNN architecture and loss landscape. RMT, recently used to address overfitting in deep learning, enables the examination of DNN's weight layer spectra. We use these techniques to optimally determine the number of singular values to be removed from the weight layers of a DNN during training via singular value decomposition (SVD). This process aids in DNN simplification and accuracy enhancement, as evidenced by training simple DNN models on the MNIST and Fashion MNIST datasets. Our method can be applied to any fully connected or convolutional layer of a pretrained DNN, decreasing the layer's parameters and simplifying the DNN architecture while preserving or even enhancing the model's accuracy. By discarding small singular values based on RMT criteria, the accuracy of the test set remains consistent, facilitating more efficient DNN training without compromising performance. We provide both theoretical and empirical evidence supporting our claim that the elimination of small singular values based on RMT does not negatively impact the DNN's accuracy. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
StegGuard: Fingerprinting Self-supervised Pre-trained Encoders via Secrets Embeder and Extractor
Authors: Xingdong Ren, Tianxing Zhang, Hanzhou Wu, Xinpeng Zhang, Yinggui Wang, Guangling Sun
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.03380
Pdf link: https://arxiv.org/pdf/2310.03380
Abstract In this work, we propose StegGuard, a novel fingerprinting mechanism to verify the ownership of the suspect pre-trained encoder using steganography. A critical perspective in StegGuard is that the unique characteristic of the transformation from an image to an embedding, conducted by the pre-trained encoder, can be equivalently exposed how an embeder embeds secrets into images and how an extractor extracts the secrets from encoder's embeddings with a tolerable error after the secrets are subjected to the encoder's transformation. While each independent encoder has a distinct transformation, the piracy encoder has a similar transformation to the victim. Based on these, we learn a pair of secrets embeder and extractor as the fingerprint for the victim encoder. We introduce a frequency-domain channel attention embedding block into the embeder to adaptively embed secrets into suitable frequency bands. During verification, if the secrets embedded into the query images can be extracted with an acceptable error from the suspect encoder's embeddings, the suspect encoder is determined as piracy, otherwise independent. Extensive experiments demonstrate that depending on a very limited number of query images, StegGuard can reliably identify across varied independent encoders, and is robust against model stealing related attacks including model extraction, fine-tuning, pruning, embedding noising and shuffle.
Neural Language Model Pruning for Automatic Speech Recognition
Authors: Leonardo Emili, Thiago Fraga-Silva, Ernest Pusateri, Markus Nußbaum-Thom, Youssef Oualil
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.03424
Pdf link: https://arxiv.org/pdf/2310.03424
Abstract We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and inference speed-up for moderate compression.
Keyword: diffusion

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC
Authors: Peiyu Yu, Yaxuan Zhu, Sirui Xie, Xiaojian Ma, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.03218
Pdf link: https://arxiv.org/pdf/2310.03218
Abstract Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
Authors: Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03270
Pdf link: https://arxiv.org/pdf/2310.03270
Abstract Diffusion models have demonstrated remarkable capabilities in image synthesis and related generative tasks. Nevertheless, their practicality for low-latency real-world applications is constrained by substantial computational costs and latency issues. Quantization is a dominant way to compress and accelerate diffusion models, where post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches, each bearing its own properties. While PTQ exhibits efficiency in terms of both time and data usage, it may lead to diminished performance in low bit-width. On the other hand, QAT can alleviate performance degradation but comes with substantial demands on computational and data resources. To capitalize on the advantages while avoiding their respective drawbacks, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. Specifically, we propose a quantization-aware variant of the low-rank adapter (QALoRA) that can be merged with model weights and jointly quantized to low bit-width. The fine-tuning process distills the denoising capabilities of the full-precision model into its quantized counterpart, eliminating the requirement for training data. We also introduce scale-aware optimization and employ temporal learned step-size quantization to further enhance performance. Extensive experimental results demonstrate that our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency. Specifically, there is only a marginal 0.05 sFID increase when quantizing both weights and activations of LDM-4 to 4-bit on ImageNet 256x256. Compared to QAT-based methods, our EfficientDM also boasts a 16.2x faster quantization speed with comparable generation quality.
Denoising Diffusion Step-aware Models
Authors: Shuai Yang, Yukang Chen, Luozhou Wang, Shu Liu, Yingcong Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03337
Pdf link: https://arxiv.org/pdf/2310.03337
Abstract Denoising Diffusion Probabilistic Models (DDPMs) have garnered popularity for data generation across various domains. However, a significant bottleneck is the necessity for whole-network computation during every step of the generative process, leading to high computational overheads. This paper presents a novel framework, Denoising Diffusion Step-aware Models (DDSM), to address this challenge. Unlike conventional approaches, DDSM employs a spectrum of neural networks whose sizes are adapted according to the importance of each generative step, as determined through evolutionary search. This step-wise network variation effectively circumvents redundant computational efforts, particularly in less critical steps, thereby enhancing the efficiency of the diffusion model. Furthermore, the step-aware design can be seamlessly integrated with other efficiency-geared diffusion models such as DDIMs and latent diffusion, thus broadening the scope of computational savings. Empirical evaluations demonstrate that DDSM achieves computational savings of 49% for CIFAR-10, 61% for CelebA-HQ, 59% for LSUN-bedroom, 71% for AFHQ, and 76% for ImageNet, all without compromising the generation quality. Our code and models will be publicly available.
Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Authors: Jinting Wang, Li Liu, Jun Wang, Hei Victor Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03363
Pdf link: https://arxiv.org/pdf/2310.03363
Abstract Speech-to-face generation is an intriguing area of research that focuses on generating realistic facial images based on a speaker's audio speech. However, state-of-the-art methods employing GAN-based architectures lack stability and cannot generate realistic face images. To fill this gap, we propose a novel speech-to-face generation framework, which leverages a Speech-Conditioned Latent Diffusion Model, called SCLDM. To the best of our knowledge, this is the first work to harness the exceptional modeling capabilities of diffusion models for speech-to-face generation. Preserving the shared identity information between speech and face is crucial in generating realistic results. Therefore, we employ contrastive pre-training for both the speech encoder and the face encoder. This pre-training strategy facilitates effective alignment between the attributes of speech, such as age and gender, and the corresponding facial characteristics in the face images. Furthermore, we tackle the challenge posed by excessive diversity in the synthesis process caused by the diffusion model. To overcome this challenge, we introduce the concept of residuals by integrating a statistical face prior to the diffusion process. This addition helps to eliminate the shared component across the faces and enhances the subtle variations captured by the speech condition. Extensive quantitative, qualitative, and user study experiments demonstrate that our method can produce more realistic face images while preserving the identity of the speaker better than state-of-the-art methods. Highlighting the notable enhancements, our method demonstrates significant gains in all metrics on the AVSpeech dataset and Voxceleb dataset, particularly noteworthy are the improvements of 32.17 and 32.72 on the cosine distance metric for the two datasets, respectively.
ACT-Net: Anchor-context Action Detection in Surgery Videos
Authors: Luoying Hao, Yan Hu, Wenjun Lin, Qun Wang, Heng Li, Huazhu Fu, Jinming Duan, Jiang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03377
Pdf link: https://arxiv.org/pdf/2310.03377
Abstract Recognition and localization of surgical detailed actions is an essential component of developing a context-aware decision support system. However, most existing detection algorithms fail to provide high-accuracy action classes even having their locations, as they do not consider the surgery procedure's regularity in the whole video. This limitation hinders their application. Moreover, implementing the predictions in clinical applications seriously needs to convey model confidence to earn entrustment, which is unexplored in surgical action prediction. In this paper, to accurately detect fine-grained actions that happen at every moment, we propose an anchor-context action detection network (ACTNet), including an anchor-context detection (ACD) module and a class conditional diffusion (CCD) module, to answer the following questions: 1) where the actions happen; 2) what actions are; 3) how confidence predictions are. Specifically, the proposed ACD module spatially and temporally highlights the regions interacting with the extracted anchor in surgery video, which outputs action location and its class distribution based on anchor-context interactions. Considering the full distribution of action classes in videos, the CCD module adopts a denoising diffusion-based generative model conditioned on our ACD estimator to further reconstruct accurately the action predictions. Moreover, we utilize the stochastic nature of the diffusion model outputs to access model confidence for each prediction. Our method reports the state-of-the-art performance, with improvements of 4.0% mAP against baseline on the surgical video dataset.
FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators
Authors: Haiping Wang, Yuan Liu, Bing Wang, Yujing Sun, Zhen Dong, Wenping Wang, Bisheng Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03420
Pdf link: https://arxiv.org/pdf/2310.03420
Abstract Matching cross-modality features between images and point clouds is a fundamental problem for image-to-point cloud registration. However, due to the modality difference between images and points, it is difficult to learn robust and discriminative cross-modality features by existing metric learning methods for feature matching. Instead of applying metric learning on cross-modality data, we propose to unify the modality between images and point clouds by pretrained large-scale models first, and then establish robust correspondence within the same modality. We show that the intermediate features, called diffusion features, extracted by depth-to-image diffusion models are semantically consistent between images and point clouds, which enables the building of coarse but robust cross-modality correspondences. We further extract geometric features on depth maps produced by the monocular depth estimator. By matching such geometric features, we significantly improve the accuracy of the coarse correspondences produced by diffusion features. Extensive experiments demonstrate that without any task-specific training, direct utilization of both features produces accurate image-to-point cloud registration. On three public indoor and outdoor benchmarks, the proposed method averagely achieves a 20.6 percent improvement in Inlier Ratio, a three-fold higher Inlier Number, and a 48.6 percent improvement in Registration Recall than existing state-of-the-arts.
Deep Generative Models of Music Expectation
Authors: Ninon Lizé Masclef, T. Anderson Keller
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2310.03500
Pdf link: https://arxiv.org/pdf/2310.03500
Abstract A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation of song (or note-by-note) probabilities, conditioned on a 'training set' of prior musical or cultural experiences. To date, however, these models have been limited to compute exact probabilities through hand-crafted features or restricted to linear models which are likely not sufficient to represent the complex conditional distributions present in music. In this work, we propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence. Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself. In doing so, we expect to find that such models are able to more accurately represent the 'surprisal' of music for human listeners. From the literature, it is known that there is an inverted U-shaped relationship between surprisal and the amount human subjects 'like' a given song. In this work we show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings, and that the quality of this relationship is competitive with state of the art methods such as IDyOM. We therefore present this model a preliminary step in developing modern deep generative models of music expectation and subjective likability.
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Authors: Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, Denis Dimitrov
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03502
Pdf link: https://arxiv.org/pdf/2310.03502
Abstract Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel exploration of latent diffusion architecture, combining the principles of the image prior models with latent diffusion techniques. The image prior model is trained separately to map text embeddings to image embeddings of CLIP. Another distinct feature of the proposed model is the modified MoVQ implementation, which serves as the image autoencoder component. Overall, the designed model contains 3.3B parameters. We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting. Additionally, we released the source code and checkpoints for the Kandinsky models. Experimental evaluations demonstrate a FID score of 8.03 on the COCO-30K dataset, marking our model as the top open-source performer in terms of measurable image generation quality.
Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints
Authors: Chuan Fang, Xiaotao Hu, Kunming Luo, Ping Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03602
Pdf link: https://arxiv.org/pdf/2310.03602
Abstract Text-driven 3D indoor scene generation could be useful for gaming, film industry, and AR/VR applications. However, existing methods cannot faithfully capture the room layout, nor do they allow flexible editing of individual objects in the room. To address these problems, we present Ctrl-Room, which is able to generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a text prompt. Moreover, Ctrl-Room enables versatile interactive editing operations such as resizing or moving individual furniture items. Our key insight is to separate the modeling of layouts and appearance. %how to model the room that takes into account both scene texture and geometry at the same time. To this end, Our proposed method consists of two stages, a Layout Generation Stage' and anAppearance Generation Stage'. The Layout Generation Stage' trains a text-conditional diffusion model to learn the layout distribution with our holistic scene code parameterization. Next, theAppearance Generation Stage' employs a fine-tuned ControlNet to produce a vivid panoramic image of the room guided by the 3D scene layout and text prompt. In this way, we achieve a high-quality 3D room with convincing layouts and lively textures. Benefiting from the scene code parameterization, we can easily edit the generated room model through our mask-guided editing module, without expensive editing-specific training. Extensive experiments on the Structured3D dataset demonstrate that our method outperforms existing methods in producing more reasonable, view-consistent, and editable 3D rooms from natural language prompts.
Deep surrogate model for learning Green's function associated with linear reaction-diffusion operator
Authors: Junqing Ji, Lili Ju, Xiaoping Zhang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.03642
Pdf link: https://arxiv.org/pdf/2310.03642
Abstract In this paper, we present a deep surrogate model for learning the Green's function associated with the reaction-diffusion operator in rectangular domain. The U-Net architecture is utilized to effectively capture the mapping from source to solution of the target partial differential equations (PDEs). To enable efficient training of the model without relying on labeled data, we propose a novel loss function that draws inspiration from traditional numerical methods used for solving PDEs. Furthermore, a hard encoding mechanism is employed to ensure that the predicted Green's function is perfectly matched with the boundary conditions. Based on the learned Green's function from the trained deep surrogate model, a fast solver is developed to solve the corresponding PDEs with different sources and boundary conditions. Various numerical examples are also provided to demonstrate the effectiveness of the proposed model.
Stochastic interpolants with data-dependent couplings
Authors: Michael S. Albergo, Mark Goldstein, Nicholas M. Boffi, Rajesh Ranganath, Eric Vanden-Eijnden
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.03725
Pdf link: https://arxiv.org/pdf/2310.03725
Abstract Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Authors: Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.03739
Pdf link: https://arxiv.org/pdf/2310.03739
Abstract Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest. Code and Visualization results are available at https://align-prop.github.io/.
Keyword: adaptive

An adaptive stabilized trace finite element method for surface PDEs
Authors: Timo Heister, Maxim A. Olshanskii, Vladimir Yushutin
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.03089
Pdf link: https://arxiv.org/pdf/2310.03089
Abstract The paper introduces an adaptive version of the stabilized Trace Finite Element Method (TraceFEM) designed to solve low-regularity elliptic problems on level-set surfaces using a shape-regular bulk mesh in the embedding space. Two stabilization variants, gradient-jump face and normal-gradient volume, are considered for continuous trace spaces of the first and second degrees, based on the polynomial families $Q_1$ and $Q_2$. We propose a practical error indicator that estimates the `jumps' of finite element solution derivatives across background mesh faces and it avoids integration of any quantities along implicitly defined curvilinear edges of the discrete surface elements. For the $Q_1$ family of piecewise trilinear polynomials on bulk cells, the solve-estimate-mark-refine strategy, combined with the suggested error indicator, achieves optimal convergence rates typical of two-dimensional problems. We also provide a posteriori error estimates, establishing the reliability of the error indicator for the $Q_1$ and $Q_2$ elements and for two types of stabilization. In numerical experiments, we assess the reliability and efficiency of the error indicator. While both stabilizations are found to deliver comparable performance,the lowest degree finite element space appears to be the more robust choice for the adaptive TraceFEM framework.
Multi-Task Learning For Reduced Popularity Bias In Multi-Territory Video Recommendations
Authors: Phanideep Gampa, Farnoosh Javadi, Belhassen Bayar, Ainur Yessenalina
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03148
Pdf link: https://arxiv.org/pdf/2310.03148
Abstract Various data imbalances that naturally arise in a multi-territory personalized recommender system can lead to a significant item bias for globally prevalent items. A locally popular item can be overshadowed by a globally prevalent item. Moreover, users' viewership patterns/statistics can drastically change from one geographic location to another which may suggest to learn specific user embeddings. In this paper, we propose a multi-task learning (MTL) technique, along with an adaptive upsampling method to reduce popularity bias in multi-territory recommendations. Our proposed framework is designed to enrich training examples with active users representation through upsampling, and capable of learning geographic-based user embeddings by leveraging MTL. Through experiments, we demonstrate the effectiveness of our framework in multiple territories compared to a baseline not incorporating our proposed techniques.~Noticeably, we show improved relative gain of up to $65.27\%$ in PR-AUC metric. A case study is presented to demonstrate the advantages of our methods in attenuating the popularity bias of global items.
Toward One-Second Latency: Evolution of Live Media Streaming
Authors: Abdelhak Bentaleb, May Lim, Mehmet N. Akcay, Ali C. Begen, Sarra Hammoudi, Roger Zimmermann
Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2310.03256
Pdf link: https://arxiv.org/pdf/2310.03256
Abstract This survey presents the evolution of live media streaming and the technological developments behind today's IP-based low-latency live streaming systems. Live streaming primarily involves capturing, encoding, packaging and delivering real-time events such as live sports, live news, personal broadcasts and surveillance videos. Live streaming also involves concurrent streaming of linear TV programming off the satellite, cable, over-the-air or IPTV broadcast, where the programming is not necessarily a real-time event. The survey starts with a discussion on the latency and latency continuum in streaming applications. Then, it lays out the existing live streaming workflows and protocols, followed by an in-depth analysis of the latency sources in these workflows and protocols. The survey continues with the technology enablers, low-latency extensions for the popular HTTP adaptive streaming methods and enhancements for robust low-latency playback. An entire section is dedicated to the detailed summary and findings of Twitch's grand challenge on low-latency live streaming. The survey concludes with a discussion of ongoing research problems in this space.
LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework
Authors: Woojun Kim, Jeonghye Kim, Youngchul Sung
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03342
Pdf link: https://arxiv.org/pdf/2310.03342
Abstract In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic model. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy over time to realize a relevant exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.
Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning
Authors: Zhaorun Chen, Binhao Chen, Tairan He, Liang Gong, Chengliang Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.03379
Pdf link: https://arxiv.org/pdf/2310.03379
Abstract Safety assurance of Reinforcement Learning (RL) is critical for exploration in real-world scenarios. In handling the Constrained Markov Decision Process, current approaches experience intrinsic difficulties in trading-off between optimality and feasibility. Direct optimization methods cannot strictly guarantee state-wise in-training safety while projection-based methods are usually inefficient and correct actions through lengthy iterations. To address these two challenges, this paper proposes an adaptive surrogate chance constraint for the safety cost, and a hierarchical architecture that corrects actions produced by the upper policy layer via a fast Quasi-Newton method. Theoretical analysis indicates that the relaxed probabilistic constraint can sufficiently guarantee forward invariance to the safe set. We validate the proposed method on 4 simulated and real-world safety-critical robotic tasks. Results indicate that the proposed method can efficiently enforce safety (nearly zero-violation), while preserving optimality (+23.8%), robustness and generalizability to stochastic real-world settings.
StegGuard: Fingerprinting Self-supervised Pre-trained Encoders via Secrets Embeder and Extractor
Authors: Xingdong Ren, Tianxing Zhang, Hanzhou Wu, Xinpeng Zhang, Yinggui Wang, Guangling Sun
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.03380
Pdf link: https://arxiv.org/pdf/2310.03380
Abstract In this work, we propose StegGuard, a novel fingerprinting mechanism to verify the ownership of the suspect pre-trained encoder using steganography. A critical perspective in StegGuard is that the unique characteristic of the transformation from an image to an embedding, conducted by the pre-trained encoder, can be equivalently exposed how an embeder embeds secrets into images and how an extractor extracts the secrets from encoder's embeddings with a tolerable error after the secrets are subjected to the encoder's transformation. While each independent encoder has a distinct transformation, the piracy encoder has a similar transformation to the victim. Based on these, we learn a pair of secrets embeder and extractor as the fingerprint for the victim encoder. We introduce a frequency-domain channel attention embedding block into the embeder to adaptively embed secrets into suitable frequency bands. During verification, if the secrets embedded into the query images can be extracted with an acceptable error from the suspect encoder's embeddings, the suspect encoder is determined as piracy, otherwise independent. Extensive experiments demonstrate that depending on a very limited number of query images, StegGuard can reliably identify across varied independent encoders, and is robust against model stealing related attacks including model extraction, fine-tuning, pruning, embedding noising and shuffle.
GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks
Authors: Taraneh Younesian, Thiviyan Thanapalasingam, Emile van Krieken, Daniel Daza, Peter Bloem
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03399
Pdf link: https://arxiv.org/pdf/2310.03399
Abstract Graph neural networks (GNNs) learn the representation of nodes in a graph by aggregating the neighborhood information in various ways. As these networks grow in depth, their receptive field grows exponentially due to the increase in neighborhood sizes, resulting in high memory costs. Graph sampling solves memory issues in GNNs by sampling a small ratio of the nodes in the graph. This way, GNNs can scale to much larger graphs. Most sampling methods focus on fixed sampling heuristics, which may not generalize to different structures or tasks. We introduce GRAPES, an adaptive graph sampling method that learns to identify sets of influential nodes for training a GNN classifier. GRAPES uses a GFlowNet to learn node sampling probabilities given the classification objectives. We evaluate GRAPES across several small- and large-scale graph benchmarks and demonstrate its effectiveness in accuracy and scalability. In contrast to existing sampling methods, GRAPES maintains high accuracy even with small sample sizes and, therefore, can scale to very large graphs. Our code is publicly available at https://github.com/dfdazac/grapes.
How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
Authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.03494
Pdf link: https://arxiv.org/pdf/2310.03494
Abstract A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.
Keyword: quantization

QuATON: Quantization Aware Training of Optical Neurons
Authors: Hasindu Kariyawasam, Ramith Hettiarachchi, Dushan Wadduwage
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optics (physics.optics)
Arxiv link: https://arxiv.org/abs/2310.03049
Pdf link: https://arxiv.org/pdf/2310.03049
Abstract Optical neural architectures (ONAs) use coding elements with optimized physical parameters to perform intelligent measurements. However, fabricating ONAs while maintaining design performances is challenging. Limitations in fabrication techniques often limit the realizable precision of the trained parameters. Physical constraints may also limit the range of values the physical parameters can hold. Thus, ONAs should be trained within the implementable constraints. However, such physics-based constraints reduce the training objective to a constrained optimization problem, making it harder to optimize with existing gradient-based methods. To alleviate these critical issues that degrade performance from simulation to realization we propose a physics-informed quantization-aware training framework. Our approach accounts for the physical constraints during the training process, leading to robust designs. We evaluate our approach on an ONA proposed in the literature, named a diffractive deep neural network (D2NN), for all-optical phase imaging and for classification of phase objects. With extensive experiments on different quantization levels and datasets, we show that our approach leads to ONA designs that are robust to quantization noise.
Matrix Completion from One-Bit Dither Samples
Authors: Arian Eamaz, Farhang Yeganegi, Mojtaba Soltanalian
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.03224
Pdf link: https://arxiv.org/pdf/2310.03224
Abstract We explore the impact of coarse quantization on matrix completion in the extreme scenario of dithered one-bit sensing, where the matrix entries are compared with time-varying threshold levels. In particular, instead of observing a subset of high-resolution entries of a low-rank matrix, we have access to a small number of one-bit samples, generated as a result of these comparisons. In order to recover the low-rank matrix using its coarsely quantized known entries, we begin by transforming the problem of one-bit matrix completion (one-bit MC) with time-varying thresholds into a nuclear norm minimization problem. The one-bit sampled information is represented as linear inequality feasibility constraints. We then develop the popular singular value thresholding (SVT) algorithm to accommodate these inequality constraints, resulting in the creation of the One-Bit SVT (OB-SVT). Our findings demonstrate that incorporating multiple time-varying sampling threshold sequences in one-bit MC can significantly improve the performance of the matrix completion algorithm. In pursuit of achieving this objective, we utilize diverse thresholding schemes, namely uniform, Gaussian, and discrete thresholds. To accelerate the convergence of our proposed algorithm, we introduce three variants of the OB-SVT algorithm. Among these variants is the randomized sketched OB-SVT, which departs from using the entire information at each iteration, opting instead to utilize sketched data. This approach effectively reduces the dimension of the operational space and accelerates the convergence. We perform numerical evaluations comparing our proposed algorithm with the maximum likelihood estimation method previously employed for one-bit MC, and demonstrate that our approach can achieve a better recovery performance.
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
Authors: Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03270
Pdf link: https://arxiv.org/pdf/2310.03270
Abstract Diffusion models have demonstrated remarkable capabilities in image synthesis and related generative tasks. Nevertheless, their practicality for low-latency real-world applications is constrained by substantial computational costs and latency issues. Quantization is a dominant way to compress and accelerate diffusion models, where post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches, each bearing its own properties. While PTQ exhibits efficiency in terms of both time and data usage, it may lead to diminished performance in low bit-width. On the other hand, QAT can alleviate performance degradation but comes with substantial demands on computational and data resources. To capitalize on the advantages while avoiding their respective drawbacks, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. Specifically, we propose a quantization-aware variant of the low-rank adapter (QALoRA) that can be merged with model weights and jointly quantized to low bit-width. The fine-tuning process distills the denoising capabilities of the full-precision model into its quantized counterpart, eliminating the requirement for training data. We also introduce scale-aware optimization and employ temporal learned step-size quantization to further enhance performance. Extensive experimental results demonstrate that our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency. Specifically, there is only a marginal 0.05 sFID increase when quantizing both weights and activations of LDM-4 to 4-bit on ImageNet 256x256. Compared to QAT-based methods, our EfficientDM also boasts a 16.2x faster quantization speed with comparable generation quality.
Robustness-Guided Image Synthesis for Data-Free Quantization
Authors: Jianhong Bai, Yuchen Yang, Huanpeng Chu, Hualiang Wang, Zuozhu Liu, Ruizhe Chen, Xiaoxuan He, Lianrui Mu, Chengfei Cai, Haoji Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.03661
Pdf link: https://arxiv.org/pdf/2310.03661
Abstract Quantization has emerged as a promising direction for model compression. Recently, data-free quantization has been widely studied as a promising method to avoid privacy concerns, which synthesizes images as an alternative to real training data. Existing methods use classification loss to ensure the reliability of the synthesized images. Unfortunately, even if these images are well-classified by the pre-trained model, they still suffer from low semantics and homogenization issues. Intuitively, these low-semantic images are sensitive to perturbations, and the pre-trained model tends to have inconsistent output when the generator synthesizes an image with poor semantics. To this end, we propose Robustness-Guided Image Synthesis (RIS), a simple but effective method to enrich the semantics of synthetic images and improve image diversity, further boosting the performance of downstream data-free compression tasks. Concretely, we first introduce perturbations on input and model weight, then define the inconsistency metrics at feature and prediction levels before and after perturbations. On the basis of inconsistency on two levels, we design a robustness optimization objective to enhance the semantics of synthetic images. Moreover, we also make our approach diversity-aware by forcing the generator to synthesize images with small correlations in the label space. With RIS, we achieve state-of-the-art performance for various settings on data-free quantization and can be extended to other data-free compression tasks.
Hadamard Domain Training with Integers for Class Incremental Quantized Learning
Authors: Martin Schiemer, Clemens JS Schaefer, Jayden Parker Vap, Mark James Horeni, Yu Emma Wang, Juan Ye, Siddharth Joshi
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.03675
Pdf link: https://arxiv.org/pdf/2310.03675
Abstract Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge platforms. Reducing computational precision through fully quantized training (FQT) simultaneously reduces memory footprint and increases compute efficiency for both training and inference. However, aggressive quantization especially integer FQT typically degrades model accuracy to unacceptable levels. In this paper, we propose a technique that leverages inexpensive Hadamard transforms to enable low-precision training with only integer matrix multiplications. We further determine which tensors need stochastic rounding and propose tiled matrix multiplication to enable low-bit width accumulators. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.

A-suozhang / GetArxivDaily

New submissions for Fri, 6 Oct 23 #169

Keyword: efficient

A quantum system control method based on enhanced reinforcement learning

A Deep Reinforcement Learning Approach for Interactive Search with Sentence-level Feedback

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising

Privacy-preserving Multi-biometric Indexing based on Frequent Binary Patterns

NOCAP: Near-Optimal Correlation-Aware Partitioning Joins

Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition

Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models

Application-Oriented Co-Design of Motors and Motions for a 6DOF Robot Manipulator

Design and Optimization of Heterogeneous Coded Distributed Computing with Nonuniform File Popularity

New Auction Algorithms for the Assignment Problem and Extensions

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors

Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication

PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks

TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design

History Matching for Geological Carbon Storage using Data-Space Inversion with Spatio-Temporal Data Parameterization

${\tt MORALS}$: Analysis of High-Dimensional Robot Controllers via Topological Tools in a Latent Space

Xcrum: A Synergistic Approach Integrating Extreme Programming with Scrum

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

Network Alignment with Transferable Graph Autoencoders

LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Can pre-trained models assist in dataset distillation?

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning

Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction

BioBridge: Bridging Biomedical Foundation Models via Knowledge Graph

Investigating the Limitation of CLIP Models: The Worst-Performing Categories

CSI: Enhancing the Robustness of 3D Point Cloud Recognition against Corruption

Motivating Next-Generation OS Physical Memory Management for Terabyte-Scale NVMMs

Design Optimizer for Planar Soft-Growing Robot Manipulators

Progressive Adaptive Chance-Constrained Safeguards for Reinforcement Learning

Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Learning to Simplify Spatial-Temporal Graphs in Gait Analysis

IoTScent: Enhancing Forensic Capabilities in Internet of Things Gateways

RUSOpt: Robotic UltraSound Probe Normalization with Bayesian Optimization for In-plane and Out-plane Scanning

Pre-Training and Fine-Tuning Generative Flow Networks

Which mode is better for federated learning? Centralized or Decentralized

Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards

Fair Division with Allocator's Preference

Supervising Smart Home Device Interactions: A Profile-Based Firewall Approach

High-dimensional Bayesian Optimization with Group Testing

Large Language Models for Software Engineering: Survey and Open Problems

Reverse-Mode AD of Reduce-by-Index and Scan in Futhark

Liquid Cooling System for a High Power, Medium Frequency, and Medium Voltage Isolated Power Converter

Smoothing Methods for Automatic Differentiation Across Conditional Branches

Solving a Class of Non-Convex Minimax Optimization in Federated Learning

Animatable Virtual Humans: Learning pose-dependent human representations in UV space for interactive performance synthesis

RouteKG: A knowledge graph-based framework for route prediction on road networks

Distributional PAC-Learning from Nisan's Natural Proofs

Deep surrogate model for learning Green's function associated with linear reaction-diffusion operator

Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning

PV-OSIMr: A Lowest Order Complexity Algorithm for Computing the Delassus Matrix

Multimarginal generative modeling with stochastic interpolants

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization

Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

Improved Baselines with Visual Instruction Tuning

Keyword: faster

Physics-Informed Neural Networks for Accelerating Power System State Estimation

Speech-Based Human-Exoskeleton Interaction for Lower Limb Motion Planning

Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly

FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent

Neural architecture impact on identifying temporally extended Reinforcement Learning tasks

PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks

TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

Generalized Benders Decomposition with Continual Learning for Hybrid Model Predictive Control in Dynamic Environment

DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

Keyword: mobile

Speech-Based Human-Exoskeleton Interaction for Lower Limb Motion Planning

Roadmaps with Gaps over Controllers: Achieving Efficiency in Planning under Dynamics

Non-coresident family as a driver of migration change in a crisis: The case of the COVID-19 pandemic

RadaRays: Real-time Simulation of Rotating FMCW Radar for Mobile Robotics via Hardware-accelerated Ray Tracing

Open RAN for 5G Supply Chain Diversification: The BEACON-5G Approach and Key Achievements

Keyword: pruning

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

StegGuard: Fingerprinting Self-supervised Pre-trained Encoders via Secrets Embeder and Extractor

Neural Language Model Pruning for Automatic Speech Recognition