New submissions for Fri, 4 Aug 23

Keyword: efficient

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Authors: Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2308.01320
Pdf link: https://arxiv.org/pdf/2308.01320
Abstract ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.
Well-posedness and error estimates for coupled systems of nonlocal conservation laws
Authors: Aekta Aggarwal, Helge Holden, Ganesh Vaidya
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2308.01411
Pdf link: https://arxiv.org/pdf/2308.01411
Abstract This article deals with the error estimates for numerical approximations of the entropy solutions of coupled systems of nonlocal hyperbolic conservation laws. The systems can be strongly coupled through the nonlocal coefficient present in the convection term. A fairly general class of fluxes is being considered, where the local part of the flux can be discontinuous at infinitely many points, with possible accumulation points. The aims of the paper are threefold: 1. Establishing existence of entropy solutions with rough local flux for such systems, by deriving a uniform BV bound on the numerical approximations; 2. Deriving a general Kuznetsov-type lemma (and hence uniqueness) for such systems with both smooth and rough local fluxes; 3. Proving the convergence rate of the finite volume approximations to the entropy solutions of the system as $1/2$ and $1/3$, with homogeneous (in any dimension) and rough local parts (in one dimension), respectively. Numerical experiments are included to illustrate the convergence rates.
Novel Physics-Based Machine-Learning Models for Indoor Air Quality Approximations
Authors: Ahmad Mohammadshirazi, Aida Nadafian, Amin Karimi Monsefi, Mohammad H. Rafiei, Rajiv Ramnath
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Analysis, Statistics and Probability (physics.data-an)
Arxiv link: https://arxiv.org/abs/2308.01438
Pdf link: https://arxiv.org/pdf/2308.01438
Abstract Cost-effective sensors are capable of real-time capturing a variety of air quality-related modalities from different pollutant concentrations to indoor/outdoor humidity and temperature. Machine learning (ML) models are capable of performing air-quality "ahead-of-time" approximations. Undoubtedly, accurate indoor air quality approximation significantly helps provide a healthy indoor environment, optimize associated energy consumption, and offer human comfort. However, it is crucial to design an ML architecture to capture the domain knowledge, so-called problem physics. In this study, we propose six novel physics-based ML models for accurate indoor pollutant concentration approximations. The proposed models include an adroit combination of state-space concepts in physics, Gated Recurrent Units, and Decomposition techniques. The proposed models were illustrated using data collected from five offices in a commercial building in California. The proposed models are shown to be less complex, computationally more efficient, and more accurate than similar state-of-the-art transformer-based models. The superiority of the proposed models is due to their relatively light architecture (computational efficiency) and, more importantly, their ability to capture the underlying highly nonlinear patterns embedded in the often contaminated sensor-collected indoor air quality temporal data.
Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving
Authors: Ben Agro, Quinlan Sykora, Sergio Casas, Raquel Urtasun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2308.01471
Pdf link: https://arxiv.org/pdf/2308.01471
Abstract A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants. Existing works either perform object detection followed by trajectory forecasting of the detected objects, or predict dense occupancy and flow grids for the whole scene. The former poses a safety concern as the number of detections needs to be kept low for efficiency reasons, sacrificing object recall. The latter is computationally expensive due to the high-dimensionality of the output grid, and suffers from the limited receptive field inherent to fully convolutional networks. Furthermore, both approaches employ many computational resources predicting areas or objects that might never be queried by the motion planner. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network. Our method avoids unnecessary computation, as it can be directly queried by the motion planner at continuous spatio-temporal locations. Moreover, we design an architecture that overcomes the limited receptive field of previous explicit occupancy prediction methods by adding an efficient yet effective global attention mechanism. Through extensive experiments in both urban and highway settings, we demonstrate that our implicit model outperforms the current state-of-the-art. For more information, visit the project website: https://waabi.ai/research/implicito.
Decentralized Translator of Trust: Supporting Heterogeneous TEE for Critical Infrastructure Protection
Authors: Rabimba Karanjai, Rowan Collier, Zhimin Gao, Lin Chen, Xinxin Fan, Taeweon Suh, Weidong Shi, Lei Xu
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2308.01474
Pdf link: https://arxiv.org/pdf/2308.01474
Abstract Trusted execution environment (TEE) technology has found many applications in mitigating various security risks in an efficient manner, which is attractive for critical infrastructure protection. First, the natural of critical infrastructure requires it to be well protected from various cyber attacks. Second, performance is usually important for critical infrastructure and it cannot afford an expensive protection mechanism. While a large number of TEE-based critical infrastructure protection systems have been proposed to address various security challenges (e.g., secure sensing and reliable control), most existing works ignore one important feature, i.e., devices comprised the critical infrastructure may be equipped with multiple incompatible TEE technologies and belongs to different owners. This feature makes it hard for these devices to establish mutual trust and form a unified TEE environment. To address these challenges and fully unleash the potential of TEE technology for critical infrastructure protection, we propose DHTee, a decentralized coordination mechanism. DHTee uses blockchain technology to support key TEE functions in a heterogeneous TEE environment, especially the attestation service. A Device equipped with one TEE can interact securely with the blockchain to verify whether another potential collaborating device claiming to have a different TEE meets the security requirements. DHTee is also flexible and can support new TEE schemes without affecting devices using existing TEEs that have been supported by the system.
Efficient neural supersampling on a novel gaming dataset
Authors: Antoine Mercier, Ruan Erasmus, Yashesh Savani, Manik Dhingra, Fatih Porikli, Guillaume Berger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01483
Pdf link: https://arxiv.org/pdf/2308.01483
Abstract Real-time rendering for video games has become increasingly challenging due to the need for higher resolutions, framerates and photorealism. Supersampling has emerged as an effective solution to address this challenge. Our work introduces a novel neural algorithm for supersampling rendered content that is 4 times more efficient than existing methods while maintaining the same level of accuracy. Additionally, we introduce a new dataset which provides auxiliary modalities such as motion vectors and depth generated using graphics rendering features like viewport jittering and mipmap biasing at different resolutions. We believe that this dataset fills a gap in the current dataset landscape and can serve as a valuable resource to help measure progress in the field and advance the state-of-the-art in super-resolution techniques for gaming content.
Minimax Optimal $Q$ Learning with Nearest Neighbors
Authors: Puning Zhao, Lifeng Lai
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2308.01490
Pdf link: https://arxiv.org/pdf/2308.01490
Abstract $Q$ learning is a popular model free reinforcement learning method. Most of existing works focus on analyzing $Q$ learning for finite state and action spaces. If the state space is continuous, then the original $Q$ learning method can not be directly used. A modification of the original $Q$ learning method was proposed in (Shah and Xie, 2018), which estimates $Q$ values with nearest neighbors. Such modification makes $Q$ learning suitable for continuous state space. (Shah and Xie, 2018) shows that the convergence rate of estimated $Q$ function is $\tilde{O}(T^{-1/(d+3)})$, which is slower than the minimax lower bound $\tilde{\Omega}(T^{-1/(d+2)})$, indicating that this method is not efficient. This paper proposes two new $Q$ learning methods to bridge the gap of convergence rates in (Shah and Xie, 2018), with one of them being offline, while the other is online. Despite that we still use nearest neighbor approach to estimate $Q$ function, the algorithms are crucially different from (Shah and Xie, 2018). In particular, we replace the kernel nearest neighbor in discretized region with a direct nearest neighbor approach. Consequently, our approach significantly improves the convergence rate. Moreover, the time complexity is also significantly improved in high dimensional state spaces. Our analysis shows that both offline and online methods are minimax rate optimal.
Multi-Objective Optimization for UAV Swarm-Assisted IoT with Virtual Antenna Arrays
Authors: Jiahui Li, Geng Sun, Lingjie Duan, Qingqing Wu
Subjects: Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2308.01511
Pdf link: https://arxiv.org/pdf/2308.01511
Abstract Unmanned aerial vehicle (UAV) network is a promising technology for assisting Internet-of-Things (IoT), where a UAV can use its limited service coverage to harvest and disseminate data from IoT devices with low transmission abilities. The existing UAV-assisted data harvesting and dissemination schemes largely require UAVs to frequently fly between the IoTs and access points, resulting in extra energy and time costs. To reduce both energy and time costs, a key way is to enhance the transmission performance of IoT and UAVs. In this work, we introduce collaborative beamforming into IoTs and UAVs simultaneously to achieve energy and time-efficient data harvesting and dissemination from multiple IoT clusters to remote base stations (BSs). Except for reducing these costs, another non-ignorable threat lies in the existence of the potential eavesdroppers, whereas the handling of eavesdroppers often increases the energy and time costs, resulting in a conflict with the minimization of the costs. Moreover, the importance of these goals may vary relatively in different applications. Thus, we formulate a multi-objective optimization problem (MOP) to simultaneously minimize the mission completion time, signal strength towards the eavesdropper, and total energy cost of the UAVs. We prove that the formulated MOP is an NP-hard, mixed-variable optimization, and large-scale optimization problem. Thus, we propose a swarm intelligence-based algorithm to find a set of candidate solutions with different trade-offs which can meet various requirements in a low computational complexity. We also show that swarm intelligence methods need to enhance solution initialization, solution update, and algorithm parameter update phases when dealing with mixed-variable optimization and large-scale problems. Simulation results demonstrate the proposed algorithm outperforms state-of-the-art swarm intelligence algorithms.
Erase and Repair: An Efficient Box-Free Removal Attack on High-Capacity Deep Hiding
Authors: Hangcheng Liu, Tao Xiang, Shangwei Guo, Han Li, Tianwei Zhang, Xiaofeng Liao
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2308.01512
Pdf link: https://arxiv.org/pdf/2308.01512
Abstract Deep hiding, embedding images with others using deep neural networks, has demonstrated impressive efficacy in increasing the message capacity and robustness of secret sharing. In this paper, we challenge the robustness of existing deep hiding schemes by preventing the recovery of secret images, building on our in-depth study of state-of-the-art deep hiding schemes and their vulnerabilities. Leveraging our analysis, we first propose a simple box-free removal attack on deep hiding that does not require any prior knowledge of the deep hiding schemes. To improve the removal performance on the deep hiding schemes that may be enhanced by adversarial training, we further design a more powerful removal attack, efficient box-free removal attack (EBRA), which employs image inpainting techniques to remove secret images from container images. In addition, to ensure the effectiveness of our attack and preserve the fidelity of the processed container images, we design an erasing phase based on the locality of deep hiding to remove secret information and then make full use of the visual information of container images to repair the erased visual content. Extensive evaluations show our method can completely remove secret images from container images with negligible impact on the quality of container images.
Quantum Multi-Agent Reinforcement Learning for Autonomous Mobility Cooperation
Authors: Soohyun Park, Jae Pyoung Kim, Chanyoung Park, Soyi Jung, Joongheon Kim
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2308.01519
Pdf link: https://arxiv.org/pdf/2308.01519
Abstract For Industry 4.0 Revolution, cooperative autonomous mobility systems are widely used based on multi-agent reinforcement learning (MARL). However, the MARL-based algorithms suffer from huge parameter utilization and convergence difficulties with many agents. To tackle these problems, a quantum MARL (QMARL) algorithm based on the concept of actor-critic network is proposed, which is beneficial in terms of scalability, to deal with the limitations in the noisy intermediate-scale quantum (NISQ) era. Additionally, our QMARL is also beneficial in terms of efficient parameter utilization and fast convergence due to quantum supremacy. Note that the reward in our QMARL is defined as task precision over computation time in multiple agents, thus, multi-agent cooperation can be realized. For further improvement, an additional technique for scalability is proposed, which is called projection value measure (PVM). Based on PVM, our proposed QMARL can achieve the highest reward, by reducing the action dimension into a logarithmic-scale. Finally, we can conclude that our proposed QMARL with PVM outperforms the other algorithms in terms of efficient parameter utilization, fast convergence, and scalability.
PPI-NET: End-to-End Parametric Primitive Inference
Authors: Liang Wang, Xiaogang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2308.01521
Pdf link: https://arxiv.org/pdf/2308.01521
Abstract In engineering applications, line, circle, arc, and point are collectively referred to as primitives, and they play a crucial role in path planning, simulation analysis, and manufacturing. When designing CAD models, engineers typically start by sketching the model's orthographic view on paper or a whiteboard and then translate the design intent into a CAD program. Although this design method is powerful, it often involves challenging and repetitive tasks, requiring engineers to perform numerous similar operations in each design. To address this conversion process, we propose an efficient and accurate end-to-end method that avoids the inefficiency and error accumulation issues associated with using auto-regressive models to infer parametric primitives from hand-drawn sketch images. Since our model samples match the representation format of standard CAD software, they can be imported into CAD software for solving, editing, and applied to downstream design tasks.
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Authors: Jiazheng Xing, Mengmeng Wang, Xiaojun Hou, Guang Dai, Jingdong Wang, Yong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2308.01532
Pdf link: https://arxiv.org/pdf/2308.01532
Abstract Applying large-scale pre-trained visual models like CLIP to few-shot action recognition tasks can benefit performance and efficiency. Utilizing the "pre-training, fine-tuning" paradigm makes it possible to avoid training a network from scratch, which can be time-consuming and resource-intensive. However, this method has two drawbacks. First, limited labeled samples for few-shot action recognition necessitate minimizing the number of tunable parameters to mitigate over-fitting, also leading to inadequate fine-tuning that increases resource consumption and may disrupt the generalized representation of models. Second, the video's extra-temporal dimension challenges few-shot recognition's effective temporal modeling, while pre-trained visual models are usually image models. This paper proposes a novel method called Multimodal Adaptation of CLIP (MA-CLIP) to address these issues. It adapts CLIP for few-shot action recognition by adding lightweight adapters, which can minimize the number of learnable parameters and enable the model to transfer across different tasks quickly. The adapters we design can combine information from video-text multimodal sources for task-oriented spatiotemporal modeling, which is fast, efficient, and has low training costs. Additionally, based on the attention mechanism, we design a text-guided prototype construction module that can fully utilize video-text information to enhance the representation of video prototypes. Our MA-CLIP is plug-and-play, which can be used in any different few-shot action recognition temporal alignment metric.
Avoidance Navigation Based on Offline Pre-Training Reinforcement Learning
Authors: Yang Wenkai Ji Ruihang Zhang Yuxiang Lei Hao, Zhao Zijie
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2308.01551
Pdf link: https://arxiv.org/pdf/2308.01551
Abstract This paper presents a Pre-Training Deep Reinforcement Learning(DRL) for avoidance navigation without map for mobile robots which map raw sensor data to control variable and navigate in an unknown environment. The efficient offline training strategy is proposed to speed up the inefficient random explorations in early stage and we also collect a universal dataset including expert experience for offline training, which is of some significance for other navigation training work. The pre-training and prioritized expert experience are proposed to reduce 80\% training time and has been verified to improve the 2 times reward of DRL. The advanced simulation gazebo with real physical modelling and dynamic equations reduce the gap between sim-to-real. We train our model a corridor environment, and evaluate the model in different environment getting the same effect. Compared to traditional method navigation, we can confirm the trained model can be directly applied into different scenarios and have the ability to no collision navigate. It was demonstrated that our DRL model have universal general capacity in different environment.
Fast Slate Policy Optimization: Going Beyond Plackett-Luce
Authors: Otmane Sakhi, David Rohde, Nicolas Chopin
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2308.01566
Pdf link: https://arxiv.org/pdf/2308.01566
Abstract An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.
Another Hamiltonian Cycle in Bipartite Pfaffian Graphs
Authors: Andreas Björklund, Petteri Kaski, Jesper Nederlof
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2308.01574
Pdf link: https://arxiv.org/pdf/2308.01574
Abstract We present a linear-time algorithm that, given as input (i) a bipartite Pfaffian graph $G$ of minimum degree three, (ii) a Hamiltonian cycle $H$ in $G$, and (iii) an edge $e$ in $H$, outputs at least three other Hamiltonian cycles through the edge $e$ in $G$. This linear-time complexity of finding another Hamiltonian cycle given one is in sharp contrast to the problem of deciding the existence of a Hamiltonian cycle, which is NP-complete already for cubic bipartite planar graphs; such graphs are Pfaffian. Also, without the degree requirement, we show that it is NP-hard to find another Hamiltonian cycle in a bipartite Pfaffian graph. We present further improved algorithms for finding optimal traveling salesperson tours and counting Hamiltonian cycles in bipartite planar graphs with running times that are not known to hold in general planar graphs. We prove our results by a new structural technique that efficiently witnesses each Hamiltonian cycle $H$ through an arbitrary fixed anchor edge $e$ in a bipartite Pfaffian graph using a two-coloring of the vertices as advice that is unique to $H$. Previous techniques -- the Cut&Count technique of Cygan et al. [FOCS'11, TALG'22] in particular -- were able to reduce the Hamiltonian cycle problem only to essentially counting problems; our results show that counting can be avoided by leveraging properties of bipartite Pfaffian graphs. Our technique also has purely graph-theoretical consequences; for example, we show that every cubic bipartite Pfaffian graph has either zero or at least six distinct Hamiltonian cycles; the latter case is tight for the cube graph.
Analyzing Bank Account Information of Nominees and Scammers
Authors: Patsita Sirawongphatsara, Phisit Pornpongtechavanich, Pakkasit Sriamorntrakul, Therdpong Daengsi
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2308.01586
Pdf link: https://arxiv.org/pdf/2308.01586
Abstract Nowadays, people heavily rely on the Internet for various activities, such as e-commerce (e.g., online shopping) and online banking. While online transactions are practical, they also provide scammers with a new way to exploit unsuspecting individuals. This study and investigation utilized data from ChaladOhn, a website designed and developed by academics and policemen. The data covered the period from February 2022 to January 2023. After analyzing and investigating, the results reveal that the total losses amounted to over 3,100 million Thai Baht, with each case incurring losses of less than 10 million. Furthermore, the investigation discovered the involvement of the top two banks in the market, KB** and BB, in the fraud. These banks accounted for: 1) 28.2% and 16.0% of the total number of scam accounts, 2) 25.6% and 20.5% of the total transactions, and 3) 35.7% and 14.9% of the total losses from the victims as recorded in the database, respectively. Considering the anticipated deterioration of this issue, it is crucial to inform regulators and relevant organizations about the investigation's findings. This will enable the development, suggestion, and implementation of an efficient solution to address the rapidly increasing number of online scam cases.
Deep Learning-based surrogate models for parametrized PDEs: handling geometric variability through graph neural networks
Authors: Nicola Rares Franco, Stefania Fresca, Filippo Tombari, Andrea Manzoni
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01602
Pdf link: https://arxiv.org/pdf/2308.01602
Abstract Mesh-based simulations play a key role when modeling complex physical systems that, in many disciplines across science and engineering, require the solution of parametrized time-dependent nonlinear partial differential equations (PDEs). In this context, full order models (FOMs), such as those relying on the finite element method, can reach high levels of accuracy, however often yielding intensive simulations to run. For this reason, surrogate models are developed to replace computationally expensive solvers with more efficient ones, which can strike favorable trade-offs between accuracy and efficiency. This work explores the potential usage of graph neural networks (GNNs) for the simulation of time-dependent PDEs in the presence of geometrical variability. In particular, we propose a systematic strategy to build surrogate models based on a data-driven time-stepping scheme where a GNN architecture is used to efficiently evolve the system. With respect to the majority of surrogate models, the proposed approach stands out for its ability of tackling problems with parameter dependent spatial domains, while simultaneously generalizing to different geometries and mesh resolutions. We assess the effectiveness of the proposed approach through a series of numerical experiments, involving both two- and three-dimensional problems, showing that GNNs can provide a valid alternative to traditional surrogate models in terms of computational efficiency and generalization to new scenarios. We also assess, from a numerical standpoint, the importance of using GNNs, rather than classical dense deep neural networks, for the proposed framework.
Unsupervised Multiplex Graph Learning with Complementary and Consistent Information
Authors: Liang Peng, Xin Wang, Xiaofeng Zhu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01606
Pdf link: https://arxiv.org/pdf/2308.01606
Abstract Unsupervised multiplex graph learning (UMGL) has been shown to achieve significant effectiveness for different downstream tasks by exploring both complementary information and consistent information among multiple graphs. However, previous methods usually overlook the issues in practical applications, i.e., the out-of-sample issue and the noise issue. To address the above issues, in this paper, we propose an effective and efficient UMGL method to explore both complementary and consistent information. To do this, our method employs multiple MLP encoders rather than graph convolutional network (GCN) to conduct representation learning with two constraints, i.e., preserving the local graph structure among nodes to handle the out-of-sample issue, and maximizing the correlation of multiple node representations to handle the noise issue. Comprehensive experiments demonstrate that our proposed method achieves superior effectiveness and efficiency over the comparison methods and effectively tackles those two issues. Code is available at https://github.com/LarryUESTC/CoCoMG.
DaphneSched: A Scheduler for Integrated Data Analysis Pipelines
Authors: Ahmed Eleliemy, Florina M. Ciorba
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2308.01607
Pdf link: https://arxiv.org/pdf/2308.01607
Abstract DAPHNE is a new open-source software infrastructure designed to address the increasing demands of integrated data analysis (IDA) pipelines, comprising data management (DM), high performance computing (HPC), and machine learning (ML) systems. Efficiently executing IDA pipelines is challenging due to their diverse computing characteristics and demands. Therefore, IDA pipelines executed with the DAPHNE infrastructure require an efficient and versatile scheduler to support these demands. This work introduces DaphneSched, the task-based scheduler at the core of DAPHNE. DaphneSched is versatile by incorporating eleven task partitioning and three task assignment techniques, bringing the state-of-the-art closer to the state-of-the-practice task scheduling. To showcase DaphneSched's effectiveness in scheduling IDA pipelines, we evaluate its performance on two applications: a product recommendation system and a linear regression model training. We conduct performance experiments on multicore platforms with 20 and 56 cores. The results show that the versatility of DaphneSched enabled combinations of scheduling strategies that outperform commonly used scheduling techniques by up to 13%. This work confirms the benefits of employing DaphneSched for the efficient execution of applications with IDA pipelines.
Interactive High-Resolution Simulation of Granular Material
Authors: Alexander Sommer, Ulrich Schwanecke, Elmar Schömer
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2308.01629
Pdf link: https://arxiv.org/pdf/2308.01629
Abstract We introduce a particle-based simulation method for granular material in interactive frame rates. We divide the simulation into two decoupled steps. In the first step, a relatively small number of particles is accurately simulated with a constraint-based method. Here, all collisions and the resulting friction between the particles are taken into account. In the second step, the small number of particles is significantly increased by an efficient sampling algorithm without creating additional artifacts. The method is particularly robust and allows relatively large time steps, which makes it well suited for real-time applications. With our method, up to 500k particles can be computed in interactive frame rates on consumer CPUs without relying on GPU support for massive parallel computing. This makes it well suited for applications where a lot of GPU power is already needed for render tasks.
Improving Wind Resistance Performance of Cascaded PID Controlled Quadcopters using Residual Reinforcement Learning
Authors: Yu Ishihara, Yuichi Hazama, Kousuke Suzuki, Jerry Jun Yokono, Kohtaro Sabe, Kenta Kawamoto
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2308.01648
Pdf link: https://arxiv.org/pdf/2308.01648
Abstract Wind resistance control is an essential feature for quadcopters to maintain their position to avoid deviation from target position and prevent collisions with obstacles. Conventionally, cascaded PID controller is used for the control of quadcopters for its simplicity and ease of tuning its parameters. However, it is weak against wind disturbances and the quadcopter can easily deviate from target position. In this work, we propose a residual reinforcement learning based approach to build a wind resistance controller of a quadcopter. By learning only the residual that compensates the disturbance, we can continue using the cascaded PID controller as the base controller of the quadcopter but improve its performance against wind disturbances. To avoid unexpected crashes and destructions of quadcopters, our method does not require real hardware for data collection and training. The controller is trained only on a simulator and directly applied to the target hardware without extra finetuning process. We demonstrate the effectiveness of our approach through various experiments including an experiment in an outdoor scene with wind speed greater than 13 m/s. Despite its simplicity, our controller reduces the position deviation by approximately 50% compared to the quadcopter controlled with the conventional cascaded PID controller. Furthermore, trained controller is robust and preserves its performance even though the quadcopter's mass and propeller's lift coefficient is changed between 50% to 150% from original training time.
UniG-Encoder: A Universal Feature Encoder for Graph and Hypergraph Node Classification
Authors: Minhao Zou, Zhongxue Gan, Yutong Wang, Junheng Zhang, Dongyan Sui, Chun Guan, Siyang Leng
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01650
Pdf link: https://arxiv.org/pdf/2308.01650
Abstract Graph and hypergraph representation learning has attracted increasing attention from various research fields. Despite the decent performance and fruitful applications of Graph Neural Networks (GNNs), Hypergraph Neural Networks (HGNNs), and their well-designed variants, on some commonly used benchmark graphs and hypergraphs, they are outperformed by even a simple Multi-Layer Perceptron. This observation motivates a reexamination of the design paradigm of the current GNNs and HGNNs and poses challenges of extracting graph features effectively. In this work, a universal feature encoder for both graph and hypergraph representation learning is designed, called UniG-Encoder. The architecture starts with a forward transformation of the topological relationships of connected nodes into edge or hyperedge features via a normalized projection matrix. The resulting edge/hyperedge features, together with the original node features, are fed into a neural network. The encoded node embeddings are then derived from the reversed transformation, described by the transpose of the projection matrix, of the network's output, which can be further used for tasks such as node classification. The proposed architecture, in contrast to the traditional spectral-based and/or message passing approaches, simultaneously and comprehensively exploits the node features and graph/hypergraph topologies in an efficient and unified manner, covering both heterophilic and homophilic graphs. The designed projection matrix, encoding the graph features, is intuitive and interpretable. Extensive experiments are conducted and demonstrate the superior performance of the proposed framework on twelve representative hypergraph datasets and six real-world graph datasets, compared to the state-of-the-art methods. Our implementation is available online at https://github.com/MinhZou/UniG-Encoder.
lifex-ep: a robust and efficient software for cardiac electrophysiology simulations
Authors: Pasquale C. Africa, Roberto Piersanti, Francesco Regazzoni, Michele Bucelli, Matteo Salvador, Marco Fedele, Stefano Pagani, Luca Dede', Alfio Quarteroni
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2308.01651
Pdf link: https://arxiv.org/pdf/2308.01651
Abstract Simulating the cardiac function requires the numerical solution of multi-physics and multi-scale mathematical models. This underscores the need for streamlined, accurate, and high-performance computational tools. Despite the dedicated endeavors of various research teams, comprehensive and user-friendly software programs for cardiac simulations are still in the process of achieving full maturity within the scientific community. This work introduces lifex-ep, a publicly available software for numerical simulations of the electrophysiology activity of the cardiac muscle, under both physiological and pathological conditions. lifex-ep employs the monodomain equation to model the heart's electrical activity. It incorporates both phenomenological and second-generation ionic models. These models are discretized using the Finite Element method on tetrahedral or hexahedral meshes. Additionally, lifex-ep integrates the generation of myocardial fibers based on Laplace-Dirichlet Rule-Based Methods, previously released in Africa et al., 2023, within lifex-fiber. This paper provides a concise overview of the mathematical models and numerical methods underlying lifex-ep, along with comprehensive implementation details and instructions for users. lifex-ep features exceptional parallel speedup, scaling efficiently when using up to thousands of cores, and its implementation has been verified against an established benchmark problem for computational electrophysiology. We showcase the key features of lifex-ep through various idealized and realistic simulations. lifex-ep offers a user-friendly and flexible interface. lifex-ep provides easy access to cardiac electrophysiology simulations for a wide user community. It offers a computational tool that integrates models and accurate methods for simulating cardiac electrophysiology within a high-performance framework, while maintaining a user-friendly interface.
Towards a Safe Real-Time Motion Planning Framework for Autonomous Driving Systems: An MPPI Approach
Authors: Mehdi Testouri, Gamal Elghazaly, Raphael Frank
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2308.01654
Pdf link: https://arxiv.org/pdf/2308.01654
Abstract Planning safe trajectories in Autonomous Driving Systems (ADS) is a complex problem to solve in real-time. The main challenge to solve this problem arises from the various conditions and constraints imposed by road geometry, semantics and traffic rules, as well as the presence of dynamic agents. Recently, Model Predictive Path Integral (MPPI) has shown to be an effective framework for optimal motion planning and control in robot navigation in unstructured and highly uncertain environments. In this paper, we formulate the motion planning problem in ADS as a nonlinear stochastic dynamic optimization problem that can be solved using an MPPI strategy. The main technical contribution of this work is a method to handle obstacles within the MPPI formulation safely. In this method, obstacles are approximated by circles that can be easily integrated into the MPPI cost formulation while considering safety margins. The proposed MPPI framework has been efficiently implemented in our autonomous vehicle and experimentally validated using three different primitive scenarios. Experimental results show that generated trajectories are safe, feasible and perfectly achieve the planning objective. The video results as well as the open-source implementation are available at: https://gitlab.uni.lu/360lab-public/mppi
Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models
Authors: Zheyu Zhang, Han Yang, Bolei Ma, David Rügamer, Ercong Nie
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2308.01684
Pdf link: https://arxiv.org/pdf/2308.01684
Abstract Large Language Models (LLMs) demonstrate remarkable performance on a variety of Natural Language Understanding (NLU) tasks, primarily due to their in-context learning ability. This ability is utilized in our proposed "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs) by leveraging the Chain of Thought (CoT) prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa (Liu et al., 2019) fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the RoBERTa-base in 10 linguistic, NLU, and question answering tasks by more than 3 points, showing superior ability to extract contextual information. These results suggest that compact LMs pretrained on small, LLM-restructured data can better understand tasks and achieve improved performance. The code for data processing and model training is available at: https://github.com/oooranz/Baby-CoThought.
Finding the Optimum Design of Large Gas Engines Prechambers Using CFD and Bayesian Optimization
Authors: Stefan Posch, Clemens Gößnitzer, Franz Rohrhofer, Bernhard C. Geiger, Andreas Wimmer
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01743
Pdf link: https://arxiv.org/pdf/2308.01743
Abstract The turbulent jet ignition concept using prechambers is a promising solution to achieve stable combustion at lean conditions in large gas engines, leading to high efficiency at low emission levels. Due to the wide range of design and operating parameters for large gas engine prechambers, the preferred method for evaluating different designs is computational fluid dynamics (CFD), as testing in test bed measurement campaigns is time-consuming and expensive. However, the significant computational time required for detailed CFD simulations due to the complexity of solving the underlying physics also limits its applicability. In optimization settings similar to the present case, i.e., where the evaluation of the objective function(s) is computationally costly, Bayesian optimization has largely replaced classical design-of-experiment. Thus, the present study deals with the computationally efficient Bayesian optimization of large gas engine prechambers design using CFD simulation. Reynolds-averaged-Navier-Stokes simulations are used to determine the target values as a function of the selected prechamber design parameters. The results indicate that the chosen strategy is effective to find a prechamber design that achieves the desired target values.
Bag of Policies for Distributional Deep Exploration
Authors: Asen Nachkov, Luchen Li, Giulia Luise, Filippo Valdettaro, Aldo Faisal
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01759
Pdf link: https://arxiv.org/pdf/2308.01759
Abstract Efficient exploration in complex environments remains a major challenge for reinforcement learning (RL). Compared to previous Thompson sampling-inspired mechanisms that enable temporally extended exploration, i.e., deep exploration, we focus on deep exploration in distributional RL. We develop here a general purpose approach, Bag of Policies (BoP), that can be built on top of any return distribution estimator by maintaining a population of its copies. BoP consists of an ensemble of multiple heads that are updated independently. During training, each episode is controlled by only one of the heads and the collected state-action pairs are used to update all heads off-policy, leading to distinct learning signals for each head which diversify learning and behaviour. To test whether optimistic ensemble method can improve on distributional RL as did on scalar RL, by e.g. Bootstrapped DQN, we implement the BoP approach with a population of distributional actor-critics using Bayesian Distributional Policy Gradients (BDPG). The population thus approximates a posterior distribution of return distributions along with a posterior distribution of policies. Another benefit of building upon BDPG is that it allows to analyze global posterior uncertainty along with local curiosity bonus simultaneously for exploration. As BDPG is already an optimistic method, this pairing helps to investigate if optimism is accumulatable in distributional RL. Overall BoP results in greater robustness and speed during learning as demonstrated by our experimental results on ALE Atari games.
PoissonNet: Resolution-Agnostic 3D Shape Reconstruction using Fourier Neural Operators
Authors: Hector Andrade-Loarca, Aras Bacho, Julius Hege, Gitta Kutyniok
Subjects: Computer Vision and Pattern Recognition (cs.CV); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2308.01766
Pdf link: https://arxiv.org/pdf/2308.01766
Abstract We introduce PoissonNet, an architecture for shape reconstruction that addresses the challenge of recovering 3D shapes from points. Traditional deep neural networks face challenges with common 3D shape discretization techniques due to their computational complexity at higher resolutions. To overcome this, we leverage Fourier Neural Operators (FNOs) to solve the Poisson equation and reconstruct a mesh from oriented point cloud measurements. PoissonNet exhibits two main advantages. First, it enables efficient training on low-resolution data while achieving comparable performance at high-resolution evaluation, thanks to the resolution-agnostic nature of FNOs. This feature allows for one-shot super-resolution. Second, our method surpasses existing approaches in reconstruction quality while being differentiable. Overall, our proposed method not only improves upon the limitations of classical deep neural networks in shape reconstruction but also achieves superior results in terms of reconstruction quality, running time, and resolution flexibility. Furthermore, we demonstrate that the Poisson surface reconstruction problem is well-posed in the limit case by showing a universal approximation theorem for the solution operator of the Poisson equation with distributional data utilizing the Fourier Neuronal Operator, which provides a theoretical foundation for our numerical results. The code to reproduce the experiments is available on: \url{https://github.com/arsenal9971/PoissonNet}.
Deep Learning-based Prediction of Stress and Strain Maps in Arterial Walls for Improved Cardiovascular Risk Assessment
Authors: Yasin Shokrollahi1, Pengfei Dong1, Xianqi Li, Linxia Gu
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2308.01771
Pdf link: https://arxiv.org/pdf/2308.01771
Abstract This study investigated the potential of end-to-end deep learning tools as a more effective substitute for FEM in predicting stress-strain fields within 2D cross sections of arterial wall. We first proposed a U-Net based fully convolutional neural network (CNN) to predict the von Mises stress and strain distribution based on the spatial arrangement of calcification within arterial wall cross-sections. Further, we developed a conditional generative adversarial network (cGAN) to enhance, particularly from the perceptual perspective, the prediction accuracy of stress and strain field maps for arterial walls with various calcification quantities and spatial configurations. On top of U-Net and cGAN, we also proposed their ensemble approaches, respectively, to further improve the prediction accuracy of field maps. Our dataset, consisting of input and output images, was generated by implementing boundary conditions and extracting stress-strain field maps. The trained U-Net models can accurately predict von Mises stress and strain fields, with structural similarity index scores (SSIM) of 0.854 and 0.830 and mean squared errors of 0.017 and 0.018 for stress and strain, respectively, on a reserved test set. Meanwhile, the cGAN models in a combination of ensemble and transfer learning techniques demonstrate high accuracy in predicting von Mises stress and strain fields, as evidenced by SSIM scores of 0.890 for stress and 0.803 for strain. Additionally, mean squared errors of 0.008 for stress and 0.017 for strain further support the model's performance on a designated test set. Overall, this study developed a surrogate model for finite element analysis, which can accurately and efficiently predict stress-strain fields of arterial walls regardless of complex geometries and boundary conditions.
Bayesian parameter identification in impedance boundary conditions for Helmholtz problems
Authors: Nick Wulbusch, Reinhild Roden, Matthias Blau, Alexey Chernov
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2308.01788
Pdf link: https://arxiv.org/pdf/2308.01788
Abstract We consider the problem of identifying the acoustic impedance of a wall surface from noisy pressure measurements in a closed room using a Bayesian approach. The room acoustics is modeled by the interior Helmholtz equation with impedance boundary conditions. The aim is to compute moments of the acoustic impedance to estimate a suitable density function of the impedance coefficient. For the computation of moments we use ratio estimators and Monte-Carlo sampling. We consider two different experimental scenarios. In the first scenario, the noisy measurements correspond to a wall modeled by impedance boundary conditions. In this case, the Bayesian algorithm uses a model that is (up to the noise) consistent with the measurements and our algorithm is able to identify acoustic impedance with high accuracy. In the second scenario, the noisy measurements come from a coupled acoustic-structural problem, modeling a wall made of glass, whereas the Bayesian algorithm still uses a model with impedance boundary conditions. In this case, the parameter identification model is inconsistent with the measurements and therefore is not capable to represent them well. Nonetheless, for particular frequency bands the Bayesian algorithm identifies estimates with high likelihood. Outside these frequency bands the algorithm fails. We discuss the results of both examples and possible reasons for the failure of the latter case for particular frequency values.
Fundamental Data Structures for Matrix-Free Finite Elements on Hybrid Tetrahedral Grids
Authors: Nils Kohl, Daniel Bauer, Fabian Böhm, Ulrich Rüde
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2308.01792
Pdf link: https://arxiv.org/pdf/2308.01792
Abstract This paper presents efficient data structures for the implementation of matrix-free finite element methods on block-structured, hybrid tetrahedral grids. It provides a complete categorization of all geometric sub-objects that emerge from the regular refinement of the unstructured, tetrahedral coarse grid and describes efficient iteration patterns and analytical linearization functions for the mapping of coefficients to memory addresses. This foundation enables the implementation of fast, extreme-scalable, matrix-free, iterative solvers, and in particular geometric multigrid methods by design. Their application to the variable-coefficient Stokes system subject to an enriched Galerkin discretization and to the curl-curl problem discretized with N\'ed\'elec edge elements showcases the flexibility of the implementation. Eventually, the solution of a curl-curl problem with $1.6 \cdot 10^{11}$ (more than one hundred billion) unknowns on more than $32000$ processes with a matrix-free full multigrid solver demonstrates its extreme-scalability.
Deep Neural Networks Fused with Textures for Image Classification
Authors: Asish Bera, Debotosh Bhattacharjee, Mita Nasipuri
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2308.01813
Pdf link: https://arxiv.org/pdf/2308.01813
Abstract Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations. Deep learning methods have achieved remarkable success in solving FGIC. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modelling using the long short-term memory (LSTM). Another path computes image-level textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for image classification. The method is tested on eight datasets representing the human faces, skin lesions, food dishes, marine lives, etc. using four standard backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins.
Hard Adversarial Example Mining for Improving Robust Fairness
Authors: Chenhao Lin, Xiang Ji, Yulong Yang, Qian Li, Chao Shen, Run Wang, Liming Fang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2308.01823
Pdf link: https://arxiv.org/pdf/2308.01823
Abstract Adversarial training (AT) is widely considered the state-of-the-art technique for improving the robustness of deep neural networks (DNNs) against adversarial examples (AE). Nevertheless, recent studies have revealed that adversarially trained models are prone to unfairness problems, restricting their applicability. In this paper, we empirically observe that this limitation may be attributed to serious adversarial confidence overfitting, i.e., certain adversarial examples with overconfidence. To alleviate this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM concentrates on mining hard adversarial examples while discarding the easy ones in an adaptive fashion. Specifically, HAM identifies hard AEs in terms of their step sizes needed to cross the decision boundary when calculating loss value. Besides, an early-dropping mechanism is incorporated to discard the easy examples at the initial stages of AE generation, resulting in efficient AT. Extensive experimental results on CIFAR-10, SVHN, and Imagenette demonstrate that HAM achieves significant improvement in robust fairness while reducing computational cost compared to several state-of-the-art adversarial training methods. The code will be made publicly available.
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Authors: Xinghua Zhang, Bowen Yu, Haiyang Yu, Yangyu Lv, Tingwen Liu, Fei Huang, Hongbo Xu, Yongbin Li
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2308.01862
Pdf link: https://arxiv.org/pdf/2308.01862
Abstract Measuring the quality of responses generated by LLMs is a challenging task, particularly when it comes to evaluating whether the response is aligned with human preference. A novel approach involves using the LLM itself to make evaluation and stabilizing the results through multiple independent evaluations, similar to a single-layer narrow LLM network. This network consists of a fixed number of neurons, with each neuron being the same LLM. In this paper, we draw upon the extensive research on deep neural networks to explore whether deeper and wider networks can lead to fairer evaluations. Specifically, inspired by the observation that different neurons in a neural network are responsible for detecting different concepts, we first adaptively generate as many neuron roles as possible for each evaluation sample. Each perspective corresponds to the role of a specific LLM neuron in the first layer. In subsequent layers, we follow the idea that higher layers in deep networks are responsible for more comprehensive features, each layer receives representations from all neurons in the previous layer, integrating the locally learned evaluation information to obtain a more comprehensive evaluation result. Interestingly, this network design resembles the process of academic paper reviewing. To validate the effectiveness of our method, we construct the largest and most diverse English evaluation benchmark LLMEval$^2$ for LLM evaluators, comprising 15 tasks, 8 abilities, and 2,553 samples. Experimental results demonstrate that a wider network (involving many reviewers) with 2 layers (one round of discussion) performs the best, improving kappa correlation coefficient from 0.28 to 0.34. We also leverage WideDeep to aid in the assessment of Chinese LLMs, which has accelerated the evaluation time by 4.6 times, resulting in a 60% cost saving. WideDeep achieves a remarkable 93% agreement level among humans.
DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations
Authors: Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01890
Pdf link: https://arxiv.org/pdf/2308.01890
Abstract Multi-label image recognition in the low-label regime is a task of great challenge and practical significance. Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations. In this research, we leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs. We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++), which serves as a unified approach for addressing partial-label and zero-shot multi-label recognition. In DualCoOp++ we separately encode evidential, positive, and negative contexts for target classes as parametric components of the linguistic input (i.e., prompts). The evidential context aims to discover all the related visual content for the target class, and serves as guidance to aggregate positive and negative contexts from the spatial domain of the image, enabling better distinguishment between similar categories. Additionally, we introduce a Winner-Take-All module that promotes inter-class interaction during training, while avoiding the need for extra parameters and costs. As DualCoOp++ imposes minimal additional learnable overhead on the pretrained vision-language framework, it enables rapid adaptation to multi-label recognition tasks with limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the superior performance of our approach compared to state-of-the-art methods.
Exact identification of nonlinear dynamical systems by Trimmed Lasso
Authors: Shawn L. Kiser, Mikhail Guskov, Marc Rébillat, Nicolas Ranc
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2308.01891
Pdf link: https://arxiv.org/pdf/2308.01891
Abstract Identification of nonlinear dynamical systems has been popularized by sparse identification of the nonlinear dynamics (SINDy) via the sequentially thresholded least squares (STLS) algorithm. Many extensions SINDy have emerged in the literature to deal with experimental data which are finite in length and noisy. Recently, the computationally intensive method of ensembling bootstrapped SINDy models (E-SINDy) was proposed for model identification, handling finite, highly noisy data. While the extensions of SINDy are numerous, their sparsity-promoting estimators occasionally provide sparse approximations of the dynamics as opposed to exact recovery. Furthermore, these estimators suffer under multicollinearity, e.g. the irrepresentable condition for the Lasso. In this paper, we demonstrate that the Trimmed Lasso for robust identification of models (TRIM) can provide exact recovery under more severe noise, finite data, and multicollinearity as opposed to E-SINDy. Additionally, the computational cost of TRIM is asymptotically equal to STLS since the sparsity parameter of the TRIM can be solved efficiently by convex solvers. We compare these methodologies on challenging nonlinear systems, specifically the Lorenz 63 system, the Bouc Wen oscillator from the nonlinear dynamics benchmark of No\"el and Schoukens, 2016, and a time delay system describing tool cutting dynamics. This study emphasizes the comparisons between STLS, reweighted $\ell_1$ minimization, and Trimmed Lasso in identification with respect to problems faced by practitioners: the problem of finite and noisy data, the performance of the sparse regression of when the library grows in dimension (multicollinearity), and automatic methods for choice of regularization parameters.
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Authors: Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2308.01907
Pdf link: https://arxiv.org/pdf/2308.01907
Abstract We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new dataset (AS-1B) with over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. It covers a wide range of 3.5 million common and rare concepts in the real world, and has 132.2 billion tokens that describe the concepts and their attributes. Leveraging this new dataset, we develop the All-Seeing model (ASM), a unified framework for panoptic visual recognition and understanding. The model is trained with open-ended language prompts and locations, which allows it to generalize to various vision and language tasks with remarkable zero-shot performance, including region-text retrieval, region recognition, captioning, and question-answering. We hope that this project can serve as a foundation for vision-language artificial general intelligence research. Models and the dataset shall be released at https://github.com/OpenGVLab/All-Seeing, and demo can be seen at https://huggingface.co/spaces/OpenGVLab/all-seeing.
Keyword: faster

One Partition Approximating All $\ell_p$-norm Objectives in Correlation Clustering
Authors: Sami Davies, Benjamin Moseley, Heather Newman
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2308.01534
Pdf link: https://arxiv.org/pdf/2308.01534
Abstract This paper considers correlation clustering on unweighted complete graphs. We give a combinatorial algorithm that returns a single clustering solution that is simultaneously $O(1)$-approximate for all $\ell_p$-norms of the disagreement vector. This proves that minimal sacrifice is needed in order to optimize different norms of the disagreement vector. Our algorithm is the first combinatorial approximation algorithm for the $\ell_2$-norm objective, and more generally the first combinatorial algorithm for the $\ell_p$-norm objective when $2 \leq p < \infty$. It is also faster than all previous algorithms that minimize the $\ell_p$-norm of the disagreement vector, with run-time $O(n^\omega)$, where $O(n^\omega)$ is the time for matrix multiplication on $n \times n$ matrices. When the maximum positive degree in the graph is at most $\Delta$, this can be improved to a run-time of $O(n\Delta^2 \log n)$.
Finite element approximation of the Hardy constant
Authors: Francesco Della Pietra, Giovanni Fantuzzi, Liviu I. Ignat, Alba Lia Masiello, Gloria Paoli, Enrique Zuazua
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2308.01580
Pdf link: https://arxiv.org/pdf/2308.01580
Abstract We consider finite element approximations to the optimal constant for the Hardy inequality with exponent $p=2$ in bounded domains of dimension $n=1$ or $n\geq 3$. For finite element spaces of piecewise linear and continuous functions on a mesh of size $h$, we prove that the approximate Hardy constant, $S_h^n$, converges to the optimal Hardy constant $S^n$ no slower than $O(1/\vert \log h \vert)$. We also show that the convergence is no faster than $O(1/\vert \log h \vert^2)$ if $n=1$ or if $n\geq 3$, the domain is the unit ball, and the finite element discretization exploits the rotational symmetry of the problem. Our estimates are compared to exact values for $S_h^n$ obtained computationally.
Keyword: mobile

Avoidance Navigation Based on Offline Pre-Training Reinforcement Learning
Authors: Yang Wenkai Ji Ruihang Zhang Yuxiang Lei Hao, Zhao Zijie
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2308.01551
Pdf link: https://arxiv.org/pdf/2308.01551
Abstract This paper presents a Pre-Training Deep Reinforcement Learning(DRL) for avoidance navigation without map for mobile robots which map raw sensor data to control variable and navigate in an unknown environment. The efficient offline training strategy is proposed to speed up the inefficient random explorations in early stage and we also collect a universal dataset including expert experience for offline training, which is of some significance for other navigation training work. The pre-training and prioritized expert experience are proposed to reduce 80\% training time and has been verified to improve the 2 times reward of DRL. The advanced simulation gazebo with real physical modelling and dynamic equations reduce the gap between sim-to-real. We train our model a corridor environment, and evaluate the model in different environment getting the same effect. Compared to traditional method navigation, we can confirm the trained model can be directly applied into different scenarios and have the ability to no collision navigate. It was demonstrated that our DRL model have universal general capacity in different environment.
Real-time Light Estimation and Neural Soft Shadows for AR Indoor Scenarios
Authors: Alexander Sommer, Ulrich Schwanecke, Elmar Schömer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2308.01613
Pdf link: https://arxiv.org/pdf/2308.01613
Abstract We present a pipeline for realistic embedding of virtual objects into footage of indoor scenes with focus on real-time AR applications. Our pipeline consists of two main components: A light estimator and a neural soft shadow texture generator. Our light estimation is based on deep neural nets and determines the main light direction, light color, ambient color and an opacity parameter for the shadow texture. Our neural soft shadow method encodes object-based realistic soft shadows as light direction dependent textures in a small MLP. We show that our pipeline can be used to integrate objects into AR scenes in a new level of realism in real-time. Our models are small enough to run on current mobile devices. We achieve runtimes of 9ms for light estimation and 5ms for neural shadows on an iPhone 11 Pro.
Time-optimal geodesic mutual visibility of robots on grids within minimum area
Authors: Serafino Cicerone, Alessia Di Fonso, Gabriele Di Stefano, Alfredo Navarra
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2308.01855
Pdf link: https://arxiv.org/pdf/2308.01855
Abstract The \textsc{Mutual Visibility} is a well-known problem in the context of mobile robots. For a set of $n$ robots disposed in the Euclidean plane, it asks for moving the robots without collisions so as to achieve a placement ensuring that no three robots are collinear. For robots moving on graphs, we consider the \textsc{Geodesic Mutual Visibility} ($\GMV$) problem. Robots move along the edges of the graph, without collisions, so as to occupy some vertices that guarantee they become pairwise geodesic mutually visible. This means that there is a shortest path (i.e., a "geodesic") between each pair of robots along which no other robots reside. We study this problem in the context of finite and infinite square grids, for robots operating under the standard Look-Compute-Move model. In both scenarios, we provide resolution algorithms along with formal correctness proofs, highlighting the most relevant peculiarities arising within the different contexts, while optimizing the time complexity.
MRQ:Support Multiple Quantization Schemes through Model Re-Quantization
Authors: Manasa Manohara, Sankalp Dayal, Tarqi Afzal, Rahul Bakshi, Kahkuen Fu
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2308.01867
Pdf link: https://arxiv.org/pdf/2308.01867
Abstract Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU, DPU), deploying deep learning models on edge devices with fixed-point hardware is still challenging due to complex model quantization and conversion. Existing model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g., only asymmetric per-tensor quantization in TF1.x QAT [4]). Accordingly, deep learning models cannot be easily quantized for diverse fixed-point hardwares, mainly due to slightly different quantization requirements. In this paper, we envision a new type of model quantization approach called MRQ (model re-quantization), which takes existing quantized models and quickly transforms the models to meet different quantization requirements (e.g., asymmetric -> symmetric, non-power-of-2 scale -> power-of-2 scale). Re-quantization is much simpler than quantizing from scratch because it avoids costly re-training and provides support for multiple quantization schemes simultaneously. To minimize re-quantization error, we developed a new set of re-quantization algorithms including weight correction and rounding error folding. We have demonstrated that MobileNetV2 QAT model [7] can be quickly re-quantized into two different quantization schemes (i.e., symmetric and symmetric+power-of-2 scale) with less than 0.64 units of accuracy loss. We believe our work is the first to leverage this concept of re-quantization for model quantization and models obtained from the re-quantization process have been successfully deployed on NNA in the Echo Show devices.
Keyword: pruning

Hierarchical Federated Learning in Wireless Networks: Pruning Tackles Bandwidth Scarcity and System Heterogeneity
Authors: Md Ferdous Pervej, Richeng Jin, Huaiyu Dai
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2308.01562
Pdf link: https://arxiv.org/pdf/2308.01562
Abstract While a practical wireless network has many tiers where end users do not directly communicate with the central server, the users' devices have limited computation and battery powers, and the serving base station (BS) has a fixed bandwidth. Owing to these practical constraints and system models, this paper leverages model pruning and proposes a pruning-enabled hierarchical federated learning (PHFL) in heterogeneous networks (HetNets). We first derive an upper bound of the convergence rate that clearly demonstrates the impact of the model pruning and wireless communications between the clients and the associated BS. Then we jointly optimize the model pruning ratio, central processing unit (CPU) frequency and transmission power of the clients in order to minimize the controllable terms of the convergence bound under strict delay and energy constraints. However, since the original problem is not convex, we perform successive convex approximation (SCA) and jointly optimize the parameters for the relaxed convex problem. Through extensive simulation, we validate the effectiveness of our proposed PHFL algorithm in terms of test accuracy, wall clock time, energy consumption and bandwidth requirement.
Keyword: diffusion

Reverse Stable Diffusion: What prompt was used to generate this image?
Authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01472
Pdf link: https://arxiv.org/pdf/2308.01472
Abstract Text-to-image diffusion models such as Stable Diffusion have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we introduce the new task of predicting the text prompt given an image generated by a generative diffusion model. We combine a series of white-box and black-box models (with and without access to the weights of the diffusion network) to deal with the proposed task. We propose a novel learning framework comprising of a joint prompt regression and multi-label vocabulary classification objective that generates improved prompts. To further improve our method, we employ a curriculum learning procedure that promotes the learning of image-prompt pairs with lower labeling noise (i.e. that are better aligned), and an unsupervised domain-adaptive kernel learning method that uses the similarities between samples in the source and target domains as extra features. We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion. Our novel learning framework produces excellent results on the aforementioned task, yielding the highest gains when applied on the white-box model. In addition, we make an interesting discovery: training a diffusion model on the prompt generation task can make the model generate images that are much better aligned with the input prompts, when the model is directly reused for text-to-image generation.
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Authors: Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2308.01546
Pdf link: https://arxiv.org/pdf/2308.01546
Abstract Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.
Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models
Authors: Joao Carvalho, An T. Le, Mark Baierl, Dorothea Koert, Jan Peters
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01557
Pdf link: https://arxiv.org/pdf/2308.01557
Abstract Learning priors on trajectory distributions can help accelerate robot motion planning optimization. Given previously successful plans, learning trajectory generative models as priors for a new planning problem is highly desirable. Prior works propose several ways on utilizing this prior to bootstrapping the motion planning problem. Either sampling the prior for initializations or using the prior distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we propose learning diffusion models as priors. We then can sample directly from the posterior trajectory distribution conditioned on task goals, by leveraging the inverse denoising process of diffusion models. Furthermore, diffusion has been recently shown to effectively encode data multimodality in high-dimensional settings, which is particularly well-suited for large trajectory dataset. To demonstrate our method efficacy, we compare our proposed method - Motion Planning Diffusion - against several baselines in simulated planar robot and 7-dof robot arm manipulator environments. To assess the generalization capabilities of our method, we test it in environments with previously unseen obstacles. Our experiments show that diffusion models are strong priors to encode high-dimensional trajectory distributions of robot motions.
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Authors: Myeongjin Ko, Yong-Hoon Choi
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2308.01573
Pdf link: https://arxiv.org/pdf/2308.01573
Abstract The diffusion model is capable of generating high-quality data through a probabilistic approach. However, it suffers from the drawback of slow generation speed due to the requirement of a large number of time steps. To address this limitation, recent models such as denoising diffusion implicit models (DDIM) focus on generating samples without directly modeling the probability distribution, while models like denoising diffusion generative adversarial networks (GAN) combine diffusion processes with GANs. In the field of speech synthesis, a recent diffusion speech synthesis model called DiffGAN-TTS, utilizing the structure of GANs, has been introduced and demonstrates superior performance in both speech quality and generation speed. In this paper, to further enhance the performance of DiffGAN-TTS, we propose a speech synthesis model with two discriminators: a diffusion discriminator for learning the distribution of the reverse process and a spectrogram discriminator for learning the distribution of the generated data. Objective metrics such as structural similarity index measure (SSIM), mel-cepstral distortion (MCD), F0 root mean squared error (F0 RMSE), short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), as well as subjective metrics like mean opinion score (MOS), are used to evaluate the performance of the proposed model. The evaluation results show that the proposed model outperforms recent state-of-the-art models such as FastSpeech2 and DiffGAN-TTS in various metrics. Our implementation and audio samples are located on GitHub.
Reference-Free Isotropic 3D EM Reconstruction using Diffusion Models
Authors: Kyungryun Lee, Won-Ki Jeong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2308.01594
Pdf link: https://arxiv.org/pdf/2308.01594
Abstract Electron microscopy (EM) images exhibit anisotropic axial resolution due to the characteristics inherent to the imaging modality, presenting challenges in analysis and downstream tasks.In this paper, we propose a diffusion-model-based framework that overcomes the limitations of requiring reference data or prior knowledge about the degradation process. Our approach utilizes 2D diffusion models to consistently reconstruct 3D volumes and is well-suited for highly downsampled data. Extensive experiments conducted on two public datasets demonstrate the robustness and superiority of leveraging the generative prior compared to supervised learning methods. Additionally, we demonstrate our method's feasibility for self-supervised reconstruction, which can restore a single anisotropic volume without any training data.
DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models
Authors: Jianxin Lin, Peng Xiao, Yijun Wang, Rongju Zhang, Xiangxiang Zeng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2308.01655
Pdf link: https://arxiv.org/pdf/2308.01655
Abstract Recent data-driven image colorization methods have enabled automatic or reference-based colorization, while still suffering from unsatisfactory and inaccurate object-level color control. To address these issues, we propose a new method called DiffColor that leverages the power of pre-trained diffusion models to recover vivid colors conditioned on a prompt text, without any additional inputs. DiffColor mainly contains two stages: colorization with generative color prior and in-context controllable colorization. Specifically, we first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss. Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction. Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance. Moreover, our method allows for in-context colorization, i.e., producing different colorization results by modifying prompt texts without any fine-tuning, and can achieve object-level controllable colorization results. Extensive experiments and user studies demonstrate that DiffColor outperforms previous works in terms of visual quality, color fidelity, and diversity of colorization options.
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling
Authors: Zhao Yang, Bing Su, Ji-Rong Wen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2308.01850
Pdf link: https://arxiv.org/pdf/2308.01850
Abstract Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions that correspond to a single sentence describing a single action. However, when a text stream describes a sequence of continuous motions, the generated motions corresponding to each sentence may not be coherently linked. Existing long-term motion generation methods face two main issues. Firstly, they cannot directly generate coherent motions and require additional operations such as interpolation to process the generated actions. Secondly, they generate subsequent actions in an autoregressive manner without considering the influence of future actions on previous ones. To address these issues, we propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods: Past Inpainting Sampling and Compositional Transition Sampling. Past Inpainting Sampling completes subsequent motions by treating previous motions as conditions, while Compositional Transition Sampling models the distribution of the transition as the composition of two adjacent motions guided by different text prompts. Our experimental results demonstrate that our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream. The code is available at \href{https://github.com/yangzhao1230/PCMDM}{https://github.com/yangzhao1230/PCMDM}.
Keyword: adaptive

Reverse Stable Diffusion: What prompt was used to generate this image?
Authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2308.01472
Pdf link: https://arxiv.org/pdf/2308.01472
Abstract Text-to-image diffusion models such as Stable Diffusion have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we introduce the new task of predicting the text prompt given an image generated by a generative diffusion model. We combine a series of white-box and black-box models (with and without access to the weights of the diffusion network) to deal with the proposed task. We propose a novel learning framework comprising of a joint prompt regression and multi-label vocabulary classification objective that generates improved prompts. To further improve our method, we employ a curriculum learning procedure that promotes the learning of image-prompt pairs with lower labeling noise (i.e. that are better aligned), and an unsupervised domain-adaptive kernel learning method that uses the similarities between samples in the source and target domains as extra features. We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion. Our novel learning framework produces excellent results on the aforementioned task, yielding the highest gains when applied on the white-box model. In addition, we make an interesting discovery: training a diffusion model on the prompt generation task can make the model generate images that are much better aligned with the input prompts, when the model is directly reused for text-to-image generation.
Randomized approximation of summable sequences -- adaptive and non-adaptive
Authors: Robert Kunsch, Erich Novak, Marcin Wnuk
Subjects: Numerical Analysis (math.NA); Functional Analysis (math.FA); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2308.01705
Pdf link: https://arxiv.org/pdf/2308.01705
Abstract We prove lower bounds for the randomized approximation of the embedding $\ell1^m \rightarrow \ell\infty^m$ based on algorithms that use arbitrary linear (hence non-adaptive) information provided by a (randomized) measurement matrix $N \in \mathbb{R}^{n \times m}$. These lower bounds reflect the increasing difficulty of the problem for $m \to \infty$, namely, a term $\sqrt{\log m}$ in the complexity $n$. This result implies that non-compact operators between arbitrary Banach spaces are not approximable using non-adaptive Monte Carlo methods. We also compare these lower bounds for non-adaptive methods with upper bounds based on adaptive, randomized methods for recovery for which the complexity $n$ only exhibits a $(\log\log m)$-dependence. In doing so we give an example of linear problems where the error for adaptive vs. non-adaptive Monte Carlo methods shows a gap of order $n^{1/2} ( \log n)^{-1/2}$.
NeuroSwarm: Multi-Agent Neural 3D Scene Reconstruction and Segmentation with UAV for Optimal Navigation of Quadruped Robot
Authors: Iana Zhura, Denis Davletshin, Nipun Dhananjaya Weerakkodi Mudalige, Aleksey Fedoseev, Robinroy Peter, Dzmitry Tsetserukou
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2308.01725
Pdf link: https://arxiv.org/pdf/2308.01725
Abstract Quadruped robots have the distinct ability to adapt their body and step height to navigate through cluttered environments. Nonetheless, for these robots to utilize their full potential in real-world scenarios, they require awareness of their environment and obstacle geometry. We propose a novel multi-agent robotic system that incorporates cutting-edge technologies. The proposed solution features a 3D neural reconstruction algorithm that enables navigation of a quadruped robot in both static and semi-static environments. The prior areas of the environment are also segmented according to the quadruped robots' abilities to pass them. Moreover, we have developed an adaptive neural field optimal motion planner (ANFOMP) that considers both collision probability and obstacle height in 2D space.Our new navigation and mapping approach enables quadruped robots to adjust their height and behavior to navigate under arches and push through obstacles with smaller dimensions. The multi-agent mapping operation has proven to be highly accurate, with an obstacle reconstruction precision of 82%. Moreover, the quadruped robot can navigate with 3D obstacle information and the ANFOMP system, resulting in a 33.3% reduction in path length and a 70% reduction in navigation time.
Enhancing Visibility in Nighttime Haze Images Using Guided APSF and Gradient Adaptive Convolution
Authors: Yeying Jin, Beibei Lin, Wending Yan, Wei Ye, Yuan Yuan, Robby T. Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2308.01738
Pdf link: https://arxiv.org/pdf/2308.01738
Abstract Visibility in hazy nighttime scenes is frequently reduced by multiple factors, including low light, intense glow, light scattering, and the presence of multicolored light sources. Existing nighttime dehazing methods often struggle with handling glow or low-light conditions, resulting in either excessively dark visuals or unsuppressed glow outputs. In this paper, we enhance the visibility from a single nighttime haze image by suppressing glow and enhancing low-light regions. To handle glow effects, our framework learns from the rendered glow pairs. Specifically, a light source aware network is proposed to detect light sources of night images, followed by the APSF (Angular Point Spread Function)-guided glow rendering. Our framework is then trained on the rendered images, resulting in glow suppression. Moreover, we utilize gradient-adaptive convolution, to capture edges and textures in hazy scenes. By leveraging extracted edges and textures, we enhance the contrast of the scene without losing important structural details. To boost low-light intensity, our network learns an attention map, then adjusted by gamma correction. This attention has high values on low-light regions and low values on haze and glow regions. Extensive evaluation on real nighttime haze images, demonstrates the effectiveness of our method. Our experiments demonstrate that our method achieves a PSNR of 30.72dB, outperforming state-of-the-art methods by 14$\%$ on GTA5 nighttime haze dataset. Our data and code is available at: \url{https://github.com/jinyeying/nighttime_dehaze}.
Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit
Authors: Greg Yang, Etai Littwin
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Neural and Evolutionary Computing (cs.NE); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2308.01814
Pdf link: https://arxiv.org/pdf/2308.01814
Abstract Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general optimizers as well, including Adam -- albeit with a nonlinear notion of "kernel." We derive the corresponding "neural tangent" and "maximal update" limits for any architecture. Two foundational advances underlie the above results: 1) A new Tensor Program language, NEXORT, that can express how adaptive optimizers process gradients into updates. 2) The introduction of bra-ket notation to drastically simplify expressions and calculations in Tensor Programs. This work summarizes and generalizes all previous results in the Tensor Programs series of papers.
Hard Adversarial Example Mining for Improving Robust Fairness
Authors: Chenhao Lin, Xiang Ji, Yulong Yang, Qian Li, Chao Shen, Run Wang, Liming Fang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2308.01823
Pdf link: https://arxiv.org/pdf/2308.01823
Abstract Adversarial training (AT) is widely considered the state-of-the-art technique for improving the robustness of deep neural networks (DNNs) against adversarial examples (AE). Nevertheless, recent studies have revealed that adversarially trained models are prone to unfairness problems, restricting their applicability. In this paper, we empirically observe that this limitation may be attributed to serious adversarial confidence overfitting, i.e., certain adversarial examples with overconfidence. To alleviate this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM concentrates on mining hard adversarial examples while discarding the easy ones in an adaptive fashion. Specifically, HAM identifies hard AEs in terms of their step sizes needed to cross the decision boundary when calculating loss value. Besides, an early-dropping mechanism is incorporated to discard the easy examples at the initial stages of AE generation, resulting in efficient AT. Extensive experimental results on CIFAR-10, SVHN, and Imagenette demonstrate that HAM achieves significant improvement in robust fairness while reducing computational cost compared to several state-of-the-art adversarial training methods. The code will be made publicly available.
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Authors: Xinghua Zhang, Bowen Yu, Haiyang Yu, Yangyu Lv, Tingwen Liu, Fei Huang, Hongbo Xu, Yongbin Li
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2308.01862
Pdf link: https://arxiv.org/pdf/2308.01862
Abstract Measuring the quality of responses generated by LLMs is a challenging task, particularly when it comes to evaluating whether the response is aligned with human preference. A novel approach involves using the LLM itself to make evaluation and stabilizing the results through multiple independent evaluations, similar to a single-layer narrow LLM network. This network consists of a fixed number of neurons, with each neuron being the same LLM. In this paper, we draw upon the extensive research on deep neural networks to explore whether deeper and wider networks can lead to fairer evaluations. Specifically, inspired by the observation that different neurons in a neural network are responsible for detecting different concepts, we first adaptively generate as many neuron roles as possible for each evaluation sample. Each perspective corresponds to the role of a specific LLM neuron in the first layer. In subsequent layers, we follow the idea that higher layers in deep networks are responsible for more comprehensive features, each layer receives representations from all neurons in the previous layer, integrating the locally learned evaluation information to obtain a more comprehensive evaluation result. Interestingly, this network design resembles the process of academic paper reviewing. To validate the effectiveness of our method, we construct the largest and most diverse English evaluation benchmark LLMEval$^2$ for LLM evaluators, comprising 15 tasks, 8 abilities, and 2,553 samples. Experimental results demonstrate that a wider network (involving many reviewers) with 2 layers (one round of discussion) performs the best, improving kappa correlation coefficient from 0.28 to 0.34. We also leverage WideDeep to aid in the assessment of Chinese LLMs, which has accelerated the evaluation time by 4.6 times, resulting in a 60% cost saving. WideDeep achieves a remarkable 93% agreement level among humans.
Keyword: quantization

Optimizing Cellular Networks for UAV Corridors via Quantization Theory
Authors: Saeed Karimi-Bidhendi, Giovanni Geraci, Hamid Jafarkhani
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2308.01440
Pdf link: https://arxiv.org/pdf/2308.01440
Abstract We present a new framework based on quantization theory to design cellular networks optimized for both legacy ground users and uncrewed aerial vehicle (UAV) corridors, dedicated aerial highways for safe UAV flights. Our framework leverages antenna tilts and transmit power at each base station to enhance coverage and quality of service among users. We develop a comprehensive mathematical analysis and optimization algorithms for multiple system-level performance metrics, including received signal strength and signal-to-interference-plus-noise ratio. Realistic antenna radiation patterns and propagation channel models are considered, alongside a generic 3D user distribution that allows for performance prioritization on the ground, along UAV corridors, or a desired tradeoff between the two. We demonstrate the efficacy of the proposed framework through case studies, showcasing the non-trivial combinations of antenna tilts and power levels that improve coverage and signal quality along UAV corridors while incurring only a marginal impact on the ground user performance compared to scenarios without UAVs.
Bees Local Phase Quantization Feature Selection for RGB-D Facial Expressions Recognition
Authors: Seyed Muhammad Hossein Mousavi, Atiye Ilanloo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2308.01700
Pdf link: https://arxiv.org/pdf/2308.01700
Abstract Feature selection could be defined as an optimization problem and solved by bio-inspired algorithms. Bees Algorithm (BA) shows decent performance in feature selection optimization tasks. On the other hand, Local Phase Quantization (LPQ) is a frequency domain feature which has excellent performance on Depth images. Here, after extracting LPQ features out of RGB (colour) and Depth images from the Iranian Kinect Face Database (IKFDB), the Bees feature selection algorithm applies to select the desired number of features for final classification tasks. IKFDB is recorded with Kinect sensor V.2 and contains colour and depth images for facial and facial micro-expressions recognition purposes. Here five facial expressions of Anger, Joy, Surprise, Disgust and Fear are used for final validation. The proposed Bees LPQ method is compared with Particle Swarm Optimization (PSO) LPQ, PCA LPQ, Lasso LPQ, and just LPQ features for classification tasks with Support Vector Machines (SVM), K-Nearest Neighbourhood (KNN), Shallow Neural Network and Ensemble Subspace KNN. Returned results, show a decent performance of the proposed algorithm (99 % accuracy) in comparison with others.
MRQ:Support Multiple Quantization Schemes through Model Re-Quantization
Authors: Manasa Manohara, Sankalp Dayal, Tarqi Afzal, Rahul Bakshi, Kahkuen Fu
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2308.01867
Pdf link: https://arxiv.org/pdf/2308.01867
Abstract Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU, DPU), deploying deep learning models on edge devices with fixed-point hardware is still challenging due to complex model quantization and conversion. Existing model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g., only asymmetric per-tensor quantization in TF1.x QAT [4]). Accordingly, deep learning models cannot be easily quantized for diverse fixed-point hardwares, mainly due to slightly different quantization requirements. In this paper, we envision a new type of model quantization approach called MRQ (model re-quantization), which takes existing quantized models and quickly transforms the models to meet different quantization requirements (e.g., asymmetric -> symmetric, non-power-of-2 scale -> power-of-2 scale). Re-quantization is much simpler than quantizing from scratch because it avoids costly re-training and provides support for multiple quantization schemes simultaneously. To minimize re-quantization error, we developed a new set of re-quantization algorithms including weight correction and rounding error folding. We have demonstrated that MobileNetV2 QAT model [7] can be quickly re-quantized into two different quantization schemes (i.e., symmetric and symmetric+power-of-2 scale) with less than 0.64 units of accuracy loss. We believe our work is the first to leverage this concept of re-quantization for model quantization and models obtained from the re-quantization process have been successfully deployed on NNA in the Echo Show devices.

A-suozhang / GetArxivDaily

New submissions for Fri, 4 Aug 23 #120

Keyword: efficient

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Well-posedness and error estimates for coupled systems of nonlocal conservation laws

Novel Physics-Based Machine-Learning Models for Indoor Air Quality Approximations

Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

Decentralized Translator of Trust: Supporting Heterogeneous TEE for Critical Infrastructure Protection

Efficient neural supersampling on a novel gaming dataset

Minimax Optimal $Q$ Learning with Nearest Neighbors

Multi-Objective Optimization for UAV Swarm-Assisted IoT with Virtual Antenna Arrays

Erase and Repair: An Efficient Box-Free Removal Attack on High-Capacity Deep Hiding

Quantum Multi-Agent Reinforcement Learning for Autonomous Mobility Cooperation

PPI-NET: End-to-End Parametric Primitive Inference

Multimodal Adaptation of CLIP for Few-Shot Action Recognition

Avoidance Navigation Based on Offline Pre-Training Reinforcement Learning

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Another Hamiltonian Cycle in Bipartite Pfaffian Graphs

Analyzing Bank Account Information of Nominees and Scammers

Deep Learning-based surrogate models for parametrized PDEs: handling geometric variability through graph neural networks

Unsupervised Multiplex Graph Learning with Complementary and Consistent Information

DaphneSched: A Scheduler for Integrated Data Analysis Pipelines

Interactive High-Resolution Simulation of Granular Material

Improving Wind Resistance Performance of Cascaded PID Controlled Quadcopters using Residual Reinforcement Learning

UniG-Encoder: A Universal Feature Encoder for Graph and Hypergraph Node Classification

lifex-ep: a robust and efficient software for cardiac electrophysiology simulations

Towards a Safe Real-Time Motion Planning Framework for Autonomous Driving Systems: An MPPI Approach

Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

Finding the Optimum Design of Large Gas Engines Prechambers Using CFD and Bayesian Optimization

Bag of Policies for Distributional Deep Exploration

PoissonNet: Resolution-Agnostic 3D Shape Reconstruction using Fourier Neural Operators

Deep Learning-based Prediction of Stress and Strain Maps in Arterial Walls for Improved Cardiovascular Risk Assessment

Bayesian parameter identification in impedance boundary conditions for Helmholtz problems

Fundamental Data Structures for Matrix-Free Finite Elements on Hybrid Tetrahedral Grids

Deep Neural Networks Fused with Textures for Image Classification

Hard Adversarial Example Mining for Improving Robust Fairness

Wider and Deeper LLM Networks are Fairer LLM Evaluators

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

Exact identification of nonlinear dynamical systems by Trimmed Lasso

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Keyword: faster

One Partition Approximating All $\ell_p$-norm Objectives in Correlation Clustering

Finite element approximation of the Hardy constant

Keyword: mobile

Avoidance Navigation Based on Offline Pre-Training Reinforcement Learning

Real-time Light Estimation and Neural Soft Shadows for AR Indoor Scenarios

Time-optimal geodesic mutual visibility of robots on grids within minimum area

MRQ:Support Multiple Quantization Schemes through Model Re-Quantization

Keyword: pruning

Hierarchical Federated Learning in Wireless Networks: Pruning Tackles Bandwidth Scarcity and System Heterogeneity

Keyword: diffusion

Reverse Stable Diffusion: What prompt was used to generate this image?

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

Reference-Free Isotropic 3D EM Reconstruction using Diffusion Models

DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models

Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling

Keyword: adaptive

Reverse Stable Diffusion: What prompt was used to generate this image?

Randomized approximation of summable sequences -- adaptive and non-adaptive

NeuroSwarm: Multi-Agent Neural 3D Scene Reconstruction and Segmentation with UAV for Optimal Navigation of Quadruped Robot

Enhancing Visibility in Nighttime Haze Images Using Guided APSF and Gradient Adaptive Convolution

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

Hard Adversarial Example Mining for Improving Robust Fairness

Wider and Deeper LLM Networks are Fairer LLM Evaluators

Keyword: quantization

Optimizing Cellular Networks for UAV Corridors via Quantization Theory

Bees Local Phase Quantization Feature Selection for RGB-D Facial Expressions Recognition

MRQ:Support Multiple Quantization Schemes through Model Re-Quantization