New submissions for Mon, 25 Sep 23

Keyword: efficient

Stochastic scheduling of autonomous mobile robots at hospitals

Authors: Lulu Cheng, Ning Zhao
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.12318
Pdf link: https://arxiv.org/pdf/2309.12318
Abstract The outbreak of the New Coronavirus has significantly increased the vulnerability of medical staff. This paper addresses the safety and stress relief of medical personnel by proposing a solution to the scheduling problem of autonomous mobile robots (AMRs) in a stochastic environment. Considering the stochastic nature of travel and service times for AMRs affected by the surrounding environment, the routes of AMRs are planned to minimize the daily cost of the hospital (including the AMR fixed cost, penalty cost of violating the time window, and transportation cost). To efficiently generate high-quality solutions, we identify several properties and incorporate them into an improved Tabu Search (I-TS) algorithm for problem-solving. Experimental evaluations demonstrate that the I-TS algorithm outperforms existing methods by producing higher-quality solutions. By leveraging the characteristics of medical request environments, we intelligently allocate an appropriate number of AMRs to efficiently provide services, resulting in substantial cost reductions for hospitals and enhanced utilization of medical resources. These findings confirm the effectiveness of the proposed stochastic programming model in determining the optimal number of AMRs and their corresponding service routes across various environmental settings.
Aviation Safety Risk Analysis and Flight Technology Assessment Issues
Authors: Shuanghe Liu
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12324
Pdf link: https://arxiv.org/pdf/2309.12324
Abstract This text highlights the significance of flight safety in China's civil aviation industry and emphasizes the need for comprehensive research. It focuses on two main areas: analyzing exceedance events and statistically evaluating non-exceedance data. The challenges of current approaches lie in insufficient cause analysis for exceedances. The proposed solutions involve data preprocessing, reliability assessment, quantifying flight control using neural networks, exploratory data analysis, flight personnel skill evaluation with machine learning, and establishing real-time automated warnings. These endeavors aim to enhance flight safety, personnel assessment, and warning mechanisms, contributing to a safer and more efficient civil aviation sector.
Onchain Sports Betting using UBET Automated Market Maker
Authors: Daniel Jiwoong Im, Alexander Kondratskiy, Vincent Harvey, Hsuan-Wei Fu
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12333
Pdf link: https://arxiv.org/pdf/2309.12333
Abstract The paper underscores how decentralization in sports betting addresses the drawbacks of traditional centralized platforms, ensuring transparency, security, and lower fees. Non-custodial solutions empower bettors with ownership of funds, bypassing geographical restrictions. Decentralized platforms enhance security, privacy, and democratic decision-making. However, decentralized sports betting necessitates automated market makers (AMMs) for efficient liquidity provision. Existing AMMs like Uniswap lack alignment with fair odds, creating risks for liquidity providers. To mitigate this, the paper introduces UBET AMM (UAMM), utilizing smart contracts and algorithms to price sports odds fairly. It establishes an on-chain betting framework, detailing market creation, UAMM application, collateral liquidity pools, and experiments that exhibit positive outcomes. UAMM enhances decentralized sports betting by ensuring liquidity, decentralized pricing, and global accessibility, promoting trustless and efficient betting.
How Beaufort, Neumann and Gates met? Subject integration with spreadsheeting
Authors: Maria Csernoch, Julia Csernoch
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2309.12353
Pdf link: https://arxiv.org/pdf/2309.12353
Abstract Computational thinking should be the fourth fundamental skill, along with reading, writing, and arithmetic (3R). To reach the level where computational thinking skills, especially digital problem solving have their own schemata, there is a long way to go. In the present paper, a novel approach is detailed to support subject integration and building digital schemata, on the well-known Beaufort scale. The conversion of a traditional, paper-based problem and a data retrieval process are presented within the frame of a Grade 8 action research study. It is found that both students content knowledge and their digital skills developed more efficiently than in traditional course book and decontextualized digital environments. Furthermore, the method presented here can be adapted to any paper-based problems whose solutions would be more effective in a digital environment and which offer various forms for building schemata both in the subject matter and informatics.
An Efficient Intelligent Semi-Automated Warehouse Inventory Stocktaking System
Authors: Chunan Tong
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12365
Pdf link: https://arxiv.org/pdf/2309.12365
Abstract In the context of evolving supply chain management, the significance of efficient inventory management has grown substantially for businesses. However, conventional manual and experience-based approaches often struggle to meet the complexities of modern market demands. This research introduces an intelligent inventory management system to address challenges related to inaccurate data, delayed monitoring, and overreliance on subjective experience in forecasting. The proposed system integrates bar code and distributed flutter application technologies for intelligent perception, alongside comprehensive big data analytics to enable data-driven decision-making. Through meticulous analysis, system design, critical technology exploration, and simulation validation, the effectiveness of the proposed system is successfully demonstrated. The intelligent system facilitates second-level monitoring, high-frequency checks, and artificial intelligence-driven forecasting, consequently enhancing the automation, precision, and intelligence of inventory management. This system contributes to cost reduction and optimized inventory sizes through accurate predictions and informed decisions, ultimately achieving a mutually beneficial scenario. The outcomes of this research offer
Conversational Swarm Intelligence (CSI) Enhances Groupwise Deliberation
Authors: Louis Rosenberg, Gregg Willcox, Hans Schumann, Ganesh Mani
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2309.12366
Pdf link: https://arxiv.org/pdf/2309.12366
Abstract Real-time conversational deliberation is a critical groupwise method for reaching decisions, solving problems, evaluating priorities, generating ideas, and producing insights. Unfortunately, real-time conversations are difficult to scale, losing effectiveness as groups grow above 5 to 7 members. Conversational Swarm Intelligence (CSI) is a new technology modeled on the dynamics of biological swarms. It aims to enable networked groups of any size to hold productive real-time deliberations that converge on unified solutions. CSI leverages the power of Large Language Models (LLMs) in a unique and powerful way, allowing real-time dialog among small local groups while simultaneously enabling efficient content propagation across much larger populations. In this way, CSI combines the benefits of small-scale deliberative reasoning and large-scale collective intelligence. In this study, we compare deliberative groups of 48 people using standard online chat to the same sized groups using a prototype chat-based CSI system called Thinkscape. Results show that participants using CSI contributed 51% more content (p<0.001) than those using standard chat, and the deliberations using CSI showed 37% less difference in contribution quantity between the most active vs least active members, indicating more balanced dialog. And finally, a large majority of participants preferred deliberating using the CSI system over standard chat (p<0.05) and re-ported feeling more impactful when doing so (p<0.01). These results suggest that Conversational Swarm Intelligence is a promising technology for enabling large-scale deliberation.
DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion
Authors: Zhenzhen Chu, Jiayu Chen, Cen Chen, Chengyu Wang, Ziheng Wu, Jun Huang, Weining Qian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12424
Pdf link: https://arxiv.org/pdf/2309.12424
Abstract Self-attention-based vision transformers (ViTs) have emerged as a highly competitive architecture in computer vision. Unlike convolutional neural networks (CNNs), ViTs are capable of global information sharing. With the development of various structures of ViTs, ViTs are increasingly advantageous for many vision tasks. However, the quadratic complexity of self-attention renders ViTs computationally intensive, and their lack of inductive biases of locality and translation equivariance demands larger model sizes compared to CNNs to effectively learn visual features. In this paper, we propose a light-weight and efficient vision transformer model called DualToken-ViT that leverages the advantages of CNNs and ViTs. DualToken-ViT effectively fuses the token with local information obtained by convolution-based structure and the token with global information obtained by self-attention-based structure to achieve an efficient attention structure. In addition, we use position-aware global tokens throughout all stages to enrich the global information, which further strengthening the effect of DualToken-ViT. Position-aware global tokens also contain the position information of the image, which makes our model better for vision tasks. We conducted extensive experiments on image classification, object detection and semantic segmentation tasks to demonstrate the effectiveness of DualToken-ViT. On the ImageNet-1K dataset, our models of different scales achieve accuracies of 75.4% and 79.4% with only 0.5G and 1.0G FLOPs, respectively, and our model with 1.0G FLOPs outperforms LightViT-T using global tokens by 0.7%.
Foundation Metrics: Quantifying Effectiveness of Healthcare Conversations powered by Generative AI
Authors: Mahyar Abbasian, Elahe Khatibi, Iman Azimi, David Oniani, Zahra Shakeri Hossein Abad, Alexander Thieme, Zhongqi Yang, Yanshan Wang, Bryant Lin, Olivier Gevaert, Li-Jia Li, Ramesh Jain, Amir M. Rahmani
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.12444
Pdf link: https://arxiv.org/pdf/2309.12444
Abstract Generative Artificial Intelligence is set to revolutionize healthcare delivery by transforming traditional patient care into a more personalized, efficient, and proactive process. Chatbots, serving as interactive conversational models, will probably drive this patient-centered transformation in healthcare. Through the provision of various services, including diagnosis, personalized lifestyle recommendations, and mental health support, the objective is to substantially augment patient health outcomes, all the while mitigating the workload burden on healthcare providers. The life-critical nature of healthcare applications necessitates establishing a unified and comprehensive set of evaluation metrics for conversational models. Existing evaluation metrics proposed for various generic large language models (LLMs) demonstrate a lack of comprehension regarding medical and health concepts and their significance in promoting patients' well-being. Moreover, these metrics neglect pivotal user-centered aspects, including trust-building, ethics, personalization, empathy, user comprehension, and emotional support. The purpose of this paper is to explore state-of-the-art LLM-based evaluation metrics that are specifically applicable to the assessment of interactive conversational models in healthcare. Subsequently, we present an comprehensive set of evaluation metrics designed to thoroughly assess the performance of healthcare chatbots from an end-user perspective. These metrics encompass an evaluation of language processing abilities, impact on real-world clinical tasks, and effectiveness in user-interactive conversations. Finally, we engage in a discussion concerning the challenges associated with defining and implementing these metrics, with particular emphasis on confounding factors such as the target audience, evaluation methods, and prompt techniques involved in the evaluation process.
Knowledge Base Aware Semantic Communication in Vehicular Networks
Authors: Le Xia, Yao Sun, Dusit Niyato, Kairong Ma, Jiawen Kang, Muhammad Ali Imran
Subjects: Systems and Control (eess.SY); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2309.12461
Pdf link: https://arxiv.org/pdf/2309.12461
Abstract Semantic communication (SemCom) has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to vehicular networks, which normally consume a tremendous amount of resources to achieve stringent requirements on high reliability and low latency. Unfortunately, the unique background knowledge matching mechanism in SemCom makes it challenging to realize efficient vehicle-to-vehicle service provisioning for multiple users at the same time. To this end, this paper identifies and jointly addresses two fundamental problems of knowledge base construction (KBC) and vehicle service pairing (VSP) inherently existing in SemCom-enabled vehicular networks. Concretely, we first derive the knowledge matching based queuing latency specific for semantic data packets, and then formulate a latency-minimization problem subject to several KBC and VSP related reliability constraints. Afterward, a SemCom-empowered Service Supplying Solution (S$^{\text{4}}$) is proposed along with the theoretical analysis of its optimality guarantee. Simulation results demonstrate the superiority of S$^{\text{4}}$ in terms of average queuing latency, semantic data packet throughput, and user knowledge preference satisfaction compared with two different benchmarks.
Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development
Authors: Seyed Jalaleddin Mousavirad, Luís A. Alexandre
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12484
Pdf link: https://arxiv.org/pdf/2309.12484
Abstract Energy consumption is a fundamental concern in mobile application development, bearing substantial significance for both developers and end-users. Moreover, it is a critical determinant in the consumer's decision-making process when considering a smartphone purchase. From the sustainability perspective, it becomes imperative to explore approaches aimed at mitigating the energy consumption of mobile devices, given the significant global consequences arising from the extensive utilisation of billions of smartphones, which imparts a profound environmental impact. Despite the existence of various energy-efficient programming practices within the Android platform, the dominant mobile ecosystem, there remains a need for documented machine learning-based energy prediction algorithms tailored explicitly for mobile app development. Hence, the main objective of this research is to propose a novel neural network-based framework, enhanced by a metaheuristic approach, to achieve robust energy prediction in the context of mobile app development. The metaheuristic approach here plays a crucial role in not only identifying suitable learning algorithms and their corresponding parameters but also determining the optimal number of layers and neurons within each layer. To the best of our knowledge, prior studies have yet to employ any metaheuristic algorithm to address all these hyperparameters simultaneously. Moreover, due to limitations in accessing certain aspects of a mobile phone, there might be missing data in the data set, and the proposed framework can handle this. In addition, we conducted an optimal algorithm selection strategy, employing 13 metaheuristic algorithms, to identify the best algorithm based on accuracy and resistance to missing values. The comprehensive experiments demonstrate that our proposed approach yields significant outcomes for energy consumption prediction.
High-Dimensional Controller Tuning through Latent Representations
Authors: Alireza Sarmadi, Prashanth Krishnamurthy, Farshad Khorrami
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12487
Pdf link: https://arxiv.org/pdf/2309.12487
Abstract In this paper, we propose a method to automatically and efficiently tune high-dimensional vectors of controller parameters. The proposed method first learns a mapping from the high-dimensional controller parameter space to a lower dimensional space using a machine learning-based algorithm. This mapping is then utilized in an actor-critic framework using Bayesian optimization (BO). The proposed approach is applicable to complex systems (such as quadruped robots). In addition, the proposed approach also enables efficient generalization to different control tasks while also reducing the number of evaluations required while tuning the controller parameters. We evaluate our method on a legged locomotion application. We show the efficacy of the algorithm in tuning the high-dimensional controller parameters and also reducing the number of evaluations required for the tuning. Moreover, it is shown that the method is successful in generalizing to new tasks and is also transferable to other robot dynamics.
Trip Planning for Autonomous Vehicles with Wireless Data Transfer Needs Using Reinforcement Learning
Authors: Yousef AlSaqabi, Bhaskar Krishnamachari
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12534
Pdf link: https://arxiv.org/pdf/2309.12534
Abstract With recent advancements in the field of communications and the Internet of Things, vehicles are becoming more aware of their environment and are evolving towards full autonomy. Vehicular communication opens up the possibility for vehicle-to-infrastructure interaction, where vehicles could share information with components such as cameras, traffic lights, and signage that support a countrys road system. As a result, vehicles are becoming more than just a means of transportation; they are collecting, processing, and transmitting massive amounts of data used to make driving safer and more convenient. With 5G cellular networks and beyond, there is going to be more data bandwidth available on our roads, but it may be heterogeneous because of limitations like line of sight, infrastructure, and heterogeneous traffic on the road. This paper addresses the problem of route planning for autonomous vehicles in urban areas accounting for both driving time and data transfer needs. We propose a novel reinforcement learning solution that prioritizes high bandwidth roads to meet a vehicles data transfer requirement, while also minimizing driving time. We compare this approach to traffic-unaware and bandwidth-unaware baselines to show how much better it performs under heterogeneous traffic. This solution could be used as a starting point to understand what good policies look like, which could potentially yield faster, more efficient heuristics in the future.
Adaptive Model Predictive Control for Engine-Driven Ducted Fan Lift Systems using an Associated Linear Parameter Varying Model
Authors: Hanjie Jiang, Ye Zhou, Hann Woei Ho, Wenjie Hu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12552
Pdf link: https://arxiv.org/pdf/2309.12552
Abstract Ducted fan lift systems (DFLSs) powered by two-stroke aviation piston engines present a challenging control problem due to their complex multivariable dynamics. Current controllers for these systems typically rely on proportional-integral algorithms combined with data tables, which rely on accurate models and are not adaptive to handle time-varying dynamics or system uncertainties. This paper proposes a novel adaptive model predictive control (AMPC) strategy with an associated linear parameter varying (LPV) model for controlling the engine-driven DFLS. This LPV model is derived from a global network model, which is trained off-line with data obtained from a general mean value engine model for two-stroke aviation engines. Different network models, including multi-layer perceptron, Elman, and radial basis function (RBF), are evaluated and compared in this study. The results demonstrate that the RBF model exhibits higher prediction accuracy and robustness in the DFLS application. Based on the trained RBF model, the proposed AMPC approach constructs an associated network that directly outputs the LPV model parameters as an adaptive, robust, and efficient prediction model. The efficiency of the proposed approach is demonstrated through numerical simulations of a vertical take-off thrust preparation process for the DFLS. The simulation results indicate that the proposed AMPC method can effectively control the DFLS thrust with a relative error below 3.5%.
Machine Learning Meets Advanced Robotic Manipulation
Authors: Saeid Nahavandi, Roohallah Alizadehsani, Darius Nahavandi, Chee Peng Lim, Kevin Kelly, Fernando Bello
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12560
Pdf link: https://arxiv.org/pdf/2309.12560
Abstract Automated industries lead to high quality production, lower manufacturing cost and better utilization of human resources. Robotic manipulator arms have major role in the automation process. However, for complex manipulation tasks, hard coding efficient and safe trajectories is challenging and time consuming. Machine learning methods have the potential to learn such controllers based on expert demonstrations. Despite promising advances, better approaches must be developed to improve safety, reliability, and efficiency of ML methods in both training and deployment phases. This survey aims to review cutting edge technologies and recent trends on ML methods applied to real-world manipulation tasks. After reviewing the related background on ML, the rest of the paper is devoted to ML applications in different domains such as industry, healthcare, agriculture, space, military, and search and rescue. The paper is closed with important research directions for future works.
Cognitive Approach to Hierarchical Task Selection for Human-Robot Interaction in Dynamic Environments
Authors: Syed T. Bukhari, Bashira Akter Anima, David Feil-Seifer, Wajahat M. Qazi
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.12562
Pdf link: https://arxiv.org/pdf/2309.12562
Abstract In an efficient and flexible human-robot collaborative work environment, a robot team member must be able to recognize both explicit requests and implied actions from human users. Identifying "what to do" in such cases requires an agent to have the ability to construct associations between objects, their actions, and the effect of actions on the environment. In this regard, semantic memory is being introduced to understand the explicit cues and their relationships with available objects and required skills to make "tea" and "sandwich". We have extended our previous hierarchical robot control architecture to add the capability to execute the most appropriate task based on both feedback from the user and the environmental context. To validate this system, two types of skills were implemented in the hierarchical task tree: 1) Tea making skills and 2) Sandwich making skills. During the conversation between the robot and the human, the robot was able to determine the hidden context using ontology and began to act accordingly. For instance, if the person says "I am thirsty" or "It is cold outside" the robot will start to perform the tea-making skill. In contrast, if the person says, "I am hungry" or "I need something to eat", the robot will make the sandwich. A humanoid robot Baxter was used for this experiment. We tested three scenarios with objects at different positions on the table for each skill. We observed that in all cases, the robot used only objects that were relevant to the skill.
Passive Reflection Codebook Design for IRS-Integrated Access Point
Authors: Yuwei Huang, Lipeng Zhu, Rui Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.12563
Pdf link: https://arxiv.org/pdf/2309.12563
Abstract Intelligent reflecting surface (IRS) has emerged as a promising technique to extend the wireless signal coverage of access point (AP) and improve the communication performance cost-effectively. In order to reduce the path-loss of the cascaded user-IRS-AP channels, the IRS-integrated AP architecture has been proposed to deploy the IRSs and the antenna array of the AP within the same antenna radome. To reduce the pilot overhead for estimating all IRS-involved channels, in this paper, we propose a novel codebook-based IRS reflection design for the IRS-integrated AP to enhance the coverage performance in a given area. In particular, the codebook consisting of a small number of codewords is designed offline by employing an efficient sector division strategy based on the azimuth angle. To ensure the performance of each sector, we optimize its corresponding codeword for IRS reflection pattern to maximize the sector-min-average-effective-channel-power (SMAECP) by applying the alternating optimization (AO) and semidefinite relaxation (SDR) methods. With the designed codebook, the AP performs the IRS reflection training by sequentially applying all codewords and selects the one achieving the best communication performance for data transmission. Numerical results show that our proposed codebook design can enhance the average channel power of the whole coverage area, as compared to the system without IRS. Moreover, our proposed codebook-based IRS reflection design is shown to achieve significant performance gain over other benchmark schemes in both single-user and multi-user transmissions.
Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives
Authors: Muhammad Kazim, JunGee Hong, Min-Gyeom Kim, Kwang-Ki K. Kim
Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.12566
Pdf link: https://arxiv.org/pdf/2309.12566
Abstract This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme known as the model predictive path integral (MPPI), and a parameterized state feedback controller based on the path integral control theory. We discuss policy search methods based on path integral control, efficient and stable sampling strategies, extensions to multi-agent decision-making, and MPPI for the trajectory optimization on manifolds. For tutorial demonstrations, some PI-based controllers are implemented in MATLAB and ROS2/Gazebo simulations for trajectory optimization. The simulation frameworks and source codes are publicly available at https://github.com/INHA-Autonomous-Systems-Laboratory-ASL/An-Overview-on-Recent-Advances-in-Path-Integral-Control.
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling
Authors: Bokyeong Yoon, Yoonsang Han, Gordon Euhyun Moon
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.12578
Pdf link: https://arxiv.org/pdf/2309.12578
Abstract Sparsifying the Transformer has garnered considerable interest, as training the Transformer is very computationally demanding. Prior efforts to sparsify the Transformer have either used a fixed pattern or data-driven approach to reduce the number of operations involving the computation of multi-head attention, which is the main bottleneck of the Transformer. However, existing methods suffer from inevitable problems, such as the potential loss of essential sequence features due to the uniform fixed pattern applied across all layers, and an increase in the model size resulting from the use of additional parameters to learn sparsity patterns in attention operations. In this paper, we propose a novel sparsification scheme for the Transformer that integrates convolution filters and the flood filling method to efficiently capture the layer-wise sparse pattern in attention operations. Our sparsification approach reduces the computational complexity and memory footprint of the Transformer during training. Efficient implementations of the layer-wise sparsified attention algorithm on GPUs are developed, demonstrating a new SPION that achieves up to 3.08X speedup over existing state-of-the-art sparse Transformer models, with better evaluation quality.
A Multi-Robot Task Assignment Framework for Search and Rescue with Heterogeneous Teams
Authors: Hamid Osooli, Paul Robinette, Kshitij Jerath, S. Reza Ahmadzadeh
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.12589
Pdf link: https://arxiv.org/pdf/2309.12589
Abstract In post-disaster scenarios, efficient search and rescue operations involve collaborative efforts between robots and humans. Existing planning approaches focus on specific aspects but overlook crucial elements like information gathering, task assignment, and planning. Furthermore, previous methods considering robot capabilities and victim requirements suffer from time complexity due to repetitive planning steps. To overcome these challenges, we introduce a comprehensive framework__the Multi-Stage Multi-Robot Task Assignment. This framework integrates scouting, task assignment, and path-planning stages, optimizing task allocation based on robot capabilities, victim requirements, and past robot performance. Our iterative approach ensures objective fulfillment within problem constraints. Evaluation across four maps, comparing with a state-of-the-art baseline, demonstrates our algorithm's superiority with a remarkable 97 percent performance increase. Our code is open-sourced to enable result replication.
Stable Reconstruction of Anisotropic Objects from Near-Field Electromagnetic Data
Authors: Tran H. Lan, Dinh-Liem Nguyen
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.12606
Pdf link: https://arxiv.org/pdf/2309.12606
Abstract This paper addresses the electromagnetic inverse scattering problem of determining the location and shape of anisotropic objects from near-field data. We investigate both cases involving the Helmholtz equation and Maxwell's equations for this inverse problem. Our study focuses on developing efficient imaging functionals that enable a fast and stable recovery of the anisotropic object. The implementation of the imaging functionals is simple and avoids the need to solve an ill-posed problem. The resolution analysis of the imaging functionals is conducted using the Green representation formula. Furthermore, we establish stability estimates for these imaging functionals when noise is present in the data. To illustrate the effectiveness of the methods, we present numerical examples showcasing their performance.
Data-driven Preference Learning Methods for Multiple Criteria Sorting with Temporal Criteria
Authors: Li Yijun, Guo Mengzhuo, Zhang Qingpeng
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12620
Pdf link: https://arxiv.org/pdf/2309.12620
Abstract The advent of predictive methodologies has catalyzed the emergence of data-driven decision support across various domains. However, developing models capable of effectively handling input time series data presents an enduring challenge. This study presents novel preference learning approaches to multiple criteria sorting problems in the presence of temporal criteria. We first formulate a convex quadratic programming model characterized by fixed time discount factors, operating within a regularization framework. Additionally, we propose an ensemble learning algorithm designed to consolidate the outputs of multiple, potentially weaker, optimizers, a process executed efficiently through parallel computation. To enhance scalability and accommodate learnable time discount factors, we introduce a novel monotonic Recurrent Neural Network (mRNN). It is designed to capture the evolving dynamics of preferences over time while upholding critical properties inherent to MCS problems, including criteria monotonicity, preference independence, and the natural ordering of classes. The proposed mRNN can describe the preference dynamics by depicting marginal value functions and personalized time discount factors along with time, effectively amalgamating the interpretability of traditional MCS methods with the predictive potential offered by deep preference learning models. Comprehensive assessments of the proposed models are conducted, encompassing synthetic data scenarios and a real-case study centered on classifying valuable users within a mobile gaming app based on their historical in-app behavioral sequences. Empirical findings underscore the notable performance improvements achieved by the proposed models when compared to a spectrum of baseline methods, spanning machine learning, deep learning, and conventional multiple criteria sorting approaches.
A Detailed Analysis of the SpaceSaving$\pm$ Family of Algorithms with Bounded Deletions
Authors: Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi, Claire Mathieu, Ahmed Metwally, Michel de Rougemont
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2309.12623
Pdf link: https://arxiv.org/pdf/2309.12623
Abstract In this paper, we present an advanced analysis of near optimal deterministic algorithms using a small space budget to solve the frequency estimation, heavy hitters, frequent items, and top-k approximation in the bounded deletion model. We define the family of SpaceSaving$\pm$ algorithms and explain why the original SpaceSaving$\pm$ algorithm only works when insertions and deletions are not interleaved. Next, we introduce the new DoubleSpaceSaving$\pm$ and the IntegratedSpaceSaving$\pm$ and prove their correctness. They show similar characteristics and both extend the popular space-efficient SpaceSaving algorithm. However, these two algorithms represent different trade-offs, in which DoubleSpaceSaving$\pm$ distributes the operations to two independent summaries while Integrated-SpaceSaving$\pm$ fully synchronizes deletions with insertions. Since data streams are often skewed, we present an improved analysis of these two algorithms and show that errors do not depend on the hot items and are only dependent on the cold and warm items. We also demonstrate how to achieve the relative error guarantee under mild assumptions. Moreover, we establish that the important mergeability property exists on these two algorithms which is desirable in distributed settings.
Quark: A High-Performance Secure Container Runtime for Serverless Computing
Authors: Chenxingyu Zhao, Yulin Sun, Arvind Krishnamurthy
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2309.12624
Pdf link: https://arxiv.org/pdf/2309.12624
Abstract Secure container runtimes serve as the foundational layer for creating and running containers, which is the bedrock of emerging computing paradigms like microservices and serverless computing. Although existing secure container runtimes indeed enhance security via running containers over a guest kernel and a Virtual Machine Monitor (VMM or Hypervisor), they incur performance penalties in critical areas such as networking, container startup, and I/O system calls. In our practice of operating microservices and serverless computing, we build a high-performance secure container runtime named Quark. Unlike existing solutions that rely on traditional VM technologies by importing Linux for the guest kernel and QEMU for the VMM, we take a different approach to building Quark from the ground up, paving the way for extreme customization to unlock high performance. Our development centers on co-designing a custom guest kernel and a VMM for secure containers. To this end, we build a lightweight guest OS kernel named QKernel and a specialized VMM named QVisor. The QKernel-QVisor codesign allows us to deliver three key advancements: high-performance RDMA-based container networking, fast container startup mode, and efficient mechanisms for executing I/O syscalls. In our practice with real-world apps like Redis, Quark cuts down P95 latency by 79.3% and increases throughput by 2.43x compared to Kata. Moreover, Quark container startup achieves 96.5% lower latency than the cold-start mode while saving 81.3% memory cost to the keep-warm mode. Quark is open-source with an industry-standard codebase in Rust.
Heterogeneous Rank Beamforming for Industrial Communications
Authors: Andrea Bedin, Akshay Jain, Andrea Zanella, Karthik Upadhya
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2309.12636
Pdf link: https://arxiv.org/pdf/2309.12636
Abstract This paper proposes a novel hardware beamforming architecture, which is capable of utilizing a different number of Radio Frequency (RF) chains in different parts of the bandwidth. It also shows that a proportional fairness scheduler will effectively utilize the high rank part of the bandwidth in a multi-user setting, thus operating more efficiently and effectively than classical beamforming schemes.
MEV Makes Everyone Happy under Greedy Sequencing Rule
Authors: Yuhao Li, Mengqian Zhang, Jichen Li, Elynn Chen, Xi Chen, Xiaotie Deng
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2309.12640
Pdf link: https://arxiv.org/pdf/2309.12640
Abstract Trading through decentralized exchanges (DEXs) has become crucial in today's blockchain ecosystem, enabling users to swap tokens efficiently and automatically. However, the capacity of miners to strategically order transactions has led to exploitative practices (e.g., front-running attacks, sandwich attacks) and gain substantial Maximal Extractable Value (MEV) for their own advantage. To mitigate such manipulation, Ferreira and Parkes recently proposed a greedy sequencing rule such that the execution price of transactions in a block moves back and forth around the starting price. Utilizing this sequencing rule makes it impossible for miners to conduct sandwich attacks, consequently mitigating the MEV problem. However, no sequencing rule can prevent miners from obtaining risk-free profits. This paper systemically studies the computation of a miner's optimal strategy for maximizing MEV under the greedy sequencing rule, where the utility of miners is measured by the overall value of their token holdings. Our results unveil a dichotomy between the no trading fee scenario, which can be optimally strategized in polynomial time, and the scenario with a constant fraction of trading fee, where finding the optimal strategy is proven NP-hard. The latter represents a significant challenge for miners seeking optimal MEV. Following the computation results, we further show a remarkable phenomenon: Miner's optimal MEV also benefits users. Precisely, in the scenarios without trading fees, when miners adopt the optimal strategy given by our algorithm, all users' transactions will be executed, and each user will receive equivalent or surpass profits compared to their expectations. This outcome provides further support for the study and design of sequencing rules in decentralized exchanges.
OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling
Authors: Yi-Fan Zhang, Qingsong Wen, Xue Wang, Weiqi Chen, Liang Sun, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2309.12659
Pdf link: https://arxiv.org/pdf/2309.12659
Abstract Online updating of time series forecasting models aims to address the concept drifting problem by efficiently updating forecasting models based on streaming data. Many algorithms are designed for online time series forecasting, with some exploiting cross-variable dependency while others assume independence among variables. Given every data assumption has its own pros and cons in online time series modeling, we propose \textbf{On}line \textbf{e}nsembling \textbf{Net}work (OneNet). It dynamically updates and combines two models, with one focusing on modeling the dependency across the time dimension and the other on cross-variate dependency. Our method incorporates a reinforcement learning-based approach into the traditional online convex programming framework, allowing for the linear combination of the two models with dynamically adjusted weights. OneNet addresses the main shortcoming of classical online learning methods that tend to be slow in adapting to the concept drift. Empirical results show that OneNet reduces online forecasting error by more than $\mathbf{50\%}$ compared to the State-Of-The-Art (SOTA) method. The code is available at \url{https://github.com/yfzhang114/OneNet}.
PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion
Authors: Yuxiang Yan, Boda Liu, Jianfei Ai, Qinbu Li, Ru Wan, Jian Pu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12708
Pdf link: https://arxiv.org/pdf/2309.12708
Abstract Semantic Scene Completion (SSC) aims to jointly generate space occupancies and semantic labels for complex 3D scenes. Most existing SSC models focus on volumetric representations, which are memory-inefficient for large outdoor spaces. Point clouds provide a lightweight alternative but existing benchmarks lack outdoor point cloud scenes with semantic labels. To address this, we introduce PointSSC, the first cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. These scenes exhibit long-range perception and minimal occlusion. We develop an automated annotation pipeline leveraging Segment Anything to efficiently assign semantics. To benchmark progress, we propose a LiDAR-based model with a Spatial-Aware Transformer for global and local feature extraction and a Completion and Segmentation Cooperative Module for joint completion and segmentation. PointSSC provides a challenging testbed to drive advances in semantic point cloud completion for real-world navigation.
Direct Learning for Parameter-Varying Feedforward Control: A Neural-Network Approach
Authors: Johan Kon, Jeroen van de Wijdeven, Dennis Bruijnen, Roland Tóth, Marcel Heertjes, Tom Oomen
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12722
Pdf link: https://arxiv.org/pdf/2309.12722
Abstract The performance of a feedforward controller is primarily determined by the extent to which it can capture the relevant dynamics of a system. The aim of this paper is to develop an input-output linear parameter-varying (LPV) feedforward parameterization and a corresponding data-driven estimation method in which the dependency of the coefficients on the scheduling signal are learned by a neural network. The use of a neural network enables the parameterization to compensate a wide class of constant relative degree LPV systems. Efficient optimization of the neural-network-based controller is achieved through a Levenberg-Marquardt approach with analytic gradients and a pseudolinear approach generalizing Sanathanan-Koerner to the LPV case. The performance of the developed feedforward learning method is validated in a simulation study of an LPV system showing excellent performance.
Optimal Dynamic Fees for Blockchain Resources
Authors: Davide Crapis, Ciamac C. Moallemi, Shouqiao Wang
Subjects: Computer Science and Game Theory (cs.GT); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.12735
Pdf link: https://arxiv.org/pdf/2309.12735
Abstract We develop a general and practical framework to address the problem of the optimal design of dynamic fee mechanisms for multiple blockchain resources. Our framework allows to compute policies that optimally trade-off between adjusting resource prices to handle persistent demand shifts versus being robust to local noise in the observed block demand. In the general case with more than one resource, our optimal policies correctly handle cross-effects (complementarity and substitutability) in resource demands. We also show how these cross-effects can be used to inform resource design, i.e. combining resources into bundles that have low demand-side cross-effects can yield simpler and more efficient price-update rules. Our framework is also practical, we demonstrate how it can be used to refine or inform the design of heuristic fee update rules such as EIP-1559 or EIP-4844 with two case studies. We then estimate a uni-dimensional version of our model using real market data from the Ethereum blockchain and empirically compare the performance of our optimal policies to EIP-1559.
Towards an MLOps Architecture for XAI in Industrial Applications
Authors: Leonhard Faubel, Thomas Woudsma, Leila Methnani, Amir Ghorbani Ghezeljhemeidan, Fabian Buelow, Klaus Schmid, Willem D. van Driel, Benjamin Kloepper, Andreas Theodorou, Mohsen Nosratinia, Magnus Bång
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12756
Pdf link: https://arxiv.org/pdf/2309.12756
Abstract Machine learning (ML) has become a popular tool in the industrial sector as it helps to improve operations, increase efficiency, and reduce costs. However, deploying and managing ML models in production environments can be complex. This is where Machine Learning Operations (MLOps) comes in. MLOps aims to streamline this deployment and management process. One of the remaining MLOps challenges is the need for explanations. These explanations are essential for understanding how ML models reason, which is key to trust and acceptance. Better identification of errors and improved model accuracy are only two resulting advantages. An often neglected fact is that deployed models are bypassed in practice when accuracy and especially explainability do not meet user expectations. We developed a novel MLOps software architecture to address the challenge of integrating explanations and feedback capabilities into the ML development and deployment processes. In the project EXPLAIN, our architecture is implemented in a series of industrial use cases. The proposed MLOps software architecture has several advantages. It provides an efficient way to manage ML models in production environments. Further, it allows for integrating explanations into the development and deployment processes.
AgentChat: Multi-Agent Collaborative Logistics for Carbon Reduction
Authors: Liming Xu, Stephen Mak, Stefan Schoepf, Michael Ostroumov, Alexandra Brintrup
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2309.12781
Pdf link: https://arxiv.org/pdf/2309.12781
Abstract Heavy Good Vehicles (HGVs) are the second largest source of greenhouse gas emissions in transportation, after cars and taxis. However, HGVs are inefficiently utilised, with more than one-third of their weight capacity not being used during travel. We, thus, in this paper address collaborative logistics, an effective pathway to enhance HGVs' utilisation and reduce carbon emissions. We investigate a multi-agent system approach to facilitate collaborative logistics, particularly carrier collaboration. We propose a simple yet effective multi-agent collaborative logistics (MACL) framework, representing key stakeholders as intelligent agents. Furthermore, we utilise the MACL framework in conjunction with a proposed system architecture to create an integrated collaborative logistics testbed. This testbed, consisting of a physical system and its digital replica, is a tailored cyber-physical system or digital twin for collaborative logistics. Through a demonstration, we show the utility of the testbed for studying collaborative logistics.
CloudGripper: An Open Source Cloud Robotics Testbed for Robotic Manipulation Research, Benchmarking and Data Collection at Scale
Authors: Muhammad Zahid, Florian T. Pokorny
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.12786
Pdf link: https://arxiv.org/pdf/2309.12786
Abstract We present CloudGripper, an open source cloud robotics testbed, consisting of a scalable, space and cost-efficient design constructed as a rack of 32 small robot arm work cells. Each robot work cell is fully enclosed and features individual lighting, a low-cost custom 5 degree of freedom Cartesian robot arm with an attached parallel jaw gripper and a dual camera setup for experimentation. The system design is focused on continuous operation and features a 10 Gbit/s network connectivity allowing for high throughput remote-controlled experimentation and data collection for robotic manipulation. CloudGripper furthermore is intended to form a community testbed to study the challenges of large scale machine learning and cloud and edge-computing in the context of robotic manipulation. In this work, we describe the mechanical design of the system, its initial software stack and evaluate the repeatability of motions executed by the proposed robot arm design. A local network API throughput and latency analysis is also provided. CloudGripper-Rope-100, a dataset of more than a hundred hours of randomized rope pushing interactions and approximately 4 million camera images is collected and serves as a proof of concept demonstrating data collection capabilities. A project website with more information is available at https://cloudgripper.org.
Scalable Semantic 3D Mapping of Coral Reefs with Deep Learning
Authors: Jonathan Sauder, Guilhem Banc-Prandi, Anders Meibom, Devis Tuia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12804
Pdf link: https://arxiv.org/pdf/2309.12804
Abstract Coral reefs are among the most diverse ecosystems on our planet, and are depended on by hundreds of millions of people. Unfortunately, most coral reefs are existentially threatened by global climate change and local anthropogenic pressures. To better understand the dynamics underlying deterioration of reefs, monitoring at high spatial and temporal resolution is key. However, conventional monitoring methods for quantifying coral cover and species abundance are limited in scale due to the extensive manual labor required. Although computer vision tools have been employed to aid in this process, in particular SfM photogrammetry for 3D mapping and deep neural networks for image segmentation, analysis of the data products creates a bottleneck, effectively limiting their scalability. This paper presents a new paradigm for mapping underwater environments from ego-motion video, unifying 3D mapping systems that use machine learning to adapt to challenging conditions under water, combined with a modern approach for semantic segmentation of images. The method is exemplified on coral reefs in the northern Gulf of Aqaba, Red Sea, demonstrating high-precision 3D semantic mapping at unprecedented scale with significantly reduced required labor costs: a 100 m video transect acquired within 5 minutes of diving with a cheap consumer-grade camera can be fully automatically analyzed within 5 minutes. Our approach significantly scales up coral reef monitoring by taking a leap towards fully automatic analysis of video transects. The method democratizes coral reef transects by reducing the labor, equipment, logistics, and computing cost. This can help to inform conservation policies more efficiently. The underlying computational method of learning-based Structure-from-Motion has broad implications for fast low-cost mapping of underwater environments other than coral reefs.
Improving Generalization in Game Agents with Data Augmentation in Imitation Learning
Authors: Derek Yadgaroff, Alessandro Sestini, Konrad Tollmar, Linus Gisslén
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12815
Pdf link: https://arxiv.org/pdf/2309.12815
Abstract Imitation learning is an effective approach for training game-playing agents and, consequently, for efficient game production. However, generalization - the ability to perform well in related but unseen scenarios - is an essential requirement that remains an unsolved challenge for game AI. Generalization is difficult for imitation learning agents because it requires the algorithm to take meaningful actions outside of the training distribution. In this paper we propose a solution to this challenge. Inspired by the success of data augmentation in supervised learning, we augment the training data so the distribution of states and actions in the dataset better represents the real state-action distribution. This study evaluates methods for combining and applying data augmentations to observations, to improve generalization of imitation learning agents. It also provides a performance benchmark of these augmentations across several 3D environments. These results demonstrate that data augmentation is a promising framework for improving generalization in imitation learning agents.
OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control
Authors: Botian Xu, Feng Gao, Chao Yu, Ruize Zhang, Yi Wu, Yu Wang
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12825
Pdf link: https://arxiv.org/pdf/2309.12825
Abstract In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia's Omniverse Isaac Sim. It employs a bottom-up design approach that allows users to easily design and experiment with various application scenarios on top of GPU-parallelized simulations. It also offers a range of benchmark tasks, presenting challenges ranging from single-drone hovering to over-actuated system tracking. In summary, we propose an open-sourced drone simulation platform, equipped with an extensive suite of tools for drone learning. It includes 4 drone models, 5 sensor modalities, 4 control modes, over 10 benchmark tasks, and a selection of widely used RL baselines. To showcase the capabilities of OmniDrones and to support future research, we also provide preliminary results on these benchmark tasks. We hope this platform will encourage further studies on applying RL to practical drone systems.
Reward Function Design for Crowd Simulation via Reinforcement Learning
Authors: Ariel Kwiatkowski, Vicky Kalogeiton, Julien Pettré, Marie-Paule Cani
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12841
Pdf link: https://arxiv.org/pdf/2309.12841
Abstract Crowd simulation is important for video-games design, since it enables to populate virtual worlds with autonomous avatars that navigate in a human-like manner. Reinforcement learning has shown great potential in simulating virtual crowds, but the design of the reward function is critical to achieving effective and efficient results. In this work, we explore the design of reward functions for reinforcement learning-based crowd simulation. We provide theoretical insights on the validity of certain reward functions according to their analytical properties, and evaluate them empirically using a range of scenarios, using the energy efficiency as the metric. Our experiments show that directly minimizing the energy usage is a viable strategy as long as it is paired with an appropriately scaled guiding potential, and enable us to study the impact of the different reward components on the behavior of the simulated crowd. Our findings can inform the development of new crowd simulation techniques, and contribute to the wider study of human-like navigation.
Accurate and Fast Compressed Video Captioning
Authors: Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12867
Pdf link: https://arxiv.org/pdf/2309.12867
Abstract Existing video captioning approaches typically require to first sample video frames from a decoded video and then conduct a subsequent process (e.g., feature extraction and/or captioning model learning). In this pipeline, manual frame sampling may ignore key information in videos and thus degrade performance. Additionally, redundant information in the sampled frames may result in low efficiency in the inference of video captioning. Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed. We propose a simple yet effective end-to-end transformer in the compressed domain for video captioning that enables learning from the compressed video for captioning. We show that even with a simple design, our method can achieve state-of-the-art performance on different benchmarks while running almost 2x faster than existing approaches. Code is available at https://github.com/acherstyx/CoCap.
OptCtrlPoints: Finding the Optimal Control Points for Biharmonic 3D Shape Deformation
Authors: Kunho Kim, Mikaela Angelina Uy, Despoina Paschalidou, Alec Jacobson, Leonidas J. Guibas, Minhyuk Sung
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2309.12899
Pdf link: https://arxiv.org/pdf/2309.12899
Abstract We propose OptCtrlPoints, a data-driven framework designed to identify the optimal sparse set of control points for reproducing target shapes using biharmonic 3D shape deformation. Control-point-based 3D deformation methods are widely utilized for interactive shape editing, and their usability is enhanced when the control points are sparse yet strategically distributed across the shape. With this objective in mind, we introduce a data-driven approach that can determine the most suitable set of control points, assuming that we have a given set of possible shape variations. The challenges associated with this task primarily stem from the computationally demanding nature of the problem. Two main factors contribute to this complexity: solving a large linear system for the biharmonic weight computation and addressing the combinatorial problem of finding the optimal subset of mesh vertices. To overcome these challenges, we propose a reformulation of the biharmonic computation that reduces the matrix size, making it dependent on the number of control points rather than the number of vertices. Additionally, we present an efficient search algorithm that significantly reduces the time complexity while still delivering a nearly optimal solution. Experiments on SMPL, SMAL, and DeformingThings4D datasets demonstrate the efficacy of our method. Our control points achieve better template-to-target fit than FPS, random search, and neural-network-based prediction. We also highlight the significant reduction in computation time from days to approximately 3 minutes.
Evolving Spiking Neural Networks to Mimic PID Control for Autonomous Blimps
Authors: Tim Burgers, Stein Stroobants, Guido de Croon
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.12937
Pdf link: https://arxiv.org/pdf/2309.12937
Abstract In recent years, Artificial Neural Networks (ANN) have become a standard in robotic control. However, a significant drawback of large-scale ANNs is their increased power consumption. This becomes a critical concern when designing autonomous aerial vehicles, given the stringent constraints on power and weight. Especially in the case of blimps, known for their extended endurance, power-efficient control methods are essential. Spiking neural networks (SNN) can provide a solution, facilitating energy-efficient and asynchronous event-driven processing. In this paper, we have evolved SNNs for accurate altitude control of a non-neutrally buoyant indoor blimp, relying solely on onboard sensing and processing power. The blimp's altitude tracking performance significantly improved compared to prior research, showing reduced oscillations and a minimal steady-state error. The parameters of the SNNs were optimized via an evolutionary algorithm, using a Proportional-Derivative-Integral (PID) controller as the target signal. We developed two complementary SNN controllers while examining various hidden layer structures. The first controller responds swiftly to control errors, mitigating overshooting and oscillations, while the second minimizes steady-state errors due to non-neutral buoyancy-induced drift. Despite the blimp's drivetrain limitations, our SNN controllers ensured stable altitude control, employing only 160 spiking neurons.
Performance Evaluation for Subarray-based Reconfigurable Intelligent Surface-Aided Wireless Communication Systems
Authors: Xinyi Yang, Weicong Chen, Xiao Li, Shi Jin
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.12977
Pdf link: https://arxiv.org/pdf/2309.12977
Abstract Reconfigurable intelligent surfaces (RISs) have received extensive concern to improve the performance of wireless communication systems. In this paper, a subarray-based scheme is investigated in terms of its effects on ergodic spectral efficiency (SE) and energy efficiency (EE) in RIS-assisted systems. In this scheme, the adjacent elements divided into a subarray are controlled by one signal and share the same reflection coefficient. An upper bound of ergodic SE is derived and an optimal phase shift design is proposed for the subarray-based RIS. Based on the upper bound and optimal design, we obtain the maximum of the upper bound. In particular, we analytically evaluate the effect of the subarray-based RIS on EE since it reduces SE and power consumption simultaneously. Numerical results verify the tightness of the upper bound, demonstrate the effectiveness of the optimal phase shift design for the subarray-based RIS, and reveal the effects of the subarray-based scheme on SE and EE.
Deep3DSketch+: Rapid 3D Modeling from Single Free-hand Sketches
Authors: Tianrun Chen, Chenglong Fu, Ying Zang, Lanyun Zhu, Jia Zhang, Papa Mao, Lingyun Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.13006
Pdf link: https://arxiv.org/pdf/2309.13006
Abstract The rapid development of AR/VR brings tremendous demands for 3D content. While the widely-used Computer-Aided Design (CAD) method requires a time-consuming and labor-intensive modeling process, sketch-based 3D modeling offers a potential solution as a natural form of computer-human interaction. However, the sparsity and ambiguity of sketches make it challenging to generate high-fidelity content reflecting creators' ideas. Precise drawing from multiple views or strategic step-by-step drawings is often required to tackle the challenge but is not friendly to novice users. In this work, we introduce a novel end-to-end approach, Deep3DSketch+, which performs 3D modeling using only a single free-hand sketch without inputting multiple sketches or view information. Specifically, we introduce a lightweight generation network for efficient inference in real-time and a structural-aware adversarial training approach with a Stroke Enhancement Module (SEM) to capture the structural information to facilitate learning of the realistic and fine-detailed shape structures for high-fidelity performance. Extensive experiments demonstrated the effectiveness of our approach with the state-of-the-art (SOTA) performance on both synthetic and real datasets.
Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design
Authors: Chao Fang, Wei Sun, Aojun Zhou, Zhongfeng Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2309.13015
Pdf link: https://arxiv.org/pdf/2309.13015
Abstract Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has attracted attention due to its hardware-friendly pattern and capability of achieving a high sparse ratio. However, the potential to accelerate N:M sparse DNN training has not been fully exploited, and there is a lack of efficient hardware supporting N:M sparse training. To tackle these challenges, this paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy. At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to neatly support both the regular dense operations and the computation-efficient N:M sparse operations. At the dataflow level, multiple optimization methods ranging from interleave mapping, pre-generation of N:M sparse weights, and offline scheduling, are proposed to boost the computational efficiency of SAT. Finally, the effectiveness of our training scheme is evaluated on a Xilinx VCU1525 FPGA card using various DNN models and datasets. Experimental results show the SAT accelerator with the BDWP sparse training method under 2:8 sparse ratio achieves an average speedup of 1.75x over that with the dense training, accompanied by a negligible accuracy loss of 0.56% on average. Furthermore, our proposed training scheme significantly improves the training throughput by 2.97~25.22x and the energy efficiency by 1.36~3.58x over prior FPGA-based accelerators.
A Hybrid Deep Learning-based Approach for Optimal Genotype by Environment Selection
Authors: Zahra Khalilzadeh, Motahareh Kashanian, Saeed Khaki, Lizhi Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.13021
Pdf link: https://arxiv.org/pdf/2309.13021
Abstract Precise crop yield prediction is essential for improving agricultural practices and ensuring crop resilience in varying climates. Integrating weather data across the growing season, especially for different crop varieties, is crucial for understanding their adaptability in the face of climate change. In the MLCAS2021 Crop Yield Prediction Challenge, we utilized a dataset comprising 93,028 training records to forecast yields for 10,337 test records, covering 159 locations across 28 U.S. states and Canadian provinces over 13 years (2003-2015). This dataset included details on 5,838 distinct genotypes and daily weather data for a 214-day growing season, enabling comprehensive analysis. As one of the winning teams, we developed two novel convolutional neural network (CNN) architectures: the CNN-DNN model, combining CNN and fully-connected networks, and the CNN-LSTM-DNN model, with an added LSTM layer for weather variables. Leveraging the Generalized Ensemble Method (GEM), we determined optimal model weights, resulting in superior performance compared to baseline models. The GEM model achieved lower RMSE (5.55% to 39.88%), reduced MAE (5.34% to 43.76%), and higher correlation coefficients (1.1% to 10.79%) when evaluated on test data. We applied the CNN-DNN model to identify top-performing genotypes for various locations and weather conditions, aiding genotype selection based on weather variables. Our data-driven approach is valuable for scenarios with limited testing years. Additionally, a feature importance analysis using RMSE change highlighted the significance of location, MG, year, and genotype, along with the importance of weather variables MDNI and AP.
Graph Neural Network for Stress Predictions in Stiffened Panels Under Uniform Loading
Authors: Yuecheng Cai, Jasmin Jelovica
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.13022
Pdf link: https://arxiv.org/pdf/2309.13022
Abstract Machine learning (ML) and deep learning (DL) techniques have gained significant attention as reduced order models (ROMs) to computationally expensive structural analysis methods, such as finite element analysis (FEA). Graph neural network (GNN) is a particular type of neural network which processes data that can be represented as graphs. This allows for efficient representation of complex geometries that can change during conceptual design of a structure or a product. In this study, we propose a novel graph embedding technique for efficient representation of 3D stiffened panels by considering separate plate domains as vertices. This approach is considered using Graph Sampling and Aggregation (GraphSAGE) to predict stress distributions in stiffened panels with varying geometries. A comparison between a finite-element-vertex graph representation is conducted to demonstrate the effectiveness of the proposed approach. A comprehensive parametric study is performed to examine the effect of structural geometry on the prediction performance. Our results demonstrate the immense potential of graph neural networks with the proposed graph embedding method as robust reduced-order models for 3D structures.
Minimization of energy functionals via FEM: implementation of hp-FEM
Authors: Miroslav Frost, Alexej Moskovka, Jan Valdman
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.13028
Pdf link: https://arxiv.org/pdf/2309.13028
Abstract Many problems in science and engineering can be rigorously recast into minimizing a suitable energy functional. We have been developing efficient and flexible solution strategies to tackle various minimization problems by employing finite element discretization with P1 triangular elements [1,2]. An extension to rectangular hp-finite elements in 2D is introduced in this contribution.
A numerical framework for simulating progressive failure in composite laminates under high-cycle fatigue loading
Authors: Pieter Hofman, Frans Paul van der Meer, Lambertus Johannes Sluys
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2309.13030
Pdf link: https://arxiv.org/pdf/2309.13030
Abstract In this work, a recently proposed high-cycle fatigue cohesive zone model, which covers crack initiation and propagation with limited input parameters, is embedded in a robust and efficient numerical framework for simulating progressive failure in composite laminates under fatigue loading. The fatigue cohesive zone model is enhanced with an implicit time integration scheme of the fatigue damage variable which allows for larger cycle increments and more efficient analyses. The method is combined with an adaptive strategy for determining the cycle increment based on global convergence rates. Moreover, a consistent material tangent stiffness matrix has been derived by fully linearizing the underlying mixed-mode quasi-static model and the fatigue damage update. The enhanced fatigue cohesive zone model is used to describe matrix cracking and delamination in laminates. In order to allow for matrix cracks to initiate at arbitrary locations and to avoid complex and costly mesh generation, the phantom node version of the eXtended finite element method (XFEM) is employed. For the insertion of new crack segments, an XFEM fatigue crack insertion criterion is presented, which is consistent with the fatigue cohesive zone formulation. It is shown with numerical examples that the improved fatigue damage update enhances the accuracy, efficiency and robustness of the numerical simulations significantly. The numerical framework is applied to the simulation of progressive fatigue failure in an open-hole [$\pm$45]-laminate. It is demonstrated that the numerical model is capable of accurately and efficiently simulating the complete failure process from distributed damage to localized failure.
PyPose v0.6: The Imperative Programming Interface for Robotics
Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, Jingnan Shi, Shaoshu Su, Yiren Lu, Taimeng Fu, Karthik Dantu, Jiajun Wu, Lihua Xie, Marco Hutter, Luca Carlone, Sebastian Scherer, Daning Huang, Yaoyu Hu, Junyi Geng, Chen Wang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.13035
Pdf link: https://arxiv.org/pdf/2309.13035
Abstract PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code.
GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators
Authors: Philipp Wu, Yide Shentu, Zhongke Yi, Xingyu Lin, Pieter Abbeel
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.13037
Pdf link: https://arxiv.org/pdf/2309.13037
Abstract Imitation learning from human demonstrations is a powerful framework to teach robots new skills. However, the performance of the learned policies is bottlenecked by the quality, scale, and variety of the demonstration data. In this paper, we aim to lower the barrier to collecting large and high-quality human demonstration data by proposing GELLO, a general framework for building low-cost and intuitive teleoperation systems for robotic manipulation. Given a target robot arm, we build a GELLO controller that has the same kinematic structure as the target arm, leveraging 3D-printed parts and off-the-shelf motors. GELLO is easy to build and intuitive to use. Through an extensive user study, we show that GELLO enables more reliable and efficient demonstration collection compared to commonly used teleoperation devices in the imitation learning literature such as VR controllers and 3D spacemouses. We further demonstrate the capabilities of GELLO for performing complex bi-manual and contact-rich manipulation tasks. To make GELLO accessible to everyone, we have designed and built GELLO systems for 3 commonly used robotic arms: Franka, UR5, and xArm. All software and hardware are open-sourced and can be found on our website: https://wuphilipp.github.io/gello/.
NeRRF: 3D Reconstruction and View Synthesis for Transparent and Specular Objects with Neural Refractive-Reflective Fields
Authors: Xiaoxue Chen, Junchen Liu, Hao Zhao, Guyue Zhou, Ya-Qin Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.13039
Pdf link: https://arxiv.org/pdf/2309.13039
Abstract Neural radiance fields (NeRF) have revolutionized the field of image-based view synthesis. However, NeRF uses straight rays and fails to deal with complicated light path changes caused by refraction and reflection. This prevents NeRF from successfully synthesizing transparent or specular objects, which are ubiquitous in real-world robotics and A/VR applications. In this paper, we introduce the refractive-reflective field. Taking the object silhouette as input, we first utilize marching tetrahedra with a progressive encoding to reconstruct the geometry of non-Lambertian objects and then model refraction and reflection effects of the object in a unified framework using Fresnel terms. Meanwhile, to achieve efficient and effective anti-aliasing, we propose a virtual cone supersampling technique. We benchmark our method on different shapes, backgrounds and Fresnel terms on both real-world and synthetic datasets. We also qualitatively and quantitatively benchmark the rendering results of various editing applications, including material editing, object replacement/insertion, and environment illumination estimation. Codes and data are publicly available at https://github.com/dawning77/NeRRF.
E(2)-Equivariant Graph Planning for Navigation
Authors: Linfeng Zhao, Hongyu Li, Taskin Padir, Huaizu Jiang, Lawson L.S. Wong
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.13043
Pdf link: https://arxiv.org/pdf/2309.13043
Abstract Learning for robot navigation presents a critical and challenging task. The scarcity and costliness of real-world datasets necessitate efficient learning approaches. In this letter, we exploit Euclidean symmetry in planning for 2D navigation, which originates from Euclidean transformations between reference frames and enables parameter sharing. To address the challenges of unstructured environments, we formulate the navigation problem as planning on a geometric graph and develop an equivariant message passing network to perform value iteration. Furthermore, to handle multi-camera input, we propose a learnable equivariant layer to lift features to a desired space. We conduct comprehensive evaluations across five diverse tasks encompassing structured and unstructured environments, along with maps of known and unknown, given point goals or semantic goals. Our experiments confirm the substantial benefits on training efficiency, stability, and generalization.
Keyword: faster

Memory Efficient Mixed-Precision Optimizers
Authors: Basile Lewandowski, Atli Kosson
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12381
Pdf link: https://arxiv.org/pdf/2309.12381
Abstract Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both single and half-precision floating point arithmetic to reduce memory requirements while maintaining model accuracy. We provide here an algorithm to further reduce memory usage during the training of a model by getting rid of the floating point copy of the parameters, virtually keeping only half-precision numbers. We also explore the benefits of getting rid of the gradient's value by executing the optimizer step during the back-propagation. In practice, we achieve up to 25% lower peak memory use and 15% faster training while maintaining the same level of accuracy.
Rapidash: Efficient Constraint Discovery via Rapid Verification
Authors: Zifan Liu, Shaleen Deep, Anna Fariha, Fotis Psallidas, Ashish Tiwari, Avrilia Floratou
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2309.12436
Pdf link: https://arxiv.org/pdf/2309.12436
Abstract Denial Constraint (DC) is a well-established formalism that captures a wide range of integrity constraints commonly encountered, including candidate keys, functional dependencies, and ordering constraints, among others. Given their significance, there has been considerable research interest in achieving fast verification and discovery of exact DCs within the database community. Despite the significant advancements in the field, prior work exhibits notable limitations when confronted with large-scale datasets. The current state-of-the-art exact DC verification algorithm demonstrates a quadratic (worst-case) time complexity relative to the dataset's number of rows. In the context of DC discovery, existing methodologies rely on a two-step algorithm that commences with an expensive data structure-building phase, often requiring hours to complete even for datasets containing only a few million rows. Consequently, users are left without any insights into the DCs that hold on their dataset until this lengthy building phase concludes. In this paper, we introduce Rapidash, a comprehensive framework for DC verification and discovery. Our work makes a dual contribution. First, we establish a connection between orthogonal range search and DC verification. We introduce a novel exact DC verification algorithm that demonstrates near-linear time complexity, representing a theoretical improvement over prior work. Second, we propose an anytime DC discovery algorithm that leverages our novel verification algorithm to gradually provide DCs to users, eliminating the need for the time-intensive building phase observed in prior work. To validate the effectiveness of our algorithms, we conduct extensive evaluations on four large-scale production datasets. Our results reveal that our DC verification algorithm achieves up to 40 times faster performance compared to state-of-the-art approaches.
SAVME: Efficient Safety Validation for Autonomous Systems Using Meta-Learning
Authors: Marc R. Schlichting, Nina V. Board, Anthony L. Corso, Mykel J. Kochenderfer
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Emerging Technologies (cs.ET); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12474
Pdf link: https://arxiv.org/pdf/2309.12474
Abstract Discovering potential failures of an autonomous system is important prior to deployment. Falsification-based methods are often used to assess the safety of such systems, but the cost of running many accurate simulation can be high. The validation can be accelerated by identifying critical failure scenarios for the system under test and by reducing the simulation runtime. We propose a Bayesian approach that integrates meta-learning strategies with a multi-armed bandit framework. Our method involves learning distributions over scenario parameters that are prone to triggering failures in the system under test, as well as a distribution over fidelity settings that enable fast and accurate simulations. In the spirit of meta-learning, we also assess whether the learned fidelity settings distribution facilitates faster learning of the scenario parameter distributions for new scenarios. We showcase our methodology using a cutting-edge 3D driving simulator, incorporating 16 fidelity settings for an autonomous vehicle stack that includes camera and lidar sensors. We evaluate various scenarios based on an autonomous vehicle pre-crash typology. As a result, our approach achieves a significant speedup, up to 18 times faster compared to traditional methods that solely rely on a high-fidelity simulator.
Trip Planning for Autonomous Vehicles with Wireless Data Transfer Needs Using Reinforcement Learning
Authors: Yousef AlSaqabi, Bhaskar Krishnamachari
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12534
Pdf link: https://arxiv.org/pdf/2309.12534
Abstract With recent advancements in the field of communications and the Internet of Things, vehicles are becoming more aware of their environment and are evolving towards full autonomy. Vehicular communication opens up the possibility for vehicle-to-infrastructure interaction, where vehicles could share information with components such as cameras, traffic lights, and signage that support a countrys road system. As a result, vehicles are becoming more than just a means of transportation; they are collecting, processing, and transmitting massive amounts of data used to make driving safer and more convenient. With 5G cellular networks and beyond, there is going to be more data bandwidth available on our roads, but it may be heterogeneous because of limitations like line of sight, infrastructure, and heterogeneous traffic on the road. This paper addresses the problem of route planning for autonomous vehicles in urban areas accounting for both driving time and data transfer needs. We propose a novel reinforcement learning solution that prioritizes high bandwidth roads to meet a vehicles data transfer requirement, while also minimizing driving time. We compare this approach to traffic-unaware and bandwidth-unaware baselines to show how much better it performs under heterogeneous traffic. This solution could be used as a starting point to understand what good policies look like, which could potentially yield faster, more efficient heuristics in the future.
From Text to Trends: A Unique Garden Analytics Perspective on the Future of Modern Agriculture
Authors: Parag Saxena
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12579
Pdf link: https://arxiv.org/pdf/2309.12579
Abstract Data-driven insights are essential for modern agriculture. This research paper introduces a machine learning framework designed to improve how we educate and reach out to people in the field of horticulture. The framework relies on data from the Horticulture Online Help Desk (HOHD), which is like a big collection of questions from people who love gardening and are part of the Extension Master Gardener Program (EMGP). This framework has two main parts. First, it uses special computer programs (machine learning models) to sort questions into categories. This helps us quickly send each question to the right expert, so we can answer it faster. Second, it looks at when questions are asked and uses that information to guess how many questions we might get in the future and what they will be about. This helps us plan on topics that will be really important. It's like knowing what questions will be popular in the coming months. We also take into account where the questions come from by looking at the Zip Code. This helps us make research that fits the challenges faced by gardeners in different places. In this paper, we demonstrate the potential of machine learning techniques to predict trends in horticulture by analyzing textual queries from homeowners. We show that NLP, classification, and time series analysis can be used to identify patterns in homeowners' queries and predict future trends in horticulture. Our results suggest that machine learning could be used to predict trends in other agricultural sectors as well. If large-scale agriculture industries curate and maintain a comparable repository of textual data, the potential for trend prediction and strategic agricultural planning could be revolutionized. This convergence of technology and agriculture offers a promising pathway for the future of sustainable farming and data-informed agricultural practices
Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes
Authors: Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.12658
Pdf link: https://arxiv.org/pdf/2309.12658
Abstract Deep Gaussian Process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these challenges, we introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior. We solve the minimax problem using Monte Carlo estimation and subsampling stochastic optimization techniques. We demonstrate that the bias introduced by our method can be controlled by multiplying the Fisher divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm. Our experiments on datasets ranging from hundreds to tens of thousands demonstrate the effectiveness and the faster convergence rate of the proposed method. We achieve a classification accuracy of 93.56 on the CIFAR10 dataset, outperforming SOTA Gaussian process methods. Furthermore, our method guarantees theoretically controlled prediction error for DGP models and demonstrates remarkable performance on various datasets. We are optimistic that NOVI has the potential to enhance the performance of deep Bayesian nonparametric models and could have significant implications for various practical applications
eWand: A calibration framework for wide baseline frame-based and event-based camera systems
Authors: Thomas Gossard, Andreas Ziegler, Levin Kolmar, Jonas Tebbe, Andreas Zell
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12685
Pdf link: https://arxiv.org/pdf/2309.12685
Abstract Accurate calibration is crucial for using multiple cameras to triangulate the position of objects precisely. However, it is also a time-consuming process that needs to be repeated for every displacement of the cameras. The standard approach is to use a printed pattern with known geometry to estimate the intrinsic and extrinsic parameters of the cameras. The same idea can be applied to event-based cameras, though it requires extra work. By using frame reconstruction from events, a printed pattern can be detected. A blinking pattern can also be displayed on a screen. Then, the pattern can be directly detected from the events. Such calibration methods can provide accurate intrinsic calibration for both frame- and event-based cameras. However, using 2D patterns has several limitations for multi-camera extrinsic calibration, with cameras possessing highly different points of view and a wide baseline. The 2D pattern can only be detected from one direction and needs to be of significant size to compensate for its distance to the camera. This makes the extrinsic calibration time-consuming and cumbersome. To overcome these limitations, we propose eWand, a new method that uses blinking LEDs inside opaque spheres instead of a printed or displayed pattern. Our method provides a faster, easier-to-use extrinsic calibration approach that maintains high accuracy for both event- and frame-based cameras.
Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography
Authors: Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel, Prasiddha Bhandari, Bishesh Khanal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12829
Pdf link: https://arxiv.org/pdf/2309.12829
Abstract Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially aiding in accurate and explainable segmentation. However, the lack of readily available data in echocardiography hampers the training of VLSMs. In this study, we explore using synthetic datasets from Semantic Diffusion Models (SDMs) to enhance VLSMs for echocardiography segmentation. We evaluate results for two popular VLSMs (CLIPSeg and CRIS) using seven different kinds of language prompts derived from several attributes, automatically extracted from echocardiography images, segmentation masks, and their metadata. Our results show improved metrics and faster convergence when pretraining VLSMs on SDM-generated synthetic images before finetuning on real images. The code, configs, and prompts are available at https://github.com/naamiinepal/synthetic-boost.
Accurate and Fast Compressed Video Captioning
Authors: Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12867
Pdf link: https://arxiv.org/pdf/2309.12867
Abstract Existing video captioning approaches typically require to first sample video frames from a decoded video and then conduct a subsequent process (e.g., feature extraction and/or captioning model learning). In this pipeline, manual frame sampling may ignore key information in videos and thus degrade performance. Additionally, redundant information in the sampled frames may result in low efficiency in the inference of video captioning. Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed. We propose a simple yet effective end-to-end transformer in the compressed domain for video captioning that enables learning from the compressed video for captioning. We show that even with a simple design, our method can achieve state-of-the-art performance on different benchmarks while running almost 2x faster than existing approaches. Code is available at https://github.com/acherstyx/CoCap.
Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: the Past, Present, and Future
Authors: Yan Song, He Jiang, Haifeng Zhang, Zheng Tian, Weinan Zhang, Jun Wang
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2309.12951
Pdf link: https://arxiv.org/pdf/2309.12951
Abstract Even though Google Research Football (GRF) was initially benchmarked and studied as a single-agent environment in its original paper, recent years have witnessed an increasing focus on its multi-agent nature by researchers utilizing it as a testbed for Multi-Agent Reinforcement Learning (MARL). However, the absence of standardized environment settings and unified evaluation metrics for multi-agent scenarios hampers the consistent understanding of various studies. Furthermore, the challenging 5-vs-5 and 11-vs-11 full-game scenarios have received limited thorough examination due to their substantial training complexities. To address these gaps, this paper extends the original environment by not only standardizing the environment settings and benchmarking cooperative learning algorithms across different scenarios, including the most challenging full-game scenarios, but also by discussing approaches to enhance football AI from diverse perspectives and introducing related research tools. Specifically, we provide a distributed and asynchronous population-based self-play framework with diverse pre-trained policies for faster training, two football-specific analytical tools for deeper investigation, and an online leaderboard for broader evaluation. The overall expectation of this work is to advance the study of Multi-Agent Reinforcement Learning on Google Research Football environment, with the ultimate goal of benefiting real-world sports beyond virtual games.
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Authors: Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.13007
Pdf link: https://arxiv.org/pdf/2309.13007
Abstract Large Language Models (LLMs) still struggle with complex reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents to foster diverse thoughts and discussion for improved consensus. ReConcile enhances the reasoning capabilities of LLMs by holding multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their uncertainties, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. This discussion prompt enables each agent to revise their responses in light of insights from other agents. Once a consensus is reached and the discussion ends, ReConcile determines the final answer by leveraging the confidence of each agent in a weighted voting scheme. We implement ReConcile with ChatGPT, Bard, and Claude2 as the three agents. Our experimental results on various benchmarks demonstrate that ReConcile significantly enhances the reasoning performance of the agents (both individually and as a team), surpassing prior single-agent and multi-agent baselines by 7.7% and also outperforming GPT-4 on some of these datasets. We also experiment with GPT-4 itself as one of the agents in ReConcile and demonstrate that its initial performance also improves by absolute 10.0% through discussion and feedback from other agents. Finally, we also analyze the accuracy after every round and observe that ReConcile achieves better and faster consensus between agents, compared to a multi-agent debate baseline. Our code is available at: https://github.com/dinobby/ReConcile
Keyword: mobile

Stochastic scheduling of autonomous mobile robots at hospitals
Authors: Lulu Cheng, Ning Zhao
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.12318
Pdf link: https://arxiv.org/pdf/2309.12318
Abstract The outbreak of the New Coronavirus has significantly increased the vulnerability of medical staff. This paper addresses the safety and stress relief of medical personnel by proposing a solution to the scheduling problem of autonomous mobile robots (AMRs) in a stochastic environment. Considering the stochastic nature of travel and service times for AMRs affected by the surrounding environment, the routes of AMRs are planned to minimize the daily cost of the hospital (including the AMR fixed cost, penalty cost of violating the time window, and transportation cost). To efficiently generate high-quality solutions, we identify several properties and incorporate them into an improved Tabu Search (I-TS) algorithm for problem-solving. Experimental evaluations demonstrate that the I-TS algorithm outperforms existing methods by producing higher-quality solutions. By leveraging the characteristics of medical request environments, we intelligently allocate an appropriate number of AMRs to efficiently provide services, resulting in substantial cost reductions for hospitals and enhanced utilization of medical resources. These findings confirm the effectiveness of the proposed stochastic programming model in determining the optimal number of AMRs and their corresponding service routes across various environmental settings.
Human Following in Mobile Platforms with Person Re-Identification
Authors: Mario Srouji, Yao-Hung Hubert Tsai, Hugues Thomas, Jian Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.12479
Pdf link: https://arxiv.org/pdf/2309.12479
Abstract Human following is a crucial feature of human-robot interaction, yet it poses numerous challenges to mobile agents in real-world scenarios. Some major hurdles are that the target person may be in a crowd, obstructed by others, or facing away from the agent. To tackle these challenges, we present a novel person re-identification module composed of three parts: a 360-degree visual registration, a neural-based person re-identification using human faces and torsos, and a motion tracker that records and predicts the target person's future position. Our human-following system also addresses other challenges, including identifying fast-moving targets with low latency, searching for targets that move out of the camera's sight, collision avoidance, and adaptively choosing different following mechanisms based on the distance between the target person and the mobile agent. Extensive experiments show that our proposed person re-identification module significantly enhances the human-following feature compared to other baseline variants.
Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development
Authors: Seyed Jalaleddin Mousavirad, Luís A. Alexandre
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12484
Pdf link: https://arxiv.org/pdf/2309.12484
Abstract Energy consumption is a fundamental concern in mobile application development, bearing substantial significance for both developers and end-users. Moreover, it is a critical determinant in the consumer's decision-making process when considering a smartphone purchase. From the sustainability perspective, it becomes imperative to explore approaches aimed at mitigating the energy consumption of mobile devices, given the significant global consequences arising from the extensive utilisation of billions of smartphones, which imparts a profound environmental impact. Despite the existence of various energy-efficient programming practices within the Android platform, the dominant mobile ecosystem, there remains a need for documented machine learning-based energy prediction algorithms tailored explicitly for mobile app development. Hence, the main objective of this research is to propose a novel neural network-based framework, enhanced by a metaheuristic approach, to achieve robust energy prediction in the context of mobile app development. The metaheuristic approach here plays a crucial role in not only identifying suitable learning algorithms and their corresponding parameters but also determining the optimal number of layers and neurons within each layer. To the best of our knowledge, prior studies have yet to employ any metaheuristic algorithm to address all these hyperparameters simultaneously. Moreover, due to limitations in accessing certain aspects of a mobile phone, there might be missing data in the data set, and the proposed framework can handle this. In addition, we conducted an optimal algorithm selection strategy, employing 13 metaheuristic algorithms, to identify the best algorithm based on accuracy and resistance to missing values. The comprehensive experiments demonstrate that our proposed approach yields significant outcomes for energy consumption prediction.
A Study on Learning Social Robot Navigation with Multimodal Perception
Authors: Bhabaranjan Panigrahi, Amir Hossain Raj, Mohammad Nazeri, Xuesu Xiao
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12568
Pdf link: https://arxiv.org/pdf/2309.12568
Abstract Autonomous mobile robots need to perceive the environments with their onboard sensors (e.g., LiDARs and RGB cameras) and then make appropriate navigation decisions. In order to navigate human-inhabited public spaces, such a navigation task becomes more than only obstacle avoidance, but also requires considering surrounding humans and their intentions to somewhat change the navigation behavior in response to the underlying social norms, i.e., being socially compliant. Machine learning methods are shown to be effective in capturing those complex and subtle social interactions in a data-driven manner, without explicitly hand-crafting simplified models or cost functions. Considering multiple available sensor modalities and the efficiency of learning methods, this paper presents a comprehensive study on learning social robot navigation with multimodal perception using a large-scale real-world dataset. The study investigates social robot navigation decision making on both the global and local planning levels and contrasts unimodal and multimodal learning against a set of classical navigation approaches in different social scenarios, while also analyzing the training and generalizability performance from the learning perspective. We also conduct a human study on how learning with multimodal perception affects the perceived social compliance. The results show that multimodal learning has a clear advantage over unimodal learning in both dataset and human studies. We open-source our code for the community's future use to study multimodal perception for learning social robot navigation.
Data-driven Preference Learning Methods for Multiple Criteria Sorting with Temporal Criteria
Authors: Li Yijun, Guo Mengzhuo, Zhang Qingpeng
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12620
Pdf link: https://arxiv.org/pdf/2309.12620
Abstract The advent of predictive methodologies has catalyzed the emergence of data-driven decision support across various domains. However, developing models capable of effectively handling input time series data presents an enduring challenge. This study presents novel preference learning approaches to multiple criteria sorting problems in the presence of temporal criteria. We first formulate a convex quadratic programming model characterized by fixed time discount factors, operating within a regularization framework. Additionally, we propose an ensemble learning algorithm designed to consolidate the outputs of multiple, potentially weaker, optimizers, a process executed efficiently through parallel computation. To enhance scalability and accommodate learnable time discount factors, we introduce a novel monotonic Recurrent Neural Network (mRNN). It is designed to capture the evolving dynamics of preferences over time while upholding critical properties inherent to MCS problems, including criteria monotonicity, preference independence, and the natural ordering of classes. The proposed mRNN can describe the preference dynamics by depicting marginal value functions and personalized time discount factors along with time, effectively amalgamating the interpretability of traditional MCS methods with the predictive potential offered by deep preference learning models. Comprehensive assessments of the proposed models are conducted, encompassing synthetic data scenarios and a real-case study centered on classifying valuable users within a mobile gaming app based on their historical in-app behavioral sequences. Empirical findings underscore the notable performance improvements achieved by the proposed models when compared to a spectrum of baseline methods, spanning machine learning, deep learning, and conventional multiple criteria sorting approaches.
Learning Actions and Control of Focus of Attention with a Log-Polar-like Sensor
Authors: Robin Göransson, Volker Krueger
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12634
Pdf link: https://arxiv.org/pdf/2309.12634
Abstract With the long-term goal of reducing the image processing time on an autonomous mobile robot in mind we explore in this paper the use of log-polar like image data with gaze control. The gaze control is not done on the Cartesian image but on the log-polar like image data. For this we start out from the classic deep reinforcement learning approach for Atari games. We extend an A3C deep RL approach with an LSTM network, and we learn the policy for playing three Atari games and a policy for gaze control. While the Atari games already use low-resolution images of 80 by 80 pixels, we are able to further reduce the amount of image pixels by a factor of 5 without losing any gaming performance.
Open Source Robot Localization for Non-Planar Environments
Authors: Francisco Martín Rico, José Miguel Guerrero Hernández, Rodrigo Pérez Rodríguez, Juan Diego Peña Narváez, Alberto García Gómez-Jacinto
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.12744
Pdf link: https://arxiv.org/pdf/2309.12744
Abstract The operational environments in which a mobile robot executes its missions often exhibit non-flat terrain characteristics, encompassing outdoor and indoor settings featuring ramps and slopes. In such scenarios, the conventional methodologies employed for localization encounter novel challenges and limitations. This study delineates a localization framework incorporating ground elevation and inclination considerations, deviating from traditional 2D localization paradigms that may falter in such contexts. In our proposed approach, the map encompasses elevation and spatial occupancy information, employing Gridmaps and Octomaps. At the same time, the perception model is designed to accommodate the robot's inclined orientation and the potential presence of ground as an obstacle, besides usual structural and dynamic obstacles. We have developed and rigorously validated our approach within Nav2, and esteemed open-source framework renowned for robot navigation. Our findings demonstrate that our methodology represents a viable and effective alternative for mobile robots operating in challenging outdoor environments or intrincate terrains.
Keyword: pruning

ThinResNet: A New Baseline for Structured Convolutional Networks Pruning
Authors: Hugo Tessier, Ghouti Boukli Hacene, Vincent Gripon
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2309.12854
Pdf link: https://arxiv.org/pdf/2309.12854
Abstract Pruning is a compression method which aims to improve the efficiency of neural networks by reducing their number of parameters while maintaining a good performance, thus enhancing the performance-to-cost ratio in nontrivial ways. Of particular interest are structured pruning techniques, in which whole portions of parameters are removed altogether, resulting in easier to leverage shrunk architectures. Since its growth in popularity in the recent years, pruning gave birth to countless papers and contributions, resulting first in critical inconsistencies in the way results are compared, and then to a collective effort to establish standardized benchmarks. However, said benchmarks are based on training practices that date from several years ago and do not align with current practices. In this work, we verify how results in the recent literature of pruning hold up against networks that underwent both state-of-the-art training methods and trivial model scaling. We find that the latter clearly and utterly outperform all the literature we compared to, proving that updating standard pruning benchmarks and re-evaluating classical methods in their light is an absolute necessity. We thus introduce a new challenging baseline to compare structured pruning to: ThinResNet.
Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design
Authors: Chao Fang, Wei Sun, Aojun Zhou, Zhongfeng Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2309.13015
Pdf link: https://arxiv.org/pdf/2309.13015
Abstract Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has attracted attention due to its hardware-friendly pattern and capability of achieving a high sparse ratio. However, the potential to accelerate N:M sparse DNN training has not been fully exploited, and there is a lack of efficient hardware supporting N:M sparse training. To tackle these challenges, this paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy. At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to neatly support both the regular dense operations and the computation-efficient N:M sparse operations. At the dataflow level, multiple optimization methods ranging from interleave mapping, pre-generation of N:M sparse weights, and offline scheduling, are proposed to boost the computational efficiency of SAT. Finally, the effectiveness of our training scheme is evaluated on a Xilinx VCU1525 FPGA card using various DNN models and datasets. Experimental results show the SAT accelerator with the BDWP sparse training method under 2:8 sparse ratio achieves an average speedup of 1.75x over that with the dense training, accompanied by a negligible accuracy loss of 0.56% on average. Furthermore, our proposed training scheme significantly improves the training throughput by 2.97~25.22x and the energy efficiency by 1.36~3.58x over prior FPGA-based accelerators.
Keyword: diffusion

Antagonising explanation and revealing bias directly through sequencing and multimodal inference
Authors: Luís Arandas, Mick Grierson, Miguel Carvalhais
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12345
Pdf link: https://arxiv.org/pdf/2309.12345
Abstract Deep generative models produce data according to a learned representation, e.g. diffusion models, through a process of approximation computing possible samples. Approximation can be understood as reconstruction and the large datasets used to train models as sets of records in which we represent the physical world with some data structure (photographs, audio recordings, manuscripts). During the process of reconstruction, e.g., image frames develop each timestep towards a textual input description. While moving forward in time, frame sets are shaped according to learned bias and their production, we argue here, can be considered as going back in time; not by inspiration on the backward diffusion process but acknowledging culture is specifically marked in the records. Futures of generative modelling, namely in film and audiovisual arts, can benefit by dealing with diffusion systems as a process to compute the future by inevitably being tied to the past, if acknowledging the records as to capture fields of view at a specific time, and to correlate with our own finite memory ideals. Models generating new data distributions can target video production as signal processors and by developing sequences through timelines we ourselves also go back to decade-old algorithmic and multi-track methodologies revealing the actual predictive failure of contemporary approaches to synthesis in moving image, both as relevant to composition and not explanatory.
Synthetic Image Detection: Highlights from the IEEE Video and Image Processing Cup 2022 Student Competition
Authors: Davide Cozzolino, Koki Nagano, Lucas Thomaz, Angshul Majumdar, Luisa Verdoliva
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12428
Pdf link: https://arxiv.org/pdf/2309.12428
Abstract The Video and Image Processing (VIP) Cup is a student competition that takes place each year at the IEEE International Conference on Image Processing. The 2022 IEEE VIP Cup asked undergraduate students to develop a system capable of distinguishing pristine images from generated ones. The interest in this topic stems from the incredible advances in the AI-based generation of visual data, with tools that allows the synthesis of highly realistic images and videos. While this opens up a large number of new opportunities, it also undermines the trustworthiness of media content and fosters the spread of disinformation on the internet. Recently there was strong concern about the generation of extremely realistic images by means of editing software that includes the recent technology on diffusion models. In this context, there is a need to develop robust and automatic tools for synthetic image detection.
License Plate Super-Resolution Using Diffusion Models
Authors: Sawsan AlHalawani, Bilel Benjdira, Adel Ammar, Anis Koubaa, Anas M. Ali
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12506
Pdf link: https://arxiv.org/pdf/2309.12506
Abstract In surveillance, accurately recognizing license plates is hindered by their often low quality and small dimensions, compromising recognition precision. Despite advancements in AI-based image super-resolution, methods like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) still fall short in enhancing license plate images. This study leverages the cutting-edge diffusion model, which has consistently outperformed other deep learning techniques in image restoration. By training this model using a curated dataset of Saudi license plates, both in low and high resolutions, we discovered the diffusion model's superior efficacy. The method achieves a 12.55\% and 37.32% improvement in Peak Signal-to-Noise Ratio (PSNR) over SwinIR and ESRGAN, respectively. Moreover, our method surpasses these techniques in terms of Structural Similarity Index (SSIM), registering a 4.89% and 17.66% improvement over SwinIR and ESRGAN, respectively. Furthermore, 92% of human evaluators preferred our images over those from other algorithms. In essence, this research presents a pioneering solution for license plate super-resolution, with tangible potential for surveillance systems.
A Diffusion-Model of Joint Interactive Navigation
Authors: Matthew Niedoba, Jonathan Wilder Lavington, Yunpeng Liu, Vasileios Lioutas, Justice Sefas, Xiaoxuan Liang, Dylan Green, Setareh Dabiri, Berend Zwartsenberg, Adam Scibior, Frank Wood
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.12508
Pdf link: https://arxiv.org/pdf/2309.12508
Abstract Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.
Semantic Change Driven Generative Semantic Communication Framework
Authors: Wanting Yang, Zehui Xiong, Yanli Yuan, Tony Q. S. Quek
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2309.12775
Pdf link: https://arxiv.org/pdf/2309.12775
Abstract The burgeoning generative artificial intelligence technology offers novel insights into the development of semantic communication (SemCom) frameworks. These frameworks hold the potential to address the challenges associated with the black-box nature inherent in existing end-to-end training manner for the existing SemCom framework, as well as deterioration of the user experience caused by the inevitable error floor in deep learning-based semantic communication. In this paper, we focus on the widespread remote monitoring scenario, and propose a semantic change driven generative SemCom framework. Therein, the semantic encoder and semantic decoder can be optimized independently. Specifically, we develop a modular semantic encoder with value of information based semantic sampling function. In addition, we propose a conditional denoising diffusion probabilistic mode-assisted semantic decoder that relies on received semantic information from the source, namely, the semantic map, and the local static scene information to remotely regenerate scenes. Moreover, we demonstrate the effectiveness of the proposed semantic encoder and decoder as well as the considerable potential in reducing energy consumption through simulation. The code is available at https://github.com/wty2011jl/SCDGSC.git
Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography
Authors: Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel, Prasiddha Bhandari, Bishesh Khanal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12829
Pdf link: https://arxiv.org/pdf/2309.12829
Abstract Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially aiding in accurate and explainable segmentation. However, the lack of readily available data in echocardiography hampers the training of VLSMs. In this study, we explore using synthetic datasets from Semantic Diffusion Models (SDMs) to enhance VLSMs for echocardiography segmentation. We evaluate results for two popular VLSMs (CLIPSeg and CRIS) using seven different kinds of language prompts derived from several attributes, automatically extracted from echocardiography images, segmentation masks, and their metadata. Our results show improved metrics and faster convergence when pretraining VLSMs on SDM-generated synthetic images before finetuning on real images. The code, configs, and prompts are available at https://github.com/naamiinepal/synthetic-boost.
Diffusion Augmentation for Sequential Recommendation
Authors: Qidong Liu, Fan Yan, Xiangyu Zhao, Zhaocheng Du, Huifeng Guo, Ruiming Tang, Feng Tian
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12858
Pdf link: https://arxiv.org/pdf/2309.12858
Abstract Sequential recommendation (SRS) has become the technical foundation in many applications recently, which aims to recommend the next item based on the user's historical interactions. However, sequential recommendation often faces the problem of data sparsity, which widely exists in recommender systems. Besides, most users only interact with a few items, but existing SRS models often underperform these users. Such a problem, named the long-tail user problem, is still to be resolved. Data augmentation is a distinct way to alleviate these two problems, but they often need fabricated training strategies or are hindered by poor-quality generated interactions. To address these problems, we propose a Diffusion Augmentation for Sequential Recommendation (DiffuASR) for a higher quality generation. The augmented dataset by DiffuASR can be used to train the sequential recommendation models directly, free from complex training procedures. To make the best of the generation ability of the diffusion model, we first propose a diffusion-based pseudo sequence generation framework to fill the gap between image and sequence generation. Then, a sequential U-Net is designed to adapt the diffusion noise prediction model U-Net to the discrete sequence generation task. At last, we develop two guide strategies to assimilate the preference between generated and origin sequences. To validate the proposed DiffuASR, we conduct extensive experiments on three real-world datasets with three sequential recommendation models. The experimental results illustrate the effectiveness of DiffuASR. As far as we know, DiffuASR is one pioneer that introduce the diffusion model to the recommendation.
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
Authors: Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.13042
Pdf link: https://arxiv.org/pdf/2309.13042
Abstract We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation. Our method is training-free and does not rely on any label supervision. Two key designs enable us to employ an off-the-shelf text-to-image diffusion model as a useful dataset generator for object instances and mask annotations. First, we divide an image canvas into several regions and perform a single round of diffusion process to generate multiple instances simultaneously, conditioning on different text prompts. Second, we obtain corresponding instance masks by aggregating cross-attention maps associated with object prompts across layers and diffusion time steps, followed by simple thresholding and edge-aware refinement processing. Without bells and whistles, our MosaicFusion can produce a significant amount of synthetic labeled data for both rare and novel categories. Experimental results on the challenging LVIS long-tailed and open-vocabulary benchmarks demonstrate that MosaicFusion can significantly improve the performance of existing instance segmentation models, especially for rare and novel categories. Code will be released at https://github.com/Jiahao000/MosaicFusion.
Keyword: adaptive

Human Following in Mobile Platforms with Person Re-Identification
Authors: Mario Srouji, Yao-Hung Hubert Tsai, Hugues Thomas, Jian Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.12479
Pdf link: https://arxiv.org/pdf/2309.12479
Abstract Human following is a crucial feature of human-robot interaction, yet it poses numerous challenges to mobile agents in real-world scenarios. Some major hurdles are that the target person may be in a crowd, obstructed by others, or facing away from the agent. To tackle these challenges, we present a novel person re-identification module composed of three parts: a 360-degree visual registration, a neural-based person re-identification using human faces and torsos, and a motion tracker that records and predicts the target person's future position. Our human-following system also addresses other challenges, including identifying fast-moving targets with low latency, searching for targets that move out of the camera's sight, collision avoidance, and adaptively choosing different following mechanisms based on the distance between the target person and the mobile agent. Extensive experiments show that our proposed person re-identification module significantly enhances the human-following feature compared to other baseline variants.
CodePlan: Repository-level Coding using LLMs and Planning
Authors: Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Vageesh D C, Arun Iyer, Suresh Parthasarathy, Sriram Rajamani, B. Ashok, Shashank Shet
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2309.12499
Pdf link: https://arxiv.org/pdf/2309.12499
Abstract Software engineering activities such as package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. We formulate these activities as repository-level coding tasks. Recent tools like GitHub Copilot, which are powered by Large Language Models (LLMs), have succeeded in offering high-quality solutions to localized coding problems. Repository-level coding tasks are more involved and cannot be solved directly using LLMs, since code within a repository is inter-dependent and the entire repository may be too large to fit into the prompt. We frame repository-level coding as a planning problem and present a task-agnostic framework, called CodePlan to solve it. CodePlan synthesizes a multi-step chain of edits (plan), where each step results in a call to an LLM on a code location with context derived from the entire repository, previous code changes and task-specific instructions. CodePlan is based on a novel combination of an incremental dependency analysis, a change may-impact analysis and an adaptive planning algorithm. We evaluate the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python). Each task is evaluated on multiple code repositories, each of which requires inter-dependent changes to many files (between 2-97 files). Coding tasks of this level of complexity have not been automated using LLMs before. Our results show that CodePlan has better match with the ground truth compared to baselines. CodePlan is able to get 5/6 repositories to pass the validity checks (e.g., to build without errors and make correct code edits) whereas the baselines (without planning but with the same type of contextual information as CodePlan) cannot get any of the repositories to pass them.
Mildly Exponential Lower Bounds on Tolerant Testers for Monotonicity, Unateness, and Juntas
Authors: Xi Chen, Anindya De, Yuhao Li, Shivam Nadimpalli, Rocco A. Servedio
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2309.12513
Pdf link: https://arxiv.org/pdf/2309.12513
Abstract We give the first super-polynomial (in fact, mildly exponential) lower bounds for tolerant testing (equivalently, distance estimation) of monotonicity, unateness, and juntas with a constant separation between the "yes" and "no" cases. Specifically, we give $\bullet$ A $2^{\Omega(n^{1/4}/\sqrt{\varepsilon})}$-query lower bound for non-adaptive, two-sided tolerant monotonicity testers and unateness testers when the "gap" parameter $\varepsilon_2-\varepsilon_1$ is equal to $\varepsilon$, for any $\varepsilon \geq 1/\sqrt{n}$; $\bullet$ A $2^{\Omega(k^{1/2})}$-query lower bound for non-adaptive, two-sided tolerant junta testers when the gap parameter is an absolute constant. In the constant-gap regime no non-trivial prior lower bound was known for monotonicity, the best prior lower bound known for unateness was $\tilde{\Omega}(n^{3/2})$ queries, and the best prior lower bound known for juntas was $\mathrm{poly}(k)$ queries.
Curriculum Reinforcement Learning via Morphology-Environment Co-Evolution
Authors: Shuang Ao, Tianyi Zhou, Guodong Long, Xuan Song, Jing Jiang
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12529
Pdf link: https://arxiv.org/pdf/2309.12529
Abstract Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through ``morphology-environment co-evolution (MECE)'', in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success.
Adaptive Model Predictive Control for Engine-Driven Ducted Fan Lift Systems using an Associated Linear Parameter Varying Model
Authors: Hanjie Jiang, Ye Zhou, Hann Woei Ho, Wenjie Hu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12552
Pdf link: https://arxiv.org/pdf/2309.12552
Abstract Ducted fan lift systems (DFLSs) powered by two-stroke aviation piston engines present a challenging control problem due to their complex multivariable dynamics. Current controllers for these systems typically rely on proportional-integral algorithms combined with data tables, which rely on accurate models and are not adaptive to handle time-varying dynamics or system uncertainties. This paper proposes a novel adaptive model predictive control (AMPC) strategy with an associated linear parameter varying (LPV) model for controlling the engine-driven DFLS. This LPV model is derived from a global network model, which is trained off-line with data obtained from a general mean value engine model for two-stroke aviation engines. Different network models, including multi-layer perceptron, Elman, and radial basis function (RBF), are evaluated and compared in this study. The results demonstrate that the RBF model exhibits higher prediction accuracy and robustness in the DFLS application. Based on the trained RBF model, the proposed AMPC approach constructs an associated network that directly outputs the LPV model parameters as an adaptive, robust, and efficient prediction model. The efficiency of the proposed approach is demonstrated through numerical simulations of a vertical take-off thrust preparation process for the DFLS. The simulation results indicate that the proposed AMPC method can effectively control the DFLS thrust with a relative error below 3.5%.
Driving with Guidance: Exploring the Trade-Off Between GPS Utility and Privacy Concerns Among Drivers
Authors: Yousef AlSaqabi, Souti Chattopadhyay
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2309.12601
Pdf link: https://arxiv.org/pdf/2309.12601
Abstract As the reliance on GPS technology for navigation grows, so does the ethical dilemma of balancing its indispensable utility with the escalating concerns over user privacy. This study investigates the trade-offs between GPS utility and privacy among drivers, using a mixed-method approach that includes a survey of 151 participants and 10 follow-up interviews. We examine usage patterns, feature preferences, and comfort levels with location tracking and destination prediction. Our findings demonstrate that users tend to overlook potential privacy risks in favor of the utility the technology provides. We also find that users do not mind sharing inaccurate or obfuscated location data as long as their frequently visited locations aren't identified, and their full driving routes can't be recreated. Based on our findings, we explore design opportunities for enhancing privacy and utility, including adaptive interfaces, personalized profiles, and technological innovations like blockchain.
Zero-Regret Performative Prediction Under Inequality Constraints
Authors: Wenjing Yan, Xuanyu Cao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.12618
Pdf link: https://arxiv.org/pdf/2309.12618
Abstract Performative prediction is a recently proposed framework where predictions guide decision-making and hence influence future data distributions. Such performative phenomena are ubiquitous in various areas, such as transportation, finance, public policy, and recommendation systems. To date, work on performative prediction has only focused on unconstrained scenarios, neglecting the fact that many real-world learning problems are subject to constraints. This paper bridges this gap by studying performative prediction under inequality constraints. Unlike most existing work that provides only performative stable points, we aim to find the optimal solutions. Anticipating performative gradients is a challenging task, due to the agnostic performative effect on data distributions. To address this issue, we first develop a robust primal-dual framework that requires only approximate gradients up to a certain accuracy, yet delivers the same order of performance as the stochastic primal-dual algorithm without performativity. Based on this framework, we then propose an adaptive primal-dual algorithm for location families. Our analysis demonstrates that the proposed adaptive primal-dual algorithm attains $\ca{O}(\sqrt{T})$ regret and constraint violations, using only $\sqrt{T} + 2T$ samples, where $T$ is the time horizon. To our best knowledge, this is the first study and analysis on the optimality of the performative prediction problem under inequality constraints. Finally, we validate the effectiveness of our algorithm and theoretical results through numerical simulations.
Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects
Authors: Feng Yan, Xiaoheng Jiang, Yang Lu, Lisha Cui, Shupan Li, Jiale Cao, Mingliang Xu, Dacheng Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12641
Pdf link: https://arxiv.org/pdf/2309.12641
Abstract Surface defect inspection is a very challenging task in which surface defects usually show weak appearances or exist under complex backgrounds. Most high-accuracy defect detection methods require expensive computation and storage overhead, making them less practical in some resource-constrained defect detection applications. Although some lightweight methods have achieved real-time inference speed with fewer parameters, they show poor detection accuracy in complex defect scenarios. To this end, we develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure. First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module. The proposed DSA performs element-wise similarity in channel dimension while maintaining linear complexity. In addition, we introduce a novel Channel Reference Attention (CRA) module before each decoder block to strengthen the representation of multi-level features in the bottom-up path. The proposed CRA exploits the channel correlation between features at different layers to adaptively enhance feature representation. The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods. Specifically, GCANet achieves competitive accuracy (91.79% $F{\beta}^{w}$, 93.55% $S\alpha$, and 97.35% $E_\phi$) on SD-saliency-900 while running 272fps on a single gpu.
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding
Authors: Jiazhen Wang, Bin Liu, Changtao Miao, Zhiwei Zhao, Wanyi Zhuang, Qi Chu, Nenghai Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12657
Pdf link: https://arxiv.org/pdf/2309.12657
Abstract AI-synthesized text and images have gained significant attention, particularly due to the widespread dissemination of multi-modal manipulations on the internet, which has resulted in numerous negative impacts on society. Existing methods for multi-modal manipulation detection and grounding primarily focus on fusing vision-language features to make predictions, while overlooking the importance of modality-specific features, leading to sub-optimal results. In this paper, we construct a simple and novel transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. To achieve this, we introduce visual/language pre-trained encoders and dual-branch cross-attention (DCA) to extract and fuse modality-unique features. Furthermore, we design decoupled fine-grained classifiers (DFC) to enhance modality-specific feature mining and mitigate modality competition. Moreover, we propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality using learnable queries, thereby improving the discovery of forged details. Extensive experiments on the $\rm DGM^4$ dataset demonstrate the superior performance of our proposed model compared to state-of-the-art approaches.
Disturbance Rejection Control for Autonomous Trolley Collection Robots with Prescribed Performance
Authors: Rui-Dong Xi, Liang Lu, Xue Zhang, Xiao Xiao, Bingyi Xia, Jiankun Wang, Max Q.-H. Meng
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.12660
Pdf link: https://arxiv.org/pdf/2309.12660
Abstract Trajectory tracking control of autonomous trolley collection robots (ATCR) is an ambitious work due to the complex environment, serious noise and external disturbances. This work investigates a control scheme for ATCR subjecting to severe environmental interference. A kinematics model based adaptive sliding mode disturbance observer with fast convergence is first proposed to estimate the lumped disturbances. On this basis, a robust controller with prescribed performance is proposed using a backstepping technique, which improves the transient performance and guarantees fast convergence. Simulation outcomes have been provided to illustrate the effectiveness of the proposed control scheme.
How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization
Authors: Hai Zhang, Hang Yu, Junqiao Zhao, Di Zhang, ChangHuang, Hongtu Zhou, Xiao Zhang, Chen Ye
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.12671
Pdf link: https://arxiv.org/pdf/2309.12671
Abstract Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.
mixed attention auto encoder for multi-class industrial anomaly detection
Authors: Jiangqi Liu, Feng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12700
Pdf link: https://arxiv.org/pdf/2309.12700
Abstract Most existing methods for unsupervised industrial anomaly detection train a separate model for each object category. This kind of approach can easily capture the category-specific feature distributions, but results in high storage cost and low training efficiency. In this paper, we propose a unified mixed-attention auto encoder (MAAE) to implement multi-class anomaly detection with a single model. To alleviate the performance degradation due to the diverse distribution patterns of different categories, we employ spatial attentions and channel attentions to effectively capture the global category information and model the feature distributions of multiple classes. Furthermore, to simulate the realistic noises on features and preserve the surface semantics of objects from different categories which are essential for detecting the subtle anomalies, we propose an adaptive noise generator and a multi-scale fusion module for the pre-trained features. MAAE delivers remarkable performances on the benchmark dataset compared with the state-of-the-art methods.
Transformer-based Image Compression with Variable Image Quality Objectives
Authors: Chia-Hao Kao, Yi-Hsin Chen, Cheng Chien, Wei-Chen Chiu, Wen-Hsiao Peng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2309.12717
Pdf link: https://arxiv.org/pdf/2309.12717
Abstract This paper presents a Transformer-based image compression system that allows for a variable image quality objective according to the user's preference. Optimizing a learned codec for different quality objectives leads to reconstructed images with varying visual characteristics. Our method provides the user with the flexibility to choose a trade-off between two image quality objectives using a single, shared model. Motivated by the success of prompt-tuning techniques, we introduce prompt tokens to condition our Transformer-based autoencoder. These prompt tokens are generated adaptively based on the user's preference and input image through learning a prompt generation network. Extensive experiments on commonly used quality metrics demonstrate the effectiveness of our method in adapting the encoding and/or decoding processes to a variable quality objective. While offering the additional flexibility, our proposed method performs comparably to the single-objective methods in terms of rate-distortion performance.
Domain Adaptive Few-Shot Open-Set Learning
Authors: Debabrata Pal, Deeptej More, Sai Bhargav, Dipesh Tamboli, Vaneet Aggarwal, Biplab Banerjee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.12814
Pdf link: https://arxiv.org/pdf/2309.12814
Abstract Few-shot learning has made impressive strides in addressing the crucial challenges of recognizing unknown samples from novel classes in target query sets and managing visual shifts between domains. However, existing techniques fall short when it comes to identifying target outliers under domain shifts by learning to reject pseudo-outliers from the source domain, resulting in an incomplete solution to both problems. To address these challenges comprehensively, we propose a novel approach called Domain Adaptive Few-Shot Open Set Recognition (DA-FSOS) and introduce a meta-learning-based architecture named DAFOSNET. During training, our model learns a shared and discriminative embedding space while creating a pseudo open-space decision boundary, given a fully-supervised source domain and a label-disjoint few-shot target domain. To enhance data density, we use a pair of conditional adversarial networks with tunable noise variances to augment both domains closed and pseudo-open spaces. Furthermore, we propose a domain-specific batch-normalized class prototypes alignment strategy to align both domains globally while ensuring class-discriminativeness through novel metric objectives. Our training approach ensures that DAFOS-NET can generalize well to new scenarios in the target domain. We present three benchmarks for DA-FSOS based on the Office-Home, mini-ImageNet/CUB, and DomainNet datasets and demonstrate the efficacy of DAFOS-NET through extensive experimentation
A numerical framework for simulating progressive failure in composite laminates under high-cycle fatigue loading
Authors: Pieter Hofman, Frans Paul van der Meer, Lambertus Johannes Sluys
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2309.13030
Pdf link: https://arxiv.org/pdf/2309.13030
Abstract In this work, a recently proposed high-cycle fatigue cohesive zone model, which covers crack initiation and propagation with limited input parameters, is embedded in a robust and efficient numerical framework for simulating progressive failure in composite laminates under fatigue loading. The fatigue cohesive zone model is enhanced with an implicit time integration scheme of the fatigue damage variable which allows for larger cycle increments and more efficient analyses. The method is combined with an adaptive strategy for determining the cycle increment based on global convergence rates. Moreover, a consistent material tangent stiffness matrix has been derived by fully linearizing the underlying mixed-mode quasi-static model and the fatigue damage update. The enhanced fatigue cohesive zone model is used to describe matrix cracking and delamination in laminates. In order to allow for matrix cracks to initiate at arbitrary locations and to avoid complex and costly mesh generation, the phantom node version of the eXtended finite element method (XFEM) is employed. For the insertion of new crack segments, an XFEM fatigue crack insertion criterion is presented, which is consistent with the fatigue cohesive zone formulation. It is shown with numerical examples that the improved fatigue damage update enhances the accuracy, efficiency and robustness of the numerical simulations significantly. The numerical framework is applied to the simulation of progressive fatigue failure in an open-hole [$\pm$45]-laminate. It is demonstrated that the numerical model is capable of accurately and efficiently simulating the complete failure process from distributed damage to localized failure.
Keyword: quantization

There is no result

A-suozhang / GetArxivDaily

New submissions for Mon, 25 Sep 23 #159

Keyword: efficient

Stochastic scheduling of autonomous mobile robots at hospitals

Aviation Safety Risk Analysis and Flight Technology Assessment Issues

Onchain Sports Betting using UBET Automated Market Maker

How Beaufort, Neumann and Gates met? Subject integration with spreadsheeting

An Efficient Intelligent Semi-Automated Warehouse Inventory Stocktaking System

Conversational Swarm Intelligence (CSI) Enhances Groupwise Deliberation

DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion

Foundation Metrics: Quantifying Effectiveness of Healthcare Conversations powered by Generative AI

Knowledge Base Aware Semantic Communication in Vehicular Networks

Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development

High-Dimensional Controller Tuning through Latent Representations

Trip Planning for Autonomous Vehicles with Wireless Data Transfer Needs Using Reinforcement Learning

Adaptive Model Predictive Control for Engine-Driven Ducted Fan Lift Systems using an Associated Linear Parameter Varying Model

Machine Learning Meets Advanced Robotic Manipulation

Cognitive Approach to Hierarchical Task Selection for Human-Robot Interaction in Dynamic Environments

Passive Reflection Codebook Design for IRS-Integrated Access Point

Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives

SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling

A Multi-Robot Task Assignment Framework for Search and Rescue with Heterogeneous Teams

Stable Reconstruction of Anisotropic Objects from Near-Field Electromagnetic Data

Data-driven Preference Learning Methods for Multiple Criteria Sorting with Temporal Criteria

A Detailed Analysis of the SpaceSaving$\pm$ Family of Algorithms with Bounded Deletions

Quark: A High-Performance Secure Container Runtime for Serverless Computing

Heterogeneous Rank Beamforming for Industrial Communications

MEV Makes Everyone Happy under Greedy Sequencing Rule

OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling

PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion

Direct Learning for Parameter-Varying Feedforward Control: A Neural-Network Approach

Optimal Dynamic Fees for Blockchain Resources

Towards an MLOps Architecture for XAI in Industrial Applications

AgentChat: Multi-Agent Collaborative Logistics for Carbon Reduction

CloudGripper: An Open Source Cloud Robotics Testbed for Robotic Manipulation Research, Benchmarking and Data Collection at Scale

Scalable Semantic 3D Mapping of Coral Reefs with Deep Learning

Improving Generalization in Game Agents with Data Augmentation in Imitation Learning

OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control

Reward Function Design for Crowd Simulation via Reinforcement Learning

Accurate and Fast Compressed Video Captioning

OptCtrlPoints: Finding the Optimal Control Points for Biharmonic 3D Shape Deformation

Evolving Spiking Neural Networks to Mimic PID Control for Autonomous Blimps

Performance Evaluation for Subarray-based Reconfigurable Intelligent Surface-Aided Wireless Communication Systems

Deep3DSketch+: Rapid 3D Modeling from Single Free-hand Sketches

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

A Hybrid Deep Learning-based Approach for Optimal Genotype by Environment Selection

Graph Neural Network for Stress Predictions in Stiffened Panels Under Uniform Loading

Minimization of energy functionals via FEM: implementation of hp-FEM

A numerical framework for simulating progressive failure in composite laminates under high-cycle fatigue loading

PyPose v0.6: The Imperative Programming Interface for Robotics

GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators

NeRRF: 3D Reconstruction and View Synthesis for Transparent and Specular Objects with Neural Refractive-Reflective Fields

E(2)-Equivariant Graph Planning for Navigation

Keyword: faster

Memory Efficient Mixed-Precision Optimizers

Rapidash: Efficient Constraint Discovery via Rapid Verification

SAVME: Efficient Safety Validation for Autonomous Systems Using Meta-Learning

Trip Planning for Autonomous Vehicles with Wireless Data Transfer Needs Using Reinforcement Learning

From Text to Trends: A Unique Garden Analytics Perspective on the Future of Modern Agriculture

Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes

eWand: A calibration framework for wide baseline frame-based and event-based camera systems

Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography

Accurate and Fast Compressed Video Captioning

Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: the Past, Present, and Future

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

Keyword: mobile

Stochastic scheduling of autonomous mobile robots at hospitals

Human Following in Mobile Platforms with Person Re-Identification

Robust Energy Consumption Prediction with a Missing Value-Resilient Metaheuristic-based Neural Network in Mobile App Development

A Study on Learning Social Robot Navigation with Multimodal Perception

Data-driven Preference Learning Methods for Multiple Criteria Sorting with Temporal Criteria

Learning Actions and Control of Focus of Attention with a Log-Polar-like Sensor

Open Source Robot Localization for Non-Planar Environments

Keyword: pruning

ThinResNet: A New Baseline for Structured Convolutional Networks Pruning

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

Keyword: diffusion

Antagonising explanation and revealing bias directly through sequencing and multimodal inference

Synthetic Image Detection: Highlights from the IEEE Video and Image Processing Cup 2022 Student Competition

License Plate Super-Resolution Using Diffusion Models