New submissions for Thu, 20 Apr 23

Keyword: efficient

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

Authors: Zac Pullar-Strecker, Xinglong Chang, Liam Brydon, Ioannis Ziogas, Katharina Dost, Jörg Wicker
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09175
Pdf link: https://arxiv.org/pdf/2304.09175
Abstract Running complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework. This leaves researchers forced to spend time implementing necessary features such as parallelization, caching, and checkpointing themselves instead of focussing on their project. To simplify the process, in this paper, we introduce Memento, a Python package that is designed to aid researchers and data scientists in the efficient management and execution of computationally intensive experiments. Memento has the capacity to streamline any experimental pipeline by providing a straightforward configuration matrix and the ability to concurrently run experiments across multiple threads. A demonstration of Memento is available at: https://wickerlab.org/publication/memento.
Generative models improve fairness of medical classifiers under distribution shifts
Authors: Ira Ktena, Olivia Wiles, Isabela Albuquerque, Sylvestre-Alvise Rebuffi, Ryutaro Tanno, Abhijit Guha Roy, Shekoofeh Azizi, Danielle Belgrave, Pushmeet Kohli, Alan Karthikesalingam, Taylan Cemgil, Sven Gowal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09218
Pdf link: https://arxiv.org/pdf/2304.09218
Abstract A ubiquitous challenge in machine learning is the problem of domain generalisation. This can exacerbate bias against groups or labels that are underrepresented in the datasets used for model development. Model bias can lead to unintended harms, especially in safety-critical applications like healthcare. Furthermore, the challenge is compounded by the difficulty of obtaining labelled data due to high cost or lack of readily available domain expertise. In our work, we show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models. In particular, we leverage the higher abundance of unlabelled data to capture the underlying data distribution of different conditions and subgroups for an imaging modality. By conditioning generative models on appropriate labels, we can steer the distribution of synthetic examples according to specific requirements. We demonstrate that these learned augmentations can surpass heuristic ones by making models more robust and statistically fair in- and out-of-distribution. To evaluate the generality of our approach, we study 3 distinct medical imaging contexts of varying difficulty: (i) histopathology images from a publicly available generalisation benchmark, (ii) chest X-rays from publicly available clinical datasets, and (iii) dermatology images characterised by complex shifts and imaging conditions. Complementing real training samples with synthetic ones improves the robustness of models in all three medical tasks and increases fairness by improving the accuracy of diagnosis within underrepresented groups. This approach leads to stark improvements OOD across modalities: 7.7% prediction accuracy improvement in histopathology, 5.2% in chest radiology with 44.6% lower fairness gap and a striking 63.5% improvement in high-risk sensitivity for dermatology with a 7.5x reduction in fairness gap.
A Data Driven Sequential Learning Framework to Accelerate and Optimize Multi-Objective Manufacturing Decisions
Authors: Hamed Khosravi, Taofeeq Olajire, Ahmed Shoyeb Raihan, Imtiaz Ahmed
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.09278
Pdf link: https://arxiv.org/pdf/2304.09278
Abstract Manufacturing advanced materials and products with a specific property or combination of properties is often warranted. To achieve that it is crucial to find out the optimum recipe or processing conditions that can generate the ideal combination of these properties. Most of the time, a sufficient number of experiments are needed to generate a Pareto front. However, manufacturing experiments are usually costly and even conducting a single experiment can be a time-consuming process. So, it's critical to determine the optimal location for data collection to gain the most comprehensive understanding of the process. Sequential learning is a promising approach to actively learn from the ongoing experiments, iteratively update the underlying optimization routine, and adapt the data collection process on the go. This paper presents a novel data-driven Bayesian optimization framework that utilizes sequential learning to efficiently optimize complex systems with multiple conflicting objectives. Additionally, this paper proposes a novel metric for evaluating multi-objective data-driven optimization approaches. This metric considers both the quality of the Pareto front and the amount of data used to generate it. The proposed framework is particularly beneficial in practical applications where acquiring data can be expensive and resource intensive. To demonstrate the effectiveness of the proposed algorithm and metric, the algorithm is evaluated on a manufacturing dataset. The results indicate that the proposed algorithm can achieve the actual Pareto front while processing significantly less data. It implies that the proposed data-driven framework can lead to similar manufacturing decisions with reduced costs and time.
Leveraging Deep Learning Techniques on Collaborative Filtering Recommender Systems
Authors: Ali Fallahi RahmatAbadi, Javad Mohammadzadeh
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2304.09282
Pdf link: https://arxiv.org/pdf/2304.09282
Abstract With the exponentially increasing volume of online data, searching and finding required information have become an extensive and time-consuming task. Recommender Systems as a subclass of information retrieval and decision support systems by providing personalized suggestions helping users access what they need more efficiently. Among the different techniques for building a recommender system, Collaborative Filtering (CF) is the most popular and widespread approach. However, cold start and data sparsity are the fundamental challenges ahead of implementing an effective CF-based recommender. Recent successful developments in enhancing and implementing deep learning architectures motivated many studies to propose deep learning-based solutions for solving the recommenders' weak points. In this research, unlike the past similar works about using deep learning architectures in recommender systems that covered different techniques generally, we specifically provide a comprehensive review of deep learning-based collaborative filtering recommender systems. This in-depth filtering gives a clear overview of the level of popularity, gaps, and ignored areas on leveraging deep learning techniques to build CF-based systems as the most influential recommenders.
Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search
Authors: Wenping Wang, Yunxi Guo, Chiyao Shen, Shuai Ding, Guangdeng Liao, Hao Fu, Pramodh Karanth Prabhakar
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09287
Pdf link: https://arxiv.org/pdf/2304.09287
Abstract Embedding based retrieval has seen its usage in a variety of search applications like e-commerce, social networking search etc. While the approach has demonstrated its efficacy in tasks like semantic matching and contextual search, it is plagued by the problem of uncontrollable relevance. In this paper, we conduct an analysis of embedding-based retrieval launched in early 2021 on our social network search engine, and define two main categories of failures introduced by it, integrity and junkiness. The former refers to issues such as hate speech and offensive content that can severely harm user experience, while the latter includes irrelevant results like fuzzy text matching or language mismatches. Efficient methods during model inference are further proposed to resolve the issue, including indexing treatments and targeted user cohort treatments, etc. Though being simple, we show the methods have good offline NDCG and online A/B tests metrics gain in practice. We analyze the reasons for the improvements, pointing out that our methods are only preliminary attempts to this important but challenging problem. We put forward potential future directions to explore.
From RSSE to BotSE: Potentials and Challenges Revisited after 15 Years
Authors: Walid Maalej
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2304.09308
Pdf link: https://arxiv.org/pdf/2304.09308
Abstract Both recommender systems and bots should proactively and smartly answer the questions of software developers or other project stakeholders to assist them in performing their tasks more efficiently. This paper reflects on the achievements from the more mature area of Recommendation Systems in Software Engineering (RSSE) as well as the rising area of Bots in Software Engineering (BotSE). We discuss the similarities and differences, briefly review current state of the art, and highlight three particular areas, in which the full potential is yet to be tapped: a more socio-technical context awareness, assisting knowledge sharing in addition to knowledge access, as well as covering repetitive or stimulative scenarios related to requirements and user-developer interaction.
Application of genetic algorithm to load balancing in networks with a homogeneous traffic flow
Authors: Marek Bolanowski (1), Alicja Gerka, Andrzej Paszkiewicz (1), Maria Ganzha (2), Marcin Paprzycki (2) ((1) Rzeszow University of Technology, (2) Systems Research Institute Polish Academy of Sciences)
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.09313
Pdf link: https://arxiv.org/pdf/2304.09313
Abstract The concept of extended cloud requires efficient network infrastructure to support ecosystems reaching form the edge to the cloud(s). Standard approaches to network load balancing deliver static solutions that are insufficient for the extended clouds, where network loads change often. To address this issue, a genetic algorithm based load optimizer is proposed and implemented. Next, its performance is experimentally evaluated and it is shown that it outperforms other existing solutions.
Provably-Efficient and Internally-Deterministic Parallel Union-Find
Authors: Alexander Fedorov, Diba Hashemi, Giorgi Nadiradze, Dan Alistarh
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.09331
Pdf link: https://arxiv.org/pdf/2304.09331
Abstract Determining the degree of inherent parallelism in classical sequential algorithms and leveraging it for fast parallel execution is a key topic in parallel computing, and detailed analyses are known for a wide range of classical algorithms. In this paper, we perform the first such analysis for the fundamental Union-Find problem, in which we are given a graph as a sequence of edges, and must maintain its connectivity structure under edge additions. We prove that classic sequential algorithms for this problem are well-parallelizable under reasonable assumptions, addressing a conjecture by [Blelloch, 2017]. More precisely, we show via a new potential argument that, under uniform random edge ordering, parallel union-find operations are unlikely to interfere: $T$ concurrent threads processing the graph in parallel will encounter memory contention $O(T^2 \cdot \log |V| \cdot \log |E|)$ times in expectation, where $|E|$ and $|V|$ are the number of edges and nodes in the graph, respectively. We leverage this result to design a new parallel Union-Find algorithm that is both internally deterministic, i.e., its results are guaranteed to match those of a sequential execution, but also work-efficient and scalable, as long as the number of threads $T$ is $O(|E|^{\frac{1}{3} - \varepsilon})$, for an arbitrarily small constant $\varepsilon > 0$, which holds for most large real-world graphs. We present lower bounds which show that our analysis is close to optimal, and experimental results suggesting that the performance cost of internal determinism is limited.
BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information Retrieval
Authors: Junwen Zheng, Martin Fischer
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2304.09333
Pdf link: https://arxiv.org/pdf/2304.09333
Abstract Efficient information retrieval (IR) from building information models (BIMs) poses significant challenges due to the necessity for deep BIM knowledge or extensive engineering efforts for automation. We introduce BIM-GPT, a prompt-based virtual assistant (VA) framework integrating BIM and generative pre-trained transformer (GPT) technologies to support NL-based IR. A prompt manager and dynamic template generate prompts for GPT models, enabling interpretation of NL queries, summarization of retrieved information, and answering BIM-related questions. In tests on a BIM IR dataset, our approach achieved 83.5% and 99.5% accuracy rates for classifying NL queries with no data and 2% data incorporated in prompts, respectively. Additionally, we validated the functionality of BIM-GPT through a VA prototype for a hospital building. This research contributes to the development of effective and versatile VAs for BIM IR in the construction industry, significantly enhancing BIM accessibility and reducing engineering efforts and training data requirements for processing NL queries.
Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles
Authors: Xiaoliang Ju, Yiyang Sun, Yiming Hao, Yikang Li, Yu Qiao, Hongsheng Li
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09365
Pdf link: https://arxiv.org/pdf/2304.09365
Abstract We propose a perception imitation method to simulate results of a certain perception model, and discuss a new heuristic route of autonomous driving simulator without data synthesis. The motivation is that original sensor data is not always necessary for tasks such as planning and control when semantic perception results are ready, so that simulating perception directly is more economic and efficient. In this work, a series of evaluation methods such as matching metric and performance of downstream task are exploited to examine the simulation quality. Experiments show that our method is effective to model the behavior of learning-based perception model, and can be further applied in the proposed simulation route smoothly.
SP-BatikGAN: An Efficient Generative Adversarial Network for Symmetric Pattern Generation
Authors: Chrystian, Wahyono
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2304.09384
Pdf link: https://arxiv.org/pdf/2304.09384
Abstract Following the contention of AI arts, our research focuses on bringing AI for all, particularly for artists, to create AI arts with limited data and settings. We are interested in geometrically symmetric pattern generation, which appears on many artworks such as Portuguese, Moroccan tiles, and Batik, a cultural heritage in Southeast Asia. Symmetric pattern generation is a complex problem, with prior research creating too-specific models for certain patterns only. We provide publicly, the first-ever 1,216 high-quality symmetric patterns straight from design files for this task. We then formulate symmetric pattern enforcement (SPE) loss to leverage underlying symmetric-based structures that exist on current image distributions. Our SPE improves and accelerates training on any GAN configuration, and, with efficient attention, SP-BatikGAN compared to FastGAN, the state-of-the-art GAN for limited setting, improves the FID score from 110.11 to 90.76, an 18% decrease, and model diversity recall score from 0.047 to 0.204, a 334% increase.
Information Geometrically Generalized Covariate Shift Adaptation
Authors: Masanari Kimura, Hideitsu Hino
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09387
Pdf link: https://arxiv.org/pdf/2304.09387
Abstract Many machine learning methods assume that the training and test data follow the same distribution. However, in the real world, this assumption is very often violated. In particular, the phenomenon that the marginal distribution of the data changes is called covariate shift, one of the most important research topics in machine learning. We show that the well-known family of covariate shift adaptation methods is unified in the framework of information geometry. Furthermore, we show that parameter search for geometrically generalized covariate shift adaptation method can be achieved efficiently. Numerical experiments show that our generalization can achieve better performance than the existing methods it encompasses.
Inferring High-level Geographical Concepts via Knowledge Graph and Multi-scale Data Integration: A Case Study of C-shaped Building Pattern Recognition
Authors: Zhiwei Wei, Yi Xiao, Wenjia Xu, Mi Shu, Lu Cheng, Yang Wang, Chunbo Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09391
Pdf link: https://arxiv.org/pdf/2304.09391
Abstract Effective building pattern recognition is critical for understanding urban form, automating map generalization, and visualizing 3D city models. Most existing studies use object-independent methods based on visual perception rules and proximity graph models to extract patterns. However, because human vision is a part-based system, pattern recognition may require decomposing shapes into parts or grouping them into clusters. Existing methods may not recognize all visually aware patterns, and the proximity graph model can be inefficient. To improve efficiency and effectiveness, we integrate multi-scale data using a knowledge graph, focusing on the recognition of C-shaped building patterns. First, we use a property graph to represent the relationships between buildings within and across different scales involved in C-shaped building pattern recognition. Next, we store this knowledge graph in a graph database and convert the rules for C-shaped pattern recognition and enrichment into query conditions. Finally, we recognize and enrich C-shaped building patterns using rule-based reasoning in the built knowledge graph. We verify the effectiveness of our method using multi-scale data with three levels of detail (LODs) collected from the Gaode Map. Our results show that our method achieves a higher recall rate of 26.4% for LOD1, 20.0% for LOD2, and 9.1% for LOD3 compared to existing approaches. We also achieve recognition efficiency improvements of 0.91, 1.37, and 9.35 times, respectively.
On the Capacity Region of Reconfigurable Intelligent Surface Assisted Symbiotic Radios
Authors: Qianqian Zhang, Hu Zhou, Ying-Chang Liang, Sumei Sun, Wei Zhang, H. Vincent Poor
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2304.09400
Pdf link: https://arxiv.org/pdf/2304.09400
Abstract In this paper, we are interested in reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR) systems, where an RIS assists a primary transmission by passive beamforming and simultaneously acts as an information transmitter by periodically adjusting its reflecting coefficients. The above modulation scheme innately enables a new multiplicative multiple access channel (M-MAC), where the primary and secondary signals are superposed in a multiplicative and additive manner. To pursue the fundamental performance limits of the M-MAC, we focus on the characterization of the capacity region of such systems. Due to the passive nature of RISs, the transmitted signal of the RIS should satisfy the peak power constraint. Under this constraint at the RIS as well as the average power constraint at the primary transmitter (PTx), we analyze the capacity-achieving distributions of the transmitted signals and characterize the capacity region of the M-MAC. Then, theoretical analysis is performed to reveal insights into the RIS-assisted SR. It is observed that: 1) the capacity region of the M-MAC is strictly convex and larger than that of the conventional TDMA scheme; 2) the secondary transmission can achieve the maximum rate when the PTx transmits the constant envelope signals; 3) and the sum rate can achieve the maximum when the PTx transmits Gaussian signals and the RIS transmits the constant envelope signals. Finally, extensive numerical results are provided to evaluate the performance of the RIS-assisted SR and verify the accuracy of our theoretical analysis.
Torque-based Deep Reinforcement Learning for Task-and-Robot Agnostic Learning on Bipedal Robots Using Sim-to-Real Transfer
Authors: Donghyeon Kim, Glen Berseth, Mathew Schwartz, Jaeheung Park
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09434
Pdf link: https://arxiv.org/pdf/2304.09434
Abstract In this paper, we review the question of which action space is best suited for controlling a real biped robot in combination with Sim2Real training. Position control has been popular as it has been shown to be more sample efficient and intuitive to combine with other planning algorithms. However, for position control gain tuning is required to achieve the best possible policy performance. We show that instead, using a torque-based action space enables task-and-robot agnostic learning with less parameter tuning and mitigates the sim-to-reality gap by taking advantage of torque control's inherent compliance. Also, we accelerate the torque-based-policy training process by pre-training the policy to remain upright by compensating for gravity. The paper showcases the first successful sim-to-real transfer of a torque-based deep reinforcement learning policy on a real human-sized biped robot. The video is available at https://youtu.be/CR6pTS39VRE.
Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators
Authors: Dongwon Son, Beomjoon Kim
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09439
Pdf link: https://arxiv.org/pdf/2304.09439
Abstract Our goal is to develop an efficient contact detection algorithm for large-scale GPU-based simulation of non-convex objects. Current GPU-based simulators such as IsaacGym and Brax must trade-off speed with fidelity, generality, or both when simulating non-convex objects. Their main issue lies in contact detection (CD): existing CD algorithms, such as Gilbert-Johnson-Keerthi (GJK), must trade off their computational speed with accuracy which becomes expensive as the number of collisions among non-convex objects increases. We propose a data-driven approach for CD, whose accuracy depends only on the quality and quantity of offline dataset rather than online computation time. Unlike GJK, our method inherently has a uniform computational flow, which facilitates efficient GPU usage based on advanced compilers such as XLA (Accelerated Linear Algebra). Further, we offer a data-efficient solution by learning the patterns of colliding local crop object shapes, rather than global object shapes which are harder to learn. We demonstrate our approach improves the efficiency of existing CD methods by a factor of 5-10 for non-convex objects with comparable accuracy. Using the previous work on contact resolution for a neural-network-based contact detector, we integrate our CD algorithm into the open-source GPU-based simulator, Brax, and show that we can improve the efficiency over IsaacGym and generality over standard Brax. We highly recommend the videos of our simulator included in the supplementary materials.
Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment
Authors: Hsiang-Wei Huang, Cheng-Yen Yang, Zhongyu Jiang, Pyong-Kun Kim, Kyoungoh Lee, Kwangju Kim, Samartha Ramkumar, Chaitanya Mullapudi, In-Su Jang, Chung-I Huang, Jenq-Neng Hwang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09471
Pdf link: https://arxiv.org/pdf/2304.09471
Abstract Multi-camera multiple people tracking has become an increasingly important area of research due to the growing demand for accurate and efficient indoor people tracking systems, particularly in settings such as retail, healthcare centers, and transit hubs. We proposed a novel multi-camera multiple people tracking method that uses anchor-guided clustering for cross-camera re-identification and spatio-temporal consistency for geometry-based cross-camera ID reassigning. Our approach aims to improve the accuracy of tracking by identifying key features that are unique to every individual and utilizing the overlap of views between cameras to predict accurate trajectories without needing the actual camera parameters. The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data. The proposed method is evaluated on CVPR AI City Challenge 2023 dataset, achieving IDF1 of 95.36% with the first-place ranking in the challenge. The code is available at: https://github.com/ipl-uw/AIC23_Track1_UWIPL_ETRI.
Learning Resource Scheduling with High Priority Users using Deep Deterministic Policy Gradients
Authors: Steffen Gracla, Edgar Beck, Carsten Bockelmann, Armin Dekorsy
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09488
Pdf link: https://arxiv.org/pdf/2304.09488
Abstract Advances in mobile communication capabilities open the door for closer integration of pre-hospital and in-hospital care processes. For example, medical specialists can be enabled to guide on-site paramedics and can, in turn, be supplied with live vitals or visuals. Consolidating such performance-critical applications with the highly complex workings of mobile communications requires solutions both reliable and efficient, yet easy to integrate with existing systems. This paper explores the application of Deep Deterministic Policy Gradient~(\ddpg) methods for learning a communications resource scheduling algorithm with special regards to priority users. Unlike the popular Deep-Q-Network methods, the \ddpg is able to produce continuous-valued output. With light post-processing, the resulting scheduler is able to achieve high performance on a flexible sum-utility goal.
Neural Network Quantisation for Faster Homomorphic Encryption
Authors: Wouter Legiest, Jan-Pieter D'Anvers, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.09490
Pdf link: https://arxiv.org/pdf/2304.09490
Abstract Homomorphic encryption (HE) enables calculating on encrypted data, which makes it possible to perform privacypreserving neural network inference. One disadvantage of this technique is that it is several orders of magnitudes slower than calculation on unencrypted data. Neural networks are commonly trained using floating-point, while most homomorphic encryption libraries calculate on integers, thus requiring a quantisation of the neural network. A straightforward approach would be to quantise to large integer sizes (e.g. 32 bit) to avoid large quantisation errors. In this work, we reduce the integer sizes of the networks, using quantisation-aware training, to allow more efficient computations. For the targeted MNIST architecture proposed by Badawi et al., we reduce the integer sizes by 33% without significant loss of accuracy, while for the CIFAR architecture, we can reduce the integer sizes by 43%. Implementing the resulting networks under the BFV homomorphic encryption scheme using SEAL, we could reduce the execution time of an MNIST neural network by 80% and by 40% for a CIFAR neural network.
Sampling is Matter: Point-guided 3D Human Mesh Reconstruction
Authors: Jeonghwan Kim (1), Mi-Gyeong Gwon (1), Hyunwoo Park (1), Hyukmin Kwon (2), Gi-Mun Um (2), Wonjun Kim (1) ((1) Konkuk University, (2) Electronics and Telecommunications Research Institute)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09502
Pdf link: https://arxiv.org/pdf/2304.09502
Abstract This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image. Most recently, the non-local interactions of the whole mesh vertices have been effectively estimated in the transformer while the relationship between body parts also has begun to be handled via the graph model. Even though those approaches have shown the remarkable progress in 3D human mesh reconstruction, it is still difficult to directly infer the relationship between features, which are encoded from the 2D input image, and 3D coordinates of each vertex. To resolve this problem, we propose to design a simple feature sampling scheme. The key idea is to sample features in the embedded space by following the guide of points, which are estimated as projection results of 3D mesh vertices (i.e., ground truth). This helps the model to concentrate more on vertex-relevant features in the 2D space, thus leading to the reconstruction of the natural human pose. Furthermore, we apply progressive attention masking to precisely estimate local interactions between vertices even under severe occlusions. Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction. The code and model are publicly available at: https://github.com/DCVL-3D/PointHMR_release.
Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand
Authors: Yongkang Luo, Wanyi Li, Peng Wang, Haonan Duan, Wei Wei, Jia Sun
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09526
Pdf link: https://arxiv.org/pdf/2304.09526
Abstract Dexterous in-hand manipulation for a multi-fingered anthropomorphic hand is extremely difficult because of the high-dimensional state and action spaces, rich contact patterns between the fingers and objects. Even though deep reinforcement learning has made moderate progress and demonstrated its strong potential for manipulation, it is still faced with certain challenges, such as large-scale data collection and high sample complexity. Especially, for some slight change scenes, it always needs to re-collect vast amounts of data and carry out numerous iterations of fine-tuning. Remarkably, humans can quickly transfer learned manipulation skills to different scenarios with little supervision. Inspired by human flexible transfer learning capability, we propose a novel dexterous in-hand manipulation progressive transfer learning framework (PTL) based on efficiently utilizing the collected trajectories and the source-trained dynamics model. This framework adopts progressive neural networks for dynamics model transfer learning on samples selected by a new samples selection method based on dynamics properties, rewards and scores of the trajectories. Experimental results on contact-rich anthropomorphic hand manipulation tasks show that our method can efficiently and effectively learn in-hand manipulation skills with a few online attempts and adjustment learning under the new scene. Compared to learning from scratch, our method can reduce training time costs by 95%.
SelfAct: Personalized Activity Recognition based on Self-Supervised and Active Learning
Authors: Luca Arrotta, Gabriele Civitarese, Samuele Valente, Claudio Bettini
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.09530
Pdf link: https://arxiv.org/pdf/2304.09530
Abstract Supervised Deep Learning (DL) models are currently the leading approach for sensor-based Human Activity Recognition (HAR) on wearable and mobile devices. However, training them requires large amounts of labeled data whose collection is often time-consuming, expensive, and error-prone. At the same time, due to the intra- and inter-variability of activity execution, activity models should be personalized for each user. In this work, we propose SelfAct: a novel framework for HAR combining self-supervised and active learning to mitigate these problems. SelfAct leverages a large pool of unlabeled data collected from many users to pre-train through self-supervision a DL model, with the goal of learning a meaningful and efficient latent representation of sensor data. The resulting pre-trained model can be locally used by new users, which will fine-tune it thanks to a novel unsupervised active learning strategy. Our experiments on two publicly available HAR datasets demonstrate that SelfAct achieves results that are close to or even better than the ones of fully supervised approaches with a small number of active learning queries.
Graph Exploration for Effective Multi-agent Q-Learning
Authors: Ainur Zhaikhan, Ali H. Sayed
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2304.09547
Pdf link: https://arxiv.org/pdf/2304.09547
Abstract This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents. We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuous-state scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for continuous ones.
The State-of-the-Art in Air Pollution Monitoring and Forecasting Systems using IoT, Big Data, and Machine Learning
Authors: Amisha Gangwar, Sudhakar Singh, Richa Mishra, Shiv Prakash
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.09574
Pdf link: https://arxiv.org/pdf/2304.09574
Abstract The quality of air is closely linked with the life quality of humans, plantations, and wildlife. It needs to be monitored and preserved continuously. Transportations, industries, construction sites, generators, fireworks, and waste burning have a major percentage in degrading the air quality. These sources are required to be used in a safe and controlled manner. Using traditional laboratory analysis or installing bulk and expensive models every few miles is no longer efficient. Smart devices are needed for collecting and analyzing air data. The quality of air depends on various factors, including location, traffic, and time. Recent researches are using machine learning algorithms, big data technologies, and the Internet of Things to propose a stable and efficient model for the stated purpose. This review paper focuses on studying and compiling recent research in this field and emphasizes the Data sources, Monitoring, and Forecasting models. The main objective of this paper is to provide the astuteness of the researches happening to improve the various aspects of air polluting models. Further, it casts light on the various research issues and challenges also.
DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation
Authors: Yu Guo, Ryan Wen Liu, Jiangtian Nie, Lingjuan Lyu, Zehui Xiong, Jiawen Kang, Han Yu, Dusit Niyato
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09588
Pdf link: https://arxiv.org/pdf/2304.09588
Abstract Visual surveillance technology is an indispensable functional component of advanced traffic management systems. It has been applied to perform traffic supervision tasks, such as object detection, tracking and recognition. However, adverse weather conditions, e.g., fog, haze and mist, pose severe challenges for video-based transportation surveillance. To eliminate the influences of adverse weather conditions, we propose a dual attention and dual frequency-guided dehazing network (termed DADFNet) for real-time visibility enhancement. It consists of a dual attention module (DAM) and a high-low frequency-guided sub-net (HLFN) to jointly consider the attention and frequency mapping to guide haze-free scene reconstruction. Extensive experiments on both synthetic and real-world images demonstrate the superiority of DADFNet over state-of-the-art methods in terms of visibility enhancement and improvement in detection accuracy. Furthermore, DADFNet only takes $6.3$ ms to process a 1,920 * 1,080 image on the 2080 Ti GPU, making it highly efficient for deployment in intelligent transportation systems.
Efficient High-Order Space-Angle-Energy Polytopic Discontinuous Galerkin Finite Element Methods for Linear Boltzmann Transport
Authors: Paul Houston, Matthew E. Hubbard, Thomas J. Radley, Oliver J. Sutton, Richard S.J. Widdowson
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.09592
Pdf link: https://arxiv.org/pdf/2304.09592
Abstract We introduce an $hp$-version discontinuous Galerkin finite element method (DGFEM) for the linear Boltzmann transport problem. A key feature of this new method is that, while offering arbitrary order convergence rates, it may be implemented in an almost identical form to standard multigroup discrete ordinates methods, meaning that solutions can be computed efficiently with high accuracy and in parallel within existing software. This method provides a unified discretisation of the space, angle, and energy domains of the underlying integro-differential equation and naturally incorporates both local mesh and local polynomial degree variation within each of these computational domains. Moreover, general polytopic elements can be handled by the method, enabling efficient discretisations of problems posed on complicated spatial geometries. We study the stability and $hp$-version a priori error analysis of the proposed method, by deriving suitable $hp$-approximation estimates together with a novel inf-sup bound. Numerical experiments highlighting the performance of the method for both polyenergetic and monoenergetic problems are presented.
AdapterGNN: Efficient Delta Tuning Improves Generalization Ability in Graph Neural Networks
Authors: Shengrui Li, Xueting Han, Jing Bai
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09595
Pdf link: https://arxiv.org/pdf/2304.09595
Abstract Fine-tuning pre-trained models has recently yielded remarkable performance gains in graph neural networks (GNNs). In addition to pre-training techniques, inspired by the latest work in the natural language fields, more recent work has shifted towards applying effective fine-tuning approaches, such as parameter-efficient tuning (delta tuning). However, given the substantial differences between GNNs and transformer-based models, applying such approaches directly to GNNs proved to be less effective. In this paper, we present a comprehensive comparison of delta tuning techniques for GNNs and propose a novel delta tuning method specifically designed for GNNs, called AdapterGNN. AdapterGNN preserves the knowledge of the large pre-trained model and leverages highly expressive adapters for GNNs, which can adapt to downstream tasks effectively with only a few parameters, while also improving the model's generalization ability on the downstream tasks. Extensive experiments show that AdapterGNN achieves higher evaluation performance (outperforming full fine-tuning by 1.4% and 5.5% in the chemistry and biology domains respectively, with only 5% of its parameters tuned) and lower generalization gaps compared to full fine-tuning. Moreover, we empirically show that a larger GNN model can have a worse generalization ability, which differs from the trend observed in large language models. We have also provided a theoretical justification for delta tuning can improve the generalization ability of GNNs by applying generalization bounds.
LEA: Beyond Evolutionary Algorithms via Learned Optimization Strategy
Authors: Kai Wu, Penghui Liu, Jing Liu
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09599
Pdf link: https://arxiv.org/pdf/2304.09599
Abstract Evolutionary algorithms (EAs) have emerged as a powerful framework for expensive black-box optimization. Obtaining better solutions with less computational cost is essential and challenging for black-box optimization. The most critical obstacle is figuring out how to effectively use the target task information to form an efficient optimization strategy. However, current methods are weak due to the poor representation of the optimization strategy and the inefficient interaction between the optimization strategy and the target task. To overcome the above limitations, we design a learned EA (LEA) to realize the move from hand-designed optimization strategies to learned optimization strategies, including not only hyperparameters but also update rules. Unlike traditional EAs, LEA has high adaptability to the target task and can obtain better solutions with less computational cost. LEA is also able to effectively utilize the low-fidelity information of the target task to form an efficient optimization strategy. The experimental results on one synthetic case, CEC 2013, and two real-world cases show the advantages of learned optimization strategies over human-designed baselines. In addition, LEA is friendly to the acceleration provided by Graphics Processing Units and runs 102 times faster than unaccelerated EA when evolving 32 populations, each containing 6400 individuals.
StyleDEM: a Versatile Model for Authoring Terrains
Authors: Simon Perche, Adrien Peytavie, Bedrich Benes, Eric Galin, Eric Guérin
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09626
Pdf link: https://arxiv.org/pdf/2304.09626
Abstract Many terrain modelling methods have been proposed for the past decades, providing efficient and often interactive authoring tools. However, they generally do not include any notion of style, which is a critical aspect for designers in the entertainment industry. We introduce StyleDEM, a new generative adversarial network method for terrain synthesis and authoring, with a versatile toolbox of authoring methods with style. This method starts from an input sketch or an existing terrain. It outputs a terrain with features that can be authored using interactive brushes and enhanced with additional tools such as style manipulation or super-resolution. The strength of our approach resides in the versatility and interoperability of the toolbox.
Integrated Ray-Tracing and Coverage Planning Control using Reinforcement Learning
Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.09631
Pdf link: https://arxiv.org/pdf/2304.09631
Abstract In this work we propose a coverage planning control approach which allows a mobile agent, equipped with a controllable sensor (i.e., a camera) with limited sensing domain (i.e., finite sensing range and angle of view), to cover the surface area of an object of interest. The proposed approach integrates ray-tracing into the coverage planning process, thus allowing the agent to identify which parts of the scene are visible at any point in time. The problem of integrated ray-tracing and coverage planning control is first formulated as a constrained optimal control problem (OCP), which aims at determining the agent's optimal control inputs over a finite planning horizon, that minimize the coverage time. Efficiently solving the resulting OCP is however very challenging due to non-convex and non-linear visibility constraints. To overcome this limitation, the problem is converted into a Markov decision process (MDP) which is then solved using reinforcement learning. In particular, we show that a controller which follows an optimal control law can be learned using off-policy temporal-difference control (i.e., Q-learning). Extensive numerical experiments demonstrate the effectiveness of the proposed approach for various configurations of the agent and the object of interest.
Resource Allocation in the RIS Assisted SCMA Cellular Network Coexisting with D2D Communications
Authors: Yukai Liu, Wen Chen, Kunlun Wang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.09646
Pdf link: https://arxiv.org/pdf/2304.09646
Abstract The cellular network coexisting with device-to-device (D2D) communications has been studied extensively. Reconfigurable intelligent surface (RIS) and non-orthogonal multiple access (NOMA) are promising technologies for the evolution of 5G, 6G and beyond. Besides, sparse code multiple access (SCMA) is considered suitable for next-generation wireless network in code-domain NOMA. In this paper, we consider the RIS-aided uplink SCMA cellular network simultaneously with D2D users. We formulate the optimization problem which aims to maximize the cellular sum-rate by jointly designing D2D users resource block (RB) association, the transmitted power for both cellular users and D2D users, and the phase shifts at the RIS. The power limitation and users communication requirements are considered. The problem is non-convex, and it is challenging to solve it directly. To handle this optimization problem, we propose an efficient iterative algorithm based on block coordinate descent (BCD) method. The original problem is decoupled into three subproblems to solve separately. Simulation results demonstrate that the proposed scheme can significantly improve the sum-rate performance over various schemes.
List Defective Colorings: Distributed Algorithms and Applications
Authors: Marc Fuchs, Fabian Kuhn
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.09666
Pdf link: https://arxiv.org/pdf/2304.09666
Abstract The distributed coloring problem is at the core of the area of distributed graph algorithms and it is a problem that has seen tremendous progress over the last few years. Much of the remarkable recent progress on deterministic distributed coloring algorithms is based on two main tools: a) defective colorings in which every node of a given color can have a limited number of neighbors of the same color and b) list coloring, a natural generalization of the standard coloring problem that naturally appears when colorings are computed in different stages and one has to extend a previously computed partial coloring to a full coloring. In this paper, we introduce \emph{list defective colorings}, which can be seen as a generalization of these two coloring variants. Essentially, in a list defective coloring instance, each node $v$ is given a list of colors $x{v,1},\dots,x{v,p}$ together with a list of defects $d{v,1},\dots,d{v,p}$ such that if $v$ is colored with color $x{v, i}$, it is allowed to have at most $d{v, i}$ neighbors with color $x{v, i}$. We highlight the important role of list defective colorings by showing that faster list defective coloring algorithms would directly lead to faster deterministic $(\Delta+1)$-coloring algorithms in the LOCAL model. Further, we extend a recent distributed list coloring algorithm by Maus and Tonoyan [DISC '20]. Slightly simplified, we show that if for each node $v$ it holds that $\sum{i=1}^p \big(d_{v,i}+1)^2 > \mathrm{deg}_G^2(v)\cdot polylog\Delta$ then this list defective coloring instance can be solved in a communication-efficient way in only $O(\log\Delta)$ communication rounds. This leads to the first deterministic $(\Delta+1)$-coloring algorithm in the standard CONGEST model with a time complexity of $O(\sqrt{\Delta}\cdot polylog \Delta+\log^* n)$, matching the best time complexity in the LOCAL model up to a $polylog\Delta$ factor.
Operations for D-algebraic Functions
Authors: Bertrand Teguia Tabuguia
Subjects: Symbolic Computation (cs.SC)
Arxiv link: https://arxiv.org/abs/2304.09675
Pdf link: https://arxiv.org/pdf/2304.09675
Abstract A function is differentially algebraic (or simply D-algebraic) if there is a polynomial relationship between some of its derivatives and the indeterminate variable. Many functions in the sciences, such as Mathieu functions, the Weierstrass elliptic functions, and holonomic or D-finite functions are D-algebraic. These functions form a field, and are closed under composition, taking functional inverse, and derivation. We present implementation for each underlying operation. We also give a systematic way for computing an algebraic differential equation from a linear differential equation with D-finite function coefficients. Each command is a feature of our Maple package $NLDE$ available at https://mathrepo.mis.mpg.de/OperationsForDAlgebraicFunctions.
GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database
Authors: Weixing Zhou, Qi Peng, Zijie Zhang, Yanfeng Zhang, Yang Ren, Sihao Li, Guo Fu, Yulong Cui, Qiang Li, Caiyi Wu, Shangjun Han, Shengyi Wang, Guoliang Li, Ge Yu
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2304.09692
Pdf link: https://arxiv.org/pdf/2304.09692
Abstract Multinational enterprises conduct global business that has a demand for geo-distributed transactional databases. Existing state-of-the-art databases adopt a sharded master-follower replication architecture. However, the single-master serving mode incurs massive cross-region writes from clients, and the sharded architecture requires multiple round-trip acknowledgments (e.g., 2PC) to ensure atomicity for cross-shard transactions. These limitations drive us to seek yet another design choice. In this paper, we propose a strongly consistent OLTP database GeoGauss with full replica multi-master architecture. To efficiently merge the updates from different master nodes, we propose a multi-master OCC that unifies data replication and concurrent transaction processing. By leveraging an epoch-based delta state merge rule and the optimistic asynchronous execution, GeoGauss ensures strong consistency with light-coordinated protocol and allows more concurrency with weak isolation, which are sufficient to meet our needs. Our geo-distributed experimental results show that GeoGauss achieves 7.06X higher throughput and 17.41X lower latency than the state-of-the-art geo-distributed database CockroachDB on the TPC-C benchmark.
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Authors: Romain Loiseau, Elliot Vincent, Mathieu Aubry, Loic Landrieu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09704
Pdf link: https://arxiv.org/pdf/2304.09704
Abstract We propose an unsupervised method for parsing large 3D scans of real-world scenes into interpretable parts. Our goal is to provide a practical tool for analyzing 3D scenes with unique characteristics in the context of aerial surveying and mapping, without relying on application-specific user annotations. Our approach is based on a probabilistic reconstruction model that decomposes an input 3D point cloud into a small set of learned prototypical shapes. Our model provides an interpretable reconstruction of complex scenes and leads to relevant instance and semantic segmentations. To demonstrate the usefulness of our results, we introduce a novel dataset of seven diverse aerial LiDAR scans. We show that our method outperforms state-of-the-art unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. Our method offers significant advantage over existing approaches, as it does not require any manual annotations, making it a practical and efficient tool for 3D scene analysis. Our code and dataset are available at https://imagine.enpc.fr/~loiseaur/learnable-earth-parser
Grooming Connectivity Intents in IP-Optical Networks Using Directed Acyclic Graphs
Authors: Filippos Christou, Andreas Kirstädter
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.09711
Pdf link: https://arxiv.org/pdf/2304.09711
Abstract During the last few years, there have been concentrated efforts toward intent-driven networking. While relying upon Software-Defined Networking (SDN), Intent-Based Networking (IBN) pushes the frontiers of efficient networking by decoupling the intentions of a network operator (i.e., what is desired to be done) from the implementation (i.e., how is it achieved). The advantages of such a paradigm have long been argued and include, but are not limited to, the reduction of human errors, reduced expertise requirements among operator personnel, and faster business plan adaptation. In previous work, we have shown how incorporating IBN in multi-domain networks can have a significantly positive impact as it can enable decentralized operation, accountability, and confidentiality. The pillar of our previous contribution is the compilation of intents using system-generated intent trees. In this work, we extend the architecture to enable grooming among the user intents. Therefore, separate intents can now end up using the same network resources. While this makes the intent system reasonably more complex, it indisputably improves resource allocation. To represent the intent relationships of the newly enhanced architecture, we use Directed Acyclic Graphs (DAGs). Furthermore, we appropriately adapt an advanced established technique from the literature to solve the Routing, Modulation, and Spectrum Assignment (RMSA) problem for the intent compilation. We demonstrate a realistic scenario in which we evaluate our architecture and the intent compilation strategy. Our current approach successfully consolidates the advantages of having an intent-driven architecture and, at the same time, flexibly choosing among advanced resource allocation techniques.
A compact simple HWENO scheme with ADER time discretization for hyperbolic conservation laws I: structured meshes
Authors: Dongmi Luo, Shiyi Li, Jianxian Qiu, Jun Zhu, Yibing Chen
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2304.09724
Pdf link: https://arxiv.org/pdf/2304.09724
Abstract In this paper, a compact and high order ADER (Arbitrary high order using DERivatives) scheme using the simple HWENO method (ADER-SHWENO) is proposed for hyperbolic conservation laws. The newly-developed method employs the Lax-Wendroff procedure to convert time derivatives to spatial derivatives, which provides the time evolution of the variables at the cell interfaces. This information is required for the simple HWENO reconstructions, which take advantages of the simple WENO and the classic HWENO. Compared with the original Runge-Kutta HWENO method (RK-HWENO), the new method has two advantages. Firstly, RK-HWENO method must solve the additional equations for reconstructions, which is avoided for the new method. Secondly, the SHWENO reconstruction is performed once with one stencil and is different from the classic HWENO methods, in which both the function and its derivative values are reconstructed with two different stencils, respectively. Thus the new method is more efficient than the RK-HWENO method. Moreover, the new method is more compact than the existing ADER-WENO method. Besides, the new method makes the best use of the information in the ADER method. Thus, the time evolution of the cell averages of the derivatives is simpler than that developed in the work [Li et. al., 447 (2021), 110661.]. Numerical tests indicate that the new method can achieve high order for smooth solutions both in space and time, keep non-oscillatory at discontinuities.
A Multi-robot Coverage Path Planning Algorithm Based on Improved DARP Algorithm
Authors: Yufan Huang, Man Li, Tao Zhao
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09741
Pdf link: https://arxiv.org/pdf/2304.09741
Abstract The research on multi-robot coverage path planning (CPP) has been attracting more and more attention. In order to achieve efficient coverage, this paper proposes an improved DARP coverage algorithm. The improved DARP algorithm based on A* algorithm is used to assign tasks to robots and then combined with STC algorithm based on Up-First algorithm to achieve full coverage of the task area. Compared with the initial DARP algorithm, this algorithm has higher efficiency and higher coverage rate.
Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently
Authors: Jamshaid Ul Rahman, Faiza Makhdoom, Dianchen Lu
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2304.09759
Pdf link: https://arxiv.org/pdf/2304.09759
Abstract Many industrial and real life problems exhibit highly nonlinear periodic behaviors and the conventional methods may fall short of finding their analytical or closed form solutions. Such problems demand some cutting edge computational tools with increased functionality and reduced cost. Recently, deep neural networks have gained massive research interest due to their ability to handle large data and universality to learn complex functions. In this work, we put forward a methodology based on deep neural networks with responsive layers structure to deal nonlinear oscillations in microelectromechanical systems. We incorporated some oscillatory and non oscillatory activation functions such as growing cosine unit known as GCU, Sine, Mish and Tanh in our designed network to have a comprehensive analysis on their performance for highly nonlinear and vibrational problems. Integrating oscillatory activation functions with deep neural networks definitely outperform in predicting the periodic patterns of underlying systems. To support oscillatory actuation for nonlinear systems, we have proposed a novel oscillatory activation function called Amplifying Sine Unit denoted as ASU which is more efficient than GCU for complex vibratory systems such as microelectromechanical systems. Experimental results show that the designed network with our proposed activation function ASU is more reliable and robust to handle the challenges posed by nonlinearity and oscillations. To validate the proposed methodology, outputs of our networks are being compared with the results from Livermore solver for ordinary differential equation called LSODA. Further, graphical illustrations of incurred errors are also being presented in the work.
Nearly Work-Efficient Parallel DFS in Undirected Graphs
Authors: Mohsen Ghaffari, Christoph Grunau, Jiahao Qu
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.09774
Pdf link: https://arxiv.org/pdf/2304.09774
Abstract We present the first parallel depth-first search algorithm for undirected graphs that has near-linear work and sublinear depth. Concretely, in any $n$-node $m$-edge undirected graph, our algorithm computes a DFS in $\tilde{O}(\sqrt{n})$ depth and using $\tilde{O}(m+n)$ work. All prior work either required $\Omega(n)$ depth, and thus were essentially sequential, or needed a high $poly(n)$ work and thus were far from being work-efficient.
Post-Training Quantization for Object Detection
Authors: Lin Niu, Jiawei Liu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09785
Pdf link: https://arxiv.org/pdf/2304.09785
Abstract Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce model inference complexity. But it suffers severe accuracy drop when applied to complex tasks such as object detection. PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization. The p-norm distance of feature maps before and after quantization, Lp, is widely used as the metric to evaluate perturbation. For the specialty of object detection network, we observe that the parameter p in Lp metric will significantly influence its quantization performance. We indicate that using a fixed hyper-parameter p does not achieve optimal quantization performance. To mitigate this problem, we propose a framework, DetPTQ, to assign different p values for quantizing different layers using an Object Detection Output Loss (ODOL), which represents the task loss of object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters. Experiments show that our DetPTQ outperforms the state-of-the-art PTQ methods by a significant margin on both 2D and 3D object detectors. For example, we achieve 31.1/31.7(quantization/full-precision) mAP on RetinaNet-ResNet18 with 4-bit weight and 4-bit activation.
NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09787
Pdf link: https://arxiv.org/pdf/2304.09787
Abstract Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Authors: Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, Yu Li
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09797
Pdf link: https://arxiv.org/pdf/2304.09797
Abstract The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted an extensive and comprehensive evaluation to demonstrate the effectiveness of the proposed method. Our experimental results on six benchmarks show that combining CoT and self-consistency with PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (91.9%), GSM8K (95.5%) and AQuA (79.9%).
VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene
Authors: Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09807
Pdf link: https://arxiv.org/pdf/2304.09807
Abstract High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene. VMA is highly efficient and extensible, requiring negligible human effort, and flexible in terms of spatial scale and element type. We quantitatively and qualitatively validate the annotation performance on real-world urban and highway scenes, as well as NYC Planimetric Database. VMA can significantly improve map generation efficiency and require little human effort. On average VMA takes 160min for annotating a scene with a range of hundreds of meters, and reduces 52.3% of the human cost, showing great application value.
FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing
Authors: Kyle Stachowicz, Dhruv Shah, Arjun Bhorkar, Ilya Kostrikov, Sergey Levine
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09831
Pdf link: https://arxiv.org/pdf/2304.09831
Abstract We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL). Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. Our system integrates a number of important components to make this possible: we initialize the representations for the RL policy and value function from a large prior dataset of other robots navigating in other environments (at low speed), which provides a navigation-relevant representation. From here, a sample-efficient online RL method uses a single low-speed user-provided demonstration to determine the desired driving course, extracts a set of navigational checkpoints, and autonomously practices driving through these checkpoints, resetting automatically on collision or failure. Perhaps surprisingly, we find that with appropriate initialization and choice of algorithm, our system can learn to drive over a variety of racing courses with less than 20 minutes of online training. The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.
Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction
Authors: Serge Kas Hanna
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2304.09839
Pdf link: https://arxiv.org/pdf/2304.09839
Abstract Consider two or more strings $\mathbf{x}^1,\mathbf{x}^2,\ldots,$ that are concatenated to form $\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle$. Suppose that up to $\delta$ deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in $\mathbf{x}$ in order to recover the boundaries of $\mathbf{x}^1,\mathbf{x}^2,\ldots$? This boundary problem is equivalent to the problem of designing codes that can detect the exact number of deletions in each concatenated string. In this work, we answer the question above by first deriving converse results that give lower bounds on the redundancy of deletion-detecting codes. Then, we present a marker-based code construction whose redundancy is asymptotically optimal in $\delta$ among all families of deletion-detecting codes, and exactly optimal among all block-by-block decodable codes. To exemplify the usefulness of such deletion-detecting codes, we apply our code to trace reconstruction and design an efficient coded reconstruction scheme that requires a constant number of traces.
Transformer-Based Visual Segmentation: A Survey
Authors: Xiangtai Li, Henghui Ding, Wenwei Zhang, Haobo Yuan, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09854
Pdf link: https://arxiv.org/pdf/2304.09854
Abstract Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several closely related settings, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmenation-With-Transformer. We will also continually monitor developments in this rapidly evolving field.
Keyword: faster

LEA: Beyond Evolutionary Algorithms via Learned Optimization Strategy
Authors: Kai Wu, Penghui Liu, Jing Liu
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09599
Pdf link: https://arxiv.org/pdf/2304.09599
Abstract Evolutionary algorithms (EAs) have emerged as a powerful framework for expensive black-box optimization. Obtaining better solutions with less computational cost is essential and challenging for black-box optimization. The most critical obstacle is figuring out how to effectively use the target task information to form an efficient optimization strategy. However, current methods are weak due to the poor representation of the optimization strategy and the inefficient interaction between the optimization strategy and the target task. To overcome the above limitations, we design a learned EA (LEA) to realize the move from hand-designed optimization strategies to learned optimization strategies, including not only hyperparameters but also update rules. Unlike traditional EAs, LEA has high adaptability to the target task and can obtain better solutions with less computational cost. LEA is also able to effectively utilize the low-fidelity information of the target task to form an efficient optimization strategy. The experimental results on one synthetic case, CEC 2013, and two real-world cases show the advantages of learned optimization strategies over human-designed baselines. In addition, LEA is friendly to the acceleration provided by Graphics Processing Units and runs 102 times faster than unaccelerated EA when evolving 32 populations, each containing 6400 individuals.
List Defective Colorings: Distributed Algorithms and Applications
Authors: Marc Fuchs, Fabian Kuhn
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.09666
Pdf link: https://arxiv.org/pdf/2304.09666
Abstract The distributed coloring problem is at the core of the area of distributed graph algorithms and it is a problem that has seen tremendous progress over the last few years. Much of the remarkable recent progress on deterministic distributed coloring algorithms is based on two main tools: a) defective colorings in which every node of a given color can have a limited number of neighbors of the same color and b) list coloring, a natural generalization of the standard coloring problem that naturally appears when colorings are computed in different stages and one has to extend a previously computed partial coloring to a full coloring. In this paper, we introduce \emph{list defective colorings}, which can be seen as a generalization of these two coloring variants. Essentially, in a list defective coloring instance, each node $v$ is given a list of colors $x{v,1},\dots,x{v,p}$ together with a list of defects $d{v,1},\dots,d{v,p}$ such that if $v$ is colored with color $x{v, i}$, it is allowed to have at most $d{v, i}$ neighbors with color $x{v, i}$. We highlight the important role of list defective colorings by showing that faster list defective coloring algorithms would directly lead to faster deterministic $(\Delta+1)$-coloring algorithms in the LOCAL model. Further, we extend a recent distributed list coloring algorithm by Maus and Tonoyan [DISC '20]. Slightly simplified, we show that if for each node $v$ it holds that $\sum{i=1}^p \big(d_{v,i}+1)^2 > \mathrm{deg}_G^2(v)\cdot polylog\Delta$ then this list defective coloring instance can be solved in a communication-efficient way in only $O(\log\Delta)$ communication rounds. This leads to the first deterministic $(\Delta+1)$-coloring algorithm in the standard CONGEST model with a time complexity of $O(\sqrt{\Delta}\cdot polylog \Delta+\log^* n)$, matching the best time complexity in the LOCAL model up to a $polylog\Delta$ factor.
Grooming Connectivity Intents in IP-Optical Networks Using Directed Acyclic Graphs
Authors: Filippos Christou, Andreas Kirstädter
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.09711
Pdf link: https://arxiv.org/pdf/2304.09711
Abstract During the last few years, there have been concentrated efforts toward intent-driven networking. While relying upon Software-Defined Networking (SDN), Intent-Based Networking (IBN) pushes the frontiers of efficient networking by decoupling the intentions of a network operator (i.e., what is desired to be done) from the implementation (i.e., how is it achieved). The advantages of such a paradigm have long been argued and include, but are not limited to, the reduction of human errors, reduced expertise requirements among operator personnel, and faster business plan adaptation. In previous work, we have shown how incorporating IBN in multi-domain networks can have a significantly positive impact as it can enable decentralized operation, accountability, and confidentiality. The pillar of our previous contribution is the compilation of intents using system-generated intent trees. In this work, we extend the architecture to enable grooming among the user intents. Therefore, separate intents can now end up using the same network resources. While this makes the intent system reasonably more complex, it indisputably improves resource allocation. To represent the intent relationships of the newly enhanced architecture, we use Directed Acyclic Graphs (DAGs). Furthermore, we appropriately adapt an advanced established technique from the literature to solve the Routing, Modulation, and Spectrum Assignment (RMSA) problem for the intent compilation. We demonstrate a realistic scenario in which we evaluate our architecture and the intent compilation strategy. Our current approach successfully consolidates the advantages of having an intent-driven architecture and, at the same time, flexibly choosing among advanced resource allocation techniques.
Comma Selection Outperforms Plus Selection on OneMax with Randomly Planted Optima
Authors: Joost Jorritsma, Johannes Lengler, Dirk Sudholt
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2304.09712
Pdf link: https://arxiv.org/pdf/2304.09712
Abstract It is an ongoing debate whether and how comma selection in evolutionary algorithms helps to escape local optima. We propose a new benchmark function to investigate the benefits of comma selection: OneMax with randomly planted local optima, generated by frozen noise. We show that comma selection (the $(1,\lambda)$ EA) is faster than plus selection (the $(1+\lambda)$ EA) on this benchmark, in a fixed-target scenario, and for offspring population sizes $\lambda$ for which both algorithms behave differently. For certain parameters, the $(1,\lambda)$ EA finds the target in $\Theta(n \ln n)$ evaluations, with high probability (w.h.p.), while the $(1+\lambda)$ EA) w.h.p. requires almost $\Theta((n\ln n)^2)$ evaluations. We further show that the advantage of comma selection is not arbitrarily large: w.h.p. comma selection outperforms plus selection at most by a factor of $O(n \ln n)$ for most reasonable parameter choices. We develop novel methods for analysing frozen noise and give powerful and general fixed-target results with tail bounds that are of independent interest.
FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing
Authors: Kyle Stachowicz, Dhruv Shah, Arjun Bhorkar, Ilya Kostrikov, Sergey Levine
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09831
Pdf link: https://arxiv.org/pdf/2304.09831
Abstract We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL). Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. Our system integrates a number of important components to make this possible: we initialize the representations for the RL policy and value function from a large prior dataset of other robots navigating in other environments (at low speed), which provides a navigation-relevant representation. From here, a sample-efficient online RL method uses a single low-speed user-provided demonstration to determine the desired driving course, extracts a set of navigational checkpoints, and autonomously practices driving through these checkpoints, resetting automatically on collision or failure. Perhaps surprisingly, we find that with appropriate initialization and choice of algorithm, our system can learn to drive over a variety of racing courses with less than 20 minutes of online training. The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.
LipsFormer: Introducing Lipschitz Continuity to Vision Transformers
Authors: Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09856
Pdf link: https://arxiv.org/pdf/2304.09856
Abstract We present a Lipschitz continuous Transformer, called LipsFormer, to pursue training stability both theoretically and empirically for Transformer-based models. In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability. In LipsFormer, we replace unstable Transformer component modules with Lipschitz continuous counterparts: CenterNorm instead of LayerNorm, spectral initialization instead of Xavier initialization, scaled cosine similarity attention instead of dot-product attention, and weighted residual shortcut. We prove that these introduced modules are Lipschitz continuous and derive an upper bound on the Lipschitz constant of LipsFormer. Our experiments show that LipsFormer allows stable training of deep Transformer architectures without the need of careful learning rate tuning such as warmup, yielding a faster convergence and better generalization. As a result, on the ImageNet 1K dataset, LipsFormer-Swin-Tiny based on Swin Transformer training for 300 epochs can obtain 82.7\% without any learning rate warmup. Moreover, LipsFormer-CSwin-Tiny, based on CSwin, training for 300 epochs achieves a top-1 accuracy of 83.5\% with 4.7G FLOPs and 24M parameters. The code will be released at \url{https://github.com/IDEA-Research/LipsFormer}.
Keyword: mobile

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units
Authors: Mohammed E. Elbtity, Brendan Reidy, Md Hasibul Amin, Ramtin Zand
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09258
Pdf link: https://arxiv.org/pdf/2304.09258
Abstract Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in convolutional neural networks (CNNs). However, they struggle to maintain the same efficiency in fully connected (FC) layers, leading to suboptimal hardware utilization. In-memory analog computing (IMAC) architectures, on the other hand, have demonstrated notable speedup in executing FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. To leverage the strengths of TPUs for convolutional layers and IMAC circuits for dense layers, we propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture. The simulations demonstrate that the TPU-IMAC configuration achieves up to $2.59\times$ performance improvements, and $88\%$ memory reductions compared to conventional TPU architectures for various CNN models while maintaining comparable accuracy. The TPU-IMAC architecture shows potential for various applications where energy efficiency and high performance are essential, such as edge computing and real-time processing in mobile devices. The unified training algorithm and the integration of IMAC and TPU architectures contribute to the potential impact of this research on the broader machine learning landscape.
Secure Mobile Payment Architecture Enabling Multi-factor Authentication
Authors: Hosam Alamleh, Ali Abdullah S. AlQahtani, Baker Al Smadi
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.09468
Pdf link: https://arxiv.org/pdf/2304.09468
Abstract The rise of smartphones has led to a significant increase in the usage of mobile payments. Mobile payments allow individuals to access financial resources and make transactions through their mobile devices while on the go. However, the current mobile payment systems were designed to align with traditional payment structures, which limits the full potential of smartphones, including their security features. This has become a major concern in the rapidly growing mobile payment market. To address these security concerns,in this paper we propose new mobile payment architecture. This architecture leverages the advanced capabilities of modern smartphones to verify various aspects of a payment, such as funds, biometrics, location, and others. The proposed system aims to guarantee the legitimacy of transactions and protect against identity theft by verifying multiple elements of a payment. The security of mobile payment systems is crucial, given the rapid growth of the market. Evaluating mobile payment systems based on their authentication, encryption, and fraud detection capabilities is of utmost importance. The proposed architecture provides a secure mobile payment solution that enhances the overall payment experience by taking advantage of the advanced capabilities of modern smartphones. This will not only improve the security of mobile payments but also offer a more user-friendly payment experience for consumers.
Learning Resource Scheduling with High Priority Users using Deep Deterministic Policy Gradients
Authors: Steffen Gracla, Edgar Beck, Carsten Bockelmann, Armin Dekorsy
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09488
Pdf link: https://arxiv.org/pdf/2304.09488
Abstract Advances in mobile communication capabilities open the door for closer integration of pre-hospital and in-hospital care processes. For example, medical specialists can be enabled to guide on-site paramedics and can, in turn, be supplied with live vitals or visuals. Consolidating such performance-critical applications with the highly complex workings of mobile communications requires solutions both reliable and efficient, yet easy to integrate with existing systems. This paper explores the application of Deep Deterministic Policy Gradient~(\ddpg) methods for learning a communications resource scheduling algorithm with special regards to priority users. Unlike the popular Deep-Q-Network methods, the \ddpg is able to produce continuous-valued output. With light post-processing, the resulting scheduler is able to achieve high performance on a flexible sum-utility goal.
SelfAct: Personalized Activity Recognition based on Self-Supervised and Active Learning
Authors: Luca Arrotta, Gabriele Civitarese, Samuele Valente, Claudio Bettini
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.09530
Pdf link: https://arxiv.org/pdf/2304.09530
Abstract Supervised Deep Learning (DL) models are currently the leading approach for sensor-based Human Activity Recognition (HAR) on wearable and mobile devices. However, training them requires large amounts of labeled data whose collection is often time-consuming, expensive, and error-prone. At the same time, due to the intra- and inter-variability of activity execution, activity models should be personalized for each user. In this work, we propose SelfAct: a novel framework for HAR combining self-supervised and active learning to mitigate these problems. SelfAct leverages a large pool of unlabeled data collected from many users to pre-train through self-supervision a DL model, with the goal of learning a meaningful and efficient latent representation of sensor data. The resulting pre-trained model can be locally used by new users, which will fine-tune it thanks to a novel unsupervised active learning strategy. Our experiments on two publicly available HAR datasets demonstrate that SelfAct achieves results that are close to or even better than the ones of fully supervised approaches with a small number of active learning queries.
DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic Conditions
Authors: Yaxiong Lei, Yuheng Wang, Tyler Caslin, Alexander Wisowaty, Xu Zhu, Mohamed Khamis, Juan Ye
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2304.09584
Pdf link: https://arxiv.org/pdf/2304.09584
Abstract Enabling gaze interaction in real-time on handheld mobile devices has attracted significant attention in recent years. An increasing number of research projects have focused on sophisticated appearance-based deep learning models to enhance the precision of gaze estimation on smartphones. This inspires important research questions, including how the gaze can be used in a real-time application, and what type of gaze interaction methods are preferable under dynamic conditions in terms of both user acceptance and delivering reliable performance. To address these questions, we design four types of gaze scrolling techniques: three explicit technique based on Gaze Gesture, Dwell time, and Pursuit; and one implicit technique based on reading speed to support touch-free, page-scrolling on a reading application. We conduct a 20-participant user study under both sitting and walking settings and our results reveal that Gaze Gesture and Dwell time-based interfaces are more robust while walking and Gaze Gesture has achieved consistently good scores on usability while not causing high cognitive workload.
Integrated Ray-Tracing and Coverage Planning Control using Reinforcement Learning
Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.09631
Pdf link: https://arxiv.org/pdf/2304.09631
Abstract In this work we propose a coverage planning control approach which allows a mobile agent, equipped with a controllable sensor (i.e., a camera) with limited sensing domain (i.e., finite sensing range and angle of view), to cover the surface area of an object of interest. The proposed approach integrates ray-tracing into the coverage planning process, thus allowing the agent to identify which parts of the scene are visible at any point in time. The problem of integrated ray-tracing and coverage planning control is first formulated as a constrained optimal control problem (OCP), which aims at determining the agent's optimal control inputs over a finite planning horizon, that minimize the coverage time. Efficiently solving the resulting OCP is however very challenging due to non-convex and non-linear visibility constraints. To overcome this limitation, the problem is converted into a Markov decision process (MDP) which is then solved using reinforcement learning. In particular, we show that a controller which follows an optimal control law can be learned using off-policy temporal-difference control (i.e., Q-learning). Extensive numerical experiments demonstrate the effectiveness of the proposed approach for various configurations of the agent and the object of interest.
Keyword: pruning

Network Pruning Spaces
Authors: Xuanyu He, Yu-I Yang, Ran Song, Jiachen Pu, Conggang Hu, Feijun Jiang, Wei Zhang, Huanghao Ding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09453
Pdf link: https://arxiv.org/pdf/2304.09453
Abstract Network pruning techniques, including weight pruning and filter pruning, reveal that most state-of-the-art neural networks can be accelerated without a significant performance drop. This work focuses on filter pruning which enables accelerated inference with any off-the-shelf deep learning library and hardware. We propose the concept of \emph{network pruning spaces} that parametrize populations of subnetwork architectures. Based on this concept, we explore the structure aspect of subnetworks that result in minimal loss of accuracy in different pruning regimes and arrive at a series of observations by comparing subnetwork distributions. We conjecture through empirical studies that there exists an optimal FLOPs-to-parameter-bucket ratio related to the design of original network in a pruning regime. Statistically, the structure of a winning subnetwork guarantees an approximately optimal ratio in this regime. Upon our conjectures, we further refine the initial pruning space to reduce the cost of searching a good subnetwork architecture. Our experimental results on ImageNet show that the subnetwork we found is superior to those from the state-of-the-art pruning methods under comparable FLOPs.
Biologically inspired structure learning with reverse knowledge distillation for spiking neural networks
Authors: Qi Xu, Yaxin Li, Xuanye Fang, Jiangrong Shen, Jian K. Liu, Huajin Tang, Gang Pan
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09500
Pdf link: https://arxiv.org/pdf/2304.09500
Abstract Spiking neural networks (SNNs) have superb characteristics in sensory information recognition tasks due to their biological plausibility. However, the performance of some current spiking-based models is limited by their structures which means either fully connected or too-deep structures bring too much redundancy. This redundancy from both connection and neurons is one of the key factors hindering the practical application of SNNs. Although Some pruning methods were proposed to tackle this problem, they normally ignored the fact the neural topology in the human brain could be adjusted dynamically. Inspired by this, this paper proposed an evolutionary-based structure construction method for constructing more reasonable SNNs. By integrating the knowledge distillation and connection pruning method, the synaptic connections in SNNs can be optimized dynamically to reach an optimal state. As a result, the structure of SNNs could not only absorb knowledge from the teacher model but also search for deep but sparse network topology. Experimental results on CIFAR100 and DVS-Gesture show that the proposed structure learning method can get pretty well performance while reducing the connection redundancy. The proposed method explores a novel dynamical way for structure learning from scratch in SNNs which could build a bridge to close the gap between deep learning and bio-inspired neural dynamics.
Single-View View Synthesis with Self-Rectified Pseudo-Stereo
Authors: Zhou Yang, Wu Hanjie, Liu Wenxi, Xiong Zheng, Qin Jing, He Shengfeng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09527
Pdf link: https://arxiv.org/pdf/2304.09527
Abstract Synthesizing novel views from a single view image is a highly ill-posed problem. We discover an effective solution to reduce the learning ambiguity by expanding the single-view view synthesis problem to a multi-view setting. Specifically, we leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint, which serves as an auxiliary input to construct the 3D space. In this way, the challenging novel view synthesis process is decoupled into two simpler problems of stereo synthesis and 3D reconstruction. In order to synthesize a structurally correct and detail-preserved stereo image, we propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner. Hard-to-train and incorrect warping samples are first discovered by two strategies, 1) pruning the network to reveal low-confident predictions; and 2) bidirectionally matching between stereo images to allow the discovery of improper mapping. These regions are then inpainted to form the final pseudo-stereo. With the aid of this extra input, a preferable 3D reconstruction can be easily obtained, and our method can work with arbitrary 3D representations. Extensive experiments show that our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.
Keyword: voxel

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09787
Pdf link: https://arxiv.org/pdf/2304.09787
Abstract Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.
Keyword: lidar

Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection
Authors: Qianjiang Hu, Daizong Liu, Wei Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09446
Pdf link: https://arxiv.org/pdf/2304.09446
Abstract 3D object detection from point clouds is crucial in safety-critical autonomous driving. Although many works have made great efforts and achieved significant progress on this task, most of them suffer from expensive annotation cost and poor transferability to unknown data due to the domain gap. Recently, few works attempt to tackle the domain gap in objects, but still fail to adapt to the gap of varying beam-densities between two domains, which is critical to mitigate the characteristic differences of the LiDAR collectors. To this end, we make the attempt to propose a density-insensitive domain adaption framework to address the density-induced domain gap. In particular, we first introduce Random Beam Re-Sampling (RBRS) to enhance the robustness of 3D detectors trained on the source domain to the varying beam-density. Then, we take this pre-trained detector as the backbone model, and feed the unlabeled target domain data into our newly designed task-specific teacher-student framework for predicting its high-quality pseudo labels. To further adapt the property of density-insensitivity into the target domain, we feed the teacher and student branches with the same sample of different densities, and propose an Object Graph Alignment (OGA) module to construct two object-graphs between the two branches for enforcing the consistency in both the attribute and relation of cross-density objects. Experimental results on three widely adopted 3D object detection datasets demonstrate that our proposed domain adaption method outperforms the state-of-the-art methods, especially over varying-density data. Code is available at https://github.com/WoodwindHu/DTS}{https://github.com/WoodwindHu/DTS.
CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection
Authors: Yang Yang, Weijie Ma, Hao Chen, Linlin Ou, Xinyi Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09694
Pdf link: https://arxiv.org/pdf/2304.09694
Abstract The combination of LiDAR and camera modalities is proven to be necessary and typical for 3D object detection according to recent studies. Existing fusion strategies tend to overly rely on the LiDAR modal in essence, which exploits the abundant semantics from the camera sensor insufficiently. However, existing methods cannot rely on information from other modalities because the corruption of LiDAR features results in a large domain gap. Following this, we propose CrossFusion, a more robust and noise-resistant scheme that makes full use of the camera and LiDAR features with the designed cross-modal complementation strategy. Extensive experiments we conducted show that our method not only outperforms the state-of-the-art methods under the setting without introducing an extra depth estimation network but also demonstrates our model's noise resistance without re-training for the specific malfunction scenarios by increasing 5.2\% mAP and 2.4\% NDS.
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Authors: Romain Loiseau, Elliot Vincent, Mathieu Aubry, Loic Landrieu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09704
Pdf link: https://arxiv.org/pdf/2304.09704
Abstract We propose an unsupervised method for parsing large 3D scans of real-world scenes into interpretable parts. Our goal is to provide a practical tool for analyzing 3D scenes with unique characteristics in the context of aerial surveying and mapping, without relying on application-specific user annotations. Our approach is based on a probabilistic reconstruction model that decomposes an input 3D point cloud into a small set of learned prototypical shapes. Our model provides an interpretable reconstruction of complex scenes and leads to relevant instance and semantic segmentations. To demonstrate the usefulness of our results, we introduce a novel dataset of seven diverse aerial LiDAR scans. We show that our method outperforms state-of-the-art unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. Our method offers significant advantage over existing approaches, as it does not require any manual annotations, making it a practical and efficient tool for 3D scene analysis. Our code and dataset are available at https://imagine.enpc.fr/~loiseaur/learnable-earth-parser
UniCal: a Single-Branch Transformer-Based Model for Camera-to-LiDAR Calibration and Validation
Authors: Mathieu Cocheteux, Aaron Low, Marius Bruehlmeier
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09715
Pdf link: https://arxiv.org/pdf/2304.09715
Abstract We introduce a novel architecture, UniCal, for Camera-to-LiDAR (C2L) extrinsic calibration which leverages self-attention mechanisms through a Transformer-based backbone network to infer the 6-degree of freedom (DoF) relative transformation between the sensors. Unlike previous methods, UniCal performs an early fusion of the input camera and LiDAR data by aggregating camera image channels and LiDAR mappings into a multi-channel unified representation before extracting their features jointly with a single-branch architecture. This single-branch architecture makes UniCal lightweight, which is desirable in applications with restrained resources such as autonomous driving. Through experiments, we show that UniCal achieves state-of-the-art results compared to existing methods. We also show that through transfer learning, weights learned on the calibration task can be applied to a calibration validation task without re-training the backbone.
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation
Authors: Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09801
Pdf link: https://arxiv.org/pdf/2304.09801
Abstract Perception systems in modern autonomous driving vehicles typically take inputs from complementary multi-modal sensors, e.g., LiDAR and cameras. However, in real-world applications, sensor corruptions and failures lead to inferior performances, thus compromising autonomous safety. In this paper, we propose a robust framework, called MetaBEV, to address extreme real-world environments involving overall six sensor corruptions and two extreme sensor-missing situations. In MetaBEV, signals from multiple sensors are first processed by modal-specific encoders. Subsequently, a set of dense BEV queries are initialized, termed meta-BEV. These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities. The updated BEV representations are further leveraged for multiple 3D prediction tasks. Additionally, we introduce a new M2oE structure to alleviate the performance drop on distinct tasks in multi-task joint learning. Finally, MetaBEV is evaluated on the nuScenes dataset with 3D object detection and BEV map segmentation tasks. Experiments show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV improves 35.5% detection NDS and 17.7% segmentation mIoU upon the vanilla BEVFusion model; and when the camera signal is absent, MetaBEV still achieves 69.2% NDS and 53.7% mIoU, which is even higher than previous works that perform on full-modalities. Moreover, MetaBEV performs fairly against previous methods in both canonical perception and multi-task learning settings, refreshing state-of-the-art nuScenes BEV map segmentation with 70.4% mIoU.
Keyword: diffusion

A structure-preserving upwind DG scheme for a degenerate phase-field tumor model
Authors: Daniel Acosta-Soba, Francisco Guillén-González, J. Rafael Rodríguez Galván
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.09257
Pdf link: https://arxiv.org/pdf/2304.09257
Abstract In this work, we present a modification of the phase-field tumor growth model given in [26] that leads to bounded, more physically meaningful, volume fraction variables. In addition, we develop an upwind discontinuous Galerkin (DG) scheme preserving the mass conservation, pointwise bounds and energy stability of the continuous model. Finally, some computational tests in accordance with the theoretical results are introduced. In the first test, we compare our DG scheme with the finite element (FE) scheme related to the same time approximation. The DG scheme shows a well-behavior even for strong cross-diffusion effects in contrast with FE where numerical spurious oscillations appear. Moreover, the second test exhibits the behavior of the tumor-growth model under different choices of parameters and also of mobility and proliferation functions.
DiFaReli : Diffusion Face Relighting
Authors: Puntawat Ponglertnapakorn, Nontawat Tritrong, Supasorn Suwajanakorn
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09479
Pdf link: https://arxiv.org/pdf/2304.09479
Abstract We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io
Realistic Data Enrichment for Robust Image Segmentation in Histopathology
Authors: Sarah Cechnicka, James Ball, Callum Arthurs, Candice Roufosse, Bernhard Kainz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09534
Pdf link: https://arxiv.org/pdf/2304.09534
Abstract Poor performance of quantitative analysis in histopathological Whole Slide Images (WSI) has been a significant obstacle in clinical practice. Annotating large-scale WSIs manually is a demanding and time-consuming task, unlikely to yield the expected results when used for fully supervised learning systems. Rarely observed disease patterns and large differences in object scales are difficult to model through conventional patient intake. Prior methods either fall back to direct disease classification, which only requires learning a few factors per image, or report on average image segmentation performance, which is highly biased towards majority observations. Geometric image augmentation is commonly used to improve robustness for average case predictions and to enrich limited datasets. So far no method provided sampling of a realistic posterior distribution to improve stability, e.g. for the segmentation of imbalanced objects within images. Therefore, we propose a new approach, based on diffusion models, which can enrich an imbalanced dataset with plausible examples from underrepresented groups by conditioning on segmentation maps. Our method can simply expand limited clinical datasets making them suitable to train machine learning pipelines, and provides an interpretable and human-controllable way of generating histopathology images that are indistinguishable from real ones to human experts. We validate our findings on two datasets, one from the public domain and one from a Kidney Transplant study.
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model
Authors: Kangyeol Kim, Sunghyun Park, Junsoo Lee, Jaegul Choo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09748
Pdf link: https://arxiv.org/pdf/2304.09748
Abstract Recent remarkable improvements in large-scale text-to-image generative models have shown promising results in generating high-fidelity images. To further enhance editability and enable fine-grained generation, we introduce a multi-input-conditioned image composition model that incorporates a sketch as a novel modal, alongside a reference image. Thanks to the edge-level controllability using sketches, our method enables a user to edit or complete an image sub-part with a desired structure (i.e., sketch) and content (i.e., reference image). Our framework fine-tunes a pre-trained diffusion model to complete missing regions using the reference image while maintaining sketch guidance. Albeit simple, this leads to wide opportunities to fulfill user needs for obtaining the in-demand images. Through extensive experiments, we demonstrate that our proposed method offers unique use cases for image manipulation, enabling user-driven modifications of arbitrary scenes.
Attributing Image Generative Models using Latent Fingerprints
Authors: Guangyu Nie, Changhoon Kim, Yezhou Yang, Yi Ren
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09752
Pdf link: https://arxiv.org/pdf/2304.09752
Abstract Generative models have enabled the creation of contents that are indistinguishable from those taken from the nature. Open-source development of such models raised concerns about the risks in their misuse for malicious purposes. One potential risk mitigation strategy is to attribute generative models via fingerprinting. Current fingerprinting methods exhibit significant tradeoff between robust attribution accuracy and generation quality, and also lack designing principles to improve this tradeoff. This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff. Compared with previous SOTA, our method requires minimum computation and is more applicable to large-scale models. We use StyleGAN2 and the latent diffusion model to demonstrate the efficacy of our method.
NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09787
Pdf link: https://arxiv.org/pdf/2304.09787
Abstract Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.
Keyword: dynamic

A Deep Learning Framework for Traffic Data Imputation Considering Spatiotemporal Dependencies
Authors: Li Jiang, Ting Zhang, Qiruyi Zuo, Chenyu Tian, George P. Chan, Wai Kin (Victor)Chan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09182
Pdf link: https://arxiv.org/pdf/2304.09182
Abstract Spatiotemporal (ST) data collected by sensors can be represented as multi-variate time series, which is a sequence of data points listed in an order of time. Despite the vast amount of useful information, the ST data usually suffer from the issue of missing or incomplete data, which also limits its applications. Imputation is one viable solution and is often used to prepossess the data for further applications. However, in practice, n practice, spatiotemporal data imputation is quite difficult due to the complexity of spatiotemporal dependencies with dynamic changes in the traffic network and is a crucial prepossessing task for further applications. Existing approaches mostly only capture the temporal dependencies in time series or static spatial dependencies. They fail to directly model the spatiotemporal dependencies, and the representation ability of the models is relatively limited.
Token Imbalance Adaptation for Radiology Report Generation
Authors: Yuexin Wu, I-Chan Huang, Xiaolei Huang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09185
Pdf link: https://arxiv.org/pdf/2304.09185
Abstract Imbalanced token distributions naturally exist in text documents, leading neural language models to overfit on frequent tokens. The token imbalance may dampen the robustness of radiology report generators, as complex medical terms appear less frequently but reflect more medical information. In this study, we demonstrate how current state-of-the-art models fail to generate infrequent tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology report generation. % However, no prior study has proposed methods to adapt infrequent tokens for text generators feeding with medical images. To solve the challenge, we propose the \textbf{T}oken \textbf{Im}balance Adapt\textbf{er} (\textit{TIMER}), aiming to improve generation robustness on infrequent tokens. The model automatically leverages token imbalance by an unlikelihood loss and dynamically optimizes generation processes to augment infrequent tokens. We compare our approach with multiple state-of-the-art methods on the two benchmarks. Experiments demonstrate the effectiveness of our approach in enhancing model robustness overall and infrequent tokens. Our ablation analysis shows that our reinforcement learning method has a major effect in adapting token imbalance for radiology report generation.
Towards Spatio-temporal Sea Surface Temperature Forecasting via Static and Dynamic Learnable Personalized Graph Convolution Network
Authors: Xiaohan Li, Gaowei Zhang, Kai Huang, Zhaofeng He
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2304.09290
Pdf link: https://arxiv.org/pdf/2304.09290
Abstract Sea surface temperature (SST) is uniquely important to the Earth's atmosphere since its dynamics are a major force in shaping local and global climate and profoundly affect our ecosystems. Accurate forecasting of SST brings significant economic and social implications, for example, better preparation for extreme weather such as severe droughts or tropical cyclones months ahead. However, such a task faces unique challenges due to the intrinsic complexity and uncertainty of ocean systems. Recently, deep learning techniques, such as graphical neural networks (GNN), have been applied to address this task. Even though these methods have some success, they frequently have serious drawbacks when it comes to investigating dynamic spatiotemporal dependencies between signals. To solve this problem, this paper proposes a novel static and dynamic learnable personalized graph convolution network (SD-LPGC). Specifically, two graph learning layers are first constructed to respectively model the stable long-term and short-term evolutionary patterns hidden in the multivariate SST signals. Then, a learnable personalized convolution layer is designed to fuse this information. Our experiments on real SST datasets demonstrate the state-of-the-art performances of the proposed approach on the forecasting task.
Deep Dynamic Cloud Lighting
Authors: Pinar Satilmis, Thomas Bashford-Rogers
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09317
Pdf link: https://arxiv.org/pdf/2304.09317
Abstract Sky illumination is a core source of lighting in rendering, and a substantial amount of work has been developed to simulate lighting from clear skies. However, in reality, clouds substantially alter the appearance of the sky and subsequently change the scene's illumination. While there have been recent advances in developing sky models which include clouds, these all neglect cloud movement which is a crucial component of cloudy sky appearance. In any sort of video or interactive environment, it can be expected that clouds will move, sometimes quite substantially in a short period of time. Our work proposes a solution to this which enables whole-sky dynamic cloud synthesis for the first time. We achieve this by proposing a multi-timescale sky appearance model which learns to predict the sky illumination over various timescales, and can be used to add dynamism to previous static, cloudy sky lighting approaches.
A New Deterministic Algorithm for Fully Dynamic All-Pairs Shortest Paths
Authors: Julia Chuzhoy, Ruimin Zhang
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.09321
Pdf link: https://arxiv.org/pdf/2304.09321
Abstract We study the fully dynamic All-Pairs Shortest Paths (APSP) problem in undirected edge-weighted graphs. Given an $n$-vertex graph $G$ with non-negative edge lengths, that undergoes an online sequence of edge insertions and deletions, the goal is to support approximate distance queries and shortest-path queries. We provide a deterministic algorithm for this problem, that, for a given precision parameter $\epsilon$, achieves approximation factor $(\log\log n)^{2^{O(1/\epsilon^3)}}$, and has amortized update time $O(n^{\epsilon}\log L)$ per operation, where $L$ is the ratio of longest to shortest edge length. Query time for distance-query is $O(2^{O(1/\epsilon)}\cdot \log n\cdot \log\log L)$, and query time for shortest-path query is $O(|E(P)|+2^{O(1/\epsilon)}\cdot \log n\cdot \log\log L)$, where $P$ is the path that the algorithm returns. To the best of our knowledge, even allowing any $o(n)$-approximation factor, no adaptive-update algorithms with better than $\Theta(m)$ amortized update time and better than $\Theta(n)$ query time were known prior to this work. We also note that our guarantees are stronger than the best current guarantees for APSP in decremental graphs in the adaptive-adversary setting.
BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information Retrieval
Authors: Junwen Zheng, Martin Fischer
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2304.09333
Pdf link: https://arxiv.org/pdf/2304.09333
Abstract Efficient information retrieval (IR) from building information models (BIMs) poses significant challenges due to the necessity for deep BIM knowledge or extensive engineering efforts for automation. We introduce BIM-GPT, a prompt-based virtual assistant (VA) framework integrating BIM and generative pre-trained transformer (GPT) technologies to support NL-based IR. A prompt manager and dynamic template generate prompts for GPT models, enabling interpretation of NL queries, summarization of retrieved information, and answering BIM-related questions. In tests on a BIM IR dataset, our approach achieved 83.5% and 99.5% accuracy rates for classifying NL queries with no data and 2% data incorporated in prompts, respectively. Additionally, we validated the functionality of BIM-GPT through a VA prototype for a hospital building. This research contributes to the development of effective and versatile VAs for BIM IR in the construction industry, significantly enhancing BIM accessibility and reducing engineering efforts and training data requirements for processing NL queries.
BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs
Authors: Jackson Callaghan, Colleen H. Xu, Jiwen Xin, Marco Alvarado Cano, Anders Riutta, Eric Zhou, Rohan Juneja, Yao Yao, Madhumita Narayan, Kristina Hanspers, Ayushi Agrawal, Alexander R. Pico, Chunlei Wu, Andrew I. Su
Subjects: Databases (cs.DB); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2304.09344
Pdf link: https://arxiv.org/pdf/2304.09344
Abstract Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThing Explorer is distributed as a lightweight application that dynamically retrieves information at query time. More information can be found at https://explorer.biothings.io, and code is available at https://github.com/biothings/biothings_explorer.
LLM as A Robotic Brain: Unifying Egocentric Memory and Control
Authors: Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09349
Pdf link: https://arxiv.org/pdf/2304.09349
Abstract Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control. The LLM-Brain framework integrates multiple multimodal language models for robotic tasks, utilizing a zero-shot learning approach. All components within LLM-Brain communicate using natural language in closed-loop multi-round dialogues that encompass perception, planning, control, and memory. The core of the system is an embodied LLM to maintain egocentric memory and control the robot. We demonstrate LLM-Brain by examining two downstream tasks: active exploration and embodied question answering. The active exploration tasks require the robot to extensively explore an unknown environment within a limited number of actions. Meanwhile, the embodied question answering tasks necessitate that the robot answers questions based on observations acquired during prior explorations.
Optimizing Carbon Storage Operations for Long-Term Safety
Authors: Yizheng Wang, Markus Zechner, Gege Wen, Anthony Louis Corso, John Michael Mern, Mykel J. Kochenderfer, Jef Karel Caers
Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2304.09352
Pdf link: https://arxiv.org/pdf/2304.09352
Abstract To combat global warming and mitigate the risks associated with climate change, carbon capture and storage (CCS) has emerged as a crucial technology. However, safely sequestering CO2 in geological formations for long-term storage presents several challenges. In this study, we address these issues by modeling the decision-making process for carbon storage operations as a partially observable Markov decision process (POMDP). We solve the POMDP using belief state planning to optimize injector and monitoring well locations, with the goal of maximizing stored CO2 while maintaining safety. Empirical results in simulation demonstrate that our approach is effective in ensuring safe long-term carbon storage operations. We showcase the flexibility of our approach by introducing three different monitoring strategies and examining their impact on decision quality. Additionally, we introduce a neural network surrogate model for the POMDP decision-making process to handle the complex dynamics of the multi-phase flow. We also investigate the effects of different fidelity levels of the surrogate model on decision qualities.
Long-Term Fairness with Unknown Dynamics
Authors: Tongxin Yin, Reilly Raab, Mingyan Liu, Yang Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09362
Pdf link: https://arxiv.org/pdf/2304.09362
Abstract While machine learning can myopically reinforce social inequalities, it may also be used to dynamically seek equitable outcomes. In this paper, we formalize long-term fairness in the context of online reinforcement learning. This formulation can accommodate dynamical control objectives, such as driving equity inherent in the state of a population, that cannot be incorporated into static formulations of fairness. We demonstrate that this framing allows an algorithm to adapt to unknown dynamics by sacrificing short-term incentives to drive a classifier-population system towards more desirable equilibria. For the proposed setting, we develop an algorithm that adapts recent work in online learning. We prove that this algorithm achieves simultaneous probabilistic bounds on cumulative loss and cumulative violations of fairness (as statistical regularities between demographic groups). We compare our proposed algorithm to the repeated retraining of myopic classifiers, as a baseline, and to a deep reinforcement learning algorithm that lacks safety guarantees. Our experiments model human populations according to evolutionary game theory and integrate real-world datasets.
Physical Knowledge Enhanced Deep Neural Network for Sea Surface Temperature Prediction
Authors: Yuxin Meng, Feng Gao, Eric Rigall, Ran Dong, Junyu Dong, Qian Du
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2304.09376
Pdf link: https://arxiv.org/pdf/2304.09376
Abstract Traditionally, numerical models have been deployed in oceanography studies to simulate ocean dynamics by representing physical equations. However, many factors pertaining to ocean dynamics seem to be ill-defined. We argue that transferring physical knowledge from observed data could further improve the accuracy of numerical models when predicting Sea Surface Temperature (SST). Recently, the advances in earth observation technologies have yielded a monumental growth of data. Consequently, it is imperative to explore ways in which to improve and supplement numerical models utilizing the ever-increasing amounts of historical observational data. To this end, we introduce a method for SST prediction that transfers physical knowledge from historical observations to numerical models. Specifically, we use a combination of an encoder and a generative adversarial network (GAN) to capture physical knowledge from the observed data. The numerical model data is then fed into the pre-trained model to generate physics-enhanced data, which can then be used for SST prediction. Experimental results demonstrate that the proposed method considerably enhances SST prediction performance when compared to several state-of-the-art baselines.
Analytical Large-Signal Modeling of Inverter-based Microgrids with Koopman Operator Theory for Autonomous Control
Authors: Zixiao Ma, Zhaoyu Wang
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.09378
Pdf link: https://arxiv.org/pdf/2304.09378
Abstract The microgrid (MG) plays a crucial role in the energy transition, but its nonlinearity presents a significant challenge for large-signal power systems studies in the electromagnetic transient (EMT) time scale. In this paper, we develop a large-signal linear MG model that considers the detailed dynamics of the primary and zero-control levels based on the Koopman operator (KO) theory. Firstly, a set of observable functions is carefully designed to capture the nonlinear dynamics of the MG. The corresponding linear KO is then analytically derived based on these observables, resulting in the linear representation of the original nonlinear MG with observables as the new coordinate. The influence of external input on the system dynamics is also considered during the derivation, enabling control of the MG. We solve the voltage control problem using the traditional linear quadratic integrator (LQI) method to demonstrate that textbook linear control techniques can accurately control the original nonlinear MG via the developed KO linearized MG model. Our proposed KO linearization method is generic and can be easily extended for different control objectives and MG structures using our analytical derivation procedure. We validate the effectiveness of our methodology through various case studies.
Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification
Authors: Suncheng Xiang, Jingsheng Gao, Mengyuan Guan, Jiacheng Ruan, Chengfeng Zhou, Ting Liu, Dahong Qian, Yuzhuo Fu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09498
Pdf link: https://arxiv.org/pdf/2304.09498
Abstract Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision, which plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. However, previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which easily leads to poor generalization capability when adapted to the new domain. In this paper, we propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning on visual, textual and visual-textual tasks respectively. To further enhance the robust feature learning in the context of transformer, a dynamic masking mechanism called Masked Multimodal Modeling strategy (MMM) is introduced to mask both the image patches and the text tokens, which can jointly works on multimodal or unimodal data and significantly boost the performance of generalizable person Re-ID. Extensive experiments on benchmark datasets demonstrate the competitive performance of our method over previous approaches. We hope this method could advance the research towards visual-semantic representation learning. Our source code is also publicly available at https://github.com/JeremyXSC/MMET.
Biologically inspired structure learning with reverse knowledge distillation for spiking neural networks
Authors: Qi Xu, Yaxin Li, Xuanye Fang, Jiangrong Shen, Jian K. Liu, Huajin Tang, Gang Pan
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09500
Pdf link: https://arxiv.org/pdf/2304.09500
Abstract Spiking neural networks (SNNs) have superb characteristics in sensory information recognition tasks due to their biological plausibility. However, the performance of some current spiking-based models is limited by their structures which means either fully connected or too-deep structures bring too much redundancy. This redundancy from both connection and neurons is one of the key factors hindering the practical application of SNNs. Although Some pruning methods were proposed to tackle this problem, they normally ignored the fact the neural topology in the human brain could be adjusted dynamically. Inspired by this, this paper proposed an evolutionary-based structure construction method for constructing more reasonable SNNs. By integrating the knowledge distillation and connection pruning method, the synaptic connections in SNNs can be optimized dynamically to reach an optimal state. As a result, the structure of SNNs could not only absorb knowledge from the teacher model but also search for deep but sparse network topology. Experimental results on CIFAR100 and DVS-Gesture show that the proposed structure learning method can get pretty well performance while reducing the connection redundancy. The proposed method explores a novel dynamical way for structure learning from scratch in SNNs which could build a bridge to close the gap between deep learning and bio-inspired neural dynamics.
Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand
Authors: Yongkang Luo, Wanyi Li, Peng Wang, Haonan Duan, Wei Wei, Jia Sun
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09526
Pdf link: https://arxiv.org/pdf/2304.09526
Abstract Dexterous in-hand manipulation for a multi-fingered anthropomorphic hand is extremely difficult because of the high-dimensional state and action spaces, rich contact patterns between the fingers and objects. Even though deep reinforcement learning has made moderate progress and demonstrated its strong potential for manipulation, it is still faced with certain challenges, such as large-scale data collection and high sample complexity. Especially, for some slight change scenes, it always needs to re-collect vast amounts of data and carry out numerous iterations of fine-tuning. Remarkably, humans can quickly transfer learned manipulation skills to different scenarios with little supervision. Inspired by human flexible transfer learning capability, we propose a novel dexterous in-hand manipulation progressive transfer learning framework (PTL) based on efficiently utilizing the collected trajectories and the source-trained dynamics model. This framework adopts progressive neural networks for dynamics model transfer learning on samples selected by a new samples selection method based on dynamics properties, rewards and scores of the trajectories. Experimental results on contact-rich anthropomorphic hand manipulation tasks show that our method can efficiently and effectively learn in-hand manipulation skills with a few online attempts and adjustment learning under the new scene. Compared to learning from scratch, our method can reduce training time costs by 95%.
Network Algebraization and Port Relationship for Power-Electronic-Dominated Power Systems
Authors: Rui Ma, Xiaowen Yang, Meng Zhan
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.09528
Pdf link: https://arxiv.org/pdf/2304.09528
Abstract Different from the quasi-static network in the traditional power system, the dynamic network in the power-electronic-dominated power system should be considered due to rapid response of converters' controls. In this paper, a nonlinear differential-algebraic model framework is established with algebraic equations for dynamic electrical networks and differential equations for the (source) nodes, by generalizing the Kron reduction. The internal and terminal voltages of source nodes including converters are chosen as ports of nodes and networks. Correspondingly, the impact of dynamic network becomes clear, namely, it serves as a voltage divider and generates the terminal voltage based on the internal voltage of the sources instantaneously, even when the dynamics of inductance are included. With this simplest model, the roles of both nodes and the network become apparent.Simulations verify the proposed model framework in the modified 9-bus system.
Decadal Temperature Prediction via Chaotic Behavior Tracking
Authors: Jinfu Ren, Yang Liu, Jiming Liu
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.09536
Pdf link: https://arxiv.org/pdf/2304.09536
Abstract Decadal temperature prediction provides crucial information for quantifying the expected effects of future climate changes and thus informs strategic planning and decision-making in various domains. However, such long-term predictions are extremely challenging, due to the chaotic nature of temperature variations. Moreover, the usefulness of existing simulation-based and machine learning-based methods for this task is limited because initial simulation or prediction errors increase exponentially over time. To address this challenging task, we devise a novel prediction method involving an information tracking mechanism that aims to track and adapt to changes in temperature dynamics during the prediction phase by providing probabilistic feedback on the prediction error of the next step based on the current prediction. We integrate this information tracking mechanism, which can be considered as a model calibrator, into the objective function of our method to obtain the corrections needed to avoid error accumulation. Our results show the ability of our method to accurately predict global land-surface temperatures over a decadal range. Furthermore, we demonstrate that our results are meaningful in a real-world context: the temperatures predicted using our method are consistent with and can be used to explain the well-known teleconnections within and between different continents.
SLIC: Self-Conditioned Adaptive Transform with Large-Scale Receptive Fields for Learned Image Compression
Authors: Wei Jiang, Peirong Ning, Ronggang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2304.09571
Pdf link: https://arxiv.org/pdf/2304.09571
Abstract Learned image compression has achieved remarkable performance. Transform, plays an important role in boosting the RD performance. Analysis transform converts the input image to a compact latent representation. The more compact the latent representation is, the fewer bits we need to compress it. When designing better transform, some previous works adopt Swin-Transformer. The success of the Swin-Transformer in image compression can be attributed to the dynamic weights and large receptive field.However,the LayerNorm adopted in transformers is not suitable for image compression.We find CNN-based modules can also be dynamic and have large receptive-fields. The CNN-based modules can also work with GDN/IGDN. To make the CNN-based modules dynamic, we generate the weights of kernels conditioned on the input feature. We scale up the size of each kernel for larger receptive fields. To reduce complexity, we make the CNN-module channel-wise connected. We call this module Dynamic Depth-wise convolution. We replace the self-attention module with the proposed Dynamic Depth-wise convolution, replace the embedding layer with a depth-wise residual bottleneck for non-linearity and replace the FFN layer with an inverted residual bottleneck for more interactions in the spatial domain. The interactions among channels of dynamic depth-wise convolution are limited. We design the other block, which replaces the dynamic depth-wise convolution with channel attention. We equip the proposed modules in the analysis and synthesis transform and receive a more compact latent representation and propose the learned image compression model SLIC, meaning Self-Conditioned Adaptive Transform with Large-Scale Receptive Fields for Learned Image Compression Learned Image Compression. Thanks to the proposed transform modules, our proposed SLIC achieves 6.35% BD-rate reduction over VVC when measured in PSNR on Kodak dataset.
Learning controllers from data via kernel-based interpolation
Authors: Zhongjie Hu, Claudio De Persis, Pietro Tesi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.09577
Pdf link: https://arxiv.org/pdf/2304.09577
Abstract We propose a data-driven control design method for nonlinear systems that builds on kernel-based interpolation. Under some assumptions on the system dynamics, kernel-based functions are built from data and a model of the system, along with deterministic model error bounds, is determined. Then, we derive a controller design method that aims at stabilizing the closed-loop system by cancelling out the system nonlinearities. The proposed method can be implemented using semidefinite programming and returns positively invariant sets for the closed-loop system.
DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic Conditions
Authors: Yaxiong Lei, Yuheng Wang, Tyler Caslin, Alexander Wisowaty, Xu Zhu, Mohamed Khamis, Juan Ye
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2304.09584
Pdf link: https://arxiv.org/pdf/2304.09584
Abstract Enabling gaze interaction in real-time on handheld mobile devices has attracted significant attention in recent years. An increasing number of research projects have focused on sophisticated appearance-based deep learning models to enhance the precision of gaze estimation on smartphones. This inspires important research questions, including how the gaze can be used in a real-time application, and what type of gaze interaction methods are preferable under dynamic conditions in terms of both user acceptance and delivering reliable performance. To address these questions, we design four types of gaze scrolling techniques: three explicit technique based on Gaze Gesture, Dwell time, and Pursuit; and one implicit technique based on reading speed to support touch-free, page-scrolling on a reading application. We conduct a 20-participant user study under both sitting and walking settings and our results reveal that Gaze Gesture and Dwell time-based interfaces are more robust while walking and Gaze Gesture has achieved consistently good scores on usability while not causing high cognitive workload.
On countings and enumerations of block-parallel automata networks
Authors: Kévin Perrot, Sylvain Sené, Léah Tapin
Subjects: Discrete Mathematics (cs.DM); Formal Languages and Automata Theory (cs.FL)
Arxiv link: https://arxiv.org/abs/2304.09664
Pdf link: https://arxiv.org/pdf/2304.09664
Abstract When we focus on finite dynamical systems from both the computability/complexity and the modelling standpoints, automata networks seem to be a particularly appropriate mathematical model on which theory shall be developed. In this paper, automata networks are finite collections of entities (the automata), each automaton having its own set of possible states, which interact with each other over discrete time, interactions being defined as local functions allowing the automata to change their state according to the states of their neighbourhoods. The studies on this model of computation have underlined the very importance of the way (i.e. the schedule) according to which the automata update their states, namely the update modes which can be deterministic, periodic, fair, or not. Indeed, a given network may admit numerous underlying dynamics, these latter depending highly on the update modes under which we let the former evolve. In this paper, we pay attention to a new kind of deterministic, periodic and fair update mode family introduced recently in a modelling framework, called the block-parallel update modes by duality with the well-known and studied block-sequential update modes. More precisely, in the general context of automata networks, this work aims at presenting what distinguish block-parallel update modes from block-sequential ones, and at counting and enumerating them: in absolute terms, by keeping only representatives leading to distinct dynamics, and by keeping only representatives giving rise to distinct isomorphic limit dynamics. Put together, this paper constitutes a first theoretical analysis of these update modes and their impact on automata networks dynamics.
State estimation of an electrochemical lithium-ion battery model: improved observer performance by hybrid redesign
Authors: E. Petri, T. Reynaudo, R. Postoyan, D. Astolfi, D. Nesic, S. Rael
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.09680
Pdf link: https://arxiv.org/pdf/2304.09680
Abstract Effective management and just-in-time maintenance of lithium-ion batteries require the knowledge of unmeasured (internal) variables that need to be estimated. Observers are thus designed for this purpose using a mathematical model of the battery internal dynamics. It appears that it is often difficult to tune the observers to obtain good estimation performances both in terms of convergence speed and accuracy, while these are essential in practice. In this context, we demonstrate how a recently developed hybrid multi-observer can be used to improve the performance of a given observer designed for an electrochemical model of a lihium-ion battery. Simulation results, obtained with standard parameters values, show the estimation performance improvement using the proposed method.
Analysing Equilibrium States for Population Diversity
Authors: Johannes Lengler, Andre Opris, Dirk Sudholt
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2304.09690
Pdf link: https://arxiv.org/pdf/2304.09690
Abstract Population diversity is crucial in evolutionary algorithms as it helps with global exploration and facilitates the use of crossover. Despite many runtime analyses showing advantages of population diversity, we have no clear picture of how diversity evolves over time. We study how population diversity of $(\mu+1)$ algorithms, measured by the sum of pairwise Hamming distances, evolves in a fitness-neutral environment. We give an exact formula for the drift of population diversity and show that it is driven towards an equilibrium state. Moreover, we bound the expected time for getting close to the equilibrium state. We find that these dynamics, including the location of the equilibrium, are unaffected by surprisingly many algorithmic choices. All unbiased mutation operators with the same expected number of bit flips have the same effect on the expected diversity. Many crossover operators have no effect at all, including all binary unbiased, respectful operators. We review crossover operators from the literature and identify crossovers that are neutral towards the evolution of diversity and crossovers that are not.
Guidance of the resonance energy flow in the mechanism of coupled magnetic pendulums
Authors: Valery N. Pilipchuk, Krystian Polczyński, Maksymilian Bednarek, Jan Awrejcewicz
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Adaptation and Self-Organizing Systems (nlin.AO)
Arxiv link: https://arxiv.org/abs/2304.09755
Pdf link: https://arxiv.org/pdf/2304.09755
Abstract This paper presents a methodology of controlling the resonance energy exchange in mechanical system consisting of two weakly coupled magnetic pendulums interacting with the magnetic field generated by coils placed underneath. It is shown that properly guided magnetic fields can effectively change mechanical potentials in a way that the energy flow between the oscillators takes the desired direction. Studies were considered by using a specific set of descriptive functions characterizing the total excitation level, its distribution between the pendulums, and the phase shift. The developed control strategies are based on the observation that, in the case of antiphase oscillation, the energy is moving from the pendulum subjected to the repelling magnetic field, to the oscillator under the attracting field. In contrast, during the inphase oscillations, the energy flow is reversed. Therefore, closed-loop controller requires only the information about phase shift, which is easily estimated from dynamic state signals through the coherency index. Advantage of suggested control strategy is that the temporal rate of inputs is dictated by the speed of beating, which is relatively slow compared to the carrying oscillations.
Contactless Human Activity Recognition using Deep Learning with Flexible and Scalable Software Define Radio
Authors: Muhammad Zakir Khan, Jawad Ahmad, Wadii Boulila, Matthew Broadbent, Syed Aziz Shah, Anis Koubaa, Qammer H. Abbasi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.09756
Pdf link: https://arxiv.org/pdf/2304.09756
Abstract Ambient computing is gaining popularity as a major technological advancement for the future. The modern era has witnessed a surge in the advancement in healthcare systems, with viable radio frequency solutions proposed for remote and unobtrusive human activity recognition (HAR). Specifically, this study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sensing that can be employed as a contactless means of recognizing human activity in indoor environments. These methods avoid additional costly hardware required for vision-based systems, which are privacy-intrusive, by (re)using Wi-Fi CSI for various safety and security applications. During an experiment utilizing universal software-defined radio (USRP) to collect CSI samples, it was observed that a subject engaged in six distinct activities, which included no activity, standing, sitting, and leaning forward, across different areas of the room. Additionally, more CSI samples were collected when the subject walked in two different directions. This study presents a Wi-Fi CSI-based HAR system that assesses and contrasts deep learning approaches, namely convolutional neural network (CNN), long short-term memory (LSTM), and hybrid (LSTM+CNN), employed for accurate activity recognition. The experimental results indicate that LSTM surpasses current models and achieves an average accuracy of 95.3% in multi-activity classification when compared to CNN and hybrid techniques. In the future, research needs to study the significance of resilience in diverse and dynamic environments to identify the activity of multiple users.
K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation
Authors: Shuyu Miao, Lin Zheng, Jingjing Liu, and Hong Jin
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.09758
Pdf link: https://arxiv.org/pdf/2304.09758
Abstract The label-free model evaluation aims to predict the model performance on various test sets without relying on ground truths. The main challenge of this task is the absence of labels in the test data, unlike in classical supervised model evaluation. This paper presents our solutions for the 1st DataCV Challenge of the Visual Dataset Understanding workshop at CVPR 2023. Firstly, we propose a novel method called K-means Clustering Based Feature Consistency Alignment (KCFCA), which is tailored to handle the distribution shifts of various datasets. KCFCA utilizes the K-means algorithm to cluster labeled training sets and unlabeled test sets, and then aligns the cluster centers with feature consistency. Secondly, we develop a dynamic regression model to capture the relationship between the shifts in distribution and model accuracy. Thirdly, we design an algorithm to discover the outlier model factors, eliminate the outlier models, and combine the strengths of multiple autoeval models. On the DataCV Challenge leaderboard, our approach secured 2nd place with an RMSE of 6.8526. Our method significantly improved over the best baseline method by 36\% (6.8526 vs. 10.7378). Furthermore, our method achieves a relatively more robust and optimal single model performance on the validation dataset.
Advances on Concept Drift Detection in Regression Tasks using Social Networks Theory
Authors: Jean Paul Barddal, Heitor Murilo Gomes, Fabrício Enembreck
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.09788
Pdf link: https://arxiv.org/pdf/2304.09788
Abstract Mining data streams is one of the main studies in machine learning area due to its application in many knowledge areas. One of the major challenges on mining data streams is concept drift, which requires the learner to discard the current concept and adapt to a new one. Ensemble-based drift detection algorithms have been used successfully to the classification task but usually maintain a fixed size ensemble of learners running the risk of needlessly spending processing time and memory. In this paper we present improvements to the Scale-free Network Regressor (SFNR), a dynamic ensemble-based method for regression that employs social networks theory. In order to detect concept drifts SFNR uses the Adaptive Window (ADWIN) algorithm. Results show improvements in accuracy, especially in concept drift situations and better performance compared to other state-of-the-art algorithms in both real and synthetic data.
Event-based Simultaneous Localization and Mapping: A Comprehensive Survey
Authors: Kunping Huang, Sen Zhang, Jing Zhang, Dacheng Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.09793
Pdf link: https://arxiv.org/pdf/2304.09793
Abstract In recent decades, visual simultaneous localization and mapping (vSLAM) has gained significant interest in both academia and industry. It estimates camera motion and reconstructs the environment concurrently using visual sensors on a moving robot. However, conventional cameras are limited by hardware, including motion blur and low dynamic range, which can negatively impact performance in challenging scenarios like high-speed motion and high dynamic range illumination. Recent studies have demonstrated that event cameras, a new type of bio-inspired visual sensor, offer advantages such as high temporal resolution, dynamic range, low power consumption, and low latency. This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks. The review covers the working principle of event cameras and various event representations for preprocessing event data. It also categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods, with detailed discussions and practical guidance for each approach. Furthermore, the paper evaluates the state-of-the-art methods on various benchmarks, highlighting current challenges and future opportunities in this emerging research area. A public repository will be maintained to keep track of the rapid developments in this field at {\url{https://github.com/kun150kun/ESLAM-survey}}.
Leveraging Deep Reinforcement Learning for Metacognitive Interventions across Intelligent Tutoring Systems
Authors: Mark Abdelshiheed, John Wesley Hostetter, Tiffany Barnes, Min Chi
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2304.09821
Pdf link: https://arxiv.org/pdf/2304.09821
Abstract This work compares two approaches to provide metacognitive interventions and their impact on preparing students for future learning across Intelligent Tutoring Systems (ITSs). In two consecutive semesters, we conducted two classroom experiments: Exp. 1 used a classic artificial intelligence approach to classify students into different metacognitive groups and provide static interventions based on their classified groups. In Exp. 2, we leveraged Deep Reinforcement Learning (DRL) to provide adaptive interventions that consider the dynamic changes in the student's metacognitive levels. In both experiments, students received these interventions that taught how and when to use a backward-chaining (BC) strategy on a logic tutor that supports a default forward-chaining strategy. Six weeks later, we trained students on a probability tutor that only supports BC without interventions. Our results show that adaptive DRL-based interventions closed the metacognitive skills gap between students. In contrast, static classifier-based interventions only benefited a subset of students who knew how to use BC in advance. Additionally, our DRL agent prepared the experimental students for future learning by significantly surpassing their control peers on both ITSs.
Learning and Adapting Agile Locomotion Skills by Transferring Experience
Authors: Laura Smith, J. Chase Kew, Tianyu Li, Linda Luu, Xue Bin Peng, Sehoon Ha, Jie Tan, Sergey Levine
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.09834
Pdf link: https://arxiv.org/pdf/2304.09834
Abstract Legged robots have enormous potential in their range of capabilities, from navigating unstructured terrains to high-speed running. However, designing robust controllers for highly agile dynamic motions remains a substantial challenge for roboticists. Reinforcement learning (RL) offers a promising data-driven approach for automatically training such controllers. However, exploration in these high-dimensional, underactuated systems remains a significant hurdle for enabling legged robots to learn performant, naturalistic, and versatile agility skills. We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. To leverage controllers we can acquire in practice, we design this framework to be flexible in terms of their source -- that is, the controllers may have been optimized for a different objective under different dynamics, or may require different knowledge of the surroundings -- and thus may be highly suboptimal for the target task. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments. We also demonstrate that the agile behaviors learned in this way are graceful and safe enough to deploy in the real world.
Evaluating Verifiability in Generative Search Engines
Authors: Nelson F. Liu, Tianyi Zhang, Percy Liang
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2304.09848
Pdf link: https://arxiv.org/pdf/2304.09848
Abstract Generative search engines directly generate responses to user queries, along with in-line citations. A prerequisite trait of a trustworthy generative search engine is verifiability, i.e., systems should cite comprehensively (high citation recall; all statements are fully supported by citations) and accurately (high citation precision; every cite supports its associated statement). We conduct human evaluation to audit four popular generative search engines -- Bing Chat, NeevaAI, perplexity.ai, and YouChat -- across a diverse set of queries from a variety of sources (e.g., historical Google user queries, dynamically-collected open-ended questions on Reddit, etc.). We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations: on average, a mere 51.5% of generated sentences are fully supported by citations and only 74.5% of citations support their associated sentence. We believe that these results are concerningly low for systems that may serve as a primary tool for information-seeking users, especially given their facade of trustworthiness. We hope that our results further motivate the development of trustworthy generative search engines and help researchers and users better understand the shortcomings of existing commercial systems.
Patching Neural Barrier Functions Using Hamilton-Jacobi Reachability
Authors: Sander Tonkens, Alex Toofanian, Zhizhen Qin, Sicun Gao, Sylvia Herbert
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.09850
Pdf link: https://arxiv.org/pdf/2304.09850
Abstract Learning-based control algorithms have led to major advances in robotics at the cost of decreased safety guarantees. Recently, neural networks have also been used to characterize safety through the use of barrier functions for complex nonlinear systems. Learned barrier functions approximately encode and enforce a desired safety constraint through a value function, but do not provide any formal guarantees. In this paper, we propose a local dynamic programming (DP) based approach to "patch" an almost-safe learned barrier at potentially unsafe points in the state space. This algorithm, HJ-Patch, obtains a novel barrier that provides formal safety guarantees, yet retains the global structure of the learned barrier. Our local DP based reachability algorithm, HJ-Patch, updates the barrier function "minimally" at points that both (a) neighbor the barrier safety boundary and (b) do not satisfy the safety condition. We view this as a key step to bridging the gap between learning-based barrier functions and Hamilton-Jacobi reachability analysis, providing a framework for further integration of these approaches. We demonstrate that for well-trained barriers we reduce the computational load by 2 orders of magnitude with respect to standard DP-based reachability, and demonstrate scalability to a 6-dimensional system, which is at the limit of standard DP-based reachability.

A-suozhang / GetArxivDaily

New submissions for Thu, 20 Apr 23 #37

Keyword: efficient

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

Generative models improve fairness of medical classifiers under distribution shifts

A Data Driven Sequential Learning Framework to Accelerate and Optimize Multi-Objective Manufacturing Decisions

Leveraging Deep Learning Techniques on Collaborative Filtering Recommender Systems

Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search

From RSSE to BotSE: Potentials and Challenges Revisited after 15 Years

Application of genetic algorithm to load balancing in networks with a homogeneous traffic flow

Provably-Efficient and Internally-Deterministic Parallel Union-Find

BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information Retrieval

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles

SP-BatikGAN: An Efficient Generative Adversarial Network for Symmetric Pattern Generation

Information Geometrically Generalized Covariate Shift Adaptation

Inferring High-level Geographical Concepts via Knowledge Graph and Multi-scale Data Integration: A Case Study of C-shaped Building Pattern Recognition

On the Capacity Region of Reconfigurable Intelligent Surface Assisted Symbiotic Radios

Torque-based Deep Reinforcement Learning for Task-and-Robot Agnostic Learning on Bipedal Robots Using Sim-to-Real Transfer

Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators

Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment

Learning Resource Scheduling with High Priority Users using Deep Deterministic Policy Gradients

Neural Network Quantisation for Faster Homomorphic Encryption

Sampling is Matter: Point-guided 3D Human Mesh Reconstruction

Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand

SelfAct: Personalized Activity Recognition based on Self-Supervised and Active Learning

Graph Exploration for Effective Multi-agent Q-Learning

The State-of-the-Art in Air Pollution Monitoring and Forecasting Systems using IoT, Big Data, and Machine Learning

DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation

Efficient High-Order Space-Angle-Energy Polytopic Discontinuous Galerkin Finite Element Methods for Linear Boltzmann Transport

AdapterGNN: Efficient Delta Tuning Improves Generalization Ability in Graph Neural Networks

LEA: Beyond Evolutionary Algorithms via Learned Optimization Strategy

StyleDEM: a Versatile Model for Authoring Terrains

Integrated Ray-Tracing and Coverage Planning Control using Reinforcement Learning

Resource Allocation in the RIS Assisted SCMA Cellular Network Coexisting with D2D Communications

List Defective Colorings: Distributed Algorithms and Applications

Operations for D-algebraic Functions

GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Grooming Connectivity Intents in IP-Optical Networks Using Directed Acyclic Graphs

A compact simple HWENO scheme with ADER time discretization for hyperbolic conservation laws I: structured meshes

A Multi-robot Coverage Path Planning Algorithm Based on Improved DARP Algorithm

Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently

Nearly Work-Efficient Parallel DFS in Undirected Graphs

Post-Training Quantization for Object Detection

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Progressive-Hint Prompting Improves Reasoning in Large Language Models

VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction

Transformer-Based Visual Segmentation: A Survey

Keyword: faster

LEA: Beyond Evolutionary Algorithms via Learned Optimization Strategy

List Defective Colorings: Distributed Algorithms and Applications

Grooming Connectivity Intents in IP-Optical Networks Using Directed Acyclic Graphs

Comma Selection Outperforms Plus Selection on OneMax with Randomly Planted Optima

FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

LipsFormer: Introducing Lipschitz Continuity to Vision Transformers

Keyword: mobile

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

Secure Mobile Payment Architecture Enabling Multi-factor Authentication

Learning Resource Scheduling with High Priority Users using Deep Deterministic Policy Gradients

SelfAct: Personalized Activity Recognition based on Self-Supervised and Active Learning

DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic Conditions

Integrated Ray-Tracing and Coverage Planning Control using Reinforcement Learning

Keyword: pruning

Network Pruning Spaces

Biologically inspired structure learning with reverse knowledge distillation for spiking neural networks

Single-View View Synthesis with Self-Rectified Pseudo-Stereo

Keyword: voxel

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Keyword: lidar

Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection

CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

UniCal: a Single-Branch Transformer-Based Model for Camera-to-LiDAR Calibration and Validation

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

Keyword: diffusion

A structure-preserving upwind DG scheme for a degenerate phase-field tumor model

DiFaReli : Diffusion Face Relighting

Realistic Data Enrichment for Robust Image Segmentation in Histopathology