New submissions for Thu, 30 Mar 23

Keyword: efficient

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

Authors: Valentin Macé, Raphaël Boige, Felix Chalumeau, Thomas Pierrot, Guillaume Richard, Nicolas Perrin-Gilbert
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16207
Pdf link: https://arxiv.org/pdf/2303.16207
Abstract In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.
ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales
Authors: Xingfu Wu, Prasanna Balaprakash, Michael Kruse, Jaehoon Koo, Brice Videau, Paul Hovland, Valerie Taylor, Brad Geltz, Siddhartha Jana, Mary Hall
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2303.16245
Pdf link: https://arxiv.org/pdf/2303.16245
Abstract As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application runtime and power/energy for energy efficient application execution, then use this framework to autotune four ECP proxy applications -- XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with a Random Forest surrogate model to effectively search parameter spaces with up to 6 million different configurations on two large-scale production systems, Theta at Argonne National Laboratory and Summit at Oak Ridge National Laboratory. The experimental results show that our autotuning framework at large scales has low overhead and achieves good scalability. Using the proposed autotuning framework to identify the best configurations, we achieve up to 91.59% performance improvement, up to 21.2% energy savings, and up to 37.84% EDP improvement on up to 4,096 nodes.
Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples
Authors: Jingwei Sun, Ziyue Xu, Dong Yang, Vishwesh Nath, Wenqi Li, Can Zhao, Daguang Xu, Yiran Chen, Holger R. Roth
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16270
Pdf link: https://arxiv.org/pdf/2303.16270
Abstract Federated learning is a popular collaborative learning approach that enables clients to train a global model without sharing their local data. Vertical federated learning (VFL) deals with scenarios in which the data on clients have different feature spaces but share some overlapping samples. Existing VFL approaches suffer from high communication costs and cannot deal efficiently with limited overlapping samples commonly seen in the real world. We propose a practical vertical federated learning (VFL) framework called \textbf{one-shot VFL} that can solve the communication bottleneck and the problem of limited overlapping samples simultaneously based on semi-supervised learning. We also propose \textbf{few-shot VFL} to improve the accuracy further with just one more communication round between the server and the clients. In our proposed framework, the clients only need to communicate with the server once or only a few times. We evaluate the proposed VFL framework on both image and tabular datasets. Our methods can improve the accuracy by more than 46.5\% and reduce the communication cost by more than 330$\times$ compared with state-of-the-art VFL methods when evaluated on CIFAR-10. Our code will be made publicly available at \url{https://nvidia.github.io/NVFlare/research/one-shot-vfl}.
A Machine Learning Outlook: Post-processing of Global Medium-range Forecasts
Authors: Shreya Agrawal, Rob Carver, Cenk Gazen, Eric Maddy, Vladimir Krasnopolsky, Carla Bromberg, Zack Ontiveros, Tyler Russell, Jason Hickey, Sid Boukabara
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2303.16301
Pdf link: https://arxiv.org/pdf/2303.16301
Abstract Post-processing typically takes the outputs of a Numerical Weather Prediction (NWP) model and applies linear statistical techniques to produce improve localized forecasts, by including additional observations, or determining systematic errors at a finer scale. In this pilot study, we investigate the benefits and challenges of using non-linear neural network (NN) based methods to post-process multiple weather features -- temperature, moisture, wind, geopotential height, precipitable water -- at 30 vertical levels, globally and at lead times up to 7 days. We show that we can achieve accuracy improvements of up to 12% (RMSE) in a field such as temperature at 850hPa for a 7 day forecast. However, we recognize the need to strengthen foundational work on objectively measuring a sharp and correct forecast. We discuss the challenges of using standard metrics such as root mean squared error (RMSE) or anomaly correlation coefficient (ACC) as we move from linear statistical models to more complex non-linear machine learning approaches for post-processing global weather forecasts.
Operator learning with PCA-Net: upper and lower complexity bounds
Authors: Samuel Lanthaler
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2303.16317
Pdf link: https://arxiv.org/pdf/2303.16317
Abstract Neural operators are gaining attention in computational science and engineering. PCA-Net is a recently proposed neural operator architecture which combines principal component analysis (PCA) with neural networks to approximate an underlying operator. The present work develops approximation theory for this approach, improving and significantly extending previous work in this direction. In terms of qualitative bounds, this paper derives a novel universal approximation result, under minimal assumptions on the underlying operator and the data-generating distribution. In terms of quantitative bounds, two potential obstacles to efficient operator learning with PCA-Net are identified, and made rigorous through the derivation of lower complexity bounds; the first relates to the complexity of the output distribution, measured by a slow decay of the PCA eigenvalues. The other obstacle relates the inherent complexity of the space of operators between infinite-dimensional input and output spaces, resulting in a rigorous and quantifiable statement of the curse of dimensionality. In addition to these lower bounds, upper complexity bounds are derived; first, a suitable smoothness criterion is shown to ensure a algebraic decay of the PCA eigenvalues. Then, it is shown that PCA-Net can overcome the general curse of dimensionality for specific operators of interest, arising from the Darcy flow and Navier-Stokes equations.
A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI
Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16376
Pdf link: https://arxiv.org/pdf/2303.16376
Abstract Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.
ProductAE: Toward Deep Learning Driven Error-Correction Codes of Large Dimensions
Authors: Mohammad Vahid Jamali, Hamid Saber, Homayoon Hatami, Jung Hyun Bae
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16424
Pdf link: https://arxiv.org/pdf/2303.16424
Abstract While decades of theoretical research have led to the invention of several classes of error-correction codes, the design of such codes is an extremely challenging task, mostly driven by human ingenuity. Recent studies demonstrate that such designs can be effectively automated and accelerated via tools from machine learning (ML), thus enabling ML-driven classes of error-correction codes with promising performance gains compared to classical designs. A fundamental challenge, however, is that it is prohibitively complex, if not impossible, to design and train fully ML-driven encoder and decoder pairs for large code dimensions. In this paper, we propose Product Autoencoder (ProductAE) -- a computationally-efficient family of deep learning driven (encoder, decoder) pairs -- aimed at enabling the training of relatively large codes (both encoder and decoder) with a manageable training complexity. We build upon ideas from classical product codes and propose constructing large neural codes using smaller code components. ProductAE boils down the complex problem of training the encoder and decoder for a large code dimension $k$ and blocklength $n$ to less-complex sub-problems of training encoders and decoders for smaller dimensions and blocklengths. Our training results show successful training of ProductAEs of dimensions as large as $k = 300$ bits with meaningful performance gains compared to state-of-the-art classical and neural designs. Moreover, we demonstrate excellent robustness and adaptivity of ProductAEs to channel models different than the ones used for training.
Learning Excavation of Rigid Objects with Offline Reinforcement Learning
Authors: Shiyu Jin, Zhixian Ye, Liangjun Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2303.16427
Pdf link: https://arxiv.org/pdf/2303.16427
Abstract Autonomous excavation is a challenging task. The unknown contact dynamics between the excavator bucket and the terrain could easily result in large contact forces and jamming problems during excavation. Traditional model-based methods struggle to handle such problems due to complex dynamic modeling. In this paper, we formulate the excavation skills with three novel manipulation primitives. We propose to learn the manipulation primitives with offline reinforcement learning (RL) to avoid large amounts of online robot interactions. The proposed method can learn efficient penetration skills from sub-optimal demonstrations, which contain sub-trajectories that can be ``stitched" together to formulate an optimal trajectory without causing jamming. We evaluate the proposed method with extensive experiments on excavating a variety of rigid objects and demonstrate that the learned policy outperforms the demonstrations. We also show that the learned policy can quickly adapt to unseen and challenging fragmented rocks with online fine-tuning.
Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations
Authors: Liu Haofeng, Chen Yiwen, Tan Jiayi, Marcelo H Ang
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16469
Pdf link: https://arxiv.org/pdf/2303.16469
Abstract Combined with demonstrations, deep reinforcement learning can efficiently develop policies for manipulators. However, it takes time to collect sufficient high-quality demonstrations in practice. And human demonstrations may be unsuitable for robots. The non-Markovian process and over-reliance on demonstrations are further challenges. For example, we found that RL agents are sensitive to demonstration quality in manipulation tasks and struggle to adapt to demonstrations directly from humans. Thus it is challenging to leverage low-quality and insufficient demonstrations to assist reinforcement learning in training better policies, and sometimes, limited demonstrations even lead to worse performance. We propose a new algorithm named TD3fG (TD3 learning from a generator) to solve these problems. It forms a smooth transition from learning from experts to learning from experience. This innovation can help agents extract prior knowledge while reducing the detrimental effects of the demonstrations. Our algorithm performs well in Adroit manipulator and MuJoCo tasks with limited demonstrations.
Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
Authors: Tao Hu, Xiaogang Xu, Shu Liu, Jiaya Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16482
Pdf link: https://arxiv.org/pdf/2303.16482
Abstract Synthesizing photo-realistic images from a point cloud is challenging because of the sparsity of point cloud representation. Recent Neural Radiance Fields and extensions are proposed to synthesize realistic images from 2D input. In this paper, we present Point2Pix as a novel point renderer to link the 3D sparse point clouds with 2D dense image pixels. Taking advantage of the point cloud 3D prior and NeRF rendering pipeline, our method can synthesize high-quality images from colored point clouds, generally for novel indoor scenes. To improve the efficiency of ray sampling, we propose point-guided sampling, which focuses on valid samples. Also, we present Point Encoding to build Multi-scale Radiance Fields that provide discriminative 3D point features. Finally, we propose Fusion Encoding to efficiently synthesize high-quality images. Extensive experiments on the ScanNet and ArkitScenes datasets demonstrate the effectiveness and generalization.
TriVol: Point Cloud Rendering via Triple Volumes
Authors: Tao Hu, Xiaogang Xu, Ruihang Chu, Jiaya Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16485
Pdf link: https://arxiv.org/pdf/2303.16485
Abstract Existing learning-based methods for point cloud rendering adopt various 3D representations and feature querying mechanisms to alleviate the sparsity problem of point clouds. However, artifacts still appear in rendered images, due to the challenges in extracting continuous and discriminative 3D features from point clouds. In this paper, we present a dense while lightweight 3D representation, named TriVol, that can be combined with NeRF to render photo-realistic images from point clouds. Our TriVol consists of triple slim volumes, each of which is encoded from the point cloud. TriVol has two advantages. First, it fuses respective fields at different scales and thus extracts local and non-local features for discriminative representation. Second, since the volume size is greatly reduced, our 3D decoder can be efficiently inferred, allowing us to increase the resolution of the 3D space to render more point details. Extensive experiments on different benchmarks with varying kinds of scenes/objects demonstrate our framework's effectiveness compared with current approaches. Moreover, our framework has excellent generalization ability to render a category of scenes/objects without fine-tuning.
Development of a deep learning-based tool to assist wound classification
Authors: Po-Hsuan Huang, Yi-Hsiang Pan, Ying-Sheng Luo, Yi-Fan Chen, Yu-Cheng Lo, Trista Pei-Chun Chen, Cherng-Kang Perng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16522
Pdf link: https://arxiv.org/pdf/2303.16522
Abstract This paper presents a deep learning-based wound classification tool that can assist medical personnel in non-wound care specialization to classify five key wound conditions, namely deep wound, infected wound, arterial wound, venous wound, and pressure wound, given color images captured using readily available cameras. The accuracy of the classification is vital for appropriate wound management. The proposed wound classification method adopts a multi-task deep learning framework that leverages the relationships among the five key wound conditions for a unified wound classification architecture. With differences in Cohen's kappa coefficients as the metrics to compare our proposed model with humans, the performance of our model was better or non-inferior to those of all human medical personnel. Our convolutional neural network-based model is the first to classify five tasks of deep, infected, arterial, venous, and pressure wounds simultaneously with good accuracy. The proposed model is compact and matches or exceeds the performance of human doctors and nurses. Medical personnel who do not specialize in wound care can potentially benefit from an app equipped with the proposed deep learning model.
RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition
Authors: Igor Markov, Sergey Nesteruk, Andrey Kuznetsov, Denis Dimitrov
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16531
Pdf link: https://arxiv.org/pdf/2303.16531
Abstract Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While there are many high-quality datasets for English text recognition; there are no available datasets for Russian language. In this paper, we present a large-scale human-labeled dataset for Russian text recognition in-the-wild. We also publish a synthetic dataset and code to reproduce the generation process
Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks
Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16563
Pdf link: https://arxiv.org/pdf/2303.16563
Abstract We study building a multi-task agent in Minecraft. Without human demonstrations, solving long-horizon tasks in this open-ended environment with reinforcement learning (RL) is extremely sample inefficient. To tackle the challenge, we decompose solving Minecraft tasks into learning basic skills and planning over the skills. We propose three types of fine-grained basic skills in Minecraft, and use RL with intrinsic rewards to accomplish basic skills with high success rates. For skill planning, we use Large Language Models to find the relationships between skills and build a skill graph in advance. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 24 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines in most tasks by a large margin. The project's website and code can be found at https://sites.google.com/view/plan4mc.
Efficient and Reconfigurable Optimal Planning in Large-Scale Systems Using Hierarchical Finite State Machines
Authors: Elis Stefansson, Karl H. Johansson
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2303.16567
Pdf link: https://arxiv.org/pdf/2303.16567
Abstract In this paper, we consider a planning problem for a large-scale system modelled as a hierarchical finite state machine (HFSM) and develop a control algorithm for computing optimal plans between any two states. The control algorithm consists of two steps: a preprocessing step computing optimal exit costs for each machine in the HFSM, with time complexity scaling linearly with the number of machines, and a query step that rapidly computes optimal plans, truncating irrelevant parts of the HFSM using the optimal exit costs, with time complexity scaling near-linearly with the depth of the HFSM. The control algorithm is reconfigurable in the sense that a change in the HFSM is efficiently handled, updating only needed parts in the preprocessing step to account for the change, with time complexity linear in the depth of the HFSM. We validate our algorithm on a robotic application, comparing with Dijkstra's algorithm and Contraction Hierarchies. Our algorithm outperforms both.
Satisfiability of Non-Linear Transcendental Arithmetic as a Certificate Search Problem
Authors: Enrico Lipparini, Stefan Ratschan
Subjects: Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2303.16582
Pdf link: https://arxiv.org/pdf/2303.16582
Abstract For typical first-order logical theories, satisfying assignments have a straightforward finite representation that can directly serve as a certificate that a given assignment satisfies the given formula. For non-linear real arithmetic with transcendental functions, however, no general finite representation of satisfying assignments is available. Hence, in this paper, we introduce a different form of satisfiability certificate for this theory, formulate the satisfiability verification problem as the problem of searching for such a certificate, and show how to perform this search in a systematic fashion. This does not only ease the independent verification of results, but also allows the systematic design of new, efficient search techniques. Computational experiments document that the resulting method is able to prove satisfiability of a substantially higher number of benchmark problems than existing methods.
4D Facial Expression Diffusion Model
Authors: Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16611
Pdf link: https://arxiv.org/pdf/2303.16611
Abstract Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.
AraSpot: Arabic Spoken Command Spotting
Authors: Mahmoud Salhab, Haidar Harmanani
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16621
Pdf link: https://arxiv.org/pdf/2303.16621
Abstract Spoken keyword spotting (KWS) is the task of identifying a keyword in an audio stream and is widely used in smart devices at the edge in order to activate voice assistants and perform hands-free tasks. The task is daunting as there is a need, on the one hand, to achieve high accuracy while at the same time ensuring that such systems continue to run efficiently on low power and possibly limited computational capabilities devices. This work presents AraSpot for Arabic keyword spotting trained on 40 Arabic keywords, using different online data augmentation, and introducing ConformerGRU model architecture. Finally, we further improve the performance of the model by training a text-to-speech model for synthetic data generation. AraSpot achieved a State-of-the-Art SOTA 99.59% result outperforming previous approaches.
Optimizing Reconfigurable Intelligent Surfaces for Small Data Packets: A Subarray Approach
Authors: Anders Enqvist, Özlem Tuğfe Demir, Cicek Cavdar, Emil Björnson
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2303.16625
Pdf link: https://arxiv.org/pdf/2303.16625
Abstract In this paper, we examine the energy consumption of a user equipment (UE) when it transmits a finite-sized data packet. The receiving base station (BS) controls a reconfigurable intelligent surface (RIS) that can be utilized to improve the channel conditions, if additional pilot signals are transmitted to configure the RIS. We derive a formula for the energy consumption taking both the pilot and data transmission powers into account. By dividing the RIS into subarrays consisting of multiple RIS elements using the same reflection coefficient, the pilot overhead can be tuned to minimize the energy consumption while maintaining parts of the aperture gain. Our analytical results show that there exists an energy-minimizing subarray size. For small data blocks and when the channel conditions between the BS and UE are favorable compared to the path to the RIS, the energy consumption is minimized using large subarrays. When the channel conditions to the RIS are better and the data blocks are large, it is preferable to use fewer elements per subarray and potentially configure the elements individually.
Modeling online adaptive navigation in virtual environments based on PID control
Authors: Yuyang Wang, Jean-Rémy Chardonnet, Frédéric Merienne
Subjects: Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2303.16635
Pdf link: https://arxiv.org/pdf/2303.16635
Abstract It is well known that locomotion-dominated navigation tasks may highly provoke cybersickness effects. Past research has proposed numerous approaches to tackle this issue based on offline considerations. In this work, a novel approach to mitigate cybersickness is presented based on online adaptative navigation. Considering the Proportional-Integral-Derivative (PID) control method, we proposed a mathematical model for online adaptive navigation parameterized with several parameters, taking as input the users' electro-dermal activity (EDA), an efficient indicator to measure the cybersickness level, and providing as output adapted navigation accelerations. Therefore, minimizing the cybersickness level is regarded as an argument optimization problem: find the PID model parameters which can reduce the severity of cybersickness. User studies were organized to collect non-adapted navigation accelerations and the corresponding EDA signals. A deep neural network was then formulated to learn the correlation between EDA and navigation accelerations. The hyperparameters of the network were obtained through the Optuna open-source framework. To validate the performance of the optimized online adaptive navigation developed through the PID control, we performed an analysis in a simulated user study based on the pre-trained deep neural network. Results indicate a significant reduction of cybersickness in terms of EDA signal analysis and motion sickness dose value. This is a pioneering work which presented a systematic strategy for adaptive navigation settings from a theoretical point.
Learning Augmented, Multi-Robot Long-Horizon Navigation in Partially Mapped Environments
Authors: Abhish Khanal, Gregory J. Stein
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2303.16654
Pdf link: https://arxiv.org/pdf/2303.16654
Abstract We present a novel approach for efficient and reliable goal-directed long-horizon navigation for a multi-robot team in a structured, unknown environment by predicting statistics of unknown space. Building on recent work in learning-augmented model based planning under uncertainty, we introduce a high-level state and action abstraction that lets us approximate the challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of regions more likely to reach the unseen goal. We demonstrate improvement in cost against other multi-robot strategies; in simulated office-like environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3 robot) average cost versus standard non-learned optimistic planning and a learning-informed baseline.
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Authors: Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16727
Pdf link: https://arxiv.org/pdf/2303.16727
Abstract Scale is the primary factor for building a powerful foundation model that could well generalize to a variety of downstream tasks. However, it is still challenging to train video foundation models with billions of parameters. This paper shows that video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models. We scale the VideoMAE in both model and data with a core design. Specifically, we present a dual masking strategy for efficient pre-training, with an encoder operating on a subset of video tokens and a decoder processing another subset of video tokens. Although VideoMAE is very efficient due to high masking ratio in encoder, masking decoder can still further reduce the overall computational cost. This enables the efficient pre-training of billion-level models in video. We also use a progressive training paradigm that involves an initial pre-training on a diverse multi-sourced unlabeled dataset, followed by a post-pre-training on a mixed labeled dataset. Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90.0% on K400 and 89.9% on K600) and Something-Something (68.7% on V1 and 77.0% on V2). In addition, we extensively verify the pre-trained video ViT models on a variety of downstream tasks, demonstrating its effectiveness as a general video representation learner.
Predictive Resource Allocation in mmWave Systems with Rotation Detection
Authors: Yifei Sun, Bojie Lv, Rui Wang, Haisheng Tan, Francis C. M. Lau
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2303.16734
Pdf link: https://arxiv.org/pdf/2303.16734
Abstract Millimeter wave (MmWave) has been regarded as a promising technology to support high-capacity communications in 5G era. However, its high-layer performance such as latency and packet drop rate in the long term highly depends on resource allocation because mmWave channel suffers significant fluctuation with rotating users due to mmWave sparse channel property and limited field-of-view (FoV) of antenna arrays. In this paper, downlink transmission scheduling considering rotation of user equipments (UE) and limited antenna FoV in an mmWave system is optimized via a novel approximate Markov decision process (MDP) method. Specifically, we consider the joint downlink UE selection and power allocation in a number of frames where future orientations of rotating UEs can be predicted via embedded motion sensors. The problem is formulated as a finite-horizon MDP with non-stationary state transition probabilities. A novel low-complexity solution framework is proposed via one iteration step over a base policy whose average future cost can be predicted with analytical expressions. It is demonstrated by simulations that compared with existing benchmarks, the proposed scheme can schedule the downlink transmission and suppress the packet drop rate efficiently in non-stationary mmWave links.
Improving Code Generation by Training with Natural Language Feedback
Authors: Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16749
Pdf link: https://arxiv.org/pdf/2303.16749
Abstract The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and fine-tuning on repaired programs written by humans. Overall, our results suggest that learning from human-written natural language feedback is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM's performance on code generation tasks.
Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge
Authors: Yuan Zhang, Chuanyi Li, Yu Sheng, Jidong Ge, Bin Luo
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2303.16751
Pdf link: https://arxiv.org/pdf/2303.16751
Abstract In formal procedure of civil cases, the textual materials provided by different parties describe the development process of the cases. It is a difficult but necessary task to extract the key information for the cases from these textual materials and to clarify the dispute focus of related parties. Currently, officers read the materials manually and use methods, such as keyword searching and regular matching, to get the target information. These approaches are time-consuming and heavily depending on prior knowledge and carefulness of the officers. To assist the officers to enhance working efficiency and accuracy, we propose an approach to detect disputes from divorce cases based on a two-round-labeling event extracting technique in this paper. We implement the Judicial Intelligent Assistant (JIA) system according to the proposed approach to 1) automatically extract focus events from divorce case materials, 2) align events by identifying co-reference among them, and 3) detect conflicts among events brought by the plaintiff and the defendant. With the JIA system, it is convenient for judges to determine the disputed issues. Experimental results demonstrate that the proposed approach and system can obtain the focus of cases and detect conflicts more effectively and efficiently comparing with existing method.
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture
Authors: Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Ji-Rong Wen
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.16753
Pdf link: https://arxiv.org/pdf/2303.16753
Abstract In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable parameter-sharing architecture based on matrix product operator (MPO). MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts: the major part that contains the major information (central tensor) and the supplementary part that only has a small proportion of parameters (auxiliary tensors). Based on such a decomposition, our architecture shares the central tensor across all layers for reducing the model size and meanwhile keeps layer-specific auxiliary tensors (also using adapters) for enhancing the adaptation flexibility. To improve the model training, we further propose a stable initialization algorithm tailored for the MPO-based architecture. Extensive experiments have demonstrated the effectiveness of our proposed model in reducing the model size and achieving highly competitive performance.
ACO-tagger: A Novel Method for Part-of-Speech Tagging using Ant Colony Optimization
Authors: Amirhossein Mohammadi, Sara Hajiaghajani, Mohammad Bahrani
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2303.16760
Pdf link: https://arxiv.org/pdf/2303.16760
Abstract Swarm Intelligence algorithms have gained significant attention in recent years as a means of solving complex and non-deterministic problems. These algorithms are inspired by the collective behavior of natural creatures, and they simulate this behavior to develop intelligent agents for computational tasks. One such algorithm is Ant Colony Optimization (ACO), which is inspired by the foraging behavior of ants and their pheromone laying mechanism. ACO is used for solving difficult problems that are discrete and combinatorial in nature. Part-of-Speech (POS) tagging is a fundamental task in natural language processing that aims to assign a part-of-speech role to each word in a sentence. In this research paper, proposed a high-performance POS-tagging method based on ACO called ACO-tagger. This method achieved a high accuracy rate of 96.867%, outperforming several state-of-the-art methods. The proposed method is fast and efficient, making it a viable option for practical applications.
Computationally Efficient Labeling of Cancer Related Forum Posts by Non-Clinical Text Information Retrieval
Authors: Jimmi Agerskov, Kristian Nielsen, Christian Marius Lillelund, Christian Fischer Pedersen
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16766
Pdf link: https://arxiv.org/pdf/2303.16766
Abstract An abundance of information about cancer exists online, but categorizing and extracting useful information from it is difficult. Almost all research within healthcare data processing is concerned with formal clinical data, but there is valuable information in non-clinical data too. The present study combines methods within distributed computing, text retrieval, clustering, and classification into a coherent and computationally efficient system, that can clarify cancer patient trajectories based on non-clinical and freely available information. We produce a fully-functional prototype that can retrieve, cluster and present information about cancer trajectories from non-clinical forum posts. We evaluate three clustering algorithms (MR-DBSCAN, DBSCAN, and HDBSCAN) and compare them in terms of Adjusted Rand Index and total run time as a function of the number of posts retrieved and the neighborhood radius. Clustering results show that neighborhood radius has the most significant impact on clustering performance. For small values, the data set is split accordingly, but high values produce a large number of possible partitions and searching for the best partition is hereby time-consuming. With a proper estimated radius, MR-DBSCAN can cluster 50000 forum posts in 46.1 seconds, compared to DBSCAN (143.4) and HDBSCAN (282.3). We conduct an interview with the Danish Cancer Society and present our software prototype. The organization sees a potential in software that can democratize online information about cancer and foresee that such systems will be required in the future.
Maximin Headway Control of Automated Vehicles for System Optimal Dynamic Traffic Assignment in General Networks
Authors: Jinxiao Du, Wei Ma
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2303.16772
Pdf link: https://arxiv.org/pdf/2303.16772
Abstract This study develops the headway control framework in a fully automated road network, as we believe headway of Automated Vehicles (AVs) is another influencing factor to traffic dynamics in addition to conventional vehicle behaviors (e.g. route and departure time choices). Specifically, we aim to search for the optimal time headway between AVs on each link that achieves the network-wide system optimal dynamic traffic assignment (SO-DTA). To this end, the headway-dependent fundamental diagram (HFD) and headway-dependent double queue model (HDQ) are developed to model the effect of dynamic headway on roads, and a dynamic network model is built. It is rigorously proved that the minimum headway could always achieve SO-DTA, yet the optimal headway is non-unique. Motivated by these two findings, this study defines a novel concept of maximin headway, which is the largest headway that still achieves SO-DTA in the network. Mathematical properties regarding maximin headway are analyzed and an efficient solution algorithm is developed. Numerical experiments on both a small and large network verify the effectiveness of the maximin headway control framework as well as the properties of maximin headway. This study sheds light on deriving the desired solution among the non-unique solutions in SO-DTA and provides implications regarding the safety margin of AVs under SO-DTA.
Dispersion relation reconstruction for 2D Photonic Crystals based on polynomial interpolation
Authors: Yueqi Wang, Guanglian Li, Richard Craster
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2303.16787
Pdf link: https://arxiv.org/pdf/2303.16787
Abstract Dispersion relation reflects the dependence of wave frequency on its wave vector when the wave passes through certain material. It demonstrates the properties of this material and thus it is critical. However, dispersion relation reconstruction is very time consuming and expensive. To address this bottleneck, we propose in this paper an efficient dispersion relation reconstruction scheme based on global polynomial interpolation for the approximation of 2D photonic band functions. Our method relies on the fact that the band functions are piecewise analytic with respect to the wave vector in the first Brillouin zone. We utilize suitable sampling points in the first Brillouin zone at which we solve the eigenvalue problem involved in the band function calculation, and then employ Lagrange interpolation to approximate the band functions on the whole first Brillouin zone. Numerical results show that our proposed methods can significantly improve the computational efficiency.
On real and observable realizations of input-output equations
Authors: Sebastian Falkensteiner, Dmitrii Pavlov, Rafael Sendra
Subjects: Symbolic Computation (cs.SC); Algebraic Geometry (math.AG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2303.16799
Pdf link: https://arxiv.org/pdf/2303.16799
Abstract Given a single algebraic input-output equation, we present a method for finding different representations of the associated system in the form of rational realizations; these are dynamical systems with rational right-hand sides. It has been shown that in the case where the input-output equation is of order one, rational realizations can be computed, if they exist. In this work, we focus first on the existence and actual computation of the so-called observable rational realizations, and secondly on rational realizations with real coefficients. The study of observable realizations allows to find every rational realization of a given first order input-output equation, and the necessary field extensions in this process. We show that for first order input-output equations the existence of a rational realization is equivalent to the existence of an observable rational realization. Moreover, we give a criterion to decide the existence of real rational realizations. The computation of observable and real realizations of first order input-output equations is fully algorithmic. We also present partial results for the case of higher order input-output equations.
Adaptive Superpixel for Active Learning in Semantic Segmentation
Authors: Hoyoung Kim, Minhyeon Oh, Sehyun Hwang, Suha Kwak, Jungseul Ok
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16817
Pdf link: https://arxiv.org/pdf/2303.16817
Abstract Learning semantic segmentation requires pixel-wise annotations, which can be time-consuming and expensive. To reduce the annotation cost, we propose a superpixel-based active learning (AL) framework, which collects a dominant label per superpixel instead. To be specific, it consists of adaptive superpixel and sieving mechanisms, fully dedicated to AL. At each round of AL, we adaptively merge neighboring pixels of similar learned features into superpixels. We then query a selected subset of these superpixels using an acquisition function assuming no uniform superpixel size. This approach is more efficient than existing methods, which rely only on innate features such as RGB color and assume uniform superpixel sizes. Obtaining a dominant label per superpixel drastically reduces annotators' burden as it requires fewer clicks. However, it inevitably introduces noisy annotations due to mismatches between superpixel and ground truth segmentation. To address this issue, we further devise a sieving mechanism that identifies and excludes potentially noisy annotations from learning. Our experiments on both Cityscapes and PASCAL VOC datasets demonstrate the efficacy of adaptive superpixel and sieving mechanisms.
Multi-View Keypoints for Reliable 6D Object Pose Estimation
Authors: Alan Li, Angela P. Schoellig
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16833
Pdf link: https://arxiv.org/pdf/2303.16833
Abstract 6D Object pose estimation is a fundamental component in robotics enabling efficient interaction with the environment. It is particularly challenging in bin-picking applications, where many objects are low-feature and reflective, and self-occlusion between objects of the same type is common. We propose a novel multi-view approach leveraging known camera transformations from an eye-in-hand setup to combine heatmap and keypoint estimates into a probability density map over 3D space. The result is a robust approach that is scalable in the number of views. It relies on a confidence score composed of keypoint probabilities and point-cloud alignment error, which allows reliable rejection of false positives. We demonstrate an average pose estimation error of approximately 0.5mm and 2 degrees across a variety of difficult low-feature and reflective objects in the ROBI dataset, while also surpassing the state-of-art correct detection rate, measured using the 10% object diameter threshold on ADD error.
Full-Range Approximation for the Theis Well Function Using Ramanujan's Series and Bounds for the Exponential Integral
Authors: Manotosh Kumbhakar, Vijay P. Singh
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2303.16871
Pdf link: https://arxiv.org/pdf/2303.16871
Abstract The solution of the governing equation representing the drawdown in a horizontal confined aquifer, where groundwater flow is unsteady, is provided in terms of the exponential integral, which is famously known as the Well function. For the computation of this function in practical applications, it is important to develop not only accurate but also a simple approximation that requires evaluation of the fewest possible terms. To that end, introducing Ramanujan's series expression, this work proposes a full-range approximation to the exponential integral using Ramanujan's series for the small argument (u \leq 1) and an approximation based on the bound of the integral for the other range (u \in (1,100]). The evaluation of the proposed approximation results in the most accurate formulae compared to the existing studies, which possess the maximum percentage error of 0.05\%. Further, the proposed formula is much simpler to apply as it contains just the product of exponential and logarithm functions. To further check the efficiency of the proposed approximation, we consider a practical example for evaluating the discrete pumping kernel, which shows the superiority of this approximation over the others. Finally, the authors hope that the proposed efficient approximation can be useful for groundwater and hydrogeological applications.
CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network
Authors: Ruyi Lian, Haibin Ling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16874
Pdf link: https://arxiv.org/pdf/2303.16874
Abstract Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task. Recent studies have shown the great potential of dense correspondence-based solutions, yet improvements are still needed to reach practical deployment. In this paper, we propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects. Firstly, CheckerPose densely samples 3D keypoints from the surface of the 3D object and finds their 2D correspondences progressively in the 2D image. Compared to previous solutions that conduct dense sampling in the image space, our strategy enables the correspondence searching in a 2D grid (i.e., pixel coordinate). Secondly, for our 3D-to-2D correspondence, we design a compact binary code representation for 2D image locations. This representation not only allows for progressive correspondence refinement but also converts the correspondence regression to a more efficient classification problem. Thirdly, we adopt a graph neural network to explicitly model the interactions among the sampled 3D keypoints, further boosting the reliability and accuracy of the correspondences. Together, these novel components make our CheckerPose a strong pose estimation algorithm. When evaluated on the popular Linemod, Linemod-O, and YCB-V object pose estimation benchmarks, CheckerPose clearly boosts the accuracy of correspondence-based methods and achieves state-of-the-art performances.
Keyword: faster

An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect
Authors: Ronghua Shang, Songling Zhu, Licheng Jiao, Songhua Xu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16212
Pdf link: https://arxiv.org/pdf/2303.16212
Abstract The network pruning algorithm based on evolutionary multi-objective (EMO) can balance the pruning rate and performance of the network. However, its population-based nature often suffers from the complex pruning optimization space and the highly resource-consuming pruning structure verification process, which limits its application. To this end, this paper proposes an EMO joint pruning with multiple sub-networks (EMO-PMS) to reduce space complexity and resource consumption. First, a divide-and-conquer EMO network pruning framework is proposed, which decomposes the complex EMO pruning task on the whole network into easier sub-tasks on multiple sub-networks. On the one hand, this decomposition reduces the pruning optimization space and decreases the optimization difficulty; on the other hand, the smaller network structure converges faster, so the computational resource consumption of the proposed algorithm is lower. Secondly, a sub-network training method based on cross-network constraints is designed so that the sub-network can process the features generated by the previous one through feature constraints. This method allows sub-networks optimized independently to collaborate better and improves the overall performance of the pruned network. Finally, a multiple sub-networks joint pruning method based on EMO is proposed. For one thing, it can accurately measure the feature processing capability of the sub-networks with the pre-trained feature selector. For another, it can combine multi-objective pruning results on multiple sub-networks through global performance impairment ranking to design a joint pruning scheme. The proposed algorithm is validated on three datasets with different challenging. Compared with fifteen advanced pruning algorithms, the experiment results exhibit the effectiveness and efficiency of the proposed algorithm.
Accelerated wind farm yaw and layout optimisation with multi-fidelity deep transfer learning wake models
Authors: Sokratis Anagnostopoulos, Jens Bauer, Mariana C. A. Clare, Matthew D. Piggott
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2303.16274
Pdf link: https://arxiv.org/pdf/2303.16274
Abstract Wind farm modelling has been an area of rapidly increasing interest with numerous analytical as well as computational-based approaches developed to extend the margins of wind farm efficiency and maximise power production. In this work, we present the novel ML framework WakeNet, which can reproduce generalised 2D turbine wake velocity fields at hub-height over a wide range of yaw angles, wind speeds and turbulence intensities (TIs), with a mean accuracy of 99.8% compared to the solution calculated using the state-of-the-art wind farm modelling software FLORIS. As the generation of sufficient high-fidelity data for network training purposes can be cost-prohibitive, the utility of multi-fidelity transfer learning has also been investigated. Specifically, a network pre-trained on the low-fidelity Gaussian wake model is fine-tuned in order to obtain accurate wake results for the mid-fidelity Curl wake model. The robustness and overall performance of WakeNet on various wake steering control and layout optimisation scenarios has been validated through power-gain heatmaps, obtaining at least 90% of the power gained through optimisation performed with FLORIS directly. We also demonstrate that when utilising the Curl model, WakeNet is able to provide similar power gains to FLORIS, two orders of magnitude faster (e.g. 10 minutes vs 36 hours per optimisation case). The wake evaluation time of wakeNet when trained on a high-fidelity CFD dataset is expected to be similar, thus further increasing computational time gains. These promising results show that generalised wake modelling with ML tools can be accurate enough to contribute towards active yaw and layout optimisation, while producing realistic optimised configurations at a fraction of the computational cost, hence making it feasible to perform real-time active yaw control as well as robust optimisation under uncertainty.
FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation
Authors: Zhuoran Xiong, Marihan Amein, Olivier Therrien, Warren J. Gross, Brett H. Meyer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16322
Pdf link: https://arxiv.org/pdf/2303.16322
Abstract We present FMAS, a fast multi-objective neural architecture search framework for semantic segmentation. FMAS subsamples the structure and pre-trained parameters of DeepLabV3+, without fine-tuning, dramatically reducing training time during search. To further reduce candidate evaluation time, we use a subset of the validation dataset during the search. Only the final, Pareto non-dominated, candidates are ultimately fine-tuned using the complete training set. We evaluate FMAS by searching for models that effectively trade accuracy and computational cost on the PASCAL VOC 2012 dataset. FMAS finds competitive designs quickly, e.g., taking just 0.5 GPU days to discover a DeepLabV3+ variant that reduces FLOPs and parameters by 10$\%$ and 20$\%$ respectively, for less than 3$\%$ increased error. We also search on an edge device called GAP8 and use its latency as the metric. FMAS is capable of finding 2.2$\times$ faster network with 7.61$\%$ MIoU loss.
A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI
Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16376
Pdf link: https://arxiv.org/pdf/2303.16376
Abstract Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.
Hard Regularization to Prevent Collapse in Online Deep Clustering without Data Augmentation
Authors: Louis Mahon, Thomas Lukasiewicz
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16521
Pdf link: https://arxiv.org/pdf/2303.16521
Abstract Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. Successful existing models have employed various techniques to avoid this problem, most of which require data augmentation or which aim to make the average soft assignment across the dataset the same for each cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments. Using a Bayesian framework, we derive an intuitive optimization objective that can be straightforwardly included in the training of the encoder network. Tested on four image datasets, we show that it consistently avoids collapse more robustly than other methods and that it leads to more accurate clustering. We also conduct further experiments and analyses justifying our choice to regularize the hard cluster assignments.
Runtime Verification of Self-Adaptive Systems with Changing Requirements
Authors: Marc Carwehl, Thomas Vogel, Genaína Nunes Rodrigues, Lars Grunske
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2303.16530
Pdf link: https://arxiv.org/pdf/2303.16530
Abstract To accurately make adaptation decisions, a self-adaptive system needs precise means to analyze itself at runtime. To this end, runtime verification can be used in the feedback loop to check that the managed system satisfies its requirements formalized as temporal-logic properties. These requirements, however, may change due to system evolution or uncertainty in the environment, managed system, and requirements themselves. Thus, the properties under investigation by the runtime verification have to be dynamically adapted to represent the changing requirements while preserving the knowledge about requirements satisfaction gathered thus far, all with minimal latency. To address this need, we present a runtime verification approach for self-adaptive systems with changing requirements. Our approach uses property specification patterns to automatically obtain automata with precise semantics that are the basis for runtime verification. The automata can be safely adapted during runtime verification while preserving intermediate verification results to seamlessly reflect requirement changes and enable continuous verification. We evaluate our approach on an Arduino prototype of the Body Sensor Network and the Timescales benchmark. Results show that our approach is over five times faster than the typical approach of redeploying and restarting runtime monitors to reflect requirements changes, while improving the system's trustworthiness by avoiding interruptions of verification.
Cyber Security aboard Micro Aerial Vehicles: An OpenTitan-based Visual Communication Use Case
Authors: Maicol Ciani, Stefano Bonato, Rafail Psiakis, Angelo Garofalo, Luca Valente, Suresh Sugumar, Alessandro Giusti, Davide Rossi, Daniele Palossi
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2303.16554
Pdf link: https://arxiv.org/pdf/2303.16554
Abstract Autonomous Micro Aerial Vehicles (MAVs), with a form factor of 10cm in diameter, are an emerging technology thanks to the broad applicability enabled by their onboard intelligence. However, these platforms are strongly limited in the onboard power envelope for processing, i.e., less than a few hundred mW, which confines the onboard processors to the class of simple microcontroller units (MCUs). These MCUs lack advanced security features opening the way to a wide range of cyber security vulnerabilities, from the communication between agents of the same fleet to the onboard execution of malicious code. This work presents an open source System on Chip (SoC) design that integrates a 64 bit Linux capable host processor accelerated by an 8 core 32 bit parallel programmable accelerator. The heterogeneous system architecture is coupled with a security enclave based on an open source OpenTitan root of trust. To demonstrate our design, we propose a use case where OpenTitan detects a security breach on the SoC aboard the MAV and drives its exclusive GPIOs to start a LED blinking routine. This procedure embodies an unconventional visual communication between two palm sized MAVs: the receiver MAV classifies the LED state of the sender (on or off) with an onboard convolutional neural network running on the parallel accelerator. Then, it reconstructs a high-level message in 1.3s, 2.3 times faster than current commercial solutions.
An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets
Authors: Amin Setayesh, Hamid Hadian, Radu Prodan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2303.16601
Pdf link: https://arxiv.org/pdf/2303.16601
Abstract Host load prediction is essential for dynamic resource scaling and job scheduling in a cloud computing environment. In this context, workload prediction is challenging because of several issues. First, it must be accurate to enable precise scheduling decisions. Second, it must be fast to schedule at the right time. Third, a model must be able to account for new patterns of workloads so it can perform well on the latest and old patterns. Not being able to make an accurate and fast prediction or the inability to predict new usage patterns can result in severe outcomes such as service level agreement (SLA) misses. Our research trains a fast model with the ability of online adaptation based on the gated recurrent unit (GRU) to mitigate the mentioned issues. We use a multivariate approach using several features, such as memory usage, CPU usage, disk I/O usage, and disk space, to perform the predictions accurately. Moreover, we predict multiple steps ahead, which is essential for making scheduling decisions in advance. Furthermore, we use two pruning methods: L1 norm and random, to produce a sparse model for faster forecasts. Finally, online learning is used to create a model that can adapt over time to new workload patterns.
Keyword: mobile

Assessing the Impact of Mobile Attackers on RPL-based Internet of Things
Authors: Cansu Dogan, Selim Yilmaz, Sevil Sen
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2303.16499
Pdf link: https://arxiv.org/pdf/2303.16499
Abstract The Internet of Things (IoT) is becoming ubiquitous in our daily life. IoT networks that are made up of devices low power, low memory, and low computing capability appears in many applications such as healthcare, home, agriculture. IPv6 Routing Protocol for Low Power and Lossy Network (RPL) has become a standardized routing protocol for such low-power and lossy networks in IoT. RPL establishes the best routes between devices according to the requirements of the application, which is achieved by the Objective Function (OF). Even though some security mechanisms are defined for external attackers in its RFC, RPL is vulnerable to attacks coming from inside. Moreover, the same attacks could has different impacts on networks with different OFs. Therefore, an analysis of such attacks becomes important in order to develop suitable security solutions for RPL. This study analyze RPL-specific attacks on networks using RPL's default OFs, namely Objective Function Zero (OF0) and the Minimum Rank with Hysteresis Objective Function (MRHOF). Moreover, mobile attackers could affect more nodes in a network due to their mobility. While the security solutions proposed in the literature assume that the network is static, this study takes into account mobile attackers.
Multi-Agent Reinforcement Learning with Action Masking for UAV-enabled Mobile Communications
Authors: Danish Rizvi, David Boyle
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16737
Pdf link: https://arxiv.org/pdf/2303.16737
Abstract Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. Building upon prior research efforts which consider either static nodes, 2D trajectories or single UAV systems, this paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput. Firstly, a weighted K-means-based clustering algorithm establishes UAV-user associations at regular intervals. The efficacy of training a novel Shared Deep Q-Network (SDQN) with action masking is then explored. Unlike training each UAV separately using DQN, the SDQN reduces training time by using the experiences of multiple UAVs instead of a single agent. We also show that SDQN can be used to train a multi-agent system with differing action spaces. Simulation results confirm that: 1) training a shared DQN outperforms a conventional DQN in terms of maximum system throughput (+20%) and training time (-10%); 2) it can converge for agents with different action spaces, yielding a 9% increase in throughput compared to mutual learning algorithms; and 3) combining NOMA with an SDQN architecture enables the network to achieve a better sum rate compared with existing baseline schemes.
Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimziation
Authors: Dongyu Yan, Jianheng Liu, Fengyu Quan, Haoyao Chen, Mengmeng Fu
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16739
Pdf link: https://arxiv.org/pdf/2303.16739
Abstract Actively planning sensor views during object reconstruction is essential to autonomous mobile robots. This task is usually performed by evaluating information gain from an explicit uncertainty map. Existing algorithms compare options among a set of preset candidate views and select the next-best-view from them. In contrast to these, we take the emerging implicit representation as the object model and seamlessly combine it with the active reconstruction task. To fully integrate observation information into the model, we propose a supervision method specifically for object-level reconstruction that considers both valid and free space. Additionally, to directly evaluate view information from the implicit object model, we introduce a sample-based uncertainty evaluation method. It samples points on rays directly from the object model and uses variations of implicit function inferences as the uncertainty metrics, with no need for voxel traversal or an additional information map. Leveraging the differentiability of our metrics, it is possible to optimize the next-best-view by maximizing the uncertainty continuously. This does away with the traditionally-used candidate views setting, which may provide sub-optimal results. Experiments in simulations and real-world scenes show that our method effectively improves the reconstruction accuracy and the view-planning efficiency of active reconstruction tasks. The proposed system is going to open source at https://github.com/HITSZ-NRSL/ActiveImplicitRecon.git.
Keyword: pruning

An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect
Authors: Ronghua Shang, Songling Zhu, Licheng Jiao, Songhua Xu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16212
Pdf link: https://arxiv.org/pdf/2303.16212
Abstract The network pruning algorithm based on evolutionary multi-objective (EMO) can balance the pruning rate and performance of the network. However, its population-based nature often suffers from the complex pruning optimization space and the highly resource-consuming pruning structure verification process, which limits its application. To this end, this paper proposes an EMO joint pruning with multiple sub-networks (EMO-PMS) to reduce space complexity and resource consumption. First, a divide-and-conquer EMO network pruning framework is proposed, which decomposes the complex EMO pruning task on the whole network into easier sub-tasks on multiple sub-networks. On the one hand, this decomposition reduces the pruning optimization space and decreases the optimization difficulty; on the other hand, the smaller network structure converges faster, so the computational resource consumption of the proposed algorithm is lower. Secondly, a sub-network training method based on cross-network constraints is designed so that the sub-network can process the features generated by the previous one through feature constraints. This method allows sub-networks optimized independently to collaborate better and improves the overall performance of the pruned network. Finally, a multiple sub-networks joint pruning method based on EMO is proposed. For one thing, it can accurately measure the feature processing capability of the sub-networks with the pre-trained feature selector. For another, it can combine multi-objective pruning results on multiple sub-networks through global performance impairment ranking to design a joint pruning scheme. The proposed algorithm is validated on three datasets with different challenging. Compared with fifteen advanced pruning algorithms, the experiment results exhibit the effectiveness and efficiency of the proposed algorithm.
Tetra-AML: Automatic Machine Learning via Tensor Networks
Authors: A. Naumov, Ar. Melnikov, V. Abronin, F. Oxanichenko, K. Izmailov, M. Pflitsch, A. Melnikov, M. Perelshtein
Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2303.16214
Pdf link: https://arxiv.org/pdf/2303.16214
Abstract Neural networks have revolutionized many aspects of society but in the era of huge models with billions of parameters, optimizing and deploying them for commercial applications can require significant computational and financial resources. To address these challenges, we introduce the Tetra-AML toolbox, which automates neural architecture search and hyperparameter optimization via a custom-developed black-box Tensor train Optimization algorithm, TetraOpt. The toolbox also provides model compression through quantization and pruning, augmented by compression using tensor networks. Here, we analyze a unified benchmark for optimizing neural networks in computer vision tasks and show the superior performance of our approach compared to Bayesian optimization on the CIFAR-10 dataset. We also demonstrate the compression of ResNet-18 neural networks, where we use 14.5 times less memory while losing just 3.2% of accuracy. The presented framework is generic, not limited by computer vision problems, supports hardware acceleration (such as with GPUs and TPUs) and can be further extended to quantum hardware and to hybrid quantum machine learning models.
An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets
Authors: Amin Setayesh, Hamid Hadian, Radu Prodan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2303.16601
Pdf link: https://arxiv.org/pdf/2303.16601
Abstract Host load prediction is essential for dynamic resource scaling and job scheduling in a cloud computing environment. In this context, workload prediction is challenging because of several issues. First, it must be accurate to enable precise scheduling decisions. Second, it must be fast to schedule at the right time. Third, a model must be able to account for new patterns of workloads so it can perform well on the latest and old patterns. Not being able to make an accurate and fast prediction or the inability to predict new usage patterns can result in severe outcomes such as service level agreement (SLA) misses. Our research trains a fast model with the ability of online adaptation based on the gated recurrent unit (GRU) to mitigate the mentioned issues. We use a multivariate approach using several features, such as memory usage, CPU usage, disk I/O usage, and disk space, to perform the predictions accurately. Moreover, we predict multiple steps ahead, which is essential for making scheduling decisions in advance. Furthermore, we use two pruning methods: L1 norm and random, to produce a sparse model for faster forecasts. Finally, online learning is used to create a model that can adapt over time to new workload patterns.
Keyword: voxel

SnakeVoxFormer: Transformer-based Single Image\Voxel Reconstruction with Run Length Encoding
Authors: Jae Joong Lee, Bedrich Benes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16293
Pdf link: https://arxiv.org/pdf/2303.16293
Abstract Deep learning-based 3D object reconstruction has achieved unprecedented results. Among those, the transformer deep neural model showed outstanding performance in many applications of computer vision. We introduce SnakeVoxFormer, a novel, 3D object reconstruction in voxel space from a single image using the transformer. The input to SnakeVoxFormer is a 2D image, and the result is a 3D voxel model. The key novelty of our approach is in using the run-length encoding that traverses (like a snake) the voxel space and encodes wide spatial differences into a 1D structure that is suitable for transformer encoding. We then use dictionary encoding to convert the discovered RLE blocks into tokens that are used for the transformer. The 1D representation is a lossless 3D shape data compression method that converts to 1D data that use only about 1% of the original data size. We show how different voxel traversing strategies affect the effect of encoding and reconstruction. We compare our method with the state-of-the-art for 3D voxel reconstruction from images and our method improves the state-of-the-art methods by at least 2.8% and up to 19.8%.
A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI
Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16376
Pdf link: https://arxiv.org/pdf/2303.16376
Abstract Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.
Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimziation
Authors: Dongyu Yan, Jianheng Liu, Fengyu Quan, Haoyao Chen, Mengmeng Fu
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16739
Pdf link: https://arxiv.org/pdf/2303.16739
Abstract Actively planning sensor views during object reconstruction is essential to autonomous mobile robots. This task is usually performed by evaluating information gain from an explicit uncertainty map. Existing algorithms compare options among a set of preset candidate views and select the next-best-view from them. In contrast to these, we take the emerging implicit representation as the object model and seamlessly combine it with the active reconstruction task. To fully integrate observation information into the model, we propose a supervision method specifically for object-level reconstruction that considers both valid and free space. Additionally, to directly evaluate view information from the implicit object model, we introduce a sample-based uncertainty evaluation method. It samples points on rays directly from the object model and uses variations of implicit function inferences as the uncertainty metrics, with no need for voxel traversal or an additional information map. Leveraging the differentiability of our metrics, it is possible to optimize the next-best-view by maximizing the uncertainty continuously. This does away with the traditionally-used candidate views setting, which may provide sub-optimal results. Experiments in simulations and real-world scenes show that our method effectively improves the reconstruction accuracy and the view-planning efficiency of active reconstruction tasks. The proposed system is going to open source at https://github.com/HITSZ-NRSL/ActiveImplicitRecon.git.
Instant Neural Radiance Fields Stylization
Authors: Shaoxu Li, Ye Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16884
Pdf link: https://arxiv.org/pdf/2303.16884
Abstract We present Instant Neural Radiance Fields Stylization, a novel approach for multi-view image stylization for the 3D scene. Our approach models a neural radiance field based on neural graphics primitives, which use a hash table-based position encoder for position embedding. We split the position encoder into two parts, the content and style sub-branches, and train the network for normal novel view image synthesis with the content and style targets. In the inference stage, we execute AdaIN to the output features of the position encoder, with content and style voxel grid features as reference. With the adjusted features, the stylization of novel view images could be obtained. Our method extends the style target from style images to image sets of scenes and does not require additional network training for stylization. Given a set of images of 3D scenes and a style target(a style image or another set of 3D scenes), our method can generate stylized novel views with a consistent appearance at various view angles in less than 10 minutes on modern GPU hardware. Extensive experimental results demonstrate the validity and superiority of our method.
Keyword: lidar

Spatiotemporal Self-supervised Learning for Point Clouds in the Wild
Authors: Yanhao Wu, Tong Zhang, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16235
Pdf link: https://arxiv.org/pdf/2303.16235
Abstract Self-supervised learning (SSL) has the potential to benefit many applications, particularly those where manually annotating data is cumbersome. One such situation is the semantic segmentation of point clouds. In this context, existing methods employ contrastive learning strategies and define positive pairs by performing various augmentation of point clusters in a single frame. As such, these methods do not exploit the temporal nature of LiDAR data. In this paper, we introduce an SSL strategy that leverages positive pairs in both the spatial and temporal domain. To this end, we design (i) a point-to-cluster learning strategy that aggregates spatial information to distinguish objects; and (ii) a cluster-to-cluster learning strategy based on unsupervised object tracking that exploits temporal correspondences. We demonstrate the benefits of our approach via extensive experiments performed by self-supervised training on two large-scale LiDAR datasets and transferring the resulting models to other point cloud segmentation benchmarks. Our results evidence that our method outperforms the state-of-the-art point cloud SSL methods.
BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection
Authors: Haimei Zhao, Qiming Zhang, Shanshan Zhao, Jing Zhang, Dacheng Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16818
Pdf link: https://arxiv.org/pdf/2303.16818
Abstract Multi-view camera-based 3D object detection has gained popularity due to its low cost. But accurately inferring 3D geometry solely from camera data remains challenging, which impacts model performance. One promising approach to address this issue is to distill precise 3D geometry knowledge from LiDAR data. However, transferring knowledge between different sensor modalities is hindered by the significant modality gap. In this paper, we approach this challenge from the perspective of both architecture design and knowledge distillation and present a new simulated multi-modal 3D object detection method named BEVSimDet. We first introduce a novel framework that includes a LiDAR and camera fusion-based teacher and a simulated multi-modal student, where the student simulates multi-modal features with image-only input. To facilitate effective distillation, we propose a simulated multi-modal distillation scheme that supports intra-modal, cross-modal, and multi-modal distillation simultaneously. By combining them together, BEVSimDet can learn better feature representations for 3D object detection while enjoying cost-effective camera-only deployment. Experimental results on the challenging nuScenes benchmark demonstrate the effectiveness and superiority of BEVSimDet over recent representative methods. The source code will be released.
Photometric LiDAR and RGB-D Bundle Adjustment
Authors: Luca Di Giammarino, Emanuele Giacomini, Leonardo Brizi, Omar Salem, Giorgio Grisetti
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2303.16878
Pdf link: https://arxiv.org/pdf/2303.16878
Abstract The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of Simultaneous Localization and Mapping (SLAM) systems. To achieve this, the gold standard is Bundle Adjustment (BA). Modern 3D LiDARs now retain higher resolutions that enable the creation of point cloud images resembling those taken by conventional cameras. Nevertheless, the typical effective global refinement techniques employed for RGB-D sensors are not widely applied to LiDARs. This paper presents a novel BA photometric strategy that accounts for both RGB-D and LiDAR in the same way. Our work can be used on top of any SLAM/GNSS estimate to improve and refine the initial trajectory. We conducted different experiments using these two depth sensors on public benchmarks. Our results show that our system performs on par or better compared to other state-of-the-art ad-hoc SLAM/BA strategies, free from data association and without making assumptions about the environment. In addition, we present the benefit of jointly using RGB-D and LiDAR within our unified method. We finally release an open-source CUDA/C++ implementation.
Keyword: diffusion

Rethinking CycleGAN: Improving Quality of GANs for Unpaired Image-to-Image Translation
Authors: Dmitrii Torbunov, Yi Huang, Huan-Hsin Tseng, Haiwang Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, Yihui Ren
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16280
Pdf link: https://arxiv.org/pdf/2303.16280
Abstract An unpaired image-to-image (I2I) translation technique seeks to find a mapping between two domains of data in a fully unsupervised manner. While the initial solutions to the I2I problem were provided by the generative adversarial neural networks (GANs), currently, diffusion models (DM) hold the state-of-the-art status on the I2I translation benchmarks in terms of FID. Yet, they suffer from some limitations, such as not using data from the source domain during the training, or maintaining consistency of the source and translated images only via simple pixel-wise errors. This work revisits the classic CycleGAN model and equips it with recent advancements in model architectures and model training procedures. The revised model is shown to significantly outperform other advanced GAN- and DM-based competitors on a variety of benchmarks. In the case of Male2Female translation of CelebA, the model achieves over 40% improvement in FID score compared to the state-of-the-art results. This work also demonstrates the ineffectiveness of the pixel-wise I2I translation faithfulness metrics and suggests their revision. The code and trained models are available at https://github.com/LS4GAN/uvcgan2
A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI
Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16376
Pdf link: https://arxiv.org/pdf/2303.16376
Abstract Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.
A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
Authors: Haomin Zhuang, Yihua Zhang, Sijia Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16378
Pdf link: https://arxiv.org/pdf/2303.16378
Abstract Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem 'query-free attack generation'. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content.
Implicit Diffusion Models for Continuous Super-Resolution
Authors: Sicheng Gao, Xuhui Liu, Bohan Zeng, Sheng Xu, Yanjing Li, Xiaoyan Luo, Jianzhuang Liu, Xiantong Zhen, Baochang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16491
Pdf link: https://arxiv.org/pdf/2303.16491
Abstract Image super-resolution (SR) has attracted increasing attention due to its wide applications. However, current SR methods generally suffer from over-smoothing and artifacts, and most work only with fixed magnifications. This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution. IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework, where the implicit neural representation is adopted in the decoding process to learn continuous-resolution representation. Furthermore, we design a scale-controllable conditioning mechanism that consists of a low-resolution (LR) conditioning network and a scaling factor. The scaling factor regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output, which enables the model to accommodate the continuous-resolution requirement. Extensive experiments validate the effectiveness of our IDM and demonstrate its superior performance over prior arts.
HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
Authors: Animesh Karnewar, Andrea Vedaldi, David Novotny, Niloy Mitra
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2303.16509
Pdf link: https://arxiv.org/pdf/2303.16509
Abstract Diffusion models have emerged as the best approach for generative modeling of 2D images. Part of their success is due to the possibility of training them on millions if not billions of images with a stable learning objective. However, extending these models to 3D remains difficult for two reasons. First, finding a large quantity of 3D training data is much more complex than for 2D images. Second, while it is conceptually trivial to extend the models to operate on 3D rather than 2D grids, the associated cubic growth in memory and compute complexity makes this infeasible. We address the first challenge by introducing a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision; and the second challenge by proposing an image formation model that decouples model memory from spatial memory. We evaluate our method on real-world data, using the CO3D dataset which has not been used to train 3D generative models before. We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models
Authors: Konstantina Nikolaidou, George Retsinas, Vincent Christlein, Mathias Seuret, Giorgos Sfikas, Elisa Barney Smith, Hamam Mokayed, Marcus Liwicki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16576
Pdf link: https://arxiv.org/pdf/2303.16576
Abstract Text-to-Image synthesis is the task of generating an image according to a specific text description. Generative Adversarial Networks have been considered the standard method for image synthesis virtually since their introduction; today, Denoising Diffusion Probabilistic Models are recently setting a new baseline, with remarkable results in Text-to-Image synthesis, among other fields. Aside its usefulness per se, it can also be particularly relevant as a tool for data augmentation to aid training models for other document image processing tasks. In this work, we present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method manages to generate realistic word image samples from different writer styles, by using class index styles and text content prompts without the need of adversarial training, writer recognition, or text recognition. We gauge system performance with Frechet Inception Distance, writer recognition accuracy, and writer retrieval. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and gets similar writer retrieval score as real data.
4D Facial Expression Diffusion Model
Authors: Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16611
Pdf link: https://arxiv.org/pdf/2303.16611
Abstract Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.
MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path
Authors: Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16765
Pdf link: https://arxiv.org/pdf/2303.16765
Abstract Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise. We analyze the corresponding parameters of these manipulations and the manipulation schedule. We show that some previous editing methods fit nicely into our framework. Particularly, we identified one specific configuration as a new type of control by manipulating the predicted noise, which can perform higher-quality edits than previous work for a variety of local and global edits.
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Authors: Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2303.16897
Pdf link: https://arxiv.org/pdf/2303.16897
Abstract Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely available in the real world and can not be applied to synthesize impact sounds from common videos. On the other hand, existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds since they lack of physics knowledge. In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. In addition to the video content, we propose to use additional physics priors to guide the impact sound synthesis procedure. The physics priors include both physics parameters that are directly estimated from noisy real-world impact sound examples without sophisticated setup and learned residual parameters that interpret the sound environment via neural networks. We further implement a novel diffusion model with specific training and inference strategies to combine physics priors and visual information for impact sound synthesis. Experimental results show that our model outperforms several existing systems in generating realistic impact sounds. More importantly, the physics-based representations are fully interpretable and transparent, thus enabling us to perform sound editing flexibly.
Keyword: dynamic

Towards Quantifying Calibrated Uncertainty via Deep Ensembles in Multi-output Regression Task
Authors: Sunwoong Yang, Kwanjung Yee
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2303.16210
Pdf link: https://arxiv.org/pdf/2303.16210
Abstract Deep ensemble is a simple and straightforward approach for approximating Bayesian inference and has been successfully applied to many classification tasks. This study aims to comprehensively investigate this approach in the multi-output regression task to predict the aerodynamic performance of a missile configuration. By scrutinizing the effect of the number of neural networks used in the ensemble, an obvious trend toward underconfidence in estimated uncertainty is observed. In this context, we propose the deep ensemble framework that applies the post-hoc calibration method, and its improved uncertainty quantification performance is demonstrated. It is compared with Gaussian process regression, the most prevalent model for uncertainty quantification in engineering, and is proven to have superior performance in terms of regression accuracy, reliability of estimated uncertainty, and training efficiency. Finally, the impact of the suggested framework on the results of Bayesian optimization is examined, showing that whether or not the deep ensemble is calibrated can result in completely different exploration characteristics. This framework can be seamlessly applied and extended to any regression task, as no special assumptions have been made for the specific problem used in this study.
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
Authors: Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16268
Pdf link: https://arxiv.org/pdf/2303.16268
Abstract Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inductive biases like using two-modalities (RGB and Optical-flow) or two-stream of different playback rates. Instead of utilizing unlabeled videos through diverse input streams, we rely on self-supervised video representations, particularly, we utilize temporally-invariant and temporally-distinctive representations. We observe that these representations complement each other depending on the nature of the action. Based on this observation, we propose a student-teacher semi-supervised learning framework, TimeBalance, where we distill the knowledge from a temporally-invariant and a temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme. Our method achieves state-of-the-art performance on three action recognition benchmarks: UCF101, HMDB51, and Kinetics400. Code: https://github.com/DAVEISHAN/TimeBalance
On the Local Cache Update Rules in Streaming Federated Learning
Authors: Heqiang Wang, Jieming Bian, Jie Xu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.16340
Pdf link: https://arxiv.org/pdf/2303.16340
Abstract In this study, we address the emerging field of Streaming Federated Learning (SFL) and propose local cache update rules to manage dynamic data distributions and limited cache capacity. Traditional federated learning relies on fixed data sets, whereas in SFL, data is streamed, and its distribution changes over time, leading to discrepancies between the local training dataset and long-term distribution. To mitigate this problem, we propose three local cache update rules - First-In-First-Out (FIFO), Static Ratio Selective Replacement (SRSR), and Dynamic Ratio Selective Replacement (DRSR) - that update the local cache of each client while considering the limited cache capacity. Furthermore, we derive a convergence bound for our proposed SFL algorithm as a function of the distribution discrepancy between the long-term data distribution and the client's local training dataset. We then evaluate our proposed algorithm on two datasets: a network traffic classification dataset and an image classification dataset. Our experimental results demonstrate that our proposed local cache update rules significantly reduce the distribution discrepancy and outperform the baseline methods. Our study advances the field of SFL and provides practical cache management solutions in federated learning.
EJ-FAT Joint ESnet JLab FPGA Accelerated Transport Load Balancer
Authors: Stacey Sheldon, Yatish Kumar, Michael Goodrich, Graham Heyes
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2303.16351
Pdf link: https://arxiv.org/pdf/2303.16351
Abstract To increase the science rate for high data rates/volumes, Thomas Jefferson National Accelerator Facility (JLab) has partnered with Energy Sciences Network (ESnet) to define an edge to data center traffic shaping / steering transport capability featuring data event-aware network shaping and forwarding. The keystone of this ESnet JLab FPGA Accelerated Transport (EJFAT) is the joint development of a dynamic compute work Load Balancer (LB) of UDP streamed data. The LB is a suite consisting of a Field Programmable Gate Array (FPGA) executing the dynamically configurable, low fixed latency LB data plane featuring real-time packet redirection at high throughput, and a control plane running on the FPGA host computer that monitors network and compute farm telemetry in order to make dynamic decisions for destination compute host redirection / load balancing.
A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI
Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16376
Pdf link: https://arxiv.org/pdf/2303.16376
Abstract Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relied on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.
ARMBench: An Object-centric Benchmark Dataset for Robotic Manipulation
Authors: Chaitanya Mitash, Fan Wang, Shiyang Lu, Vikedo Terhuja, Tyler Garaas, Felipe Polido, Manikantan Nambi
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16382
Pdf link: https://arxiv.org/pdf/2303.16382
Abstract This paper introduces Amazon Robotic Manipulation Benchmark (ARMBench), a large-scale, object-centric benchmark dataset for robotic manipulation in the context of a warehouse. Automation of operations in modern warehouses requires a robotic manipulator to deal with a wide variety of objects, unstructured storage, and dynamically changing inventory. Such settings pose challenges in perceiving the identity, physical characteristics, and state of objects during manipulation. Existing datasets for robotic manipulation consider a limited set of objects or utilize 3D models to generate synthetic scenes with limitation in capturing the variety of object properties, clutter, and interactions. We present a large-scale dataset collected in an Amazon warehouse using a robotic manipulator performing object singulation from containers with heterogeneous contents. ARMBench contains images, videos, and metadata that corresponds to 235K+ pick-and-place activities on 190K+ unique objects. The data is captured at different stages of manipulation, i.e., pre-pick, during transfer, and after placement. Benchmark tasks are proposed by virtue of high-quality annotations and baseline performance evaluation are presented on three visual perception challenges, namely 1) object segmentation in clutter, 2) object identification, and 3) defect detection. ARMBench can be accessed at this http URL
Learning Excavation of Rigid Objects with Offline Reinforcement Learning
Authors: Shiyu Jin, Zhixian Ye, Liangjun Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2303.16427
Pdf link: https://arxiv.org/pdf/2303.16427
Abstract Autonomous excavation is a challenging task. The unknown contact dynamics between the excavator bucket and the terrain could easily result in large contact forces and jamming problems during excavation. Traditional model-based methods struggle to handle such problems due to complex dynamic modeling. In this paper, we formulate the excavation skills with three novel manipulation primitives. We propose to learn the manipulation primitives with offline reinforcement learning (RL) to avoid large amounts of online robot interactions. The proposed method can learn efficient penetration skills from sub-optimal demonstrations, which contain sub-trajectories that can be ``stitched" together to formulate an optimal trajectory without causing jamming. We evaluate the proposed method with extensive experiments on excavating a variety of rigid objects and demonstrate that the learned policy outperforms the demonstrations. We also show that the learned policy can quickly adapt to unseen and challenging fragmented rocks with online fine-tuning.
Ordinary Differential Equation-based Sparse Signal Recovery
Authors: Tadashi Wadayama, Ayano Nakai-Kasai
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2303.16431
Pdf link: https://arxiv.org/pdf/2303.16431
Abstract This study investigates the use of continuous-time dynamical systems for sparse signal recovery. The proposed dynamical system is in the form of a nonlinear ordinary differential equation (ODE) derived from the gradient flow of the Lasso objective function. The sparse signal recovery process of this ODE-based approach is demonstrated by numerical simulations using the Euler method. The state of the continuous-time dynamical system eventually converges to the equilibrium point corresponding to the minimum of the objective function. To gain insight into the local convergence properties of the system, a linear approximation around the equilibrium point is applied, yielding a closed-form error evolution ODE. This analysis shows the behavior of convergence to the equilibrium point. In addition, a variational optimization problem is proposed to optimize a time-dependent regularization parameter in order to improve both convergence speed and solution quality. The deep unfolded-variational optimization method is introduced as a means of solving this optimization problem, and its effectiveness is validated through numerical experiments.
Towards Understanding the Endemic Behavior of a Competitive Tri-Virus SIS Networked Model
Authors: Sebin Gracy, Mengbin Ye, Brian D.O. Anderson, Cesar A. Uribe
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2303.16457
Pdf link: https://arxiv.org/pdf/2303.16457
Abstract This paper studies the endemic behavior of a multi-competitive networked susceptible-infected-susceptible (SIS) model. Specifically, the paper deals with three competing virus systems (i.e., tri-virus systems). First, we show that a tri-virus system, unlike a bi-virus system, is not a monotone dynamical system. Using the Parametric Transversality Theorem, we show that, generically, a tri-virus system has a finite number of equilibria and that the Jacobian matrices associated with each equilibrium are nonsingular. The endemic equilibria of this system can be classified as follows: a) single-virus endemic equilibria (also referred to as the boundary equilibria), where precisely one of the three viruses is alive; b) 2-coexistence equilibria, where exactly two of the three viruses are alive; and c) 3-coexistence equilibria, where all three viruses survive in the network. We provide a necessary and sufficient condition that guarantees local exponential convergence to a boundary equilibrium. Further, we secure conditions for the nonexistence of 3-coexistence equilibria (resp. for various forms of 2-coexistence equilibria). We also identify sufficient conditions for the existence of a 2-coexistence (resp. 3-coexistence) equilibrium. We identify conditions on the model parameters that give rise to a continuum of coexistence equilibria. More specifically, we establish i) a scenario that admits the existence and local exponential attractivity of a line of coexistence equilibria; and ii) scenarios that admit the existence of, and, in the case of one such scenario, global convergence to, a plane of 3-coexistence equilibria.
Runtime Verification of Self-Adaptive Systems with Changing Requirements
Authors: Marc Carwehl, Thomas Vogel, Genaína Nunes Rodrigues, Lars Grunske
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2303.16530
Pdf link: https://arxiv.org/pdf/2303.16530
Abstract To accurately make adaptation decisions, a self-adaptive system needs precise means to analyze itself at runtime. To this end, runtime verification can be used in the feedback loop to check that the managed system satisfies its requirements formalized as temporal-logic properties. These requirements, however, may change due to system evolution or uncertainty in the environment, managed system, and requirements themselves. Thus, the properties under investigation by the runtime verification have to be dynamically adapted to represent the changing requirements while preserving the knowledge about requirements satisfaction gathered thus far, all with minimal latency. To address this need, we present a runtime verification approach for self-adaptive systems with changing requirements. Our approach uses property specification patterns to automatically obtain automata with precise semantics that are the basis for runtime verification. The automata can be safely adapted during runtime verification while preserving intermediate verification results to seamlessly reflect requirement changes and enable continuous verification. We evaluate our approach on an Arduino prototype of the Body Sensor Network and the Timescales benchmark. Results show that our approach is over five times faster than the typical approach of redeploying and restarting runtime monitors to reflect requirements changes, while improving the system's trustworthiness by avoiding interruptions of verification.
Futures Quantitative Investment with Heterogeneous Continual Graph Neural Network
Authors: Zhizhong Tan, Min Hu, Yixuan Wang, Lu Wei, Bin Liu
Subjects: Machine Learning (cs.LG); Statistical Finance (q-fin.ST); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2303.16532
Pdf link: https://arxiv.org/pdf/2303.16532
Abstract It is a challenging problem to predict trends of futures prices with traditional econometric models as one needs to consider not only futures' historical data but also correlations among different futures. Spatial-temporal graph neural networks (STGNNs) have great advantages in dealing with such kind of spatial-temporal data. However, we cannot directly apply STGNNs to high-frequency future data because future investors have to consider both the long-term and short-term characteristics when doing decision-making. To capture both the long-term and short-term features, we exploit more label information by designing four heterogeneous tasks: price regression, price moving average regression, price gap regression (within a short interval), and change-point detection, which involve both long-term and short-term scenes. To make full use of these labels, we train our model in a continual manner. Traditional continual GNNs define the gradient of prices as the parameter important to overcome catastrophic forgetting (CF). Unfortunately, the losses of the four heterogeneous tasks lie in different spaces. Hence it is improper to calculate the parameter importance with their losses. We propose to calculate parameter importance with mutual information between original observations and the extracted features. The empirical results based on 49 commodity futures demonstrate that our model has higher prediction performance on capturing long-term or short-term dynamic change.
On the use of chaotic dynamics for mobile network design and analysis: towards a trace data generator
Authors: Martin Rosalie, Serge Chaumette
Subjects: Multiagent Systems (cs.MA); Dynamical Systems (math.DS); Chaotic Dynamics (nlin.CD)
Arxiv link: https://arxiv.org/abs/2303.16583
Pdf link: https://arxiv.org/pdf/2303.16583
Abstract With the constant increase of the number of autonomous vehicles and connected objects, tools to understand and reproduce their mobility models are required. We focus on chaotic dynamics and review their applications in the design of mobility models. We also provide a review of the nonlinear tools used to characterize mobility models, as it can be found in the literature. Finally, we propose a method to generate traces for a given scenario involving moving people, using tools from the nonlinear analysis domain usually dedicated to topological analysis of chaotic attractors.
An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets
Authors: Amin Setayesh, Hamid Hadian, Radu Prodan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2303.16601
Pdf link: https://arxiv.org/pdf/2303.16601
Abstract Host load prediction is essential for dynamic resource scaling and job scheduling in a cloud computing environment. In this context, workload prediction is challenging because of several issues. First, it must be accurate to enable precise scheduling decisions. Second, it must be fast to schedule at the right time. Third, a model must be able to account for new patterns of workloads so it can perform well on the latest and old patterns. Not being able to make an accurate and fast prediction or the inability to predict new usage patterns can result in severe outcomes such as service level agreement (SLA) misses. Our research trains a fast model with the ability of online adaptation based on the gated recurrent unit (GRU) to mitigate the mentioned issues. We use a multivariate approach using several features, such as memory usage, CPU usage, disk I/O usage, and disk space, to perform the predictions accurately. Moreover, we predict multiple steps ahead, which is essential for making scheduling decisions in advance. Furthermore, we use two pruning methods: L1 norm and random, to produce a sparse model for faster forecasts. Finally, online learning is used to create a model that can adapt over time to new workload patterns.
DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking
Authors: Qing Lian, Tai Wang, Dahua Lin, Jiangmiao Pang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2303.16628
Pdf link: https://arxiv.org/pdf/2303.16628
Abstract Recent multi-camera 3D object detectors usually leverage temporal information to construct multi-view stereo that alleviates the ill-posed depth estimation. However, they typically assume all the objects are static and directly aggregate features across frames. This work begins with a theoretical and empirical analysis to reveal that ignoring the motion of moving objects can result in serious localization bias. Therefore, we propose to model Dynamic Objects in RecurrenT (DORT) to tackle this problem. In contrast to previous global Bird-Eye-View (BEV) methods, DORT extracts object-wise local volumes for motion estimation that also alleviates the heavy computational burden. By iteratively refining the estimated object motion and location, the preceding features can be precisely aggregated to the current frame to mitigate the aforementioned adverse effects. The simple framework has two significant appealing properties. It is flexible and practical that can be plugged into most camera-based 3D object detectors. As there are predictions of object motion in the loop, it can easily track objects across frames according to their nearest center distances. Without bells and whistles, DORT outperforms all the previous methods on the nuScenes detection and tracking benchmarks with 62.5\% NDS and 57.6\% AMOTA, respectively. The source code will be released.
Learning Flow Functions from Data with Applications to Nonlinear Oscillators
Authors: Miguel Aguiar, Amritam Das, Karl H. Johansson
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.16656
Pdf link: https://arxiv.org/pdf/2303.16656
Abstract We describe a recurrent neural network (RNN) based architecture to learn the flow function of a causal, time-invariant and continuous-time control system from trajectory data. By restricting the class of control inputs to piecewise constant functions, we show that learning the flow function is equivalent to learning the input-to-state map of a discrete-time dynamical system. This motivates the use of an RNN together with encoder and decoder networks which map the state of the system to the hidden state of the RNN and back. We show that the proposed architecture is able to approximate the flow function by exploiting the system's causality and time-invariance. The output of the learned flow function model can be queried at any time instant. We experimentally validate the proposed method using models of the Van der Pol and FitzHugh Nagumo oscillators. In both cases, the results demonstrate that the architecture is able to closely reproduce the trajectories of these two systems. For the Van der Pol oscillator, we further show that the trained model generalises to the system's response with a prolonged prediction time horizon as well as control inputs outside the training distribution. For the FitzHugh-Nagumo oscillator, we show that the model accurately captures the input-dependent phenomena of excitability.
Who You Play Affects How You Play: Predicting Sports Performance Using Graph Attention Networks With Temporal Convolution
Authors: Rui Luo, Vikram Krishnamurthy
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2303.16741
Pdf link: https://arxiv.org/pdf/2303.16741
Abstract This study presents a novel deep learning method, called GATv2-GCN, for predicting player performance in sports. To construct a dynamic player interaction graph, we leverage player statistics and their interactions during gameplay. We use a graph attention network to capture the attention that each player pays to each other, allowing for more accurate modeling of the dynamic player interactions. To handle the multivariate player statistics time series, we incorporate a temporal convolution layer, which provides the model with temporal predictive power. We evaluate the performance of our model using real-world sports data, demonstrating its effectiveness in predicting player performance. Furthermore, we explore the potential use of our model in a sports betting context, providing insights into profitable strategies that leverage our predictive power. The proposed method has the potential to advance the state-of-the-art in player performance prediction and to provide valuable insights for sports analytics and betting industries.
Maximin Headway Control of Automated Vehicles for System Optimal Dynamic Traffic Assignment in General Networks
Authors: Jinxiao Du, Wei Ma
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2303.16772
Pdf link: https://arxiv.org/pdf/2303.16772
Abstract This study develops the headway control framework in a fully automated road network, as we believe headway of Automated Vehicles (AVs) is another influencing factor to traffic dynamics in addition to conventional vehicle behaviors (e.g. route and departure time choices). Specifically, we aim to search for the optimal time headway between AVs on each link that achieves the network-wide system optimal dynamic traffic assignment (SO-DTA). To this end, the headway-dependent fundamental diagram (HFD) and headway-dependent double queue model (HDQ) are developed to model the effect of dynamic headway on roads, and a dynamic network model is built. It is rigorously proved that the minimum headway could always achieve SO-DTA, yet the optimal headway is non-unique. Motivated by these two findings, this study defines a novel concept of maximin headway, which is the largest headway that still achieves SO-DTA in the network. Mathematical properties regarding maximin headway are analyzed and an efficient solution algorithm is developed. Numerical experiments on both a small and large network verify the effectiveness of the maximin headway control framework as well as the properties of maximin headway. This study sheds light on deriving the desired solution among the non-unique solutions in SO-DTA and provides implications regarding the safety margin of AVs under SO-DTA.
On real and observable realizations of input-output equations
Authors: Sebastian Falkensteiner, Dmitrii Pavlov, Rafael Sendra
Subjects: Symbolic Computation (cs.SC); Algebraic Geometry (math.AG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2303.16799
Pdf link: https://arxiv.org/pdf/2303.16799
Abstract Given a single algebraic input-output equation, we present a method for finding different representations of the associated system in the form of rational realizations; these are dynamical systems with rational right-hand sides. It has been shown that in the case where the input-output equation is of order one, rational realizations can be computed, if they exist. In this work, we focus first on the existence and actual computation of the so-called observable rational realizations, and secondly on rational realizations with real coefficients. The study of observable realizations allows to find every rational realization of a given first order input-output equation, and the necessary field extensions in this process. We show that for first order input-output equations the existence of a rational realization is equivalent to the existence of an observable rational realization. Moreover, we give a criterion to decide the existence of real rational realizations. The computation of observable and real realizations of first order input-output equations is fully algorithmic. We also present partial results for the case of higher order input-output equations.
Legged Robots for Object Manipulation: A Review
Authors: Yifeng Gong, Ge Sun, Aditya Nair, Aditya Bidwai, Raghuram CS, John Grezmak, Guillaume Sartoretti, Kathryn A. Daltorio
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2303.16865
Pdf link: https://arxiv.org/pdf/2303.16865
Abstract Legged robots can have a unique role in manipulating objects in dynamic, human-centric, or otherwise inaccessible environments. Although most legged robotics research to date typically focuses on traversing these challenging environments, many legged platform demonstrations have also included "moving an object" as a way of doing tangible work. Legged robots can be designed to manipulate a particular type of object (e.g., a cardboard box, a soccer ball, or a larger piece of furniture), by themselves or collaboratively. The objective of this review is to collect and learn from these examples, to both organize the work done so far in the community and highlight interesting open avenues for future work. This review categorizes existing works into four main manipulation methods: object interactions without grasping, manipulation with walking legs, dedicated non-locomotive arms, and legged teams. Each method has different design and autonomy features, which are illustrated by available examples in the literature. Based on a few simplifying assumptions, we further provide quantitative comparisons for the range of possible relative sizes of the manipulated object with respect to the robot. Taken together, these examples suggest new directions for research in legged robot manipulation, such as multifunctional limbs, terrain modeling, or learning-based control, to support a number of new deployments in challenging indoor/outdoor scenarios in warehouses/construction sites, preserved natural areas, and especially for home robotics.
End-to-End $n$-ary Relation Extraction for Combination Drug Therapies
Authors: Yuhang Jiang, Ramakanth Kavuluru
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.16886
Pdf link: https://arxiv.org/pdf/2303.16886
Abstract Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the "combination drug therapy" MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an $n$-ary relation extraction problem. Unlike in the general $n$-ary setting where $n$ is fixed (e.g., drug-gene-mutation relations where $n=3$), extracting combination therapies is a special setting where $n \geq 2$ is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, CombDrugExt, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of $66.7\%$ on the CombDrugExt test set for positive (or effective) combinations. This is an absolute $\approx 5\%$ F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic $n$-ary extraction scenarios.

A-suozhang / GetArxivDaily

New submissions for Thu, 30 Mar 23 #19

Keyword: efficient

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples

A Machine Learning Outlook: Post-processing of Global Medium-range Forecasts

Operator learning with PCA-Net: upper and lower complexity bounds

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

ProductAE: Toward Deep Learning Driven Error-Correction Codes of Large Dimensions

Learning Excavation of Rigid Objects with Offline Reinforcement Learning

Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations

Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields

TriVol: Point Cloud Rendering via Triple Volumes

Development of a deep learning-based tool to assist wound classification

RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks

Efficient and Reconfigurable Optimal Planning in Large-Scale Systems Using Hierarchical Finite State Machines

Satisfiability of Non-Linear Transcendental Arithmetic as a Certificate Search Problem

4D Facial Expression Diffusion Model

AraSpot: Arabic Spoken Command Spotting

Optimizing Reconfigurable Intelligent Surfaces for Small Data Packets: A Subarray Approach

Modeling online adaptive navigation in virtual environments based on PID control

Learning Augmented, Multi-Robot Long-Horizon Navigation in Partially Mapped Environments

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Predictive Resource Allocation in mmWave Systems with Rotation Detection

Improving Code Generation by Training with Natural Language Feedback

Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge

Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

ACO-tagger: A Novel Method for Part-of-Speech Tagging using Ant Colony Optimization

Computationally Efficient Labeling of Cancer Related Forum Posts by Non-Clinical Text Information Retrieval

Maximin Headway Control of Automated Vehicles for System Optimal Dynamic Traffic Assignment in General Networks

Dispersion relation reconstruction for 2D Photonic Crystals based on polynomial interpolation

On real and observable realizations of input-output equations

Adaptive Superpixel for Active Learning in Semantic Segmentation

Multi-View Keypoints for Reliable 6D Object Pose Estimation

Full-Range Approximation for the Theis Well Function Using Ramanujan's Series and Bounds for the Exponential Integral

CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network

Keyword: faster

An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect

Accelerated wind farm yaw and layout optimisation with multi-fidelity deep transfer learning wake models

FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

Hard Regularization to Prevent Collapse in Online Deep Clustering without Data Augmentation

Runtime Verification of Self-Adaptive Systems with Changing Requirements

Cyber Security aboard Micro Aerial Vehicles: An OpenTitan-based Visual Communication Use Case

An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets

Keyword: mobile

Assessing the Impact of Mobile Attackers on RPL-based Internet of Things

Multi-Agent Reinforcement Learning with Action Masking for UAV-enabled Mobile Communications

Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimziation

Keyword: pruning

An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect

Tetra-AML: Automatic Machine Learning via Tensor Networks

An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets

Keyword: voxel

SnakeVoxFormer: Transformer-based Single Image\Voxel Reconstruction with Run Length Encoding

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimziation

Instant Neural Radiance Fields Stylization

Keyword: lidar

Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection

Photometric LiDAR and RGB-D Bundle Adjustment

Keyword: diffusion

Rethinking CycleGAN: Improving Quality of GANs for Unpaired Image-to-Image Translation

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

Implicit Diffusion Models for Continuous Super-Resolution

HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

4D Facial Expression Diffusion Model

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

Keyword: dynamic

Towards Quantifying Calibrated Uncertainty via Deep Ensembles in Multi-output Regression Task

TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

On the Local Cache Update Rules in Streaming Federated Learning

EJ-FAT Joint ESnet JLab FPGA Accelerated Transport Load Balancer

A Unified Single-stage Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI