New submissions for Thu, 13 Apr 23

Keyword: efficient

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

Authors: Haley M. So, Laurie Bose, Piotr Dudek, Gordon Wetzstein
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05440
Pdf link: https://arxiv.org/pdf/2304.05440
Abstract Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing. This is challenging for perception systems operating on edge devices, because communication is power inefficient and induces latency. Fueled by innovations in stacked image sensor fabrication, emerging sensor-processors offer programmability and minimal processing capabilities directly on the sensor. We exploit these capabilities by developing an efficient recurrent neural network architecture, PixelRNN, that encodes spatio-temporal features on the sensor using purely binary operations. PixelRNN reduces the amount of data to be transmitted off the sensor by a factor of 64x compared to conventional systems while offering competitive accuracy for hand gesture recognition and lip reading tasks. We experimentally validate PixelRNN using a prototype implementation on the SCAMP-5 sensor-processor platform.
Probabilistic Reasoning at Scale: Trigger Graphs to the Rescue
Authors: Efthymia Tsamoura, Jaehun Lee, Jacopo Urbani
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2304.05459
Pdf link: https://arxiv.org/pdf/2304.05459
Abstract The role of uncertainty in data management has become more prominent than ever before, especially because of the growing importance of machine learning-driven applications that produce large uncertain databases. A well-known approach to querying such databases is to blend rule-based reasoning with uncertainty. However, techniques proposed so far struggle with large databases. In this paper, we address this problem by presenting a new technique for probabilistic reasoning that exploits Trigger Graphs (TGs) -- a notion recently introduced for the non-probabilistic setting. The intuition is that TGs can effectively store a probabilistic model by avoiding an explicit materialization of the lineage and by grouping together similar derivations of the same fact. Firstly, we show how TGs can be adapted to support the possible world semantics. Then, we describe techniques for efficiently computing a probabilistic model, and formally establish the correctness of our approach. We also present an extensive empirical evaluation using a prototype called LTGs. Our comparison against other leading engines shows that LTGs is not only faster, even against approximate reasoning techniques, but can also reason over probabilistic databases that existing engines cannot scale to.
An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices
Authors: Shifan Zhao, Tianshi Xu, Edmond Chow, Yuanzhe Xi
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05460
Pdf link: https://arxiv.org/pdf/2304.05460
Abstract The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nystr\"om (AFN) preconditioner. The preconditioner is designed for the case where the rank k of the Nystr\"om approximation is large, i.e., for kernel function parameters that lead to kernel matrices with eigenvalues that decay slowly. AFN deliberately chooses a well-conditioned submatrix to solve with and corrects a Nystr\"om approximation with a factorized sparse approximate matrix inverse. This makes AFN efficient for kernel matrices with large numerical ranks. AFN also adaptively chooses the size of this submatrix to balance accuracy and cost.
CamDiff: Camouflage Image Augmentation via Diffusion Model
Authors: Xue-Jing Luo, Shuo Wang, Zongwei Wu, Christos Sakaridis, Yun Cheng, Deng-Ping Fan, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05469
Pdf link: https://arxiv.org/pdf/2304.05469
Abstract The burgeoning field of camouflaged object detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent models, we have identified a limitation in their robustness, where existing methods may misclassify salient objects as camouflaged ones, despite these two characteristics being contradictory. This limitation may stem from lacking multi-pattern training images, leading to less saliency robustness. To address this issue, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC) that overcomes the scarcity of multi-pattern training images. Specifically, we leverage the latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure the synthesized object aligns with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflage samples with richer characteristics. The results of user studies show that the salient objects in the scenes synthesized by our framework attract the user's attention more; thus, such samples pose a greater challenge to the existing COD models. Our approach enables flexible editing and efficient large-scale dataset generation at a low cost. It significantly enhances COD baselines' training and testing phases, emphasizing robustness across diverse domains. Our newly-generated datasets and source code are available at https://github.com/drlxj/CamDiff.
Contingency Games for Multi-Agent Interaction
Authors: Lasse Peters, Andrea Bajcsy, Chih-Yuan Chiu, David Fridovich-Keil, Forrest Laine, Laura Ferranti, Javier Alonso-Mora
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05483
Pdf link: https://arxiv.org/pdf/2304.05483
Abstract Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work, we take a game-theoretic perspective on contingency planning which is tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contingency game allows the robot to efficiently coordinate with other agents by generating strategic motion plans conditioned on multiple possible intents for other actors in the scene. Contingency games are parameterized via a scalar variable which represents a future time at which intent uncertainty will be resolved. Varying this parameter enables a designer to easily adjust how conservatively the robot behaves in the game. Interestingly, we also find that existing variants of game-theoretic planning under uncertainty are readily obtained as special cases of contingency games. Lastly, we offer an efficient method for solving N-player contingency games with nonlinear dynamics and non-convex costs and constraints. Through a series of simulated autonomous driving scenarios, we demonstrate that plans generated via contingency games provide quantitative performance gains over game-theoretic motion plans that do not account for future uncertainty reduction.
Communication Efficient DNN Partitioning-based Federated Learning
Authors: Di Wu, Rehmat Ullah, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.05495
Pdf link: https://arxiv.org/pdf/2304.05495
Abstract Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to an edge server. However, this creates significant communication overheads since the activation and gradient need to be transferred between the device and the edge server during training. Current techniques reduce the communication introduced by DNN partitioning using local loss-based methods. We demonstrate that these methods adversely impact accuracy and ignore the communication costs incurred when transmitting the activation from the device to the server. This paper proposes ActionFed - a communication efficient framework for DPFL to accelerate training on resource-constrained devices. ActionFed eliminates the transmission of the gradient by developing pre-trained initialization of the DNN model on the device for the first time. This reduces the accuracy degradation seen in local loss-based methods. In addition, ActionFed proposes a novel replay buffer mechanism and implements a quantization-based compression technique to reduce the transmission of the activation. It is experimentally demonstrated that ActionFed can reduce the communication cost by up to 15.77x and accelerates training by up to 3.87x when compared to vanilla DPFL.
Revisiting Single-gated Mixtures of Experts
Authors: Amelie Royer, Ilia Karmanov, Andrii Skliar, Babak Ehteshami Bejnordi, Tijmen Blankevoort
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05497
Pdf link: https://arxiv.org/pdf/2304.05497
Abstract Mixture of Experts (MoE) are rising in popularity as a means to train extremely large-scale models, yet allowing for a reasonable computational cost at inference time. Recent state-of-the-art approaches usually assume a large number of experts, and require training all experts jointly, which often lead to training instabilities such as the router collapsing In contrast, in this work, we propose to revisit the simple single-gate MoE, which allows for more practical training. Key to our work are (i) a base model branch acting both as an early-exit and an ensembling regularization scheme, (ii) a simple and efficient asynchronous training pipeline without router collapse issues, and finally (iii) a per-sample clustering-based initialization. We show experimentally that the proposed model obtains efficiency-to-accuracy trade-offs comparable with other more complex MoE, and outperforms non-mixture baselines. This showcases the merits of even a simple single-gate MoE, and motivates further exploration in this area.
GraphGANFed: A Federated Generative Framework for Graph-Structured Molecules Towards Efficient Drug Discovery
Authors: Daniel Manu, Jingjing Yao, Wuji Liu, Xiang Sun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05498
Pdf link: https://arxiv.org/pdf/2304.05498
Abstract Recent advances in deep learning have accelerated its use in various applications, such as cellular image analysis and molecular discovery. In molecular discovery, a generative adversarial network (GAN), which comprises a discriminator to distinguish generated molecules from existing molecules and a generator to generate new molecules, is one of the premier technologies due to its ability to learn from a large molecular data set efficiently and generate novel molecules that preserve similar properties. However, different pharmaceutical companies may be unwilling or unable to share their local data sets due to the geo-distributed and sensitive nature of molecular data sets, making it impossible to train GANs in a centralized manner. In this paper, we propose a Graph convolutional network in Generative Adversarial Networks via Federated learning (GraphGANFed) framework, which integrates graph convolutional neural Network (GCN), GAN, and federated learning (FL) as a whole system to generate novel molecules without sharing local data sets. In GraphGANFed, the discriminator is implemented as a GCN to better capture features from molecules represented as molecular graphs, and FL is used to train both the discriminator and generator in a distributive manner to preserve data privacy. Extensive simulations are conducted based on the three bench-mark data sets to demonstrate the feasibility and effectiveness of GraphGANFed. The molecules generated by GraphGANFed can achieve high novelty (=100) and diversity (> 0.9). The simulation results also indicate that 1) a lower complexity discriminator model can better avoid mode collapse for a smaller data set, 2) there is a tradeoff among different evaluation metrics, and 3) having the right dropout ratio of the generator and discriminator can avoid mode collapse.
L3MVN: Leveraging Large Language Models for Visual Target Navigation
Authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05501
Pdf link: https://arxiv.org/pdf/2304.05501
Abstract Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analysis demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analysis also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.
Frontier Semantic Exploration for Visual Target Navigation
Authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05506
Pdf link: https://arxiv.org/pdf/2304.05506
Abstract This work focuses on the problem of visual target navigation, which is very important for autonomous robots as it is closely related to high-level tasks. To find a special object in unknown environments, classical and learning-based approaches are fundamental components of navigation that have been investigated thoroughly in the past. However, due to the difficulty in the representation of complicated scenes and the learning of the navigation policy, previous methods are still not adequate, especially for large unknown scenes. Hence, we propose a novel framework for visual target navigation using the frontier semantic policy. In this proposed framework, the semantic map and the frontier map are built from the current observation of the environment. Using the features of the maps and object category, deep reinforcement learning enables to learn a frontier semantic policy which can be used to select a frontier cell as a long-term goal to explore the environment efficiently. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and efficiency. Ablation analysis also indicates that the proposed approach learns a more efficient exploration policy based on the frontiers. A demonstration is provided to verify the applicability of applying our model to real-world transfer. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/fsevn.
Training Large Language Models Efficiently with Sparsity and Dataflow
Authors: Venkat Srinivasan, Darshan Gandhi, Urmish Thakker, Raghu Prabhakar
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05511
Pdf link: https://arxiv.org/pdf/2304.05511
Abstract Large foundation language models have shown their versatility in being able to be adapted to perform a wide variety of downstream tasks, such as text generation, sentiment analysis, semantic search etc. However, training such large foundational models is a non-trivial exercise that requires a significant amount of compute power and expertise from machine learning and systems experts. As models get larger, these demands are only increasing. Sparsity is a promising technique to relieve the compute requirements for training. However, sparsity introduces new challenges in training the sparse model to the same quality as the dense counterparts. Furthermore, sparsity drops the operation intensity and introduces irregular memory access patterns that makes it challenging to efficiently utilize compute resources. This paper demonstrates an end-to-end training flow on a large language model - 13 billion GPT - using sparsity and dataflow. The dataflow execution model and architecture enables efficient on-chip irregular memory accesses as well as native kernel fusion and pipelined parallelism that helps recover device utilization. We show that we can successfully train GPT 13B to the same quality as the dense GPT 13B model, while achieving an end-end speedup of 4.5x over dense A100 baseline.
State estimation of a carbon capture process through POD model reduction and neural network approximation
Authors: Siyu Liu, Xunyuan Yin, Jinfeng Liu (University of Alberta)
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2304.05514
Pdf link: https://arxiv.org/pdf/2304.05514
Abstract This paper presents an efficient approach for state estimation of post-combustion CO2 capture plants (PCCPs) by using reduced-order neural network models. The method involves extracting lower-dimensional feature vectors from high-dimensional operational data of the PCCP and constructing a reduced-order process model using proper orthogonal decomposition (POD). Multi-layer perceptron (MLP) neural networks capture the dominant dynamics of the process and train the network parameters with low-dimensional data obtained from open-loop simulations. The proposed POD-MLP model can be used as the basis for estimating the states of PCCPs at a significantly decreased computational cost. For state estimation, a reduced-order extended Kalman filtering (EKF) scheme based on the POD-MLP model is developed. Our simulations demonstrate that the proposed POD-MLP modeling approach reduces computational complexity compared to the POD-only model for nonlinear systems. Additionally, the POD-MLP-EKF algorithm can accurately reconstruct the full state information of PCCPs while significantly improving computational efficiency compared to the EKF based on the original PCCP model.
MoMo: A shared encoder Model for text, image and multi-Modal representations
Authors: Rakesh Chada, Zhaoheng Zheng, Pradeep Natarajan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.05523
Pdf link: https://arxiv.org/pdf/2304.05523
Abstract We propose a self-supervised shared encoder model that achieves strong results on several visual, language and multimodal benchmarks while being data, memory and run-time efficient. We make three key contributions. First, in contrast to most existing works, we use a single transformer with all the encoder layers processing both the text and the image modalities. Second, we propose a stage-wise training strategy where the model is first trained on images, then jointly with unimodal text and image datasets and finally jointly with text and text-image datasets. Third, to preserve information across both the modalities, we propose a training pipeline that learns simultaneously from gradient updates of different modalities at each training update step. The results on downstream text-only, image-only and multimodal tasks show that our model is competitive with several strong models while using fewer parameters and lesser pre-training data. For example, MoMo performs competitively with FLAVA on multimodal (+3.1), image-only (+1.1) and text-only (-0.1) tasks despite having 2/5th the number of parameters and using 1/3rd the image-text training pairs. Finally, we ablate various design choices and further show that increasing model size produces significant performance gains indicating potential for substantial improvements with larger models using our approach.
Understanding Causality with Large Language Models: Feasibility and Opportunities
Authors: Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.05524
Pdf link: https://arxiv.org/pdf/2304.05524
Abstract We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decision-making tasks with high precision. We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules as well as deep causal-aware LLMs. These will not only enable LLMs to answer many different types of causal questions for greater impact but also enable LLMs to be more trustworthy and efficient in general.
Encrypted Price-based Market Mechanism for Optimal Load Frequency Control
Authors: Jihoon Suh, Takashi Tanaka
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05525
Pdf link: https://arxiv.org/pdf/2304.05525
Abstract The global trend of energy deregulation has led to the market mechanism replacing some functionality of load frequency control (LFC). Accordingly, information exchange among participating generators and the market operator plays a crucial role in optimizing social utility. However, privacy has been an equally pressing concern in such settings. This conflict between individuals' privacy and social utility has been a long-standing challenge in market mechanism literature as well as in Cyber-Physical Systems (CPSs). In this paper, we propose a novel encrypted market architecture that leverages a hybrid encryption method and two-party computation protocols, enabling the secure synthesis and implementation of an optimal price-based market mechanism. This work spotlights the importance of secure and efficient outsourcing of controller synthesis, which is a critical element within the proposed framework. A two-area LFC model is used to conduct a case study.
Group projected Subspace Pursuit for Identification of variable coefficient differential equations (GP-IDENT)
Authors: Yuchen He, Sung-Ha Kang, Wenjing Liao, Hao Liu, Yingjie Liu
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05543
Pdf link: https://arxiv.org/pdf/2304.05543
Abstract We propose an effective and robust algorithm for identifying partial differential equations (PDEs) with space-time varying coefficients from a single trajectory of noisy observations. Identifying unknown differential equations from noisy observations is a difficult task, and it is even more challenging with space and time varying coefficients in the PDE. The proposed algorithm, GP-IDENT, has three ingredients: (i) we use B-spline bases to express the unknown space and time varying coefficients, (ii) we propose Group Projected Subspace Pursuit (GPSP) to find a sequence of candidate PDEs with various levels of complexity, and (iii) we propose a new criterion for model selection using the Reduction in Residual (RR) to choose an optimal one among the pool of candidates. The new GPSP considers group projected subspaces which is more robust than existing methods in distinguishing correlated group features. We test GP-IDENT on a variety of PDEs and PDE systems, and compare it with the state-of-the-art parametric PDE identification algorithms under different settings to illustrate its outstanding performance. Our experiments show that GP-IDENT is effective in identifying the correct terms from a large dictionary and the model selection scheme is robust to noise.
MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers
Authors: Andrew Sabot, Vikas Natesh, H.T. Kung, Wei-Te Ting
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Performance (cs.PF); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2304.05544
Pdf link: https://arxiv.org/pdf/2304.05544
Abstract We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in the current practice, that is, optimal schedules tend to be found only through a time consuming and heuristic search of a large scheduling space. We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems. For example, for neural network benchmarks on the ARM Cortex-M4, we achieve up to a 1.8x speedup and 44% energy reduction over CMSIS-NN.
A Predictive Model using Machine Learning Algorithm in Identifying Students Probability on Passing Semestral Course
Authors: Anabella C. Doctor
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.05565
Pdf link: https://arxiv.org/pdf/2304.05565
Abstract This study aims to determine a predictive model to learn students probability to pass their courses taken at the earliest stage of the semester. To successfully discover a good predictive model with high acceptability, accurate, and precision rate which delivers a useful outcome for decision making in education systems, in improving the processes of conveying knowledge and uplifting students academic performance, the proponent applies and strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. This study employs classification for data mining techniques, and decision tree for algorithm. With the utilization of the newly discovered predictive model, the prediction of students probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score, which shows that the model used in the prediction is reliable, accurate, and recommendable. Considering the indicators and the results, it can be noted that the prediction model used in this study is highly acceptable. The data mining techniques provides effective and efficient innovative tools in analyzing and predicting student performances. The model used in this study will greatly affect the way educators understand and identify the weakness of their students in the class, the way they improved the effectiveness of their learning processes gearing to their students, bring down academic failure rates, and help institution administrators modify their learning system outcomes. Further study for the inclusion of some students demographic information, vast amount of data within the dataset, automated and manual process of predictive criteria indicators where the students can regulate to which criteria, they must improve more for them to pass their courses taken at the end of the semester as early as midterm period are highly needed.
Distributed Compressed Sparse Row Format for Spiking Neural Network Simulation, Serialization, and Interoperability
Authors: Felix Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2304.05587
Pdf link: https://arxiv.org/pdf/2304.05587
Abstract With the increasing development of neuromorphic platforms and their related software tools as well as the increasing scale of spiking neural network (SNN) models, there is a pressure for interoperable and scalable representations of network state. In response to this, we discuss a parallel extension of a widely used format for efficiently representing sparse matrices, the compressed sparse row (CSR), in the context of supporting the simulation and serialization of large-scale SNNs. Sparse matrices for graph adjacency structure provide a natural fit for describing the connectivity of an SNN, and prior work in the area of parallel graph partitioning has developed the distributed CSR (dCSR) format for storing and ingesting large graphs. We contend that organizing additional network information, such as neuron and synapse state, in alignment with its adjacency as dCSR provides a straightforward partition-based distribution of network state. For large-scale simulations, this means each parallel process is only responsible for its own partition of state, which becomes especially useful when the size of an SNN exceeds the memory resources of a single compute node. For potentially long-running simulations, this also enables network serialization to and from disk (e.g. for checkpoint/restart fault-tolerant computing) to be performed largely independently between parallel processes. We also provide a potential implementation, and put it forward for adoption within the neural computing community.
Zero-Knowledge Proof-based Practical Federated Learning on Blockchain
Authors: Zhibo Xing, Zijian Zhang, Meng Li, Jiamou Liu, Liehuang Zhu, Giovanni Russello, Muhammad Rizwan Asghar
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.05590
Pdf link: https://arxiv.org/pdf/2304.05590
Abstract Since the concern of privacy leakage extremely discourages user participation in sharing data, federated learning has gradually become a promising technique for both academia and industry for achieving collaborative learning without leaking information about the local data. Unfortunately, most federated learning solutions cannot efficiently verify the execution of each participant's local machine learning model and protect the privacy of user data, simultaneously. In this article, we first propose a Zero-Knowledge Proof-based Federated Learning (ZKP-FL) scheme on blockchain. It leverages zero-knowledge proof for both the computation of local data and the aggregation of local model parameters, aiming to verify the computation process without requiring the plaintext of the local data. We further propose a Practical ZKP-FL (PZKP-FL) scheme to support fraction and non-linear operations. Specifically, we explore a Fraction-Integer mapping function, and use Taylor expansion to efficiently handle non-linear operations while maintaining the accuracy of the federated learning model. We also analyze the security of PZKP-FL. Performance analysis demonstrates that the whole running time of the PZKP-FL scheme is approximately less than one minute in parallel execution.
Vehicle Trajectory Prediction based Predictive Collision Risk Assessment for Autonomous Driving in Highway Scenarios
Authors: Dejian Meng, Wei Xiao, Lijun Zhang, Zhuang Zhang, Zihao Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05610
Pdf link: https://arxiv.org/pdf/2304.05610
Abstract For driving safely and efficiently in highway scenarios, autonomous vehicles (AVs) must be able to predict future behaviors of surrounding object vehicles (OVs), and assess collision risk accurately for reasonable decision-making. Aiming at autonomous driving in highway scenarios, a predictive collision risk assessment method based on trajectory prediction of OVs is proposed in this paper. Firstly, the vehicle trajectory prediction is formulated as a sequence generation task with long short-term memory (LSTM) encoder-decoder framework. Convolutional social pooling (CSP) and graph attention network (GAN) are adopted for extracting local spatial vehicle interactions and distant spatial vehicle interactions, respectively. Then, two basic risk metrics, time-to-collision (TTC) and minimal distance margin (MDM), are calculated between the predicted trajectory of OV and the candidate trajectory of AV. Consequently, a time-continuous risk function is constructed with temporal and spatial risk metrics. Finally, the vehicle trajectory prediction model CSP-GAN-LSTM is evaluated on two public highway datasets. The quantitative results indicate that the proposed CSP-GAN-LSTM model outperforms the existing state-of-the-art (SOTA) methods in terms of position prediction accuracy. Besides, simulation results in typical highway scenarios further validate the feasibility and effectiveness of the proposed predictive collision risk assessment method.
NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake Estimation
Authors: Chi-en Amy Tai, Matthew Keller, Mattie Kerrigan, Yuhao Chen, Saeejith Nair, Pengcheng Xi, Alexander Wong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05619
Pdf link: https://arxiv.org/pdf/2304.05619
Abstract 77% of adults over 50 want to age in place today, presenting a major challenge to ensuring adequate nutritional intake. It has been reported that one in four older adults that are 65 years or older are malnourished and given the direct link between malnutrition and decreased quality of life, there have been numerous studies conducted on how to efficiently track nutritional intake of food. Recent advancements in machine learning and computer vision show promise of automated nutrition tracking methods of food, but require a large high-quality dataset in order to accurately identify the nutrients from the food on the plate. Unlike existing datasets, a collection of 3D models with nutritional information allow for view synthesis to create an infinite number of 2D images for any given viewpoint/camera angle along with the associated nutritional information. In this paper, we develop a methodology for collecting high-quality 3D models for food items with a particular focus on speed and consistency, and introduce NutritionVerse-3D, a large-scale high-quality high-resolution dataset of 105 3D food models, in conjunction with their associated weight, food name, and nutritional value. These models allow for large quantity food intake scenes, diverse and customizable scene layout, and an infinite number of camera settings and lighting conditions. NutritionVerse-3D is publicly available as a part of an open initiative to accelerate machine learning for nutrition sensing.
Constructing Deep Spiking Neural Networks from Artificial Neural Networks with Knowledge Distillation
Authors: Qi Xu, Yaxin Li, Jiangrong Shen, Jian K Liu, Huajin Tang, Gang Pan
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05627
Pdf link: https://arxiv.org/pdf/2304.05627
Abstract Spiking neural networks (SNNs) are well known as the brain-inspired models with high computing efficiency, due to a key component that they utilize spikes as information units, close to the biological neural systems. Although spiking based models are energy efficient by taking advantage of discrete spike signals, their performance is limited by current network structures and their training methods. As discrete signals, typical SNNs cannot apply the gradient descent rules directly into parameters adjustment as artificial neural networks (ANNs). Aiming at this limitation, here we propose a novel method of constructing deep SNN models with knowledge distillation (KD) that uses ANN as teacher model and SNN as student model. Through ANN-SNN joint training algorithm, the student SNN model can learn rich feature information from the teacher ANN model through the KD method, yet it avoids training SNN from scratch when communicating with non-differentiable spikes. Our method can not only build a more efficient deep spiking structure feasibly and reasonably, but use few time steps to train whole model compared to direct training or ANN to SNN methods. More importantly, it has a superb ability of noise immunity for various types of artificial noises and natural signals. The proposed novel method provides efficient ways to improve the performance of SNN through constructing deeper structures in a high-throughput fashion, with potential usage for light and efficient brain-inspired computing of practical scenarios.
DOSM: Demand-Prediction based Online Service Management for Vehicular Edge Computing Networks
Authors: Anum Talpur, Mohan Gurusamy
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.05637
Pdf link: https://arxiv.org/pdf/2304.05637
Abstract In this work, we investigate an online service management problem in vehicular edge computing networks. To satisfy the varying service demands of mobile vehicles, a service management framework is required to make decisions on the service lifecycle to maintain good network performance. We describe the service lifecycle consists of creating an instance of a given service (\textit{scale-out}), moving an instance to a different edge node (\textit{migration}), and/or termination of an underutilized instance (\textit{scale-in}). In this paper, we propose an efficient online algorithm to perform service management in each time slot, where performance quality in the current time slot, the service demand in future time slots, and the minimal observed delay by vehicles and the minimal migration delay are considered while making the decisions on service lifecycle. Here, the future service demand is computed from a gated recurrent unit (GRU)-based prediction model, and the network performance quality is estimated using a deep reinforcement learning (DRL) model which has the ability to interact with the vehicular environment in real-time. The choice of optimal edge location to deploy a service instance at different times is based on our proposed optimization formulations. Simulation experiments using real-world vehicle trajectories are carried out to evaluate the performance of our proposed demand-prediction based online service management (DOSM) framework against different state-of-the-art solutions using several performance metrics.
An Optimal SVC Bitstream Schema for Viewport-dependent 360-degree Video Streaming
Authors: Gang Shen, Mingyang Ma, Guangxin Xu
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2304.05654
Pdf link: https://arxiv.org/pdf/2304.05654
Abstract To deliver ultra-high resolution 360-degree video (such as 8K, 12K, or even higher) across the internet, viewport-dependent streaming becomes necessary to save bandwidth. During viewport switches, clients and servers will instantly exchange coordination info and contents for the given viewports. However, those viewport switches pose a serious challenge for video encoding because the temporal dependency between contents within changing viewports is unpredictable. In existing practices, it is commonly noted that GOP (Group of Pictures) size in a bitstream intrinsically prohibits the reduction of the viewport switch latency, such as Motion-to-photon (MTP) latency, or motion-to-high-quality (MTHQ) latency. In this paper, we presented a Scalable Video Coding (SVC) based bitstream schema, which can structurally remove the impacts of GOP in viewport-dependent streaming and provide instant viewport switches within one-frame time (the best possible). In addition, combined with tiling, this new coding schema allows an efficient packing of the non-adjacent regions within a viewport of 360-degree video. Our experiments also show that the overall encoding with this SVC-based approach is faster than with multi-stream approaches. Compared with current 360-degree video streaming solutions based on MPEG-I OMAF, our approach is superior in terms of viewport switch latency, simplicity of viewport packing, and encoding performance.
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
Authors: Jiahao Wang, Songyang Zhang, Yong Liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05659
Pdf link: https://arxiv.org/pdf/2304.05659
Abstract This paper studies how to keep a vision backbone effective while removing token mixers in its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are intended to perform information communication between different spatial tokens but suffer from considerable computational cost and latency. However, directly removing them will lead to an incomplete model structure prior, and thus brings a significant accuracy drop. To this end, we first develop an RepIdentityFormer base on the re-parameterizing idea, to study the token mixer free model architecture. And we then explore the improved learning paradigm to break the limitation of simple token mixer free backbone, and summarize the empirical practice into 5 guidelines. Equipped with the proposed optimization strategy, we are able to build an extremely simple vision backbone with encouraging performance, while enjoying the high efficiency during inference. Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy. We hope this work can serve as a starting point for the exploration of optimization-driven efficient network design. Project page: https://techmonsterwang.github.io/RIFormer/.
A parallel rank-adaptive integrator for dynamical low-rank approximation
Authors: Gianluca Ceruti, Jonas Kusch, Christian Lubich
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05660
Pdf link: https://arxiv.org/pdf/2304.05660
Abstract This work introduces a parallel and rank-adaptive matrix integrator for dynamical low-rank approximation. The method is related to the previously proposed rank-adaptive basis update & Galerkin (BUG) integrator but differs significantly in that all arising differential equations, both for the basis and the Galerkin coefficients, are solved in parallel. Moreover, this approach eliminates the need for a potentially costly coefficient update with augmented basis matrices. The integrator also incorporates a new step rejection strategy that enhances the robustness of both the parallel integrator and the BUG integrator. By construction, the parallel integrator inherits the robust error bound of the BUG and projector-splitting integrators. Comparisons of the parallel and BUG integrators are presented by a series of numerical experiments which demonstrate the efficiency of the proposed method, for problems from radiative transfer and radiation therapy.
SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks
Authors: Haojia Yu, Han Hu, Bo Xu, Qisen Shang, Zhendong Wang, Qing Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05661
Pdf link: https://arxiv.org/pdf/2304.05661
Abstract Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images. This need contrasts with the majority of existing methods, which typically generate over-smoothed footprint polygons. Editing these automatically produced polygons can be inefficient, if not more time-consuming than manual digitization. This paper introduces a semi-automatic approach for building footprint extraction through semantically-sensitive superpixels and neural graph networks. Drawing inspiration from object-based classification techniques, we first learn to generate superpixels that are not only boundary-preserving but also semantically-sensitive. The superpixels respond exclusively to building boundaries rather than other natural objects, while simultaneously producing semantic segmentation of the buildings. These intermediate superpixel representations can be naturally considered as nodes within a graph. Consequently, graph neural networks are employed to model the global interactions among all superpixels and enhance the representativeness of node features for building segmentation. Classical approaches are utilized to extract and regularize boundaries for the vectorized building footprints. Utilizing minimal clicks and straightforward strokes, we efficiently accomplish accurate segmentation outcomes, eliminating the necessity for editing polygon vertices. Our proposed approach demonstrates superior precision and efficacy, as validated by experimental assessments on various public benchmark datasets. We observe a 10\% enhancement in the metric for superpixel clustering and an 8\% increment in vector graphics evaluation, when compared with established techniques. Additionally, we have devised an optimized and sophisticated pipeline for interactive editing, poised to further augment the overall quality of the results.
Rail Detection: An Efficient Row-based Network and A New Benchmark
Authors: Xinpeng Li, Xiaojiang Peng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05667
Pdf link: https://arxiv.org/pdf/2304.05667
Abstract Rail detection, essential for railroad anomaly detection, aims to identify the railroad region in video frames. Although various studies on rail detection exist, neither an open benchmark nor a high-speed network is available in the community, making algorithm comparison and development difficult. Inspired by the growth of lane detection, we propose a rail database and a row-based rail detection method. In detail, we make several contributions: (i) We present a real-world railway dataset, Rail-DB, with 7432 pairs of images and annotations. The images are collected from different situations in lighting, road structures, and views. The rails are labeled with polylines, and the images are categorized into nine scenes. The Rail-DB is expected to facilitate the improvement of rail detection algorithms. (ii) We present an efficient row-based rail detection method, Rail-Net, containing a lightweight convolutional backbone and an anchor classifier. Specifically, we formulate the process of rail detection as a row-based selecting problem. This strategy reduces the computational cost compared to alternative segmentation methods. (iii) We evaluate the Rail-Net on Rail-DB with extensive experiments, including cross-scene settings and network backbones ranging from ResNet to Vision Transformers. Our method achieves promising performance in terms of both speed and accuracy. Notably, a lightweight version could achieve 92.77% accuracy and 312 frames per second. The Rail-Net outperforms the traditional method by 50.65% and the segmentation one by 5.86%. The database and code are available at: https://github.com/Sampson-Lee/Rail-Detection.
Real-time Trajectory-based Social Group Detection
Authors: Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05678
Pdf link: https://arxiv.org/pdf/2304.05678
Abstract Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content-based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDBAct dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications.
Fully Conservative Difference Schemes for the Rotation-Two-Component Camassa-Holm System with Smooth/Nonsmooth Initial Data
Authors: Tong Yan, Jiwei Zhang, Qifeng Zhang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05679
Pdf link: https://arxiv.org/pdf/2304.05679
Abstract The rotation-two-component Camassa--Holm system, which possesses strongly nonlinear coupled terms and high-order differential terms, tends to have continuous nonsmooth solitary wave solutions, such as peakons, stumpons, composite waves and even chaotic waves. In this paper an accurate semi-discrete conservative difference scheme for the system is derived by taking advantage of its Hamiltonian invariants. We show that the semi-discrete numerical scheme preserves at least three discrete conservative laws: mass, momentum and energy. Furthermore, a fully discrete finite difference scheme is proposed without destroying anyone of the conservative laws. Combining a nonlinear iteration process and an efficient threshold strategy, the accuracy of the numerical scheme can be guaranteed. Meanwhile, the difference scheme can capture the formation and propagation of solitary wave solutions with satisfying long time behavior under the smooth/nonsmooth initial data. The numerical results reveal a new type of asymmetric wave breaking phenomenon under the nonzero rotational parameter.
Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives
Authors: Jayden Hong, Zengjie Zhang, Amir M. Soufi Enayati, Homayoun Najjaran
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05703
Pdf link: https://arxiv.org/pdf/2304.05703
Abstract Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning.
Stochastic Domain Decomposition Based on Variable-Separation Method
Authors: Liang Chen, Yaru Chen, Qiuqi Li
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05708
Pdf link: https://arxiv.org/pdf/2304.05708
Abstract Uncertainty propagation across different domains is of fundamental importance in stochastic simulations. In this work, we develop a novel stochastic domain decomposition method for steady-state partial differential equations (PDEs) with random inputs. The Variable-separation (VS) method is one of the most accurate and efficient approaches to solving the stochastic partial differential equation (SPDE). We extend the VS method to stochastic algebraic systems, and then integrate its essence with the deterministic domain decomposition method (DDM). It leads to the stochastic domain decomposition based on the Variable-separation method (SDD-VS) that we investigate in this paper. A significant merit of the proposed SDD-VS method is that it is competent to alleviate the "curse of dimensionality", thanks to the explicit representation of stochastic functions deduced by physical systems. The SDD-VS method aims to get a separated representation of the solution to the stochastic interface problem. To this end, an offline-online computational decomposition is introduced to improve efficiency. The main challenge in the offline phase is to obtain the affine representation of stochastic algebraic systems, which is crucial to the SDD-VS method. This is accomplished through the successive and flexible applications of the VS method. In the online phase, the interface unknowns of SPDEs are estimated using the quasi-optimal separated representation, making it easier to construct efficient surrogate models of subproblems. At last, three concrete examples are presented to illustrate the effectiveness of the proposed method.
Dynamic Graph Representation Learning with Neural Networks: A Survey
Authors: Leshanshui Yang, Sébastien Adam, Clément Chatelain
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05729
Pdf link: https://arxiv.org/pdf/2304.05729
Abstract In recent years, Dynamic Graph (DG) representations have been increasingly used for modeling dynamic systems due to their ability to integrate both topological and temporal information in a compact representation. Dynamic graphs allow to efficiently handle applications such as social network prediction, recommender systems, traffic forecasting or electroencephalography analysis, that can not be adressed using standard numeric representations. As a direct consequence of the emergence of dynamic graph representations, dynamic graph learning has emerged as a new machine learning problem, combining challenges from both sequential/temporal data processing and static graph learning. In this research area, Dynamic Graph Neural Network (DGNN) has became the state of the art approach and plethora of models have been proposed in the very recent years. This paper aims at providing a review of problems and models related to dynamic graph learning. The various dynamic graph supervised learning settings are analysed and discussed. We identify the similarities and differences between existing models with respect to the way time information is modeled. Finally, general guidelines for a DGNN designer when faced with a dynamic graph learning problem are provided.
A Novel Hybrid Post-Weighting Digital Predistortion in mMIMO Under Crosstalk
Authors: Ganesh Prasad, Håkan Johansson
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.05795
Pdf link: https://arxiv.org/pdf/2304.05795
Abstract In a hybrid beamforming, a single digital predistortion (DPD) is inefficient to address all the nonlinearities over a subarray of power amplifiers (PAs) with underlying crosstalk in a massive multiple-input multiple-output (mMIMO) transmitter. In this context, the proposed work describes a novel hybrid post-weighting (PW) scheme. Here, it extends the competence of one trained DPD to all PAs exclusively via following PW block associated with optimal coefficients along the basis functions of the DPD. Consequently, it reduces the nonlinear radiation significantly in a wide range of azimuth directions to the transmitter.
Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series
Authors: Matthieu Herrmann, Chang Wei Tan, Mahsa Salehi, Geoffrey I. Webb
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05800
Pdf link: https://arxiv.org/pdf/2304.05800
Abstract Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient.
EgoDist: Comparing networks via distributions of egonet features
Authors: Carlo Piccardi
Subjects: Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2304.05801
Pdf link: https://arxiv.org/pdf/2304.05801
Abstract Identifying networks with similar characteristics in a given ensemble, or detecting pattern discontinuities in a temporal sequence of networks, are two examples of tasks that require an effective metric capable of quantifying network (dis)similarity. Here we propose a method based on a global portrait of graph properties built by processing local nodes features. More precisely, a set of dissimilarity measures is defined by elaborating the distributions, over the network, of a few egonet features, namely the degree, the clustering coefficient, and the egonet persistence. The method, which does not require the alignment of the two networks being compared, exploits the statistics of the three features to define one- or multi-dimensional distribution functions, which are then compared to define a distance between the networks. The effectiveness of the method is evaluated using a standard classification test, i.e., recognizing the graphs originating from the same synthetic model. Overall, the proposed distances have performances comparable to the best state-of-the-art techniques (graphlet-based methods) with similar computational requirements. Given its simplicity and flexibility, the method is proposed as a viable approach for network comparison tasks.
DUFormer: A Novel Architecture for Power Line Segmentation of Aerial Images
Authors: Deyu An, Qiang Zhang, Jianshu Chao, Ting Li, Feng Qiao, Yong Deng, Zhenpeng Bian, Jia Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05821
Pdf link: https://arxiv.org/pdf/2304.05821
Abstract Power lines pose a significant safety threat to unmanned aerial vehicles (UAVs) operating at low altitudes. However, detecting power lines in aerial images is challenging due to the small size of the foreground data (i.e., power lines) and the abundance of background information. To address this challenge, we propose DUFormer, a semantic segmentation algorithm designed specifically for power line detection in aerial images. We assume that performing sufficient feature extraction with a convolutional neural network (CNN) that has a strong inductive bias is beneficial for training an efficient Transformer model. To this end, we propose a heavy token encoder responsible for overlapping feature re-mining and tokenization. The encoder comprises a pyramid CNN feature extraction module and a power line feature enhancement module. Following sufficient feature extraction for power lines, the feature fusion is carried out, and then the Transformer block is used for global modeling. The final segmentation result is obtained by fusing local and global features in the decode head. Additionally, we demonstrate the significance of the joint multi-weight loss function in power line segmentation. The experimental results demonstrate that our proposed method achieves the state-of-the-art performance in power line segmentation on the publicly available TTPLA dataset.
Data-Driven Response Regime Exploration and Identification for Dynamical Systems
Authors: Maor Farid
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2304.05822
Pdf link: https://arxiv.org/pdf/2304.05822
Abstract Data-Driven Response Regime Exploration and Identification (DR$^2$EI) is a novel and fully data-driven method for identifying and classifying response regimes of a dynamical system without requiring human intervention. This approach is a valuable tool for exploring and discovering response regimes in complex dynamical systems, especially when the governing equations and the number of response regimes are unknown, and the system is expensive to sample. Additionally, the method is useful for order reduction, as it can be used to identify the most dominant response regimes of a given dynamical system. DR$^2$EI utilizes unsupervised learning algorithms to transform the system's response into an embedding space that facilitates regime classification. An active sequential sampling approach based on Gaussian Process Regression (GPR) is used to efficiently sample the parameter space, quantify uncertainty, and provide optimal trade-offs between exploration and exploitation. The performance of the DR$^2$EI method was evaluated by analyzing three established dynamical systems: the mathematical pendulum, the Lorenz system, and the Duffing oscillator. The method was shown to effectively identify a variety of response regimes with both similar and distinct topological features and frequency content, demonstrating its versatility in capturing a wide range of behaviors. While it may not be possible to guarantee that all possible regimes will be identified, the method provides an automated and efficient means for exploring the parameter space of a dynamical system and identifying its underlying "sufficiently dominant" response regimes without prior knowledge of the system's equations or behavior.
FedTrip: A Resource-Efficient Federated Learning Method with Triplet Regularization
Authors: Xujing Li, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.05824
Pdf link: https://arxiv.org/pdf/2304.05824
Abstract In the federated learning scenario, geographically distributed clients collaboratively train a global model. Data heterogeneity among clients significantly results in inconsistent model updates, which evidently slow down model convergence. To alleviate this issue, many methods employ regularization terms to narrow the discrepancy between client-side local models and the server-side global model. However, these methods impose limitations on the ability to explore superior local models and ignore the valuable information in historical models. Besides, although the up-to-date representation method simultaneously concerns the global and historical local models, it suffers from unbearable computation cost. To accelerate convergence with low resource consumption, we innovatively propose a model regularization method named FedTrip, which is designed to restrict global-local divergence and decrease current-historical correlation for alleviating the negative effects derived from data heterogeneity. FedTrip helps the current local model to be close to the global model while keeping away from historical local models, which contributes to guaranteeing the consistency of local updates among clients and efficiently exploring superior local models with negligible additional computation cost on attaching operations. Empirically, we demonstrate the superiority of FedTrip via extensive evaluations. To achieve the target accuracy, FedTrip outperforms the state-of-the-art baselines in terms of significantly reducing the total overhead of client-server communication and local computation.
RESET: Revisiting Trajectory Sets for Conditional Behavior Prediction
Authors: Julian Schmidt, Pascal Huissel, Julian Wiederer, Julian Jordan, Vasileios Belagiannis, Klaus Dietmayer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05856
Pdf link: https://arxiv.org/pdf/2304.05856
Abstract It is desirable to predict the behavior of traffic participants conditioned on different planned trajectories of the autonomous vehicle. This allows the downstream planner to estimate the impact of its decisions. Recent approaches for conditional behavior prediction rely on a regression decoder, meaning that coordinates or polynomial coefficients are regressed. In this work we revisit set-based trajectory prediction, where the probability of each trajectory in a predefined trajectory set is determined by a classification model, and first-time employ it to the task of conditional behavior prediction. We propose RESET, which combines a new metric-driven algorithm for trajectory set generation with a graph-based encoder. For unconditional prediction, RESET achieves comparable performance to a regression-based approach. Due to the nature of set-based approaches, it has the advantageous property of being able to predict a flexible number of trajectories without influencing runtime or complexity. For conditional prediction, RESET achieves reasonable results with late fusion of the planned trajectory, which was not observed for regression-based approaches before. This means that RESET is computationally lightweight to combine with a planner that proposes multiple future plans of the autonomous vehicle, as large parts of the forward pass can be reused.
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL
Authors: Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05889
Pdf link: https://arxiv.org/pdf/2304.05889
Abstract We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the learner's own action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis leverages several new techniques tailored to non-optimistic exploration algorithms, which we anticipate will find broader use.
Node-Differentially Private Estimation of the Number of Connected Components
Authors: Iden Kalemaj, Sofya Raskhodnikova, Adam Smith, Charalampos E. Tsourakakis
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.05890
Pdf link: https://arxiv.org/pdf/2304.05890
Abstract We design the first node-differentially private algorithm for approximating the number of connected components in a graph. Given a database representing an $n$-vertex graph $G$ and a privacy parameter $\varepsilon$, our algorithm runs in polynomial time and, with probability $1-o(1)$, has additive error $\widetilde{O}(\frac{\Delta^\ln\ln n}{\varepsilon}),$ where $\Delta^$ is the smallest possible maximum degree of a spanning forest of $G.$ Node-differentially private algorithms are known only for a small number of database analysis tasks. A major obstacle for designing such an algorithm for the number of connected components is that this graph statistic is not robust to adding one node with arbitrary connections (a change that node-differential privacy is designed to hide): {\em every} graph is a neighbor of a connected graph. We overcome this by designing a family of efficiently computable Lipschitz extensions of the number of connected components or, equivalently, the size of a spanning forest. The construction of the extensions, which is at the core of our algorithm, is based on the forest polytope of $G.$ We prove several combinatorial facts about spanning forests, in particular, that a graph with no induced $\Delta$-stars has a spanning forest of degree at most $\Delta$. With this fact, we show that our Lipschitz extensions for the number of connected components equal the true value of the function for the largest possible monotone families of graphs. More generally, on all monotone sets of graphs, the $\ell_\infty$ error of our Lipschitz extensions is nearly optimal.
Localizing Model Behavior with Path Patching
Authors: Nicholas Goldowsky-Dill, Chris MacLeod, Lucas Sato, Aryaman Arora
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05969
Pdf link: https://arxiv.org/pdf/2304.05969
Abstract Localizing behaviors of neural networks to a subset of the network's components or a subset of interactions between components is a natural first step towards analyzing network mechanisms and possible failure modes. Existing work is often qualitative and ad-hoc, and there is no consensus on the appropriate way to evaluate localization claims. We introduce path patching, a technique for expressing and quantitatively testing a natural class of hypotheses expressing that behaviors are localized to a set of paths. We refine an explanation of induction heads, characterize a behavior of GPT-2, and open source a framework for efficiently running similar experiments.
HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting
Authors: Jiaying Lu, Jiaming Shen, Bo Xiong, Wenjing Ma, Steffen Staab, Carl Yang
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.05973
Pdf link: https://arxiv.org/pdf/2304.05973
Abstract Medical decision-making processes can be enhanced by comprehensive biomedical knowledge bases, which require fusing knowledge graphs constructed from different sources via a uniform index system. The index system often organizes biomedical terms in a hierarchy to provide the aligned entities with fine-grained granularity. To address the challenge of scarce supervision in the biomedical knowledge fusion (BKF) task, researchers have proposed various unsupervised methods. However, these methods heavily rely on ad-hoc lexical and structural matching algorithms, which fail to capture the rich semantics conveyed by biomedical entities and terms. Recently, neural embedding models have proved effective in semantic-rich tasks, but they rely on sufficient labeled data to be adequately trained. To bridge the gap between the scarce-labeled BKF and neural embedding models, we propose HiPrompt, a supervision-efficient knowledge fusion framework that elicits the few-shot reasoning ability of large language models through hierarchy-oriented prompts. Empirical results on the collected KG-Hi-BKF benchmark datasets demonstrate the effectiveness of HiPrompt.
GPr-Net: Geometric Prototypical Network for Point Cloud Few-Shot Learning
Authors: Tejas Anvekar, Dena Bazazian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06007
Pdf link: https://arxiv.org/pdf/2304.06007
Abstract In the realm of 3D-computer vision applications, point cloud few-shot learning plays a critical role. However, it poses an arduous challenge due to the sparsity, irregularity, and unordered nature of the data. Current methods rely on complex local geometric extraction techniques such as convolution, graph, and attention mechanisms, along with extensive data-driven pre-training tasks. These approaches contradict the fundamental goal of few-shot learning, which is to facilitate efficient learning. To address this issue, we propose GPr-Net (Geometric Prototypical Network), a lightweight and computationally efficient geometric prototypical network that captures the intrinsic topology of point clouds and achieves superior performance. Our proposed method, IGI++ (Intrinsic Geometry Interpreter++) employs vector-based hand-crafted intrinsic geometry interpreters and Laplace vectors to extract and evaluate point cloud morphology, resulting in improved representations for FSL (Few-Shot Learning). Additionally, Laplace vectors enable the extraction of valuable features from point clouds with fewer points. To tackle the distribution drift challenge in few-shot metric learning, we leverage hyperbolic space and demonstrate that our approach handles intra and inter-class variance better than existing point cloud few-shot learning methods. Experimental results on the ModelNet40 dataset show that GPr-Net outperforms state-of-the-art methods in few-shot learning on point clouds, achieving utmost computational efficiency that is $170\times$ better than all existing works. The code is publicly available at https://github.com/TejasAnvekar/GPr-Net.
An Improved Heart Disease Prediction Using Stacked Ensemble Method
Authors: Md. Maidul Islam, Tanzina Nasrin Tania, Sharmin Akter, Kazi Hassan Shakib
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06015
Pdf link: https://arxiv.org/pdf/2304.06015
Abstract Heart disorder has just overtaken cancer as the world's biggest cause of mortality. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. Medical data is collected in large quantities by the healthcare industry, but it is not well mined. The discovery of previously unknown patterns and connections in this information can help with an improved decision when it comes to forecasting heart disorder risk. In the proposed study, we constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset. We used data preprocessing techniques like outlier detection and removal, checking and removing missing entries, feature normalization, cross-validation, nine classification algorithms like RF, MLP, KNN, ETC, XGB, SVC, ADB, DT, and GBM, and eight classifier measuring performance metrics like ramification accuracy, precision, F1 score, specificity, ROC, sensitivity, log-loss, and Matthews' correlation coefficient, as well as eight classification performance evaluations. Our method can easily differentiate between people who have cardiac disease and those are normal. Receiver optimistic curves and also the region under the curves were determined by every classifier. Most of the classifiers, pretreatment strategies, validation methods, and performance assessment metrics for classification models have been discussed in this study. The performance of the proposed scheme has been confirmed, utilizing all of its capabilities. In this work, the impact of clinical decision support systems was evaluated using a stacked ensemble approach that included these nine algorithms
RECLIP: Resource-efficient CLIP by Training with Small Images
Authors: Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06028
Pdf link: https://arxiv.org/pdf/2304.06028
Abstract We present RECLIP (Resource-efficient CLIP), a simple method that minimizes computational resource footprint for CLIP (Contrastive Language Image Pretraining). Inspired by the notion of coarse-to-fine in computer vision, we leverage small images to learn from large-scale language supervision efficiently, and finetune the model with high-resolution data in the end. Since the complexity of the vision transformer heavily depends on input image size, our approach significantly reduces the training resource requirements both in theory and in practice. Using the same batch size and training epoch, RECLIP achieves highly competitive zero-shot classification and image text retrieval accuracy with 6 to 8$\times$ less computational resources and 7 to 9$\times$ fewer FLOPs than the baseline. Compared to the state-of-the-art contrastive learning methods, RECLIP demonstrates 5 to 59$\times$ training resource savings while maintaining highly competitive zero-shot classification and retrieval performance. We hope this work will pave the path for the broader research community to explore language supervised pretraining in more resource-friendly settings.
Keyword: faster

Efficient Automation of Neural Network Design: A Survey on Differentiable Neural Architecture Search
Authors: Alexandre Heuillet, Ahmad Nasser, Hichem Arioui, Hedi Tabia
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05405
Pdf link: https://arxiv.org/pdf/2304.05405
Abstract In the past few years, Differentiable Neural Architecture Search (DNAS) rapidly imposed itself as the trending approach to automate the discovery of deep neural network architectures. This rise is mainly due to the popularity of DARTS, one of the first major DNAS methods. In contrast with previous works based on Reinforcement Learning or Evolutionary Algorithms, DNAS is faster by several orders of magnitude and uses fewer computational resources. In this comprehensive survey, we focus specifically on DNAS and review recent approaches in this field. Furthermore, we propose a novel challenge-based taxonomy to classify DNAS methods. We also discuss the contributions brought to DNAS in the past few years and its impact on the global NAS field. Finally, we conclude by giving some insights into future research directions for the DNAS field.
Probabilistic Reasoning at Scale: Trigger Graphs to the Rescue
Authors: Efthymia Tsamoura, Jaehun Lee, Jacopo Urbani
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2304.05459
Pdf link: https://arxiv.org/pdf/2304.05459
Abstract The role of uncertainty in data management has become more prominent than ever before, especially because of the growing importance of machine learning-driven applications that produce large uncertain databases. A well-known approach to querying such databases is to blend rule-based reasoning with uncertainty. However, techniques proposed so far struggle with large databases. In this paper, we address this problem by presenting a new technique for probabilistic reasoning that exploits Trigger Graphs (TGs) -- a notion recently introduced for the non-probabilistic setting. The intuition is that TGs can effectively store a probabilistic model by avoiding an explicit materialization of the lineage and by grouping together similar derivations of the same fact. Firstly, we show how TGs can be adapted to support the possible world semantics. Then, we describe techniques for efficiently computing a probabilistic model, and formally establish the correctness of our approach. We also present an extensive empirical evaluation using a prototype called LTGs. Our comparison against other leading engines shows that LTGs is not only faster, even against approximate reasoning techniques, but can also reason over probabilistic databases that existing engines cannot scale to.
Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box
Authors: Ryan Giordano, Martin Ingram, Tamara Broderick
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.05527
Pdf link: https://arxiv.org/pdf/2304.05527
Abstract Automatic differentiation variational inference (ADVI) offers fast and easy-to-use posterior approximation in multiple modern probabilistic programming languages. However, its stochastic optimizer lacks clear convergence criteria and requires tuning parameters. Moreover, ADVI inherits the poor posterior uncertainty estimates of mean-field variational Bayes (MFVB). We introduce deterministic ADVI'' (DADVI) to address these issues. DADVI replaces the intractable MFVB objective with a fixed Monte Carlo approximation, a technique known in the stochastic optimization literature as thesample average approximation'' (SAA). By optimizing an approximate but deterministic objective, DADVI can use off-the-shelf second-order optimization, and, unlike standard mean-field ADVI, is amenable to more accurate posterior linear response (LR) covariance estimates. In contrast to existing worst-case theory, we show that, on certain classes of common statistical problems, DADVI and the SAA can perform well with relatively few samples even in very high dimensions, though we also show that such favorable results cannot extend to variational approximations that are too expressive relative to mean-field ADVI. We show on a variety of real-world problems that DADVI reliably finds good solutions with default settings (unlike ADVI) and, together with LR covariances, is typically faster and more accurate than standard ADVI.
Zoom is what you need: An empirical study of the power of zoom and spatial biases in image classification
Authors: Mohammad Reza Taesiri, Giang Nguyen, Sarra Habchi, Cor-Paul Bezemer, Anh Nguyen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05538
Pdf link: https://arxiv.org/pdf/2304.05538
Abstract Image classifiers are information-discarding machines, by design. Yet, how these models discard information remains mysterious. We hypothesize that one way for image classifiers to reach high accuracy is to first zoom to the most discriminative region in the image and then extract features from there to predict image labels. We study six popular networks ranging from AlexNet to CLIP and find that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images. Furthermore, we explore the potential and limits of zoom transforms in image classification and uncover positional biases in various datasets, especially a strong center bias in two popular datasets: ImageNet-A and ObjectNet. Finally, leveraging our insights into the potential of zoom, we propose a state-of-the-art test-time augmentation (TTA) technique that improves classification accuracy by forcing models to explicitly perform zoom-in operations before making predictions. Our method is more interpretable, accurate, and faster than MEMO, a state-of-the-art TTA method. Additionally, we propose ImageNet-Hard, a new benchmark where zooming in alone often does not help state-of-the-art models better label images.
An Optimal SVC Bitstream Schema for Viewport-dependent 360-degree Video Streaming
Authors: Gang Shen, Mingyang Ma, Guangxin Xu
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2304.05654
Pdf link: https://arxiv.org/pdf/2304.05654
Abstract To deliver ultra-high resolution 360-degree video (such as 8K, 12K, or even higher) across the internet, viewport-dependent streaming becomes necessary to save bandwidth. During viewport switches, clients and servers will instantly exchange coordination info and contents for the given viewports. However, those viewport switches pose a serious challenge for video encoding because the temporal dependency between contents within changing viewports is unpredictable. In existing practices, it is commonly noted that GOP (Group of Pictures) size in a bitstream intrinsically prohibits the reduction of the viewport switch latency, such as Motion-to-photon (MTP) latency, or motion-to-high-quality (MTHQ) latency. In this paper, we presented a Scalable Video Coding (SVC) based bitstream schema, which can structurally remove the impacts of GOP in viewport-dependent streaming and provide instant viewport switches within one-frame time (the best possible). In addition, combined with tiling, this new coding schema allows an efficient packing of the non-adjacent regions within a viewport of 360-degree video. Our experiments also show that the overall encoding with this SVC-based approach is faster than with multi-stream approaches. Compared with current 360-degree video streaming solutions based on MPEG-I OMAF, our approach is superior in terms of viewport switch latency, simplicity of viewport packing, and encoding performance.
Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation
Authors: Liwen Wu, Rui Zhu, Mustafa B. Yaldiz, Yinhao Zhu, Hong Cai, Janarbek Matai, Fatih Porikli, Tzu-Mao Li, Manmohan Chandraker, Ravi Ramamoorthi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.05669
Pdf link: https://arxiv.org/pdf/2304.05669
Abstract Inverse path tracing has recently been applied to joint material and lighting estimation, given geometry and multi-view HDR observations of an indoor scene. However, it has two major limitations: path tracing is expensive to compute, and ambiguities exist between reflection and emission. We propose a novel Factorized Inverse Path Tracing (FIPT) method which utilizes a factored light transport formulation and finds emitters driven by rendering errors. Our algorithm enables accurate material and lighting optimization faster than previous work, and is more effective at resolving ambiguities. The exhaustive experiments on synthetic scenes show that our method (1) outperforms state-of-the-art indoor inverse rendering and relighting methods particularly in the presence of complex illumination effects; (2) speeds up inverse path tracing optimization to less than an hour. We further demonstrate robustness to noisy inputs through material and lighting estimates that allow plausible relighting in a real scene. The source code is available at: https://github.com/lwwu2/fipt
Real-time Trajectory-based Social Group Detection
Authors: Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05678
Pdf link: https://arxiv.org/pdf/2304.05678
Abstract Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content-based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDBAct dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications.
Cost-damage analysis of attack trees
Authors: Milan Lopuhaä-Zwakenberg, Mariëlle Stoelinga
Subjects: Cryptography and Security (cs.CR); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2304.05812
Pdf link: https://arxiv.org/pdf/2304.05812
Abstract Attack trees (ATs) are a widely deployed modelling technique to categorize potential attacks on a system. An attacker of such a system aims at doing as much damage as possible, but might be limited by a cost budget. The maximum possible damage for a given cost budget is an important security metric of a system. In this paper, we find the maximum damage given a cost budget by modelling this problem with ATs, both in deterministic and probabilistic settings. We show that the general problem is NP-complete, and provide heuristics to solve it. For general ATs these are based on integer linear programming. However when the AT is tree-structured, then one can instead use a faster bottom-up approach. We also extend these methods to other problems related to the cost-damage tradeoff, such as the cost-damage Pareto front.
Keyword: mobile

DOSM: Demand-Prediction based Online Service Management for Vehicular Edge Computing Networks
Authors: Anum Talpur, Mohan Gurusamy
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.05637
Pdf link: https://arxiv.org/pdf/2304.05637
Abstract In this work, we investigate an online service management problem in vehicular edge computing networks. To satisfy the varying service demands of mobile vehicles, a service management framework is required to make decisions on the service lifecycle to maintain good network performance. We describe the service lifecycle consists of creating an instance of a given service (\textit{scale-out}), moving an instance to a different edge node (\textit{migration}), and/or termination of an underutilized instance (\textit{scale-in}). In this paper, we propose an efficient online algorithm to perform service management in each time slot, where performance quality in the current time slot, the service demand in future time slots, and the minimal observed delay by vehicles and the minimal migration delay are considered while making the decisions on service lifecycle. Here, the future service demand is computed from a gated recurrent unit (GRU)-based prediction model, and the network performance quality is estimated using a deep reinforcement learning (DRL) model which has the ability to interact with the vehicular environment in real-time. The choice of optimal edge location to deploy a service instance at different times is based on our proposed optimization formulations. Simulation experiments using real-world vehicle trajectories are carried out to evaluate the performance of our proposed demand-prediction based online service management (DOSM) framework against different state-of-the-art solutions using several performance metrics.
5Greplay: a 5G Network Traffic Fuzzer -- Application to Attack Injection
Authors: Zujany Salazar, Huu Nghia Nguyen, Wissam Mallouli, Ana R Cavalli, Edgardo Montes de Oca
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.05719
Pdf link: https://arxiv.org/pdf/2304.05719
Abstract The fifth generation of mobile broadband is more than just an evolution to provide more mobile bandwidth, massive machine-type communications, and ultra-reliable and low-latency communications. It relies on a complex, dynamic and heterogeneous environment that implies addressing numerous testing and security challenges. In this paper we present 5Greplay, an open-source 5G network traffic fuzzer that enables the evaluation of 5G components by replaying and modifying 5G network traffic by creating and injecting network scenarios into a target that can be a 5G core service (e.g., AMF, SMF) or a RAN network (e.g., gNodeB). The tool provides the ability to alter network packets online or offline in both control and data planes in a very flexible manner. The experimental evaluation conducted against open-source based 5G platforms, showed that the target services accept traffic being altered by the tool, and that it can reach up to 9.56 Gbps using only 1 processor core to replay 5G traffic.
Stand-Up Indulgent Gathering on Lines
Authors: Quentin Bramas (ICube, ICUBE-Réseaux, UNISTRA), Sayaka Kamei, Anissa Lamani (ICube, ICUBE-Réseaux, UNISTRA), Sébastien Tixeuil (SU)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.05722
Pdf link: https://arxiv.org/pdf/2304.05722
Abstract We consider a variant of the crash-fault gathering problem called stand-up indulgent gathering (SUIG). In this problem, a group of mobile robots must eventually gather at a single location, which is not known in advance. If no robots crash, they must all meet at the same location. However, if one or more robots crash at a single location, all non-crashed robots must eventually gather at that location. The SUIG problem was first introduced for robots operating in a two-dimensional continuous Euclidean space, with most solutions relying on the ability of robots to move a prescribed (real) distance at each time instant. In this paper, we investigate the SUIG problem for robots operating in a discrete universe (i.e., a graph) where they can only move one unit of distance (i.e., to an adjacent node) at each time instant. Specifically, we focus on line-shaped networks and characterize the solvability of the SUIG problem for oblivious robots without multiplicity detection.
Fast vehicle detection algorithm based on lightweight YOLO7-tiny
Authors: Bo Li, YiHua Chen, Hao Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06002
Pdf link: https://arxiv.org/pdf/2304.06002
Abstract The swift and precise detection of vehicles holds significant research significance in intelligent transportation systems (ITS). However, current vehicle detection algorithms encounter challenges such as high computational complexity, low detection rate, and limited feasibility on mobile devices. To address these issues, this paper proposes a lightweight vehicle detection algorithm for YOLOv7-tiny called Ghost-YOLOv7. The model first scales the width multiple to 0.5 and replaces the standard convolution of the backbone network with Ghost convolution to achieve a lighter network and improve the detection speed; secondly, a Ghost bi-directional feature pyramid network (Ghost-BiFPN) neck network is designed to enhance feature extraction capability of the algorithm and enrich semantic information; thirdly, a Ghost Decouoled Head (GDH) is employed for accurate prediction of vehicle location and class, enhancing model accuracy; finally, a coordinate attention mechanism is introduced in the output layer to suppress environmental interference, and the WIoU loss function is employed to enhance the detection accuracy further. Experimental results on the PASCAL VOC dataset demonstrate that Ghost-YOLOv7 outperforms the original YOLOv7-tiny model, achieving a 29.8% reduction in computation, 37.3% reduction in the number of parameters, 35.1% reduction in model weights, and 1.1% higher mean average precision (mAP), while achieving a detection speed of 428 FPS. These results validate the effectiveness of the proposed method.
Keyword: pruning

Distilling Token-Pruned Pose Transformer for 2D Human Pose Estimation
Authors: Feixiang Ren
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05548
Pdf link: https://arxiv.org/pdf/2304.05548
Abstract Human pose estimation has seen widespread use of transformer models in recent years. Pose transformers benefit from the self-attention map, which captures the correlation between human joint tokens and the image. However, training such models is computationally expensive. The recent token-Pruned Pose Transformer (PPT) solves this problem by pruning the background tokens of the image, which are usually less informative. However, although it improves efficiency, PPT inevitably leads to worse performance than TokenPose due to the pruning of tokens. To overcome this problem, we present a novel method called Distilling Pruned-Token Transformer for human pose estimation (DPPT). Our method leverages the output of a pre-trained TokenPose to supervise the learning process of PPT. We also establish connections between the internal structure of pose transformers and PPT, such as attention maps and joint features. Our experimental results on the MPII datasets show that our DPPT can significantly improve PCK compared to previous PPT models while still reducing computational complexity.
Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series
Authors: Matthieu Herrmann, Chang Wei Tan, Mahsa Salehi, Geoffrey I. Webb
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05800
Pdf link: https://arxiv.org/pdf/2304.05800
Abstract Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient.
Keyword: voxel

There is no result

Keyword: lidar

SceneCalib: Automatic Targetless Calibration of Cameras and Lidars in Autonomous Driving
Authors: Ayon Sen, Gang Pan, Anton Mitrokhin, Ashraful Islam
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05530
Pdf link: https://arxiv.org/pdf/2304.05530
Abstract Accurate camera-to-lidar calibration is a requirement for sensor data fusion in many 3D perception tasks. In this paper, we present SceneCalib, a novel method for simultaneous self-calibration of extrinsic and intrinsic parameters in a system containing multiple cameras and a lidar sensor. Existing methods typically require specially designed calibration targets and human operators, or they only attempt to solve for a subset of calibration parameters. We resolve these issues with a fully automatic method that requires no explicit correspondences between camera images and lidar point clouds, allowing for robustness to many outdoor environments. Furthermore, the full system is jointly calibrated with explicit cross-camera constraints to ensure that camera-to-camera and camera-to-lidar extrinsic parameters are consistent.
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Authors: Zhenxiang Lin, Xidong Peng, Peishan Cong, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05645
Pdf link: https://arxiv.org/pdf/2304.05645
Abstract We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data, including 2D images and 3D LiDAR point clouds. We present a novel method, WildRefer, for this task by fully utilizing the appearance features in images, the location and geometry features in point clouds, and the dynamic features in consecutive input frames to match the semantic features in language. In particular, we propose two novel datasets, STRefer and LifeRefer, which focus on large-scale human-centric daily-life scenarios with abundant 3D object and natural language annotations. Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots. Extensive comparisons and ablation studies illustrate that our method achieves state-of-the-art performance on two proposed datasets. Code and dataset will be released when the paper is published.
Keyword: diffusion

CamDiff: Camouflage Image Augmentation via Diffusion Model
Authors: Xue-Jing Luo, Shuo Wang, Zongwei Wu, Christos Sakaridis, Yun Cheng, Deng-Ping Fan, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05469
Pdf link: https://arxiv.org/pdf/2304.05469
Abstract The burgeoning field of camouflaged object detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent models, we have identified a limitation in their robustness, where existing methods may misclassify salient objects as camouflaged ones, despite these two characteristics being contradictory. This limitation may stem from lacking multi-pattern training images, leading to less saliency robustness. To address this issue, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC) that overcomes the scarcity of multi-pattern training images. Specifically, we leverage the latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure the synthesized object aligns with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflage samples with richer characteristics. The results of user studies show that the salient objects in the scenes synthesized by our framework attract the user's attention more; thus, such samples pose a greater challenge to the existing COD models. Our approach enables flexible editing and efficient large-scale dataset generation at a low cost. It significantly enhances COD baselines' training and testing phases, emphasizing robustness across diverse domains. Our newly-generated datasets and source code are available at https://github.com/drlxj/CamDiff.
Improving Diffusion Models for Scene Text Editing with Dual Encoders
Authors: Jiabao Ji, Guanhua Zhang, Zhaowen Wang, Bairu Hou, Zhifei Zhang, Brian Price, Shiyu Chang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05568
Pdf link: https://arxiv.org/pdf/2304.05568
Abstract Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop out text regions and feed them into image transfer models, such as GANs. However, these methods are limited in their ability to change text style and are unable to insert texts into images. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. However, our empirical analysis reveals that state-of-the-art diffusion models struggle with rendering correct text and controlling text style. To address these problems, we propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design, which includes a character encoder for better text legibility and an instruction encoder for better style control. An instruction tuning framework is introduced to train our model to learn the mapping from the text instruction to the corresponding image with either the specified style or the style of the surrounding texts in the background. Such a training method further brings our method the zero-shot generalization ability to the following three scenarios: generating text with unseen font variation, e.g., italic and bold, mixing different fonts to construct a new font, and using more relaxed forms of natural language as the instructions to guide the generation task. We evaluate our approach on five datasets and demonstrate its superior performance in terms of text correctness, image naturalness, and style controllability. Our code is publicly available. https://github.com/UCSB-NLP-Chang/DiffSTE
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
Authors: Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, Lan Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05684
Pdf link: https://arxiv.org/pdf/2304.05684
Abstract We have recently seen tremendous progress in diffusion advances for generating realistic human motions. Yet, they largely disregard the rich multi-human interactions. In this paper, we present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process, which enables layman users to customize high-quality two-person interaction motions, with only text guidance. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 16,756 natural language descriptions. For the algorithm side, we carefully tailor the motion diffusion model to our two-person interaction setting. To handle the symmetry of human identities during interactions, we propose two cooperative transformer-based denoisers that explicitly share weights, with a mutual attention mechanism to further connect the two denoising processes. Then, we propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame. We further introduce two novel regularization terms to encode spatial relations, equipped with a corresponding damping scheme during the training of our interaction diffusion model. Extensive experiments validate the effectiveness and generalizability of InterGen. Notably, it can generate more diverse and compelling two-person motions than previous methods and enables various downstream applications for human interactions.
Exploring Diffusion Models for Unsupervised Video Anomaly Detection
Authors: Anil Osman Tur, Nicola Dall'Asen, Cigdem Beyan, Elisa Ricci
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05841
Pdf link: https://arxiv.org/pdf/2304.05841
Abstract This paper investigates the performance of diffusion models for video anomaly detection (VAD) within the most challenging but also the most operational scenario in which the data annotations are not used. As being sparse, diverse, contextual, and often ambiguous, detecting abnormal events precisely is a very ambitious task. To this end, we rely only on the information-rich spatio-temporal data, and the reconstruction power of the diffusion models such that a high reconstruction error is utilized to decide the abnormality. Experiments performed on two large-scale video anomaly detection datasets demonstrate the consistent improvement of the proposed method over the state-of-the-art generative models while in some cases our method achieves better scores than the more complex models. This is the first study using a diffusion model and examining its parameters' influence to present guidance for VAD in surveillance scenarios.
A quadrature scheme for steady-state diffusion equations involving fractional power of regularly accretive operator
Authors: Beiping Duan, Zongze Yang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05848
Pdf link: https://arxiv.org/pdf/2304.05848
Abstract In this paper, we construct a quadrature scheme to numerically solve the nonlocal diffusion equation $(\mathcal{A}^\alpha+b\mathcal{I})u=f$ with $\mathcal{A}^\alpha$ the $\alpha$-th power of the regularly accretive operator $\mathcal{A}$. Rigorous error analysis is carried out and sharp error bounds (up to some negligible constants) are obtained. The error estimates include a wide range of cases in which the regularity index and spectral angle of $\mathcal{A}$, the smoothness of $f$, the size of $b$ and $\alpha$ are all involved. The quadrature scheme is exponentially convergent with respect to the step size and is root-exponentially convergent with respect to the number of solves. Some numerical tests are presented in the last section to verify the sharpness of our estimates. Furthermore, both the scheme and the error bounds can be utilized directly to solve and analyze time-dependent problems.
Cancer-Net BCa-S: Breast Cancer Grade Prediction using Volumetric Deep Radiomic Features from Synthetic Correlated Diffusion Imaging
Authors: Chi-en Amy Tai, Hayden Gunraj, Alexander Wong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05899
Pdf link: https://arxiv.org/pdf/2304.05899
Abstract The prevalence of breast cancer continues to grow, affecting about 300,000 females in the United States in 2023. However, there are different levels of severity of breast cancer requiring different treatment strategies, and hence, grading breast cancer has become a vital component of breast cancer diagnosis and treatment planning. Specifically, the gold-standard Scarff-Bloom-Richardson (SBR) grade has been shown to consistently indicate a patient's response to chemotherapy. Unfortunately, the current method to determine the SBR grade requires removal of some cancer cells from the patient which can lead to stress and discomfort along with costly expenses. In this paper, we study the efficacy of deep learning for breast cancer grading based on synthetic correlated diffusion (CDI$^s$) imaging, a new magnetic resonance imaging (MRI) modality and found that it achieves better performance on SBR grade prediction compared to those learnt using gold-standard imaging modalities. Hence, we introduce Cancer-Net BCa-S, a volumetric deep radiomics approach for predicting SBR grade based on volumetric CDI$^s$ data. Given the promising results, this proposed method to identify the severity of the cancer would allow for better treatment decisions without the need for a biopsy. Cancer-Net BCa-S has been made publicly available as part of a global open-source initiative for advancing machine learning for cancer care.
Diffusion models with location-scale noise
Authors: Alexia Jolicoeur-Martineau, Kilian Fatras, Ke Li, Tal Kachman
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05907
Pdf link: https://arxiv.org/pdf/2304.05907
Abstract Diffusion Models (DMs) are powerful generative models that add Gaussian noise to the data and learn to remove it. We wanted to determine which noise distribution (Gaussian or non-Gaussian) led to better generated data in DMs. Since DMs do not work by design with non-Gaussian noise, we built a framework that allows reversing a diffusion process with non-Gaussian location-scale noise. We use that framework to show that the Gaussian distribution performs the best over a wide range of other distributions (Laplace, Uniform, t, Generalized-Gaussian).
SpectralDiff: Hyperspectral Image Classification with Spectral-Spatial Diffusion Models
Authors: Ning Chen, Jun Yue, Leyuan Fang, Shaobo Xia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05961
Pdf link: https://arxiv.org/pdf/2304.05961
Abstract Hyperspectral image (HSI) classification is an important topic in the field of remote sensing, and has a wide range of applications in Earth science. HSIs contain hundreds of continuous bands, which are characterized by high dimension and high correlation between adjacent bands. The high dimension and redundancy of HSI data bring great difficulties to HSI classification. In recent years, a large number of HSI feature extraction and classification methods based on deep learning have been proposed. However, their ability to model the global relationships among samples in both spatial and spectral domains is still limited. In order to solve this problem, an HSI classification method with spectral-spatial diffusion models is proposed. The proposed method realizes the reconstruction of spectral-spatial distribution of the training samples with the forward and reverse spectral-spatial diffusion process, thus modeling the global spatial-spectral relationship between samples. Then, we use the spectral-spatial denoising network of the reverse process to extract the unsupervised diffusion features. Features extracted by the spectral-spatial diffusion models can achieve cross-sample perception from the reconstructed distribution of the training samples, thus obtaining better classification performance. Experiments on three public HSI datasets show that the proposed method can achieve better performance than the state-of-the-art methods. The source code and the pre-trained spectral-spatial diffusion model will be publicly available at https://github.com/chenning0115/SpectralDiff.
Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views
Authors: Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbarian, Darren Cosker, Siyu Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.06024
Pdf link: https://arxiv.org/pdf/2304.06024
Abstract Automatic perception of human behaviors during social interactions is crucial for AR/VR applications, and an essential component is estimation of plausible 3D human pose and shape of our social partners from the egocentric view. One of the biggest challenges of this task is severe body truncation due to close social distances in egocentric scenarios, which brings large pose ambiguities for unseen body parts. To tackle this challenge, we propose a novel scene-conditioned diffusion method to model the body pose distribution. Conditioned on the 3D scene geometry, the diffusion model generates bodies in plausible human-scene interactions, with the sampling guided by a physics-based collision score to further resolve human-scene inter-penetrations. The classifier-free training enables flexible sampling with different conditions and enhanced diversity. A visibility-aware graph convolution model guided by per-joint visibility serves as the diffusion denoiser to incorporate inter-joint dependencies and per-body-part control. Extensive evaluations show that our method generates bodies in plausible interactions with 3D scenes, achieving both superior accuracy for visible joints and diversity for invisible body parts. The code will be available at https://sanweiliti.github.io/egohmr/egohmr.html.
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
Authors: Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06025
Pdf link: https://arxiv.org/pdf/2304.06025
Abstract We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel finetuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation. Video results are available on our project page.
Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA
Authors: James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, Hongxia Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06027
Pdf link: https://arxiv.org/pdf/2304.06027
Abstract Recent works demonstrate a remarkable ability to customize text-to-image diffusion models while only providing a few example images. What happens if you try to customize such models using multiple, fine-grained concepts in a sequential (i.e., continual) manner? In our work, we show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially. Specifically, when adding a new concept, the ability to generate high quality images of past, similar concepts degrade. To circumvent this forgetting, we propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model. Furthermore, we use customization prompts which do not include the word of the customized object (i.e., "person" for a human face dataset) and are initialized as completely random embeddings. Importantly, our method induces only marginal additional parameter costs and requires no storage of user data for replay. We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, which we refer to as Continual Diffusion, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification. The high achieving performance of C-LoRA in two separate domains positions it as a compelling solution for a wide range of applications, and we believe it has significant potential for practical impact.
Keyword: dynamic

Global QoS Policy Optimization in SD-WAN
Authors: Pham Tran Anh Quang, Jérémie Leguay, Xu Gong, Xu Huiying
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.05473
Pdf link: https://arxiv.org/pdf/2304.05473
Abstract In modern SD-WAN networks, a global controller is able to steer traffic on different paths based on application requirements and global intents. However, existing solutions cannot dynamically tune the way bandwidth is shared between flows inside each overlay link, in particular when the available capacity is uncertain due to cross traffic. In this context, we propose a global QoS (Quality of Service) policy optimization model that dynamically adjusts rate limits of applications based on their requirements to follow the evolution of network conditions. It relies on a novel cross-traffic estimator for the available bandwidth of overlay links that only exploits already available measurements. We propose two local search algorithms, one centralized and one distributed, that leverage cross-traffic estimation. We show in packet-level simulations a significant performance improvement in terms of SLA (Service Level Agreement) satisfaction. For instance, the adaptive tuning of load balancing and QoS policies based on cross-traffic estimation can improve SLA satisfaction by $40\%$ compared to static policies.
Contingency Games for Multi-Agent Interaction
Authors: Lasse Peters, Andrea Bajcsy, Chih-Yuan Chiu, David Fridovich-Keil, Forrest Laine, Laura Ferranti, Javier Alonso-Mora
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05483
Pdf link: https://arxiv.org/pdf/2304.05483
Abstract Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work, we take a game-theoretic perspective on contingency planning which is tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contingency game allows the robot to efficiently coordinate with other agents by generating strategic motion plans conditioned on multiple possible intents for other actors in the scene. Contingency games are parameterized via a scalar variable which represents a future time at which intent uncertainty will be resolved. Varying this parameter enables a designer to easily adjust how conservatively the robot behaves in the game. Interestingly, we also find that existing variants of game-theoretic planning under uncertainty are readily obtained as special cases of contingency games. Lastly, we offer an efficient method for solving N-player contingency games with nonlinear dynamics and non-convex costs and constraints. Through a series of simulated autonomous driving scenarios, we demonstrate that plans generated via contingency games provide quantitative performance gains over game-theoretic motion plans that do not account for future uncertainty reduction.
DistHD: A Learner-Aware Dynamic Encoding Method for Hyperdimensional Classification
Authors: Junyao Wang, Sitao Huang, Mohsen Imani
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05503
Pdf link: https://arxiv.org/pdf/2304.05503
Abstract Brain-inspired hyperdimensional computing (HDC) has been recently considered a promising learning approach for resource-constrained devices. However, existing approaches use static encoders that are never updated during the learning process. Consequently, it requires a very high dimensionality to achieve adequate accuracy, severely lowering the encoding and training efficiency. In this paper, we propose DistHD, a novel dynamic encoding technique for HDC adaptive learning that effectively identifies and regenerates dimensions that mislead the classification and compromise the learning quality. Our proposed algorithm DistHD successfully accelerates the learning process and achieves the desired accuracy with considerably lower dimensionality.
State estimation of a carbon capture process through POD model reduction and neural network approximation
Authors: Siyu Liu, Xunyuan Yin, Jinfeng Liu (University of Alberta)
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2304.05514
Pdf link: https://arxiv.org/pdf/2304.05514
Abstract This paper presents an efficient approach for state estimation of post-combustion CO2 capture plants (PCCPs) by using reduced-order neural network models. The method involves extracting lower-dimensional feature vectors from high-dimensional operational data of the PCCP and constructing a reduced-order process model using proper orthogonal decomposition (POD). Multi-layer perceptron (MLP) neural networks capture the dominant dynamics of the process and train the network parameters with low-dimensional data obtained from open-loop simulations. The proposed POD-MLP model can be used as the basis for estimating the states of PCCPs at a significantly decreased computational cost. For state estimation, a reduced-order extended Kalman filtering (EKF) scheme based on the POD-MLP model is developed. Our simulations demonstrate that the proposed POD-MLP modeling approach reduces computational complexity compared to the POD-only model for nonlinear systems. Additionally, the POD-MLP-EKF algorithm can accurately reconstruct the full state information of PCCPs while significantly improving computational efficiency compared to the EKF based on the original PCCP model.
Necessary and Sufficient Conditions for Simultaneous State and Input Recovery of Linear Systems with Sparse Inputs by $\ell_1$-Minimization
Authors: Kyle Poe, Enrique Mallada, René Vidal
Subjects: Systems and Control (eess.SY); Information Theory (cs.IT); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2304.05526
Pdf link: https://arxiv.org/pdf/2304.05526
Abstract The study of theoretical conditions for recovering sparse signals from compressive measurements has received a lot of attention in the research community. In parallel, there has been a great amount of work characterizing conditions for the recovery both the state and the input to a linear dynamical system (LDS), including a handful of results on recovering sparse inputs. However, existing sufficient conditions for recovering sparse inputs to an LDS are conservative and hard to interpret, while necessary and sufficient conditions have not yet appeared in the literature. In this work, we provide (1) the first characterization of necessary and sufficient conditions for the existence and uniqueness of sparse inputs to an LDS, (2) the first necessary and sufficient conditions for a linear program to recover both an unknown initial state and a sparse input, and (3) simple, interpretable recovery conditions in terms of the LDS parameters. We conclude with a numerical validation of these claims and discuss implications and future directions.
CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data
Authors: Chen Zhao, Anqi Liu, Xiao Zhang, Xuewei Cao, Zhengming Ding, Qiuying Sha, Hui Shen, Hong-Wen Deng, Weihua Zhou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Genomics (q-bio.GN)
Arxiv link: https://arxiv.org/abs/2304.05542
Pdf link: https://arxiv.org/pdf/2304.05542
Abstract Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data.
DynamicDet: A Unified Dynamic Architecture for Object Detection
Authors: Zhihao Lin, Yongtao Wang, Jinhe Zhang, Xiaojie Chu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05552
Pdf link: https://arxiv.org/pdf/2304.05552
Abstract Dynamic neural network is an emerging research topic in deep learning. With adaptive inference, dynamic models can achieve remarkable accuracy and computational efficiency. However, it is challenging to design a powerful dynamic detector, because of no suitable dynamic architecture and exiting criterion for object detection. To tackle these difficulties, we propose a dynamic framework for object detection, named DynamicDet. Firstly, we carefully design a dynamic architecture based on the nature of the object detection task. Then, we propose an adaptive router to analyze the multi-scale information and to decide the inference route automatically. We also present a novel optimization strategy with an exiting criterion based on the detection losses for our dynamic detectors. Last, we present a variable-speed inference strategy, which helps to realize a wide range of accuracy-speed trade-offs with only one dynamic detector. Extensive experiments conducted on the COCO benchmark demonstrate that the proposed DynamicDet achieves new state-of-the-art accuracy-speed trade-offs. For instance, with comparable accuracy, the inference speed of our dynamic detector Dy-YOLOv7-W6 surpasses YOLOv7-E6 by 12%, YOLOv7-D6 by 17%, and YOLOv7-E6E by 39%. The code is available at https://github.com/VDIGPKU/DynamicDet.
Towards Large-Scale Simulations of Open-Ended Evolution in Continuous Cellular Automata
Authors: Bert Wang-Chak Chan
Subjects: Neural and Evolutionary Computing (cs.NE); Cellular Automata and Lattice Gases (nlin.CG)
Arxiv link: https://arxiv.org/abs/2304.05639
Pdf link: https://arxiv.org/pdf/2304.05639
Abstract Inspired by biological and cultural evolution, there have been many attempts to explore and elucidate the necessary conditions for open-endedness in artificial intelligence and artificial life. Using a continuous cellular automata called Lenia as the base system, we built large-scale evolutionary simulations using parallel computing framework JAX, in order to achieve the goal of never-ending evolution of self-organizing patterns. We report a number of system design choices, including (1) implicit implementation of genetic operators, such as reproduction by pattern self-replication, and selection by differential existential success; (2) localization of genetic information; and (3) algorithms for dynamically maintenance of the localized genotypes and translation to phenotypes. Simulation results tend to go through a phase of diversity and creativity, gradually converge to domination by fast expanding patterns, presumably a optimal solution under the current design. Based on our experimentation, we propose several factors that may further facilitate open-ended evolution, such as virtual environment design, mass conservation, and energy constraints.
Instance-Aware Domain Generalization for Face Anti-Spoofing
Authors: Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Ran Yi, Shouhong Ding, Lizhuang Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05640
Pdf link: https://arxiv.org/pdf/2304.05640
Abstract Face anti-spoofing (FAS) based on domain generalization (DG) has been recently studied to improve the generalization on unseen scenarios. Previous methods typically rely on domain labels to align the distribution of each domain for learning domain-invariant representations. However, artificial domain labels are coarse-grained and subjective, which cannot reflect real domain distributions accurately. Besides, such domain-aware methods focus on domain-level alignment, which is not fine-grained enough to ensure that learned representations are insensitive to domain styles. To address these issues, we propose a novel perspective for DG FAS that aligns features on the instance level without the need for domain labels. Specifically, Instance-Aware Domain Generalization framework is proposed to learn the generalizable feature by weakening the features' sensitivity to instance-specific styles. Concretely, we propose Asymmetric Instance Adaptive Whitening to adaptively eliminate the style-sensitive feature correlation, boosting the generalization. Moreover, Dynamic Kernel Generator and Categorical Style Assembly are proposed to first extract the instance-specific features and then generate the style-diversified features with large style shifts, respectively, further facilitating the learning of style-insensitive features. Extensive experiments and analysis demonstrate the superiority of our method over state-of-the-art competitors. Code will be publicly available at https://github.com/qianyuzqy/IADG.
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Authors: Zhenxiang Lin, Xidong Peng, Peishan Cong, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05645
Pdf link: https://arxiv.org/pdf/2304.05645
Abstract We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data, including 2D images and 3D LiDAR point clouds. We present a novel method, WildRefer, for this task by fully utilizing the appearance features in images, the location and geometry features in point clouds, and the dynamic features in consecutive input frames to match the semantic features in language. In particular, we propose two novel datasets, STRefer and LifeRefer, which focus on large-scale human-centric daily-life scenarios with abundant 3D object and natural language annotations. Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots. Extensive comparisons and ablation studies illustrate that our method achieves state-of-the-art performance on two proposed datasets. Code and dataset will be released when the paper is published.
A parallel rank-adaptive integrator for dynamical low-rank approximation
Authors: Gianluca Ceruti, Jonas Kusch, Christian Lubich
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05660
Pdf link: https://arxiv.org/pdf/2304.05660
Abstract This work introduces a parallel and rank-adaptive matrix integrator for dynamical low-rank approximation. The method is related to the previously proposed rank-adaptive basis update & Galerkin (BUG) integrator but differs significantly in that all arising differential equations, both for the basis and the Galerkin coefficients, are solved in parallel. Moreover, this approach eliminates the need for a potentially costly coefficient update with augmented basis matrices. The integrator also incorporates a new step rejection strategy that enhances the robustness of both the parallel integrator and the BUG integrator. By construction, the parallel integrator inherits the robust error bound of the BUG and projector-splitting integrators. Comparisons of the parallel and BUG integrators are presented by a series of numerical experiments which demonstrate the efficiency of the proposed method, for problems from radiative transfer and radiation therapy.
Real-time Trajectory-based Social Group Detection
Authors: Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05678
Pdf link: https://arxiv.org/pdf/2304.05678
Abstract Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content-based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDBAct dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications.
A Persistent-Excitation-Free Method for System Disturbance Estimation Using Concurrent Learning
Authors: Zengjie Zhang, Fangzhou Liu, Tong Liu, Jianbin Qiu, Martin Buss
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05693
Pdf link: https://arxiv.org/pdf/2304.05693
Abstract Observer-based methods are widely used to estimate the disturbances of different dynamic systems. However, a drawback of the conventional disturbance observers is that they all assume persistent excitation (PE) of the systems. As a result, they may lead to poor estimation precision when PE is not ensured, for instance, when the disturbance gain of the system is close to the singularity. In this paper, we propose a novel disturbance observer based on concurrent learning (CL) with time-variant history stacks, which ensures high estimation precision even in PE-free cases. The disturbance observer is designed in both continuous and discrete time. The estimation errors of the proposed method are proved to converge to a bounded set using the Lyapunov method. A history-sample-selection procedure is proposed to reduce the estimation error caused by the accumulation of old history samples. A simulation study on epidemic control shows that the proposed method produces higher estimation precision than the conventional disturbance observer when PE is not satisfied. This justifies the correctness of the proposed CL-based disturbance observer and verifies its applicability to solving practical problems.
Multi-scale Geometry-aware Transformer for 3D Point Cloud Classification
Authors: Xian Wei, Muyu Wang, Shing-Ho Jonathan Lin, Zhengyu Li, Jian Yang, Arafat Al-Jawari, Xuan Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05694
Pdf link: https://arxiv.org/pdf/2304.05694
Abstract Self-attention modules have demonstrated remarkable capabilities in capturing long-range relationships and improving the performance of point cloud tasks. However, point cloud objects are typically characterized by complex, disordered, and non-Euclidean spatial structures with multiple scales, and their behavior is often dynamic and unpredictable. The current self-attention modules mostly rely on dot product multiplication and dimension alignment among query-key-value features, which cannot adequately capture the multi-scale non-Euclidean structures of point cloud objects. To address these problems, this paper proposes a self-attention plug-in module with its variants, Multi-scale Geometry-aware Transformer (MGT). MGT processes point cloud data with multi-scale local and global geometric information in the following three aspects. At first, the MGT divides point cloud data into patches with multiple scales. Secondly, a local feature extractor based on sphere mapping is proposed to explore the geometry inner each patch and generate a fixed-length representation for each patch. Thirdly, the fixed-length representations are fed into a novel geodesic-based self-attention to capture the global non-Euclidean geometry between patches. Finally, all the modules are integrated into the framework of MGT with an end-to-end training scheme. Experimental results demonstrate that the MGT vastly increases the capability of capturing multi-scale geometry using the self-attention mechanism and achieves strong competitive performance on mainstream point cloud benchmarks.
Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives
Authors: Jayden Hong, Zengjie Zhang, Amir M. Soufi Enayati, Homayoun Najjaran
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05703
Pdf link: https://arxiv.org/pdf/2304.05703
Abstract Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning.
5Greplay: a 5G Network Traffic Fuzzer -- Application to Attack Injection
Authors: Zujany Salazar, Huu Nghia Nguyen, Wissam Mallouli, Ana R Cavalli, Edgardo Montes de Oca
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.05719
Pdf link: https://arxiv.org/pdf/2304.05719
Abstract The fifth generation of mobile broadband is more than just an evolution to provide more mobile bandwidth, massive machine-type communications, and ultra-reliable and low-latency communications. It relies on a complex, dynamic and heterogeneous environment that implies addressing numerous testing and security challenges. In this paper we present 5Greplay, an open-source 5G network traffic fuzzer that enables the evaluation of 5G components by replaying and modifying 5G network traffic by creating and injecting network scenarios into a target that can be a 5G core service (e.g., AMF, SMF) or a RAN network (e.g., gNodeB). The tool provides the ability to alter network packets online or offline in both control and data planes in a very flexible manner. The experimental evaluation conducted against open-source based 5G platforms, showed that the target services accept traffic being altered by the tool, and that it can reach up to 9.56 Gbps using only 1 processor core to replay 5G traffic.
Towards a more comprehensive open-source model for interdisciplinary smart integrated energy systems
Authors: Béla Wiegel, Tom Steffen, Davood Babazadeh, Christian Becker
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05720
Pdf link: https://arxiv.org/pdf/2304.05720
Abstract The energy transition has recently experienced a further acceleration. In order to make the integration of renewable energies as cost-effective, secure and sustainable as possible and to develop new paradigms for the energy system, many energy system models have been developed in research in the past to evaluate the solutions. While model identification and dissemination of results are widely discussed in the literature, a detailed view of the methodology is often missing. This paper addresses this topic and proposes a methodology to build a comprehensive, publicly accessible database for modeling a multi-modal integrated energy system. The focus hereby is dynamic modeling of low- and medium-voltage grids consisting of prosumers, battery storages, heat pumps and electric cars. In addition, a district heating network is parameterized to match the electricity grid. Modelica and the TransiEnt-Library serves as the modeling tool. The methodology for creating the grid models is available via GitLab. A study case that uses the methodology to analyze the congestion situation within a medium-voltage distribution grid is presented.
Distributed Coverage Control of Constrained Constant-Speed Unicycle Multi-Agent Systems
Authors: Qingchen Liu, Zengjie Zhang, Nhan Khanh Le, Jiahu Qin, Fangzhou Liu, Sandra Hirche
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05723
Pdf link: https://arxiv.org/pdf/2304.05723
Abstract This paper proposes a novel distributed coverage controller for a multi-agent system with constant-speed unicycle robots (CSUR). The work is motivated by the limitation of the conventional method that does not ensure the satisfaction of hard state- and input-dependent constraints and leads to feasibility issues for multi-CSUR systems. In this paper, we solve these problems by designing a novel coverage cost function and a saturated gradient-search-based control law. Invariant set theory and Lyapunov-based techniques are used to prove the state-dependent confinement and the convergence of the system state to the optimal coverage configuration, respectively. The controller is implemented in a distributed manner based on a novel communication standard among the agents. A series of simulation case studies are conducted to validate the effectiveness of the proposed coverage controller in different initial conditions and with control parameters. A comparison study in simulation reveals the advantage of the proposed method in terms of avoiding infeasibility. The experiment study verifies the applicability of the method to real robots with uncertainties. The development procedure of the method from theoretical analysis to experimental validation provides a novel framework for multi-agent system coordinate control with complex agent dynamics.
Dynamic Graph Representation Learning with Neural Networks: A Survey
Authors: Leshanshui Yang, Sébastien Adam, Clément Chatelain
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05729
Pdf link: https://arxiv.org/pdf/2304.05729
Abstract In recent years, Dynamic Graph (DG) representations have been increasingly used for modeling dynamic systems due to their ability to integrate both topological and temporal information in a compact representation. Dynamic graphs allow to efficiently handle applications such as social network prediction, recommender systems, traffic forecasting or electroencephalography analysis, that can not be adressed using standard numeric representations. As a direct consequence of the emergence of dynamic graph representations, dynamic graph learning has emerged as a new machine learning problem, combining challenges from both sequential/temporal data processing and static graph learning. In this research area, Dynamic Graph Neural Network (DGNN) has became the state of the art approach and plethora of models have been proposed in the very recent years. This paper aims at providing a review of problems and models related to dynamic graph learning. The various dynamic graph supervised learning settings are analysed and discussed. We identify the similarities and differences between existing models with respect to the way time information is modeled. Finally, general guidelines for a DGNN designer when faced with a dynamic graph learning problem are provided.
RO-MAP: Real-Time Multi-Object Mapping with Neural Radiance Fields
Authors: Xiao Han, Houxuan Liu, Yunchao Ding, Lu Yang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05735
Pdf link: https://arxiv.org/pdf/2304.05735
Abstract Accurate perception of objects in the environment is important for improving the scene understanding capability of SLAM systems. In robotic and augmented reality applications, object maps with semantic and metric information show attractive advantages. In this paper, we present RO-MAP, a novel multi-object mapping pipeline that does not rely on 3D priors. Given only monocular input, we use neural radiance fields to represent objects and couple them with a lightweight object SLAM based on multi-view geometry, to simultaneously localize objects and implicitly learn their dense geometry. We create separate implicit models for each detected object and train them dynamically and in parallel as new observations are added. Experiments on synthetic and real-world datasets demonstrate that our method can generate semantic object map with shape reconstruction, and be competitive with offline methods while achieving real-time performance (25Hz). The code and dataset will be available at: https://github.com/XiaoHan-Git/RO-MAP
Boosting long-term forecasting performance for continuous-time dynamic graph networks via data augmentation
Authors: Yuxing Tian, Mingjie Zhu, Jiachi Luo, Song Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05749
Pdf link: https://arxiv.org/pdf/2304.05749
Abstract This study focuses on long-term forecasting (LTF) on continuous-time dynamic graph networks (CTDGNs), which is important for real-world modeling. Existing CTDGNs are effective for modeling temporal graph data due to their ability to capture complex temporal dependencies but perform poorly on LTF due to the substantial requirement for historical data, which is not practical in most cases. To relieve this problem, a most intuitive way is data augmentation. In this study, we propose \textbf{\underline{U}ncertainty \underline{M}asked \underline{M}ix\underline{U}p (UmmU)}: a plug-and-play module that conducts uncertainty estimation to introduce uncertainty into the embedding of intermediate layer of CTDGNs, and perform masked mixup to further enhance the uncertainty of the embedding to make it generalize to more situations. UmmU can be easily inserted into arbitrary CTDGNs without increasing the number of parameters. We conduct comprehensive experiments on three real-world dynamic graph datasets, the results demonstrate that UmmU can effectively improve the long-term forecasting performance for CTDGNs.
Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification
Authors: Bing Han, Zhengyang Chen, Yanmin Qian
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2304.05754
Pdf link: https://arxiv.org/pdf/2304.05754
Abstract Automatic speaker verification task has made great achievements using deep learning approaches with the large-scale manually annotated dataset. However, it's very difficult and expensive to collect a large amount of well-labeled data for system building. In this paper, we propose a novel and advanced self-supervised learning framework which can construct a high performance speaker verification system without using any labeled data. To avoid the impact of false negative pairs, we adopt the self-distillation with no labels (DINO) framework as the initial model, which can be trained without exploiting negative pairs. Then, we introduce a cluster-aware training strategy for DINO to improve the diversity of data. In the iteration learning stage, due to a mass of unreliable labels from clustering, the quality of pseudo labels is important for the system training. This motivates us to propose dynamic loss-gate and label correction (DLG-LC) methods to alleviate the performance degradation caused by unreliable labels. More specifically, we model the loss distribution with GMM and obtain the loss-gate threshold dynamically to distinguish the reliable and unreliable labels. Besides, we adopt the model predictions to correct the unreliable label, for better utilizing the unreliable data rather than dropping them directly. Moreover, we extend the DLG-LC to multi-modality to further improve the performance. The experiments are performed on the commonly used Voxceleb dataset. Compared to the best-known self-supervised speaker verification system, our proposed method obtain 22.17%, 27.94% and 25.56% relative EER improvement on Vox-O, Vox-E and Vox-H test sets, even with fewer iterations, smaller models, and simpler clustering methods. More importantly, the newly proposed system even achieves comparable results with the fully supervised system, but without using any human labeled data.
Learning coordination through new actions
Authors: Sofia B.S.D. Castro
Subjects: Computer Science and Game Theory (cs.GT); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2304.05763
Pdf link: https://arxiv.org/pdf/2304.05763
Abstract We provide a novel approach to achieving a desired outcome in a coordination game: the original 2x2 game is embedded in a 2x3 game where one of the players may use a third action. For a large set of payoff values only one of the Nash equilibria of the original 2x2 game is stable under replicator dynamics. We show that this Nash equilibrium is the {\omega}-limit of all initial conditions in the interior of the state space for the modified 2x3 game. Thus, the existence of a third action for one of the players, although not used, allows both players to coordinate on one Nash equilibrium. This Nash equilibrium is the one preferred by, at least, the player with access to the new action. This approach deals with both coordination failure (players choose the payoff-dominant Nash equilibrium, if such a Nash equilibrium exists) and miscoordination (players do not use mixed strategies).
Model Reduction of Linear Stochastic Systems with Preservation of sc-LTL Specifications
Authors: Maico Hendrikus Wilhelmus Engelaar, Licio Romao, Yulong Gao, Mircea Lazar, Alessandro Abate, Sofie Haesaert
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05770
Pdf link: https://arxiv.org/pdf/2304.05770
Abstract We propose a correct-by-design controller synthesis framework for discrete-time linear stochastic systems that provides more flexibility to the overall abstraction framework of stochastic systems. Rather than directly abstracting the original dynamics, which can be large-scale and complex, we propose an intermediate step that leverages weak Gaussian realization theory and Kalman filtering techniques to obtain a related, discrete-time stochastic dynamical system that is simpler, and more prone to abstraction methods. We also propose a controller refinement algorithm and show correctness of the overall approach in enforcing synthetically co-safe Linear Temporal Logic properties. In general, the generated simplified stochastic dynamical systems are time-varying, but, under some technical conditions, will become time-invariant. We illustrate our theoretical findings with an example that supports the proposed correct-by-design framework and that illustrates how model reduction of stochastic models can be achieved.
A Security Evaluation Framework for Software-Defined Network Architectures in Data Center Environments
Authors: Igor Ivkić, Dominik Thiede, Nicholas Race, Matthew Broadbent, Antonios Gouglidis
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.05776
Pdf link: https://arxiv.org/pdf/2304.05776
Abstract The importance of cloud computing has grown over the last years, which resulted in a significant increase of Data Center (DC) network requirements. Virtualisation is one of the key drivers of that transformation and enables a massive deployment of computing resources, which exhausts server capacity limits. Furthermore, the increased network endpoints need to be handled dynamically and centrally to facilitate cloud computing functionalities. Traditional DCs barely satisfy those demands because of their inherent limitations based on the network topology. Software-Defined Networks (SDN) promise to meet the increasing network requirements for cloud applications by decoupling control functionalities from data forwarding. Although SDN solutions add more flexibility to DC networks, they also pose new vulnerabilities with a high impact due to the centralised architecture. In this paper we propose an evaluation framework for assessing the security level of SDN architectures in four different stages. Furthermore, we show in an experimental study, how the framework can be used for mapping SDN threats with associated vulnerabilities and necessary mitigations in conjunction with risk and impact classification. The proposed framework helps administrators to evaluate the network security level, to apply countermeasures for identified SDN threats, and to meet the networks security requirements.
Micromagnetics simulations and phase transitions of ferromagnetics with Dzyaloshinskii-Moriya interaction
Authors: Panchi Li, Shuting Gu, Jin Lan, Jingrun Chen, Weiqing Ren, Rui Du
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05789
Pdf link: https://arxiv.org/pdf/2304.05789
Abstract Magnetic skyrmions widely exist in a diverse range of magnetic systems, including chiral magnets with a non-centrosymmetric structure characterized by Dzyaloshinkii-Moriya interaction~(DMI). In this study, we propose a generalized semi-implicit backward differentiation formula projection method, enabling the simulations of the Landau-Lifshitz~(LL) equation in chiral magnets in a typical time step-size of $1$ ps, markedly exceeding the limit subjected by existing numerical methods of typically $0.1$ ps. Using micromagnetics simulations, we show that the LL equation with DMI reveals an intriguing dynamic instability in magnetization configurations as the damping varies. Both the isolated skyrmionium and skyrmionium clusters can be consequently produced using a simple initialization strategy and a specific damping parameter. Assisted by the string method, the transition path between skyrmion and skyrmionium, along with the escape of a skyrmion from the skyrmion clusters, are then thoroughly examined. The numerical methods developed in this work not only provide a reliable paradigm to investigate the skyrmion-based textures and their transition paths, but also facilitate the understandings for magnetization dynamics in complex magnetic systems.
Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series
Authors: Matthieu Herrmann, Chang Wei Tan, Mahsa Salehi, Geoffrey I. Webb
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05800
Pdf link: https://arxiv.org/pdf/2304.05800
Abstract Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient.
Data-Driven Response Regime Exploration and Identification for Dynamical Systems
Authors: Maor Farid
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2304.05822
Pdf link: https://arxiv.org/pdf/2304.05822
Abstract Data-Driven Response Regime Exploration and Identification (DR$^2$EI) is a novel and fully data-driven method for identifying and classifying response regimes of a dynamical system without requiring human intervention. This approach is a valuable tool for exploring and discovering response regimes in complex dynamical systems, especially when the governing equations and the number of response regimes are unknown, and the system is expensive to sample. Additionally, the method is useful for order reduction, as it can be used to identify the most dominant response regimes of a given dynamical system. DR$^2$EI utilizes unsupervised learning algorithms to transform the system's response into an embedding space that facilitates regime classification. An active sequential sampling approach based on Gaussian Process Regression (GPR) is used to efficiently sample the parameter space, quantify uncertainty, and provide optimal trade-offs between exploration and exploitation. The performance of the DR$^2$EI method was evaluated by analyzing three established dynamical systems: the mathematical pendulum, the Lorenz system, and the Duffing oscillator. The method was shown to effectively identify a variety of response regimes with both similar and distinct topological features and frequency content, demonstrating its versatility in capturing a wide range of behaviors. While it may not be possible to guarantee that all possible regimes will be identified, the method provides an automated and efficient means for exploring the parameter space of a dynamical system and identifying its underlying "sufficiently dominant" response regimes without prior knowledge of the system's equations or behavior.
When Should You Wait Before Updating? Toward a Robustness Refinement
Authors: Swan Dubois, Laurent Feuilloley, Franck Petit, Mikaël Rabie
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.05831
Pdf link: https://arxiv.org/pdf/2304.05831
Abstract Consider a dynamic network and a given distributed problem. At any point in time, there might exist several solutions that are equally good with respect to the problem specification, but that are different from an algorithmic perspective, because some could be easier to update than others when the network changes. In other words, one would prefer to have a solution that is more robust to topological changes in the network; and in this direction the best scenario would be that the solution remains correct despite the dynamic of the network. In~\cite{CasteigtsDPR20}, the authors introduced a very strong robustness criterion: they required that for any removal of edges that maintain the network connected, the solution remains valid. They focus on the maximal independent set problem, and their approach consists in characterizing the graphs in which there exists a robust solution (the existential problem), or even stronger, where any solution is robust (the universal problem). As the robustness criteria is very demanding, few graphs have a robust solution, and even fewer are such that all of their solutions are robust. In this paper, we ask the following question: \textit{Can we have robustness for a larger class of networks, if we bound the number $k$ of edge removals allowed}? (See the full paper for the full abstract.)
Dynamic Mixed Membership Stochastic Block Model for Weighted Labeled Networks
Authors: Gaël Poux-Médard, Julien Velcin, Sabine Loudcher
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2304.05894
Pdf link: https://arxiv.org/pdf/2304.05894
Abstract Most real-world networks evolve over time. Existing literature proposes models for dynamic networks that are either unlabeled or assumed to have a single membership structure. On the other hand, a new family of Mixed Membership Stochastic Block Models (MMSBM) allows to model static labeled networks under the assumption of mixed-membership clustering. In this work, we propose to extend this later class of models to infer dynamic labeled networks under a mixed membership assumption. Our approach takes the form of a temporal prior on the model's parameters. It relies on the single assumption that dynamics are not abrupt. We show that our method significantly differs from existing approaches, and allows to model more complex systems --dynamic labeled networks. We demonstrate the robustness of our method with several experiments on both synthetic and real-world datasets. A key interest of our approach is that it needs very few training data to yield good results. The performance gain under challenging conditions broadens the variety of possible applications of automated learning tools --as in social sciences, which comprise many fields where small datasets are a major obstacle to the introduction of machine learning methods.
A Phoneme-Informed Neural Network Model for Note-Level Singing Transcription
Authors: Sangeon Yong, Li Su, Juhan Nam
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2304.05917
Pdf link: https://arxiv.org/pdf/2304.05917
Abstract Note-level automatic music transcription is one of the most representative music information retrieval (MIR) tasks and has been studied for various instruments to understand music. However, due to the lack of high-quality labeled data, transcription of many instruments is still a challenging task. In particular, in the case of singing, it is difficult to find accurate notes due to its expressiveness in pitch, timbre, and dynamics. In this paper, we propose a method of finding note onsets of singing voice more accurately by leveraging the linguistic characteristics of singing, which are not seen in other instruments. The proposed model uses mel-scaled spectrogram and phonetic posteriorgram (PPG), a frame-wise likelihood of phoneme, as an input of the onset detection network while PPG is generated by the pre-trained network with singing and speech data. To verify how linguistic features affect onset detection, we compare the evaluation results through the dataset with different languages and divide onset types for detailed analysis. Our approach substantially improves the performance of singing transcription and therefore emphasizes the importance of linguistic features in singing analysis.
Unified Numerical Stability and Accuracy Analysis of the Partitioned-Solution Approach
Authors: Georgios Tzounas, Gabriela Hug
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05955
Pdf link: https://arxiv.org/pdf/2304.05955
Abstract This paper focuses on the Partitioned-Solution Approach (PSA) employed for the Time-Domain Simulation (TDS) of dynamic power system models. In PSA, differential equations are solved at each step of the TDS for state variables, whereas algebraic equations are solved separately. The goal of this paper is to propose a novel, matrix-pencil based technique to study numerical stability and accuracy of PSA in a unified way. The proposed technique quantifies the numerical deformation that PSA-based methods introduce to the dynamics of the power system model, and allows estimating useful upper time step bounds that achieve prescribed simulation accuracy criteria. The family of Predictor-Corrector (PC) methods, which is commonly applied in practical implementations of PSA, is utilized to illustrate the proposed technique. Simulations are carried out on the IEEE 39-bus system, as well as on a 1479-bus model of the All-Island Irish Transmission System (AIITS).
UAV Obstacle Avoidance by Human-in-the-Loop Reinforcement in Arbitrary 3D Environment
Authors: Xuyang Li, Jianwu Fang, Kai Du, Kuizhi Mei, Jianru Xue
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05959
Pdf link: https://arxiv.org/pdf/2304.05959
Abstract This paper focuses on the continuous control of the unmanned aerial vehicle (UAV) based on a deep reinforcement learning method for a large-scale 3D complex environment. The purpose is to make the UAV reach any target point from a certain starting point, and the flying height and speed are variable during navigation. In this work, we propose a deep reinforcement learning (DRL)-based method combined with human-in-the-loop, which allows the UAV to avoid obstacles automatically during flying. We design multiple reward functions based on the relevant domain knowledge to guide UAV navigation. The role of human-in-the-loop is to dynamically change the reward function of the UAV in different situations to suit the obstacle avoidance of the UAV better. We verify the success rate and average step size on urban, rural, and forest scenarios, and the experimental results show that the proposed method can reduce the training convergence time and improve the efficiency and accuracy of navigation tasks. The code is available on the website https://github.com/Monnalo/UAV_navigation.
An information-theoretic evolutionary algorithm
Authors: Arnaud Berny
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2304.05963
Pdf link: https://arxiv.org/pdf/2304.05963
Abstract We propose a novel evolutionary algorithm on bit vectors which derives from the principles of information theory. The information-theoretic evolutionary algorithm (it-EA) iteratively updates a search distribution with two parameters, the center, that is the bit vector at which standard bit mutation is applied, and the mutation rate. The mutation rate is updated by means of information-geometric optimization and the center is updated by means of a maximum likelihood principle. Standard elitist and non elitist updates of the center are also considered. Experiments illustrate the dynamics of the mutation rate and the influence of hyperparameters. In an empirical runtime analysis, on OneMax and LeadingOnes, the elitist and non elitist it-EAs obtain promising results.
Traffic Modeling with SUMO: a Tutorial
Authors: Davide Andrea Guastella, Gianluca Bontempi
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.05982
Pdf link: https://arxiv.org/pdf/2304.05982
Abstract This paper presents a step-by-step guide to generating and simulating a traffic scenario using the open-source simulation tool SUMO. It introduces the common pipeline used to generate a synthetic traffic model for SUMO, how to import existing traffic data into a model to achieve accuracy in traffic simulation (that is, producing a traffic model which dynamics is similar to the real one). It also describes how SUMO outputs information from simulation that can be used for data analysis purposes.
Astrocytic gliotransmission as a pathway for stable stimulation of post-synaptic spiking: Implications for working memory
Authors: Valentin Würzbauer, Kerstin Lenk, Matin Jafarian
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.06004
Pdf link: https://arxiv.org/pdf/2304.06004
Abstract The brain consists not only of neurons but also of non-neuronal cells, including astrocytes. Recent discoveries in neuroscience suggest that astrocytes directly regulate neuronal activity by releasing gliotransmitters such as glutamate. In this paper, we consider a biologically plausible mathematical model of a tripartite neuron-astrocyte network. We study the stability of the nonlinear astrocyte dynamics, as well as its role in regulating the firing rate of the post-synaptic neuron. We show that astrocytes enable storing neuronal information temporarily. Motivated by recent findings on the role of astrocytes in explaining mechanisms of working memory, we numerically verify the utility of our analysis in showing the possibility of two competing theories of persistent and sparse neuronal activity of working memory.
Adaptive Human Matting for Dynamic Videos
Authors: Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06018
Pdf link: https://arxiv.org/pdf/2304.06018
Abstract The most recent efforts in video matting have focused on eliminating trimap dependency since trimap annotations are expensive and trimap-based methods are less adaptable for real-time applications. Despite the latest tripmap-free methods showing promising results, their performance often degrades when dealing with highly diverse and unstructured videos. We address this limitation by introducing Adaptive Matting for Dynamic Videos, termed AdaM, which is a framework designed for simultaneously differentiating foregrounds from backgrounds and capturing alpha matte details of human subjects in the foreground. Two interconnected network designs are employed to achieve this goal: (1) an encoder-decoder network that produces alpha mattes and intermediate masks which are used to guide the transformer in adaptively decoding foregrounds and backgrounds, and (2) a transformer network in which long- and short-term attention combine to retain spatial and temporal contexts, facilitating the decoding of foreground details. We benchmark and study our methods on recently introduced datasets, showing that our model notably improves matting realism and temporal coherence in complex real-world videos and achieves new best-in-class generalizability. Further details and examples are available at https://github.com/microsoft/AdaM.
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
Authors: Moayed Haji Ali, Andrew Bond, Tolga Birdal, Duygu Ceylan, Levent Karacan, Erkut Erdem, Aykut Erdem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06020
Pdf link: https://arxiv.org/pdf/2304.06020
Abstract We propose $\textbf{VidStyleODE}$, a spatiotemporally continuous disentangled $\textbf{Vid}$eo representation based upon $\textbf{Style}$GAN and Neural-$\textbf{ODE}$s. Effective traversal of the latent space learned by Generative Adversarial Networks (GANs) has been the basis for recent breakthroughs in image editing. However, the applicability of such advancements to the video domain has been hindered by the difficulty of representing and controlling videos in the latent space of GANs. In particular, videos are composed of content (i.e., appearance) and complex motion components that require a special mechanism to disentangle and control. To achieve this, VidStyleODE encodes the video content in a pre-trained StyleGAN $\mathcal{W}_+$ space and benefits from a latent ODE component to summarize the spatiotemporal dynamics of the input video. Our novel continuous video generation process then combines the two to generate high-quality and temporally consistent videos with varying frame rates. We show that our proposed method enables a variety of applications on real videos: text-guided appearance manipulation, motion manipulation, image animation, and video interpolation and extrapolation. Project website: https://cyberiada.github.io/VidStyleODE

A-suozhang / GetArxivDaily

New submissions for Thu, 13 Apr 23 #31

Keyword: efficient

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

Probabilistic Reasoning at Scale: Trigger Graphs to the Rescue

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

CamDiff: Camouflage Image Augmentation via Diffusion Model

Contingency Games for Multi-Agent Interaction

Communication Efficient DNN Partitioning-based Federated Learning

Revisiting Single-gated Mixtures of Experts

GraphGANFed: A Federated Generative Framework for Graph-Structured Molecules Towards Efficient Drug Discovery

L3MVN: Leveraging Large Language Models for Visual Target Navigation

Frontier Semantic Exploration for Visual Target Navigation

Training Large Language Models Efficiently with Sparsity and Dataflow

State estimation of a carbon capture process through POD model reduction and neural network approximation

MoMo: A shared encoder Model for text, image and multi-Modal representations

Understanding Causality with Large Language Models: Feasibility and Opportunities

Encrypted Price-based Market Mechanism for Optimal Load Frequency Control

Group projected Subspace Pursuit for Identification of variable coefficient differential equations (GP-IDENT)

MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers

A Predictive Model using Machine Learning Algorithm in Identifying Students Probability on Passing Semestral Course

Distributed Compressed Sparse Row Format for Spiking Neural Network Simulation, Serialization, and Interoperability

Zero-Knowledge Proof-based Practical Federated Learning on Blockchain

Vehicle Trajectory Prediction based Predictive Collision Risk Assessment for Autonomous Driving in Highway Scenarios

NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake Estimation

Constructing Deep Spiking Neural Networks from Artificial Neural Networks with Knowledge Distillation

DOSM: Demand-Prediction based Online Service Management for Vehicular Edge Computing Networks

An Optimal SVC Bitstream Schema for Viewport-dependent 360-degree Video Streaming

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

A parallel rank-adaptive integrator for dynamical low-rank approximation

SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks

Rail Detection: An Efficient Row-based Network and A New Benchmark

Real-time Trajectory-based Social Group Detection

Fully Conservative Difference Schemes for the Rotation-Two-Component Camassa-Holm System with Smooth/Nonsmooth Initial Data

Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement Primitives

Stochastic Domain Decomposition Based on Variable-Separation Method

Dynamic Graph Representation Learning with Neural Networks: A Survey

A Novel Hybrid Post-Weighting Digital Predistortion in mMIMO Under Crosstalk

Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series

EgoDist: Comparing networks via distributions of egonet features

DUFormer: A Novel Architecture for Power Line Segmentation of Aerial Images

Data-Driven Response Regime Exploration and Identification for Dynamical Systems

FedTrip: A Resource-Efficient Federated Learning Method with Triplet Regularization

RESET: Revisiting Trajectory Sets for Conditional Behavior Prediction

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

Node-Differentially Private Estimation of the Number of Connected Components

Localizing Model Behavior with Path Patching

HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting

GPr-Net: Geometric Prototypical Network for Point Cloud Few-Shot Learning

An Improved Heart Disease Prediction Using Stacked Ensemble Method

RECLIP: Resource-efficient CLIP by Training with Small Images

Keyword: faster

Efficient Automation of Neural Network Design: A Survey on Differentiable Neural Architecture Search

Probabilistic Reasoning at Scale: Trigger Graphs to the Rescue

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box

Zoom is what you need: An empirical study of the power of zoom and spatial biases in image classification

An Optimal SVC Bitstream Schema for Viewport-dependent 360-degree Video Streaming

Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation

Real-time Trajectory-based Social Group Detection

Cost-damage analysis of attack trees

Keyword: mobile

DOSM: Demand-Prediction based Online Service Management for Vehicular Edge Computing Networks

5Greplay: a 5G Network Traffic Fuzzer -- Application to Attack Injection

Stand-Up Indulgent Gathering on Lines

Fast vehicle detection algorithm based on lightweight YOLO7-tiny

Keyword: pruning

Distilling Token-Pruned Pose Transformer for 2D Human Pose Estimation

Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series

Keyword: voxel

Keyword: lidar

SceneCalib: Automatic Targetless Calibration of Cameras and Lidars in Autonomous Driving

WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language

Keyword: diffusion

CamDiff: Camouflage Image Augmentation via Diffusion Model

Improving Diffusion Models for Scene Text Editing with Dual Encoders

InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions

Exploring Diffusion Models for Unsupervised Video Anomaly Detection

A quadrature scheme for steady-state diffusion equations involving fractional power of regularly accretive operator

Cancer-Net BCa-S: Breast Cancer Grade Prediction using Volumetric Deep Radiomic Features from Synthetic Correlated Diffusion Imaging

Diffusion models with location-scale noise