New submissions for Tue, 2 May 23

Keyword: efficient

Fair Distribution of Delivery Orders

Authors: Hadi Hosseini, Shivika Narang, Tomasz Wąs
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2305.00040
Pdf link: https://arxiv.org/pdf/2305.00040
Abstract We initiate the study of fair distribution of delivery tasks among a set of agents wherein delivery jobs are placed along the vertices of a graph. Our goal is to fairly distribute delivery costs (modeled as a submodular function) among a fixed set of agents while satisfying some desirable notions of economic efficiency. We adopt well-established fairness concepts$\unicode{x2014}$such as envy-freeness up to one item (EF1) and minimax share (MMS)$\unicode{x2014}$to our setting and show that fairness is often incompatible with the efficiency notion of social optimality. Yet, we characterize instances that admit fair and socially optimal solutions by exploiting graph structures. We further show that achieving fairness along with Pareto optimality is computationally intractable. Nonetheless, we design an XP algorithm (parameterized by the number of agents) for finding MMS and Pareto optimal solutions on every instance, and show that the same algorithm can be modified to find efficient solutions along with EF1, when such solutions exist. We complement our theoretical results by experimentally analyzing the price of fairness on randomly generated graph structures.
Click-Feedback Retrieval
Authors: Zeyu Wang, Yu Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00052
Pdf link: https://arxiv.org/pdf/2305.00052
Abstract Retrieving target information based on input query is of fundamental importance in many real-world applications. In practice, it is not uncommon for the initial search to fail, where additional feedback information is needed to guide the searching process. In this work, we study a setting where the feedback is provided through users clicking liked and disliked searching results. We believe this form of feedback is of great practical interests for its convenience and efficiency. To facilitate future work in this direction, we construct a new benchmark termed click-feedback retrieval based on a large-scale dataset in fashion domain. We demonstrate that incorporating click-feedback can drastically improve the retrieval performance, which validates the value of the proposed setting. We also introduce several methods to utilize click-feedback during training, and show that click-feedback-guided training can significantly enhance the retrieval quality. We hope further exploration in this direction can bring new insights on building more efficient and user-friendly search engines.
The Kolmogorov N-width for linear transport: Exact representation and the influence of the data
Authors: Florian Arbes, Constantin Greif, Karsten Urban
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.00066
Pdf link: https://arxiv.org/pdf/2305.00066
Abstract The Kolmogorov $N$-width describes the best possible error one can achieve by elements of an $N$-dimensional linear space. Its decay has extensively been studied in Approximation Theory and for the solution of Partial Differential Equations (PDEs). Particular interest has occurred within Model Order Reduction (MOR) of parameterized PDEs e.g.\ by the Reduced Basis Method (RBM). While it is known that the $N$-width decays exponentially fast (and thus admits efficient MOR) for certain problems, there are examples of the linear transport and the wave equation, where the decay rate deteriorates to $N^{-1/2}$. On the other hand, it is widely accepted that a smooth parameter dependence admits a fast decay of the $N$-width. However, a detailed analysis of the influence of properties of the data (such as regularity or slope) on the rate of the $N$-width seems to lack. In this paper, we use techniques from Fourier Analysis to derive exact representations of the $N$-width in terms of initial and boundary conditions of the linear transport equation modeled by some function $g$ for half-wave symmetric data. For arbitrary functions $g$, we derive bounds and prove that these bounds are sharp. In particular, we prove that the $N$-width decays as $c_r N^{-(r+1/2)}$ for functions in the Sobolev space, $g\in H^r$. Our theoretical investigations are complemented by numerical experiments which confirm the sharpness of our bounds and give additional quantitative insight.
CarGameAR: An Integrated AR Car Game Authoring Interface for Custom-Built Car Programed on Arduino Board
Authors: Dang Bui, Wanwan Li, Hong Huang
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2305.00084
Pdf link: https://arxiv.org/pdf/2305.00084
Abstract In this paper, we present CarGameAR: An Integrated AR Car Game Authoring Interface for Custom-Built Car Programed on Arduino Board. The car consists of an Arduino board, an H-bridge, and motors. The objective of the project is to create a system that can move a car in different directions using a computer application. The system uses Unity software to create a virtual environment where the user can control the car using keyboard commands. The car's motion is achieved by sending signals from the computer to the Arduino board, which then drives the motors through the H-bridge. The project provides a cost-effective and efficient way to build a car, which can be used for educational purposes, such as teaching programming. Moreover, this project is not limited to the control of the car through keyboard commands in a virtual environment. The system can be adapted to support augmented reality (AR) technology, providing an even more immersive and engaging user experience. By integrating the car with AR, the user can control the car's motion using physical gestures and movements, adding an extra layer of interactivity to the system. This makes the car an ideal platform for game development in AR, allowing the user to create driving games that blend the physical and virtual worlds seamlessly. Additionally, the car's affordability and ease of construction make it an accessible and valuable tool for teaching programming and principles in a fun and interactive way. Overall, this project demonstrates the versatility and potential of the car system, highlighting the various applications and possibilities it offers for both education and entertainment.
Space reduction techniques for the $3$-wise Kemeny problem
Authors: Xuan Kien Phung, Sylvie Hamel
Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00140
Pdf link: https://arxiv.org/pdf/2305.00140
Abstract Kemeny's rule is one of the most studied and well-known voting schemes with various important applications in computational social choice and biology. Recently, Kemeny's rule was generalized via a set-wise approach by Gilbert et. al. Following this paradigm, we have shown in \cite{Phung-Hamel-2023} that the $3$-wise Kemeny voting scheme induced by the $3$-wise Kendall-tau distance presents interesting advantages in comparison with the classical Kemeny rule. While the $3$-wise Kemeny problem, which consists of computing the set of $3$-wise consensus rankings of a voting profile, is NP-hard, we establish in this paper several generalizations of the Major Order Theorems, as obtained in \cite{Milosz-Hamel-2020} for the classical Kemeny rule, for the $3$-wise Kemeny voting scheme to achieve a substantial search space reduction by efficiently determining in polynomial time the relative orders of pairs of alternatives. Essentially, our theorems quantify precisely the non-trivial property that if the preference for an alternative over another one in an election is strong enough, not only in the head-to-head competition but even when taking into consideration one or two more alternatives, then the relative order of these two alternatives in every $3$-wise consensus ranking must be as expected. Moreover, we show that the well-known $3/4$-majority rule of Betzler et al. for the classical Kemeny rule is only valid for elections with no more than $5$ alternatives with respect to the $3$-wise Kemeny scheme. Examples are also provided to show that the $3$-wise Kemeny rule is more resistant to manipulation than the classical one.
Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances
Authors: Bin Du, Kun Qian, Christian Claudel, Dengfeng Sun
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2305.00154
Pdf link: https://arxiv.org/pdf/2305.00154
Abstract This paper proposes to leverage the emerging~learning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilized~to aid the computation-efficient cooperation among~multiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the two~types of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.
Beyond Prediction: On-street Parking Recommendation using Heterogeneous Graph-based List-wise Ranking
Authors: Hanyu Sun, Xiao Huang, Wei Ma
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00162
Pdf link: https://arxiv.org/pdf/2305.00162
Abstract To provide real-time parking information, existing studies focus on predicting parking availability, which seems an indirect approach to saving drivers' cruising time. In this paper, we first time propose an on-street parking recommendation (OPR) task to directly recommend a parking space for a driver. To this end, a learn-to-rank (LTR) based OPR model called OPR-LTR is built. Specifically, parking recommendation is closely related to the "turnover events" (state switching between occupied and vacant) of each parking space, and hence we design a highly efficient heterogeneous graph called ESGraph to represent historical and real-time meters' turnover events as well as geographical relations; afterward, a convolution-based event-then-graph network is used to aggregate and update representations of the heterogeneous graph. A ranking model is further utilized to learn a score function that helps recommend a list of ranked parking spots for a specific on-street parking query. The method is verified using the on-street parking meter data in Hong Kong and San Francisco. By comparing with the other two types of methods: prediction-only and prediction-then-recommendation, the proposed direct-recommendation method achieves satisfactory performance in different metrics. Extensive experiments also demonstrate that the proposed ESGraph and the recommendation model are more efficient in terms of computational efficiency as well as saving drivers' on-street parking time.
Asynchronous Distributed Protocol for Service Provisioning in the Edge-Cloud Continuum
Authors: Itamar Cohen, Paolo Giaccone, Carla Fabiana Chiasserini
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2305.00184
Pdf link: https://arxiv.org/pdf/2305.00184
Abstract In the edge-cloud continuum, datacenters provide microservices (MSs) to mobile users, with each MS having specific latency constraints and computational requirements. Deploying such a variety of MSs matching their requirements with the available computing resources is challenging. In addition, time-critical MSs may have to be migrated as the users move, to keep meeting their latency constraints. Unlike previous work relying on a central orchestrator with an always-updated global view of the available resources and of the users' locations, this work envisions a distributed solution to the above issues. In particular, we propose a distributed asynchronous protocol for MS deployment in the cloud-edge continuum that (i) dramatically reduces the system overhead compared to a centralized approach, and (ii) increases the system stability by avoiding having a single point of failure as in the case of a central orchestrator. Our solution ensures cost-efficient feasible placement of MSs, while using negligible bandwidth.
Distributed State Estimation for Linear Time-Varying Systems with Sensor Network Delays
Authors: Sanjay Chandrasekaran, Vishnu Varadan, Siva Vignesh Krishnan, Florian Dörfler, Mohammad H. Mamduhi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00190
Pdf link: https://arxiv.org/pdf/2305.00190
Abstract Distributed sensor networks often include a multitude of sensors, each measuring parts of a process state space or observing the operations of a system. Communication of measurements between the sensor nodes and estimator(s) cannot realistically be considered delay-free due to communication errors and transmission latency in the channels. We propose a novel stability-based method that mitigates the influence of sensor network delays in distributed state estimation for linear time-varying systems. Our proposed algorithm efficiently selects a subset of sensors from the entire sensor nodes in the network based on the desired stability margins of the distributed Kalman filter estimates, after which, the state estimates are computed only using the measurements of the selected sensors. We provide comparisons between the estimation performance of our proposed algorithm and a greedy algorithm that exhaustively selects an optimal subset of nodes. We then apply our method to a simulative scenario for estimating the states of a linear time-varying system using a sensor network including 2000 sensor nodes. Simulation results demonstrate the performance efficiency of our algorithm and show that it closely follows the achieved performance by the optimal greedy search algorithm.
Data-Driven Subgroup Identification for Linear Regression
Authors: Zachary Izzo, Ruishan Liu, James Zou
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.00195
Pdf link: https://arxiv.org/pdf/2305.00195
Abstract Medical studies frequently require to extract the relationship between each covariate and the outcome with statistical confidence measures. To do this, simple parametric models are frequently used (e.g. coefficients of linear regression) but usually fitted on the whole dataset. However, it is common that the covariates may not have a uniform effect over the whole population and thus a unified simple model can miss the heterogeneous signal. For example, a linear model may be able to explain a subset of the data but fail on the rest due to the nonlinearity and heterogeneity in the data. In this paper, we propose DDGroup (data-driven group discovery), a data-driven method to effectively identify subgroups in the data with a uniform linear relationship between the features and the label. DDGroup outputs an interpretable region in which the linear model is expected to hold. It is simple to implement and computationally tractable for use. We show theoretically that, given a large enough sample, DDGroup recovers a region where a single linear model with low variance is well-specified (if one exists), and experiments on real-world medical datasets confirm that it can discover regions where a local linear model has improved performance. Our experiments also show that DDGroup can uncover subgroups with qualitatively different relationships which are missed by simply applying parametric approaches to the whole dataset.
Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming
Authors: Vignesh V Menon, Jingwen Zhu, Prajit T Rajendran, Hadi Amirpour, Patrick Le Callet, Christian Timmerer
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2305.00225
Pdf link: https://arxiv.org/pdf/2305.00225
Abstract In video streaming applications, a fixed set of bitrate-resolution pairs (known as a bitrate ladder) is typically used during the entire streaming session. However, an optimized bitrate ladder per scene may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience. This paper introduces a Just Noticeable Difference (JND)-aware per-scene bitrate ladder prediction scheme (JASLA) for adaptive video-on-demand streaming applications. JASLA predicts jointly optimized resolutions and corresponding constant rate factors (CRFs) using spatial and temporal complexity features for a given set of target bitrates for every scene, which yields an efficient constrained Variable Bitrate encoding. Moreover, bitrate-resolution pairs that yield distortion lower than one JND are eliminated. Experimental results show that, on average, JASLA yields bitrate savings of 34.42% and 42.67% to maintain the same PSNR and VMAF, respectively, compared to the reference HTTP Live Streaming (HLS) bitrate ladder Constant Bitrate encoding using x265 HEVC encoder, where the maximum resolution of streaming is Full HD (1080p). Moreover, a 54.34% average cumulative decrease in storage space is observed.
ZIRCON: Zero-watermarking-based approach for data integrity and secure provenance in IoT networks
Authors: Omair Faraj, David Megías, Joaquin Garcia-Alfaro
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00266
Pdf link: https://arxiv.org/pdf/2305.00266
Abstract The Internet of Things (IoT) is integrating the Internet and smart devices in almost every domain such as home automation, e-healthcare systems, vehicular networks, industrial control and military applications. In these sectors, sensory data, which is collected from multiple sources and managed through intermediate processing by multiple nodes, is used for decision-making processes. Ensuring data integrity and keeping track of data provenance is a core requirement in such a highly dynamic context, since data provenance is an important tool for the assurance of data trustworthiness. Dealing with such requirements is challenging due to the limited computational and energy resources in IoT networks. This requires addressing several challenges such as processing overhead, secure provenance, bandwidth consumption and storage efficiency. In this paper, we propose ZIRCON, a novel zero-watermarking approach to establish end-to-end data trustworthiness in an IoT network. In ZIRCON, provenance information is stored in a tamper-proof centralized network database through watermarks, generated at source node before transmission. We provide an extensive security analysis showing the resilience of our scheme against passive and active attacks. We also compare our scheme with existing works based on performance metrics such as computational time, energy utilization and cost analysis. The results show that ZIRCON is robust against several attacks, lightweight, storage efficient, and better in energy utilization and bandwidth consumption, compared to prior art.
Path Planning for Multiple Tethered Robots Using Topological Braids
Authors: Muqing Cao, Kun Cao, Shenghai Yuan, Kangcheng Liu, Yan Loi Wong, Lihua Xie
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00271
Pdf link: https://arxiv.org/pdf/2305.00271
Abstract Path planning for multiple tethered robots is a challenging problem due to the complex interactions among the cables and the possibility of severe entanglements. Previous works on this problem either consider idealistic cable models or provide no guarantee for entanglement-free paths. In this work, we present a new approach to address this problem using the theory of braids. By establishing a topological equivalence between the physical cables and the space-time trajectories of the robots, and identifying particular braid patterns that emerge from the entangled trajectories, we obtain the key finding that all complex entanglements stem from a finite number of interaction patterns between 2 or 3 robots. Hence, non-entanglement can be guaranteed by avoiding these interaction patterns in the trajectories of the robots. Based on this finding, we present a graph search algorithm using the permutation grid to efficiently search for a feasible topology of paths and reject braid patterns that result in an entanglement. We demonstrate that the proposed algorithm can achieve 100% goal-reaching capability without entanglement for up to 10 drones with a slack cable model in a high-fidelity simulation platform. The practicality of the proposed approach is verified using three small tethered UAVs in indoor flight experiments.
A spectral method for a Fokker-Planck equation in neuroscience with applications in neural networks with learning rules
Authors: Pei Zhang, Yanli Wang, Zhennan Zhou
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.00275
Pdf link: https://arxiv.org/pdf/2305.00275
Abstract In this work, we consider the Fokker-Planck equation of the Nonlinear Noisy Leaky Integrate-and-Fire (NNLIF) model for neuron networks. Due to the firing events of neurons at the microscopic level, this Fokker-Planck equation contains dynamic boundary conditions involving specific internal points. To efficiently solve this problem and explore the properties of the unknown, we construct a flexible numerical scheme for the Fokker-Planck equation in the framework of spectral methods that can accurately handle the dynamic boundary condition. This numerical scheme is stable with suitable choices of test function spaces, and asymptotic preserving, and it is easily extendable to variant models with multiple time scales. We also present extensive numerical examples to verify the scheme properties, including order of convergence and time efficiency, and explore unique properties of the model, including blow-up phenomena for the NNLIF model and learning and discriminative properties for the NNLIF model with learning rules.
NSLF-OL: Online Learning of Neural Surface Light Fields alongside Real-time Incremental 3D Reconstruction
Authors: Yijun Yuan, Andreas Nuchter
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00282
Pdf link: https://arxiv.org/pdf/2305.00282
Abstract Immersive novel view generation is an important technology in the field of graphics and has recently also received attention for operator-based human-robot interaction. However, the involved training is time-consuming, and thus the current test scope is majorly on object capturing. This limits the usage of related models in the robotics community for 3D reconstruction since robots (1) usually only capture a very small range of view directions to surfaces that cause arbitrary predictions on unseen, novel direction, (2) requires real-time algorithms, and (3) work with growing scenes, e.g., in robotic exploration. The paper proposes a novel Neural Surface Light Fields model that copes with the small range of view directions while producing a good result in unseen directions. Exploiting recent encoding techniques, the training of our model is highly efficient. In addition, we design Multiple Asynchronous Neural Agents (MANA), a universal framework to learn each small region in parallel for large-scale growing scenes. Our model learns online the Neural Surface Light Fields (NSLF) aside from real-time 3D reconstruction with a sequential data stream as the shared input. In addition to online training, our model also provides real-time rendering after completing the data stream for visualization. We implement experiments using well-known RGBD indoor datasets, showing the high flexibility to embed our model into real-time 3D reconstruction and demonstrating high-fidelity view synthesis for these scenes. The code is available on github.
Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning
Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Hang Su, Chenguang Yang, Kai Huang, Alois Knoll
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00286
Pdf link: https://arxiv.org/pdf/2305.00286
Abstract Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions.
An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds
Authors: Zheng Liu, Fu Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00287
Pdf link: https://arxiv.org/pdf/2305.00287
Abstract Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.
Patent Mining by Extracting Functional Analysis Information Modelled As Graph Structure: A Patent Knowledge-base Collaborative Building Approach
Authors: Manal E. Helal
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2305.00309
Pdf link: https://arxiv.org/pdf/2305.00309
Abstract Patents provide a rich source of information about design innovations. Patent mining techniques employ various technologies, such as text mining, machine learning, natural language processing, and ontology-building techniques. An automated graph data modelling method is proposed for extracting functional representations for building a semantic database of patents of mechanical designs. The method has several benefits: The schema-free characteristic of the proposed graph modelling enables the ontology it is based on to evolve and generalise to upper ontologies across technology domains and to specify lower ontologies to more specific domains. Graph modelling benefits from enhanced performance of deep queries across many levels of relationships and interactions and provides efficient storage. Graph modelling also enables visualisation libraries to use the graph data structure immediately, avoiding the need for graph extraction programs from relational databases. Patent/Design comparisons are computed by search queries using counting of overlaps of different levels and weights. This work has produced the PatMine SolidWorks Add-in \c{opyright}, which compares annotated CAD designs with patents and highlights overlapping design concepts. The patent annotation extracts its functional analysis, representing its structure as geometric feature interactions. Additional features such as full-text search and semantic search of the PatMine patents database are available, and graph analytic methods and machine learning algorithms are enabled and can be implemented as plug-ins in future work. Keywords: Patent Mining; Semantic Analysis; Functional Analysis Diagrams; Graph Data Modelling; Visualisation; Similarity Scoring; Big Data Analytics; Machine Learning; Artificial Intelligence; Natural Language Processing
Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning
Authors: Yan Kang, Hanlin Gu, Xingxing Tang, Yuanqin He, Yuzhu Zhang, Jinnan He, Yuxing Han, Lixin Fan, Qiang Yang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00312
Pdf link: https://arxiv.org/pdf/2305.00312
Abstract Conventionally, federated learning aims to optimize a single objective, typically the utility. However, for a federated learning system to be trustworthy, it needs to simultaneously satisfy multiple/many objectives, such as maximizing model performance, minimizing privacy leakage and training cost, and being robust to malicious attacks. Multi-Objective Optimization (MOO) aiming to optimize multiple conflicting objectives at the same time is quite suitable for solving the optimization problem of Trustworthy Federated Learning (TFL). In this paper, we unify MOO and TFL by formulating the problem of constrained multi-objective federated learning (CMOFL). Under this formulation, existing MOO algorithms can be adapted to TFL straightforwardly. Different from existing CMOFL works focusing on utility, efficiency, fairness, and robustness, we consider optimizing privacy leakage along with utility loss and training cost, the three primary objectives of a TFL system. We develop two improved CMOFL algorithms based on NSGA-II and PSL, respectively, for effectively and efficiently finding Pareto optimal solutions, and we provide theoretical analysis on their convergence. We design specific measurements of privacy leakage, utility loss, and training cost for three privacy protection mechanisms: Randomization, BatchCrypt (An efficient version of homomorphic encryption), and Sparsification. Empirical experiments conducted under each of the three protection mechanisms demonstrate the effectiveness of our proposed algorithms.
Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data
Authors: Arthur Josi, Mahdi Alehdaghi, Rafael M. O. Cruz, Eric Granger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00320
Pdf link: https://arxiv.org/pdf/2305.00320
Abstract Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras. The task is challenging due to the significant differences between V and I modalities, especially under real-world conditions, where images are corrupted by, e.g, blur, noise, and weather. Indeed, state-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy. In this paper, we propose an efficient model for multimodal V-I ReID -- named Multimodal Middle Stream Fusion (MMSF) -- that preserves modality-specific knowledge for improved robustness to corrupted multimodal images. In addition, three state-of-art attention-based multimodal fusion models are adapted to address corrupted multimodal data in V-I ReID, allowing to dynamically balance each modality importance. Recently, evaluation protocols have been proposed to assess the robustness of ReID models under challenging real-world scenarios. However, these protocols are limited to unimodal V settings. For realistic evaluation of multimodal (and cross-modal) V-I person ReID models, we propose new challenging corrupted datasets for scenarios where V and I cameras are co-located (CL) and not co-located (NCL). Finally, the benefits of our Masking and Local Multimodal Data Augmentation (ML-MDA) strategy are explored to improve the robustness of ReID models to multimodal corruption. Our experiments on clean and corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD datasets indicate the multimodal V-I ReID models that are more likely to perform well in real-world operational conditions. In particular, our ML-MDA is an important strategy for a V-I person ReID system to sustain high accuracy and robustness when processing corrupted multimodal images. Also, our multimodal ReID model MMSF outperforms every method under CL and NCL camera scenarios.
Leveraging Data Mining Algorithms to Recommend Source Code Changes
Authors: AmirHossein Naghshzan, Saeed Khalilazar, Pierre Poilane, Olga Baysal, Latifa Guerrouj, Foutse Khomh
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00323
Pdf link: https://arxiv.org/pdf/2305.00323
Abstract Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and--or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.
MinMaxLTTB: Leveraging MinMax-Preselection to Scale LTTB
Authors: Jeroen Van Der Donckt, Jonas Van Der Donckt, Michael Rademaker, Sofie Van Hoecke
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2305.00332
Pdf link: https://arxiv.org/pdf/2305.00332
Abstract Visualization plays an important role in analyzing and exploring time series data. To facilitate efficient visualization of large datasets, downsampling has emerged as a well-established approach. This work concentrates on LTTB (Largest-Triangle-Three-Buckets), a widely adopted downsampling algorithm for time series data point selection. Specifically, we propose MinMaxLTTB, a two-step algorithm that marks a significant enhancement in the scalability of LTTB. MinMaxLTTB entails the following two steps: (i) the MinMax algorithm preselects a certain ratio of minimum and maximum data points, followed by (ii) applying the LTTB algorithm on only these preselected data points, effectively reducing LTTB's time complexity. The low computational cost of the MinMax algorithm, along with its parallelization capabilities, facilitates efficient preselection of data points. Additionally, the competitive performance of MinMax in terms of visual representativeness also makes it an effective reduction method. Experiments show that MinMaxLTTB outperforms LTTB by more than an order of magnitude in terms of computation time. Furthermore, preselecting a small multiple of the desired output size already provides similar visual representativeness compared to LTTB. In summary, MinMaxLTTB leverages the computational efficiency of MinMax to scale LTTB, without compromising on LTTB's favored visualization properties. The accompanying code and experiments of this paper can be found at https://github.com/predict-idlab/MinMaxLTTB.
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer
Authors: Yifang Xu, Yunzhuo Sun, Yang Li, Yilei Shi, Xiaoxiang Zhu, Sidan Du
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00355
Pdf link: https://arxiv.org/pdf/2305.00355
Abstract With the increasing demand for video understanding, video moment and highlight detection (MHD) has emerged as a critical research topic. MHD aims to localize all moments and predict clip-wise saliency scores simultaneously. Despite progress made by existing DETR-based methods, we observe that these methods coarsely fuse features from different modalities, which weakens the temporal intra-modal context and results in insufficient cross-modal interaction. To address this issue, we propose MH-DETR (Moment and Highlight Detection Transformer) tailored for MHD. Specifically, we introduce a simple yet efficient pooling operator within the uni-modal encoder to capture global intra-modal context. Moreover, to obtain temporally aligned cross-modal features, we design a plug-and-play cross-modal interaction module between the encoder and decoder, seamlessly integrating visual and textual features. Comprehensive experiments on QVHighlights, Charades-STA, Activity-Net, and TVSum datasets show that MH-DETR outperforms existing state-of-the-art methods, demonstrating its effectiveness and superiority. Our code is available at https://github.com/YoucanBaby/MH-DETR.
Electricity Price Prediction for Energy Storage System Arbitrage: A Decision-focused Approach
Authors: Linwei Sang, Yinliang Xu, Huan Long, Qinran Hu, Hongbin Sun
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00362
Pdf link: https://arxiv.org/pdf/2305.00362
Abstract Electricity price prediction plays a vital role in energy storage system (ESS) management. Current prediction models focus on reducing prediction errors but overlook their impact on downstream decision-making. So this paper proposes a decision-focused electricity price prediction approach for ESS arbitrage to bridge the gap from the downstream optimization model to the prediction model. The decision-focused approach aims at utilizing the downstream arbitrage model for training prediction models. It measures the difference between actual decisions under the predicted price and oracle decisions under the true price, i.e., decision error, by regret, transforms it into the tractable surrogate regret, and then derives the gradients to predicted price for training prediction models. Based on the prediction and decision errors, this paper proposes the hybrid loss and corresponding stochastic gradient descent learning method to learn prediction models for prediction and decision accuracy. The case study verifies that the proposed approach can efficiently bring more economic benefits and reduce decision errors by flattening the time distribution of prediction errors, compared to prediction models for only minimizing prediction errors.
Edge Learning for Large-Scale Internet of Things With Task-Oriented Efficient Communication
Authors: Haihui Xie, Minghua Xia, Peiran Wu, Shuai Wang, H. Vincent Poor
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.00383
Pdf link: https://arxiv.org/pdf/2305.00383
Abstract In the Internet of Things (IoT) networks, edge learning for data-driven tasks provides intelligent applications and services. As the network size becomes large, different users may generate distinct datasets. Thus, to suit multiple edge learning tasks for large-scale IoT networks, this paper performs efficient communication under the task-oriented principle by using the collaborative design of wireless resource allocation and edge learning error prediction. In particular, we start with multi-user scheduling to alleviate co-channel interference in dense networks. Then, we perform optimal power allocation in parallel for different learning tasks. Thanks to the high parallelization of the designed algorithm, extensive experimental results corroborate that the multi-user scheduling and task-oriented power allocation improve the performance of distinct edge learning tasks efficiently compared with the state-of-the-art benchmark algorithms.
Alternately denoising and reconstructing unoriented point sets
Authors: Dong Xiao, Zuoqiang Shi, Bin Wang
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2305.00391
Pdf link: https://arxiv.org/pdf/2305.00391
Abstract We propose a new strategy to bridge point cloud denoising and surface reconstruction by alternately updating the denoised point clouds and the reconstructed surfaces. In Poisson surface reconstruction, the implicit function is generated by a set of smooth basis functions centered at the octnodes. When the octree depth is properly selected, the reconstructed surface is a good smooth approximation of the noisy point set. Our method projects the noisy points onto the surface and alternately reconstructs and projects the point set. We use the iterative Poisson surface reconstruction (iPSR) to support unoriented surface reconstruction. Our method iteratively performs iPSR and acts as an outer loop of iPSR. Considering that the octree depth significantly affects the reconstruction results, we propose an adaptive depth selection strategy to ensure an appropriate depth choice. To manage the oversmoothing phenomenon near the sharp features, we propose a $\lambda$-projection method, which means to project the noisy points onto the surface with an individual control coefficient $\lambda_{i}$ for each point. The coefficients are determined through a Voronoi-based feature detection method. Experimental results show that our method achieves high performance in point cloud denoising and unoriented surface reconstruction within different noise scales, and exhibits well-rounded performance in various types of inputs.
Transformer-based Sequence Labeling for Audio Classification based on MFCCs
Authors: C. S. Sonali, Chinmayi B S, Ahana Balasubramanian
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2305.00417
Pdf link: https://arxiv.org/pdf/2305.00417
Abstract Audio classification is vital in areas such as speech and music recognition. Feature extraction from the audio signal, such as Mel-Spectrograms and MFCCs, is a critical step in audio classification. These features are transformed into spectrograms for classification. Researchers have explored various techniques, including traditional machine and deep learning methods to classify spectrograms, but these can be computationally expensive. To simplify this process, a more straightforward approach inspired by sequence classification in NLP can be used. This paper proposes a Transformer-encoder-based model for audio classification using MFCCs. The model was benchmarked against the ESC-50, Speech Commands v0.02 and UrbanSound8k datasets and has shown strong performance, with the highest accuracy of 95.2% obtained upon training the model on the UrbanSound8k dataset. The model consisted of a mere 127,544 total parameters, making it light-weight yet highly efficient at the audio classification task.
Ortho-Radial Drawing in Near-Linear Time
Authors: Yi-Jun Chang
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.00425
Pdf link: https://arxiv.org/pdf/2305.00425
Abstract An orthogonal drawing is an embedding of a plane graph into a grid. In a seminal work of Tamassia (SIAM Journal on Computing 1987), a simple combinatorial characterization of angle assignments that can be realized as bend-free orthogonal drawings was established, thereby allowing an orthogonal drawing to be described combinatorially by listing the angles of all corners. The characterization reduces the need to consider certain geometric aspects, such as edge lengths and vertex coordinates, and simplifies the task of graph drawing algorithm design. Barth, Niedermann, Rutter, and Wolf (SoCG 2017) established an analogous combinatorial characterization for ortho-radial drawings, which are a generalization of orthogonal drawings to cylindrical grids. The proof of the characterization is existential and does not result in an efficient algorithm. Niedermann, Rutter, and Wolf (SoCG 2019) later addressed this issue by developing quadratic-time algorithms for both testing the realizability of a given angle assignment as an ortho-radial drawing without bends and constructing such a drawing. In this paper, we further improve the time complexity of these tasks to near-linear time. We establish a new characterization for ortho-radial drawings based on the concept of a good sequence. Using the new characterization, we design a simple greedy algorithm for constructing ortho-radial drawings.
STAR-RIS-Aided Mobile Edge Computing: Computation Rate Maximization with Binary Amplitude Coefficients
Authors: Zhenrong Liu, Zongze Li, Miaowen Wen, Yi Gong, Yik-Chung Wu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.00428
Pdf link: https://arxiv.org/pdf/2305.00428
Abstract In this paper, simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) is investigated in the multi-user mobile edge computing (MEC) system to improve the computation rate. Compared with traditional RIS-aided MEC, STAR-RIS extends the service coverage from half-space to full-space and provides new flexibility for improving the computation rate for end users. However, the STAR-RIS-aided MEC system design is a challenging problem due to the non-smooth and non-convex binary amplitude coefficients with coupled phase shifters. To fill this gap, this paper formulates a computation rate maximization problem via the joint design of the STAR-RIS phase shifts, reflection and transmission amplitude coefficients, the receive beamforming vectors, and energy partition strategies for local computing and offloading. To tackle the discontinuity caused by binary variables, we propose an efficient smoothing-based method to decrease convergence error, in contrast to the conventional penalty-based method, which brings many undesired stationary points and local optima. Furthermore, a fast iterative algorithm is proposed to obtain a stationary point for the joint optimization problem, with each subproblem solved by a low-complexity algorithm, making the proposed design scalable to a massive number of users and STAR-RIS elements. Simulation results validate the strength of the proposed smoothing-based method and show that the proposed fast iterative algorithm achieves a higher computation rate than the conventional method while saving the computation time by at least an order of magnitude. Moreover, the resultant STAR-RIS-aided MEC system significantly improves the computation rate compared to other baseline schemes with conventional reflect-only/transmit-only RIS.
TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
Authors: Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.00447
Pdf link: https://arxiv.org/pdf/2305.00447
Abstract Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains, thereby prompting researchers to explore their potential for use in recommendation systems. Initial attempts have leveraged the exceptional capabilities of LLMs, such as rich knowledge and strong generalization through In-context Learning, which involves phrasing the recommendation task as prompts. Nevertheless, the performance of LLMs in recommendation tasks remains suboptimal due to a substantial disparity between the training tasks for LLMs and recommendation tasks, as well as inadequate recommendation data during pre-training. To bridge the gap, we consider building a Large Recommendation Language Model by tunning LLMs with recommendation data. To this end, we propose an efficient and effective Tuning framework for Aligning LLMs with Recommendation, namely TALLRec. We have demonstrated that the proposed TALLRec framework can significantly enhance the recommendation capabilities of LLMs in the movie and book domains, even with a limited dataset of fewer than 100 samples. Additionally, the proposed framework is highly efficient and can be executed on a single RTX 3090 with LLaMA-7B. Furthermore, the fine-tuned LLM exhibits robust cross-domain generalization. Our code and data are available at https://github.com/SAI990323/TALLRec.
Hypergraphs with Edge-Dependent Vertex Weights: Spectral Clustering based on the 1-Laplacian
Authors: Yu Zhu, Boning Li, Santiago Segarra
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2305.00462
Pdf link: https://arxiv.org/pdf/2305.00462
Abstract We propose a flexible framework for defining the 1-Laplacian of a hypergraph that incorporates edge-dependent vertex weights. These weights are able to reflect varying importance of vertices within a hyperedge, thus conferring the hypergraph model higher expressivity than homogeneous hypergraphs. We then utilize the eigenvector associated with the second smallest eigenvalue of the hypergraph 1-Laplacian to cluster the vertices. From a theoretical standpoint based on an adequately defined normalized Cheeger cut, this procedure is expected to achieve higher clustering accuracy than that based on the traditional Laplacian. Indeed, we confirm that this is the case using real-world datasets to demonstrate the effectiveness of the proposed spectral clustering approach. Moreover, we show that for a special case within our framework, the corresponding hypergraph 1-Laplacian is equivalent to the 1-Laplacian of a related graph, whose eigenvectors can be computed more efficiently, facilitating the adoption on larger datasets.
Unified high-order multi-scale method for mechanical behavior simulation and strength prediction of composite plate and shell structures
Authors: Ge Bu-Feng, Gao Ming-Yuan, Dong Hao
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.00464
Pdf link: https://arxiv.org/pdf/2305.00464
Abstract The complicated mesoscopic configurations of composite plate and shell structures requires a huge amount of computational overhead for directly simulating their mechanical problems. In this paper, a unified high-order multi-scale method, which can effectively simulate the mechanical behavior and predict yield strength of composite plates and shells, is developed. Firstly, through the multiscale asymptotic analysis of multi-scale elastic equations in the orthogonal curvilinear coordinate system, a high-order multi-scale model is established, which can uniformly and effectively analyze the mechanical behavior of composite plate and shell structures. Moreover, the error estimation of the high-order multi-scale solutions is derived. Then, combining with the material strength theory, a high-order multi-scale model for the strength prediction of composite plate and shell structures is established. Next, based on the established high-order multi-scale model, a multi-scale algorithm is developed which can not only efficiently and accurately simulate the mechanical behaviors of composite plate and shell structures, but also predict their yield strength. Finally, the effectiveness of the established high-order multi-scale method is verified by extensive numerical experiments. The numerical experimental results indicate that the high-order multi-scale method can more accurately capture the meso-scale oscillatory behaviors of composite plate and shell structures. The unified high-order multi-scale method established in this paper is not only suitable for the prediction of mechanical properties of composite plate and shell structures, but also can be further extended to the prediction of multi-field coupling properties of composite plate and shell structures.
Efficient and accurate nonlinear model reduction via first-order empirical interpolation
Authors: Ngoc Cuong Nguyen, Jaime Peraire
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2305.00466
Pdf link: https://arxiv.org/pdf/2305.00466
Abstract We present a model reduction approach that extends the original empirical interpolation method to enable accurate and efficient reduced basis approximation of parametrized nonlinear partial differential equations (PDEs). In the presence of nonlinearity, the Galerkin reduced basis approximation remains computationally expensive due to the high complexity of evaluating the nonlinear terms, which depends on the dimension of the truth approximation. The empirical interpolation method (EIM) was proposed as a nonlinear model reduction technique to render the complexity of evaluating the nonlinear terms independent of the dimension of the truth approximation. The main idea is to replace any nonlinear term with a reduced basis expansion expressed as a linear combination of pre-computed basis functions and parameter-dependent coefficients. The coefficients are determined efficiently by an inexpensive and stable interpolation procedure. In order to improve the approximation accuracy, we propose a first-order empirical interpolation method (FOEIM) that employs both the nonlinear function and its partial derivatives at selected parameter points to construct the reduced basis expansion of the nonlinear term. Our approach is applied to nonlinear elliptic PDEs and compared to the Galerkin reduced basis approximation and the EIM. Numerical results are presented to demonstrate the performance of the three reduced basis approaches.
Posterior Sampling for Deep Reinforcement Learning
Authors: Remo Sasso, Michelangelo Conserva, Paulo Rauber
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00477
Pdf link: https://arxiv.org/pdf/2305.00477
Abstract Despite remarkable successes, deep reinforcement learning algorithms remain sample inefficient: they require an enormous amount of trial and error to find good policies. Model-based algorithms promise sample efficiency by building an environment model that can be used for planning. Posterior Sampling for Reinforcement Learning is such a model-based algorithm that has attracted significant interest due to its performance in the tabular setting. This paper introduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), the first truly scalable approximation of Posterior Sampling for Reinforcement Learning that retains its model-based essence. PSDRL combines efficient uncertainty quantification over latent state space models with a specially tailored continual planning algorithm based on value-function approximation. Extensive experiments on the Atari benchmark show that PSDRL significantly outperforms previous state-of-the-art attempts at scaling up posterior sampling while being competitive with a state-of-the-art (model-based) reinforcement learning method, both in sample efficiency and computational efficiency.
Learned Focused Plenoptic Image Compression with Microimage Preprocessing and Global Attention
Authors: Kedeng Tong, Xin Jin, Yuqing Yang, Chen Wang, Jinshi Kang, Fan Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2305.00489
Pdf link: https://arxiv.org/pdf/2305.00489
Abstract Focused plenoptic cameras can record spatial and angular information of the light field (LF) simultaneously with higher spatial resolution relative to traditional plenoptic cameras, which facilitate various applications in computer vision. However, the existing plenoptic image compression methods present ineffectiveness to the captured images due to the complex micro-textures generated by the microlens relay imaging and long-distance correlations among the microimages. In this paper, a lossy end-to-end learning architecture is proposed to compress the focused plenoptic images efficiently. First, a data preprocessing scheme is designed according to the imaging principle to remove the sub-aperture image ineffective pixels in the recorded light field and align the microimages to the rectangular grid. Then, the global attention module with large receptive field is proposed to capture the global correlation among the feature maps using pixel-wise vector attention computed in the resampling process. Also, a new image dataset consisting of 1910 focused plenoptic images with content and depth diversity is built to benefit training and testing. Extensive experimental evaluations demonstrate the effectiveness of the proposed approach. It outperforms intra coding of HEVC and VVC by an average of 62.57% and 51.67% bitrate reduction on the 20 preprocessed focused plenoptic images, respectively. Also, it achieves 18.73% bitrate saving and generates perceptually pleasant reconstructions compared to the state-of-the-art end-to-end image compression methods, which benefits the applications of focused plenoptic cameras greatly. The dataset and code are publicly available at https://github.com/VincentChandelier/GACN.
Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition
Authors: Pangoth Santhosh Kumar, Garika Akshay
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00552
Pdf link: https://arxiv.org/pdf/2305.00552
Abstract In low-resource computing contexts, such as smartphones and other tiny devices, Both deep learning and machine learning are being used in a lot of identification systems. as authentication techniques. The transparent, contactless, and non-invasive nature of these face recognition technologies driven by AI has led to their meteoric rise in popularity in recent years. While they are mostly successful, there are still methods to get inside without permission by utilising things like pictures, masks, glasses, etc. In this research, we present an alternate authentication process that makes use of both facial recognition and the individual's distinctive temporal facial feature motions while they speak a password. Because the suggested methodology allows for a password to be specified in any language, it is not limited by language. The suggested model attained an accuracy of 96.1% when tested on the industry-standard MIRACL-VC1 dataset, demonstrating its efficacy as a reliable and powerful solution. In addition to being data-efficient, the suggested technique shows promising outcomes with as little as 10 positive video examples for training the model. The effectiveness of the network's training is further proved via comparisons with other combined facial recognition and lip reading models.
Collective Relational Inference for learning physics-consistent heterogeneous particle interactions
Authors: Zhichao Han, Olga Fink, David S. Kammer
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00557
Pdf link: https://arxiv.org/pdf/2305.00557
Abstract Interacting particle systems are ubiquitous in nature and engineering. Revealing particle interaction laws is of fundamental importance but also particularly challenging due to underlying configurational complexities. Recently developed machine learning methods show great potential in discovering pairwise interactions from particle trajectories in homogeneous systems. However, they fail to reveal interactions in heterogeneous systems that are prevalent in reality, where multiple interaction types coexist simultaneously and relational inference is required. Here, we propose a novel probabilistic method for relational inference, which possesses two distinctive characteristics compared to existing methods. First, it infers the interaction types of different edges collectively, and second, it uses a physics-induced graph neural network to learn physics-consistent pairwise interactions. We evaluate the proposed methodology across several benchmark datasets and demonstrate that it is consistent with the underlying physics. Furthermore, we showcase its ability to outperform existing methods in accurately inferring interaction types. In addition, the proposed model is data-efficient and generalizable to large systems when trained on smaller ones, which contrasts with previously proposed solutions. The developed methodology constitutes a key element for the discovery of the fundamental laws that determine macroscopic mechanical properties of particle systems.
Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL
Authors: Baiting Zhu, Meihua Dang, Aditya Grover
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00567
Pdf link: https://arxiv.org/pdf/2305.00567
Abstract The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives. In practice, an agent's preferences over the objectives may not be known apriori, and hence, we require policies that can generalize to arbitrary preferences at test time. In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. The key contributions of this work are two-fold. First, we introduce D4MORL, (D)atasets for MORL that are specifically designed for offline settings. It contains 1.8 million annotated demonstrations obtained by rolling out reference policies that optimize for randomly sampled preferences on 6 MuJoCo environments with 2-3 objectives each. Second, we propose Pareto-Efficient Decision Agents (PEDA), a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy. Empirically, we show that PEDA closely approximates the behavioral policy on the D4MORL benchmark and provides an excellent approximation of the Pareto-front with appropriate conditioning, as measured by the hypervolume and sparsity metrics.
RAPID: Autonomous Multi-Agent Racing using Constrained Potential Dynamic Games
Authors: Yixuan Jia, Maulik Bhatt, Negar Mehr
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00579
Pdf link: https://arxiv.org/pdf/2305.00579
Abstract In this work, we consider the problem of autonomous racing with multiple agents where agents must interact closely and influence each other to compete. We model interactions among agents through a game-theoretical framework and propose an efficient algorithm for tractably solving the resulting game in real time. More specifically, we capture interactions among multiple agents through a constrained dynamic game. We show that the resulting dynamic game is an instance of a simple-to-analyze class of games. Namely, we show that our racing game is an instance of a constrained dynamic potential game. An important and appealing property of dynamic potential games is that a generalized Nash equilibrium of the underlying game can be computed by solving a single constrained optimal control problem instead of multiple coupled constrained optimal control problems. Leveraging this property, we show that the problem of autonomous racing is greatly simplified and develop RAPID (autonomous multi-agent RAcing using constrained PotentIal Dynamic games), a racing algorithm that can be solved tractably in real-time. Through simulation studies, we demonstrate that our algorithm outperforms the state-of-the-art approach. We further show the real-time capabilities of our algorithm in hardware experiments.
The MCC approaches the geometric mean of precision and recall as true negatives approach infinity
Authors: Jon Crall
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00594
Pdf link: https://arxiv.org/pdf/2305.00594
Abstract The performance of a binary classifier is described by a confusion matrix with four entries: the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The Matthew's Correlation Coefficient (MCC), F1, and Fowlkes--Mallows (FM) scores are scalars that summarize a confusion matrix. Both the F1 and FM scores are based on only three of the four entries in the confusion matrix (they ignore TN). In contrast, the MCC takes into account all four entries of the confusion matrix and thus can be seen as providing a more representative picture. However, in object detection problems, measuring the number of true negatives is so large it is often intractable. Thus we ask, what happens to the MCC as the number of true negatives approaches infinity? This paper provides insight into the relationship between the MCC and FM score by proving that the FM-measure is equal to the limit of the MCC as the number of true negatives approaches infinity.
Containerization of a polyglot microservice application using Docker and Kubernetes
Authors: Vamsi Krishna Yepuri, Venkata Kalyan Polamarasetty, Shivani Donthi, Ajay Kumar Reddy Gondi
Subjects: Software Engineering (cs.SE); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2305.00600
Pdf link: https://arxiv.org/pdf/2305.00600
Abstract This project investigates the benefits of containerization technology in modern software development and deployment. The study emphasizes the advantages of using Kubernetes and Docker in the development process, including the easy packaging and deployment of microservices, efficient resource utilization, faster startup times, and greater scalability and flexibility. The project concludes by proposing a study that involves creating a polyglot microservice application using Java, Python, and JavaScript, containerizing it with Docker, and deploying it in Kubernetes. The study aims to evaluate service discovery and auto-scaling in distributed mode and compare the performance metrics with virtual machines and containers. The results of this study can inform software development teams about the benefits of containerization in modern software development and deployment.
Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation
Authors: Tianxiang Hao, Hui Chen, Yuchen Guo, Guiguang Ding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00603
Pdf link: https://arxiv.org/pdf/2305.00603
Abstract Recently, transformers have shown strong ability as visual feature extractors, surpassing traditional convolution-based models in various scenarios. However, the success of vision transformers largely owes to their capacity to accommodate numerous parameters. As a result, new challenges for adapting large models to downstream tasks arise. On the one hand, classic fine-tuning tunes all parameters in a huge model for every task and thus easily falls into overfitting, leading to inferior performance. On the other hand, on resource-limited devices, fine-tuning stores a full copy of parameters and thus is usually impracticable for the shortage of storage space. However, few works have focused on how to efficiently and effectively transfer knowledge in a vision transformer. Existing methods did not dive into the properties of visual features, leading to inferior performance. Moreover, some of them bring heavy inference cost though benefiting storage. To tackle these problems, we propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters to temporarily store the task-specific knowledge while freezing the backbone model. Motivated by the success of group-wise convolution, we adopt grouped connections across the features extracted by fully connected layers to construct tunable parts in a consolidator. To further enhance the model's capacity to transfer knowledge under a constrained storage budget and keep inference efficient, we consolidate the parameters in two stages: 1. between adaptation and storage, and 2. between loading and inference. On a series of downstream visual tasks, our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters, and outperform state-of-the-art parameter-efficient tuning methods by a clear margin. Code is available at https://github.com/beyondhtx/Consolidator.
Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding
Authors: Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, Xu Zhao, Min-Yen Kan, Junxian He, Qizhe Xie
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00633
Pdf link: https://arxiv.org/pdf/2305.00633
Abstract We propose an effective prompting approach that integrates self-evaluation guidance through stochastic beam search. Our approach explores the reasoning search space using a well-calibrated automatic criterion. This enables an efficient search to produce higher-quality final predictions. With the self-evaluation guided stochastic beam search, we also balance the quality--diversity trade-off in the generation of reasoning chains. This allows our approach to adapt well with majority voting and surpass the corresponding Codex-backboned baselines by $6.34\%$, $9.56\%$, and $5.46\%$ on the GSM8K, AQUA, and StrategyQA benchmarks, respectively, in few-shot accuracy. Analysis of our decompositional reasoning finds it pinpoints logic failures and leads to higher consistency and robustness.
GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference
Authors: Qifan Wang, Shujie Cui, Lei Zhou, Ye Dong, Jianli Bai, Yun Sing Koh, Giovanni Russello
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00645
Pdf link: https://arxiv.org/pdf/2305.00645
Abstract Decision tree (DT) is a widely used machine learning model due to its versatility, speed, and interpretability. However, for privacy-sensitive applications, outsourcing DT training and inference to cloud platforms raise concerns about data privacy. Researchers have developed privacy-preserving approaches for DT training and inference using cryptographic primitives, such as Secure Multi-Party Computation (MPC). While these approaches have shown progress, they still suffer from heavy computation and communication overheads. Few recent works employ Graphical Processing Units (GPU) to improve the performance of MPC-protected deep learning. This raises a natural question: \textit{can MPC-protected DT training and inference be accelerated by GPU?} We present GTree, the first scheme that uses GPU to accelerate MPC-protected secure DT training and inference. GTree is built across 3 parties who securely and jointly perform each step of DT training and inference with GPU. Each MPC protocol in GTree is designed in a GPU-friendly version. The performance evaluation shows that GTree achieves ${\thicksim}11{\times}$ and ${\thicksim}21{\times}$ improvements in training SPECT and Adult datasets, compared to the prior most efficient CPU-based work. For inference, GTree shows its superior efficiency when the DT has less than 10 levels, which is $126\times$ faster than the prior most efficient work when inferring $10^4$ instances with a tree of 7 levels. GTree also achieves a stronger security guarantee than prior solutions, which only leaks the tree depth and size of data samples while prior solutions also leak the tree structure. With \textit{oblivious array access}, the access pattern on GPU is also protected.
Dynamic Transfer Learning across Graphs
Authors: Haohui Wang, Yuzhen Mao, Jianhui Sun, Si Zhang, Dawei Zhou
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00664
Pdf link: https://arxiv.org/pdf/2305.00664
Abstract Transferring knowledge across graphs plays a pivotal role in many high-stake domains, ranging from transportation networks to e-commerce networks, from neuroscience to finance. To date, the vast majority of existing works assume both source and target domains are sampled from a universal and stationary distribution. However, many real-world systems are intrinsically dynamic, where the underlying domains are evolving over time. To bridge the gap, we propose to shift the problem to the dynamic setting and ask: given the label-rich source graphs and the label-scarce target graphs observed in previous T timestamps, how can we effectively characterize the evolving domain discrepancy and optimize the generalization performance of the target domain at the incoming T+1 timestamp? To answer the question, for the first time, we propose a generalization bound under the setting of dynamic transfer learning across graphs, which implies the generalization performance is dominated by domain evolution and domain discrepancy between source and target domains. Inspired by the theoretical results, we propose a novel generic framework DyTrans to improve knowledge transferability across dynamic graphs. In particular, we start with a transformer-based temporal encoding module to model temporal information of the evolving domains; then, we further design a dynamic domain unification module to efficiently learn domain-invariant representations across the source and target domains. Finally, extensive experiments on various real-world datasets demonstrate the effectiveness of DyTrans in transferring knowledge from dynamic source domains to dynamic target domains.
On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring
Authors: Dylan J. Foster, Dean P. Foster, Noah Golowich, Alexander Rakhlin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.00684
Pdf link: https://arxiv.org/pdf/2305.00684
Abstract A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees, and how these considerations change as we move from few to many agents. We study this question in a general framework for interactive decision making with multiple agents, encompassing Markov games with function approximation and normal-form games with bandit feedback. We focus on equilibrium computation, in which a centralized learning algorithm aims to compute an equilibrium by controlling multiple agents that interact with an unknown environment. Our main contributions are: - We provide upper and lower bounds on the optimal sample complexity for multi-agent decision making based on a multi-agent generalization of the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. (2021) in the single-agent counterpart to our setting. Compared to the best results for the single-agent setting, our bounds have additional gaps. We show that no "reasonable" complexity measure can close these gaps, highlighting a striking separation between single and multiple agents. - We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making, but with hidden (unobserved) rewards, a framework that subsumes variants of the partial monitoring problem. As a consequence, we characterize the statistical complexity for hidden-reward interactive decision making to the best extent possible. Building on this development, we provide several new structural results, including 1) conditions under which the statistical complexity of multi-agent decision making can be reduced to that of single-agent, and 2) conditions under which the so-called curse of multiple agents can be avoided.
Efficient dynamic model based testing using greedy test case selection
Authors: P.H.M. van Spaendonck
Subjects: Software Engineering (cs.SE); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.00705
Pdf link: https://arxiv.org/pdf/2305.00705
Abstract Model-based testing (MBT) provides an automated approach for finding discrepancies between software models and their implementation. If we want to incorporate MBT into the fast and iterative software development process that is Continuous Integration Continuous Deployment, then MBT must be able to test the entire model in as little time as possible. However, current academic MBT tools either traverse models at random, which we show to be ineffective for this purpose, or use precalculated optimal paths which can not be efficiently calculated for large industrial models. We provide a new traversal strategy that provides an improvement in error-detection rate comparable to using recalculated paths. We show that the new strategy is able to be applied efficiently to large models. The benchmarks are performed on a mix of real-world and pseudo-randomly generated models. We observe no significant difference between these two types of models.
ZeroSearch: Local Image Search from Text with Zero Shot Learning
Authors: Jatin Nainani, Abhishek Mazumdar, Viraj Sheth
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00715
Pdf link: https://arxiv.org/pdf/2305.00715
Abstract The problem of organizing and finding images in a user's directory has become increasingly challenging due to the rapid growth in the number of images captured on personal devices. This paper presents a solution that utilizes zero shot learning to create image queries with only user provided text descriptions. The paper's primary contribution is the development of an algorithm that utilizes pre-trained models to extract features from images. The algorithm uses OWL to check for the presence of bounding boxes and sorts images based on cosine similarity scores. The algorithm's output is a list of images sorted in descending order of similarity, helping users to locate specific images more efficiently. The paper's experiments were conducted using a custom dataset to simulate a user's image directory and evaluated the accuracy, inference time, and size of the models. The results showed that the VGG model achieved the highest accuracy, while the Resnet50 and InceptionV3 models had the lowest inference time and size. The papers proposed algorithm provides an effective and efficient solution for organizing and finding images in a users local directory. The algorithm's performance and flexibility make it suitable for various applications, including personal image organization and search engines. Code and dataset for zero-search are available at: https://github.com/NainaniJatinZ/zero-search
Adaptively Topological Tensor Network for Multi-view Subspace Clustering
Authors: Yipeng Liu, Yingcong Lu, Weiting Ou, Zhen Long, Ce Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00716
Pdf link: https://arxiv.org/pdf/2305.00716
Abstract Multi-view subspace clustering methods have employed learned self-representation tensors from different tensor decompositions to exploit low rank information. However, the data structures embedded with self-representation tensors may vary in different multi-view datasets. Therefore, a pre-defined tensor decomposition may not fully exploit low rank information for a certain dataset, resulting in sub-optimal multi-view clustering performance. To alleviate the aforementioned limitations, we propose the adaptively topological tensor network (ATTN) by determining the edge ranks from the structural information of the self-representation tensor, and it can give a better tensor representation with the data-driven strategy. Specifically, in multi-view tensor clustering, we analyze the higher-order correlations among different modes of a self-representation tensor, and prune the links of the weakly correlated ones from a fully connected tensor network. Therefore, the newly obtained tensor networks can efficiently explore the essential clustering information with self-representation with different tensor structures for various datasets. A greedy adaptive rank-increasing strategy is further applied to improve the capture capacity of low rank structure. We apply ATTN on multi-view subspace clustering and utilize the alternating direction method of multipliers to solve it. Experimental results show that multi-view subspace clustering based on ATTN outperforms the counterparts on six multi-view datasets.
Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing
Authors: Ibrahim Malik, Siddique Latif, Sanaullah Manzoor, Muhammad Usama, Junaid Qadir, Raja Jurdak
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2305.00725
Pdf link: https://arxiv.org/pdf/2305.00725
Abstract Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely on edge computing to analyse emotions conveyed through non-speech expressions like screaming and crying. In particular, we explore knowledge distillation to design a computationally efficient system that can be deployed on edge devices with limited resources without degrading the performance significantly. We comprehensively evaluate our proposed framework using two publicly available datasets and highlight its effectiveness by comparing the results with the well-known MobileNet model. Our results demonstrate the feasibility and effectiveness of using edge computing for non-speech emotion detection, which can potentially improve applications that rely on emotion detection in communication networks. To the best of our knowledge, this is the first work on an edge-computing-based framework for detecting emotions in non-speech audio, offering promising directions for future research.
Breaks and Code Quality: Investigating the Impact of Forgetting on Software Development. A Registered Report
Authors: Dario Amoroso d'Aragona, Luca Pascarella, Andrea Janes, Valentina Lenarduzzi, Rafael Penaloza, Davide Taibi
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2305.00760
Pdf link: https://arxiv.org/pdf/2305.00760
Abstract Developers interrupting their participation in a project might slowly forget critical information about the code, such as its intended purpose, structure, the impact of external dependencies, and the approach used for implementation. Forgetting the implementation details can have detrimental effects on software maintenance, comprehension, knowledge sharing, and developer productivity, resulting in bugs, and other issues that can negatively influence the software development process. Therefore, it is crucial to ensure that developers have a clear understanding of the codebase and can work efficiently and effectively even after long interruptions. This registered report seeks to investigate the relationship between a developer's commits break and different code quality properties, so as to understand if the amount of activity in a project impact the code quality, and if developers with different activity profiles show different impacts on code quality. The results might be useful to understand if it is beneficial to promote the practice of developing multiple projects in parallel, or if it is more beneficial to reduce the number of projects each developer contributes.
SGX Switchless Calls Made Configless
Authors: Peterson Yuhala, Michael Paper, Timothée Zerbib, Pascal Felber, Valerio Schiavoni, Alain Tchana
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00763
Pdf link: https://arxiv.org/pdf/2305.00763
Abstract Intel's software guard extensions (SGX) provide hardware enclaves to guarantee confidentiality and integrity for sensitive code and data. However, systems leveraging such security mechanisms must often pay high performance overheads. A major source of this overhead is SGX enclave transitions which induce expensive cross-enclave context switches. The Intel SGX SDK mitigates this with a switchless call mechanism for transitionless cross-enclave calls using worker threads. Intel's SGX switchless call implementation improves performance but provides limited flexibility: developers need to statically fix the system configuration at build time, which is error-prone and misconfigurations lead to performance degradations and waste of CPU resources. ZC-SWITCHLESS is a configless and efficient technique to drive the execution of SGX switchless calls. Its dynamic approach optimises the total switchless worker threads at runtime to minimise CPU waste. The experimental evaluation shows that ZC-SWITCHLESS obviates the performance penalty of misconfigured switchless systems while minimising CPU waste.
Montsalvat: Intel SGX Shielding for GraalVM Native Images
Authors: Peterson Yuhala, Jämes Ménétrey, Pascal Felber, Valerio Schiavoni, Alain Tchana, Gaël Thomas, Hugo Guiroux, Jean-Pierre Lozi
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00766
Pdf link: https://arxiv.org/pdf/2305.00766
Abstract The popularity of the Java programming language has led to its wide adoption in cloud computing infrastructures. However, Java applications running in untrusted clouds are vulnerable to various forms of privileged attacks. The emergence of trusted execution environments (TEEs) such as Intel SGX mitigates this problem. TEEs protect code and data in secure enclaves inaccessible to untrusted software, including the kernel and hypervisors. To efficiently use TEEs, developers must manually partition their applications into trusted and untrusted parts, in order to reduce the size of the trusted computing base (TCB) and minimise the risks of security vulnerabilities. However, partitioning applications poses two important challenges: (i) ensuring efficient object communication between the partitioned components, and (ii) ensuring the consistency of garbage collection between the parts, especially with memory-managed languages such as Java. We present Montsalvat, a tool which provides a practical and intuitive annotation-based partitioning approach for Java applications destined for secure enclaves. Montsalvat provides an RMI-like mechanism to ensure inter-object communication, as well as consistent garbage collection across the partitioned components. We implement Montsalvat with GraalVM native-image, a tool for compiling Java applications ahead-of-time into standalone native executables that do not require a JVM at runtime. Our extensive evaluation with micro- and macro-benchmarks shows our partitioning approach to boost performance in real-world applications
RViDeformer: Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset
Authors: Huanjing Yue, Cong Cao, Lei Liao, Jingyu Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00767
Pdf link: https://arxiv.org/pdf/2305.00767
Abstract In recent years, raw video denoising has garnered increased attention due to the consistency with the imaging process and well-studied noise modeling in the raw domain. However, two problems still hinder the denoising performance. Firstly, there is no large dataset with realistic motions for supervised raw video denoising, as capturing noisy and clean frames for real dynamic scenes is difficult. To address this, we propose recapturing existing high-resolution videos displayed on a 4K screen with high-low ISO settings to construct noisy-clean paired frames. In this way, we construct a video denoising dataset (named as ReCRVD) with 120 groups of noisy-clean videos, whose ISO values ranging from 1600 to 25600. Secondly, while non-local temporal-spatial attention is beneficial for denoising, it often leads to heavy computation costs. We propose an efficient raw video denoising transformer network (RViDeformer) that explores both short and long-distance correlations. Specifically, we propose multi-branch spatial and temporal attention modules, which explore the patch correlations from local window, local low-resolution window, global downsampled window, and neighbor-involved window, and then they are fused together. We employ reparameterization to reduce computation costs. Our network is trained in both supervised and unsupervised manners, achieving the best performance compared with state-of-the-art methods. Additionally, the model trained with our proposed dataset (ReCRVD) outperforms the model trained with previous benchmark dataset (CRVD) when evaluated on the real-world outdoor noisy videos. Our code and dataset will be released after the acceptance of this work.
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Authors: Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00787
Pdf link: https://arxiv.org/pdf/2305.00787
Abstract Generating talking person portraits with arbitrary speech audio is a crucial problem in the field of digital human and metaverse. A modern talking face generation method is expected to achieve the goals of generalized audio-lip synchronization, good video quality, and high system efficiency. Recently, neural radiance field (NeRF) has become a popular rendering technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video. However, there still exist several challenges for NeRF-based methods: 1) as for the lip synchronization, it is hard to generate a long facial motion sequence of high temporal consistency and audio-lip accuracy; 2) as for the video quality, due to the limited data used to train the renderer, it is vulnerable to out-of-domain input condition and produce bad rendering results occasionally; 3) as for the system efficiency, the slow training and inference speed of the vanilla NeRF severely obstruct its usage in real-world applications. In this paper, we propose GeneFace++ to handle these challenges by 1) utilizing the pitch contour as an auxiliary feature and introducing a temporal loss in the facial motion prediction process; 2) proposing a landmark locally linear embedding method to regulate the outliers in the predicted motion sequence to avoid robustness issues; 3) designing a computationally efficient NeRF-based motion-to-video renderer to achieves fast training and real-time inference. With these settings, GeneFace++ becomes the first NeRF-based method that achieves stable and real-time talking face generation with generalized audio-lip synchronization. Extensive experiments show that our method outperforms state-of-the-art baselines in terms of subjective and objective evaluation. Video samples are available at https://genefaceplusplus.github.io .
Automated Paper Screening for Clinical Reviews Using Large Language Models
Authors: Eddie Guo, Mehul Gupta, Jiawen Deng, Ye-Jean Park, Mike Paget, Christopher Naugler
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00844
Pdf link: https://arxiv.org/pdf/2305.00844
Abstract Objective: To assess the performance of the OpenAI GPT API in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review datasets and compare its performance against ground truth labelling by two independent human reviewers. Methods: We introduce a novel workflow using the OpenAI GPT API for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the GPT API with the screening criteria in natural language and a corpus of title and abstract datasets that have been filtered by a minimum of two human reviewers. We compared the performance of our model against human-reviewed papers across six review papers, screening over 24,000 titles and abstracts. Results: Our results show an accuracy of 0.91, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. On a randomly selected subset of papers, the GPT API demonstrated the ability to provide reasoning for its decisions and corrected its initial decision upon being asked to explain its reasoning for a subset of incorrect classifications. Conclusion: The GPT API has the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, the GPT API can enhance efficiency and lead to more accurate and reliable conclusions in medical research.
(1+1)-CMA-ES with Margin for Discrete and Mixed-Integer Problems
Authors: Yohei Watanabe, Kento Uchida, Ryoki Hamano, Shota Saito, Masahiro Nomura, Shinichi Shirakawa
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2305.00849
Pdf link: https://arxiv.org/pdf/2305.00849
Abstract The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient continuous black-box optimization method. The CMA-ES possesses many attractive features, including invariance properties and a well-tuned default hyperparameter setting. Moreover, several components to specialize the CMA-ES have been proposed, such as noise handling and constraint handling. To utilize these advantages in mixed-integer optimization problems, the CMA-ES with margin has been proposed. The CMA-ES with margin prevents the premature convergence of discrete variables by the margin correction, in which the distribution parameters are modified to leave the generation probability for changing the discrete variable. The margin correction has been applied to ($\mu/\mu_\mathrm{w}$,$\lambda$)-CMA-ES, while this paper introduces the margin correction into (1+1)-CMA-ES, an elitist version of CMA-ES. The (1+1)-CMA-ES is often advantageous for unimodal functions and can be computationally less expensive. To tackle the performance deterioration on mixed-integer optimization, we use the discretized elitist solution as the mean of the sampling distribution and modify the margin correction not to move the elitist solution. The numerical simulation using benchmark functions on mixed-integer, integer, and binary domains shows that (1+1)-CMA-ES with margin outperforms the CMA-ES with margin and is better than or comparable with several specialized methods to a particular search domain.
Multi-Agent Systems with Quantitative Satisficing Goals
Authors: Senthil Rajasekaran, Suguman Bansal, Moshe Y. Vardi
Subjects: Computer Science and Game Theory (cs.GT); Formal Languages and Automata Theory (cs.FL)
Arxiv link: https://arxiv.org/abs/2305.00953
Pdf link: https://arxiv.org/pdf/2305.00953
Abstract In the study of reactive systems, qualitative properties are usually easier to model and analyze than quantitative properties. This is especially true in systems where mutually beneficial cooperation between agents is possible, such as multi-agent systems. The large number of possible payoffs available to agents in reactive systems with quantitative properties means that there are many scenarios in which agents deviate from mutually beneficial outcomes in order to gain negligible payoff improvements. This behavior often leads to less desirable outcomes for all agents involved. For this reason we study satisficing goals, derived from a decision-making approach aimed at meeting a good-enough outcome instead of pure optimization. By considering satisficing goals, we are able to employ efficient automata-based algorithms to find pure-strategy Nash equilibria. We then show that these algorithms extend to scenarios in which agents have multiple thresholds, providing an approximation of optimization while still retaining the possibility of mutually beneficial cooperation and efficient automata-based algorithms. Finally, we demonstrate a one-way correspondence between the existence of $\epsilon$-equilibria and the existence of equilibria in games where agents have multiple thresholds.
A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm
Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Ankit Pensia, Thanasis Pittas
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.00966
Pdf link: https://arxiv.org/pdf/2305.00966
Abstract We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $\alpha<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(\mu, \Sigma)$, the goal is to output a list of $O(1/\alpha)$ hypotheses at least one of which is close to $\Sigma$ in relative Frobenius norm. Our main result is a $\mathrm{poly}(d,1/\alpha)$ sample and time algorithm for this task that guarantees relative Frobenius norm error of $\mathrm{poly}(1/\alpha)$. Importantly, our algorithm relies purely on spectral techniques. As a corollary, we obtain an efficient spectral algorithm for robust partial clustering of Gaussian mixture models (GMMs) -- a key ingredient in the recent work of [BDJ+22] on robustly learning arbitrary GMMs. Combined with the other components of [BDJ+22], our new method yields the first Sum-of-Squares-free algorithm for robustly learning GMMs. At the technical level, we develop a novel multi-filtering method for list-decodable covariance estimation that may be useful in other settings.
Keyword: faster

Neural Network Accelerated Process Design of Polycrystalline Microstructures
Authors: Junrong Lin, Mahmudul Hasan, Pinar Acarb, Jose Blanchet, Vahid Tarokh
Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00003
Pdf link: https://arxiv.org/pdf/2305.00003
Abstract Computational experiments are exploited in finding a well-designed processing path to optimize material structures for desired properties. This requires understanding the interplay between the processing-(micro)structure-property linkages using a multi-scale approach that connects the macro-scale (process parameters) to meso (homogenized properties) and micro (crystallographic texture) scales. Due to the nature of the problem's multi-scale modeling setup, possible processing path choices could grow exponentially as the decision tree becomes deeper, and the traditional simulators' speed reaches a critical computational threshold. To lessen the computational burden for predicting microstructural evolution under given loading conditions, we develop a neural network (NN)-based method with physics-infused constraints. The NN aims to learn the evolution of microstructures under each elementary process. Our method is effective and robust in finding optimal processing paths. In this study, our NN-based method is applied to maximize the homogenized stiffness of a Copper microstructure, and it is found to be 686 times faster while achieving 0.053% error in the resulting homogenized stiffness compared to the traditional finite element simulator on a 10-process experiment.
LAVA: Data Valuation without Pre-Specified Learning Algorithms
Authors: Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko, Ming Jin, Ruoxi Jia
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.00054
Pdf link: https://arxiv.org/pdf/2305.00054
Abstract Traditionally, data valuation is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many use cases of data valuation, such as setting priorities over different data sources in a data acquisition process and informing pricing mechanisms in a data marketplace. In these scenarios, data needs to be valued before the actual analysis and the choice of the learning algorithm is still undetermined then. Another side-effect of the dependence is that to assess the value of individual points, one needs to re-run the learning algorithm with and without a point, which incurs a large computation burden. This work leapfrogs over the current limits of data valuation methods by introducing a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. (1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between the training and the validation set. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions. (2) We develop a novel method to value individual data based on the sensitivity analysis of the class-wise Wasserstein distance. Importantly, these values can be directly obtained for free from the output of off-the-shelf optimization solvers when computing the distance. (3) We evaluate our new data valuation framework over various use cases related to detecting low-quality data and show that, surprisingly, the learning-agnostic feature of our framework enables a significant improvement over the state-of-the-art performance while being orders of magnitude faster.
Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT
Authors: Zhenxiang Xiao, Yuzhong Chen, Lu Zhang, Junjie Yao, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Tianming Liu, Xi Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00201
Pdf link: https://arxiv.org/pdf/2305.00201
Abstract Prompts have been proven to play a crucial role in large language models, and in recent years, vision models have also been using prompts to improve scalability for multiple downstream tasks. In this paper, we focus on adapting prompt design based on instruction tuning into a visual transformer model for image classification which we called Instruction-ViT. The key idea is to implement multi-modal prompts (text or image prompt) related to category information to guide the fine-tuning of the model. Based on the experiments of several image captionining tasks, the performance and domain adaptability were improved. Our work provided an innovative strategy to fuse multi-modal prompts with better performance and faster adaptability for visual classification models.
Physics-Guided Graph Neural Networks for Real-time AC/DC Power Flow Analysis
Authors: Mei Yang, Gao Qiu, Yong Wu, Junyong Liu, Nina Dai, Yue Shui, Kai Liu, Lijie Ding
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00216
Pdf link: https://arxiv.org/pdf/2305.00216
Abstract The increasing scale of alternating current and direct current (AC/DC) hybrid systems necessitates a faster power flow analysis tool than ever. This letter thus proposes a specific physics-guided graph neural network (PG-GNN). The tailored graph modelling of AC and DC grids is firstly advanced to enhance the topology adaptability of the PG-GNN. To eschew unreliable experience emulation from data, AC/DC physics are embedded in the PG-GNN using duality. Augmented Lagrangian method-based learning scheme is then presented to help the PG-GNN better learn nonconvex patterns in an unsupervised label-free manner. Multi-PG-GNN is finally conducted to master varied DC control modes. Case study shows that, relative to the other 7 data-driven rivals, only the proposed method matches the performance of the model-based benchmark, also beats it in computational efficiency beyond 10 times.
Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning
Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Hang Su, Chenguang Yang, Kai Huang, Alois Knoll
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00286
Pdf link: https://arxiv.org/pdf/2305.00286
Abstract Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions.
A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images
Authors: Zhe Chen, Yang Yang, Anne Bettens, Youngho Eun, Xiaofeng Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00412
Pdf link: https://arxiv.org/pdf/2305.00412
Abstract Detecting Resident Space Objects (RSOs) and preventing collisions with other satellites is crucial. Recently, deep convolutional neural networks (DCNNs) have shown superior performance in object detection when large-scale datasets are available. However, collecting rich data of RSOs is difficult due to very few occurrences in the space images. Without sufficient data, it is challenging to comprehensively train DCNN detectors and make them effective for detecting RSOs in space images, let alone to estimate whether a detector is sufficiently robust. The lack of meaningful evaluation of different detectors could further affect the design and application of detection methods. To tackle this issue, we propose that the space images containing RSOs can be simulated to complement the shortage of raw data for better benchmarking. Accordingly, we introduce a novel simulation-augmented benchmarking framework for RSO detection (SAB-RSOD). In our framework, by making the best use of the hardware parameters of the sensor that captures real-world space images, we first develop a high-fidelity RSO simulator that can generate various realistic space images. Then, we use this simulator to generate images that contain diversified RSOs in space and annotate them automatically. Later, we mix the synthetic images with the real-world images, obtaining around 500 images for training with only the real-world images for evaluation. Under SAB-RSOD, we can train different popular object detectors like Yolo and Faster RCNN effectively, enabling us to evaluate their performance thoroughly. The evaluation results have shown that the amount of available data and image resolution are two key factors for robust RSO detection. Moreover, if using a lower resolution for higher efficiency, we demonstrated that a simple UNet-based detection method can already access high detection accuracy.
Guaranteed Evader Detection in Multi-Agent Search Tasks using Pincer Trajectories
Authors: Roee M. Francos, Alfred M. Bruckstein
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2305.00533
Pdf link: https://arxiv.org/pdf/2305.00533
Abstract Assume that inside an initial planar area there are smart mobile evaders attempting to avoid detection by a team of sweeping searching agents. All sweepers detect evaders with fan-shaped sensors, modeling the field of view of real cameras. Detection of all evaders is guaranteed with cooperative sweeping strategies, by setting requirements on sweepers' speed, and by carefully designing their trajectories. Assume the smart evaders have an upper limit on their speed which is a-priori known to the sweeping team. An easier task for the team of sweepers is to confine evaders to the domain in which they are initially located. The sweepers accomplish the confinement task if they move sufficiently fast and detect evaders by applying an appropriate search strategy. Any given search strategy results in a minimal sweeper's speed in order to be able to detect all evaders. The minimal speed guarantees the ability of the sweeping team to confine evaders to their original domain, and if the sweepers move faster they are able to detect all evaders that are present in the region. We present results on the total search time for a novel pincer-movement based search protocol that utilizes complementary trajectories along with adaptive sensor geometries for any even number of pursuers.
Containerization of a polyglot microservice application using Docker and Kubernetes
Authors: Vamsi Krishna Yepuri, Venkata Kalyan Polamarasetty, Shivani Donthi, Ajay Kumar Reddy Gondi
Subjects: Software Engineering (cs.SE); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2305.00600
Pdf link: https://arxiv.org/pdf/2305.00600
Abstract This project investigates the benefits of containerization technology in modern software development and deployment. The study emphasizes the advantages of using Kubernetes and Docker in the development process, including the easy packaging and deployment of microservices, efficient resource utilization, faster startup times, and greater scalability and flexibility. The project concludes by proposing a study that involves creating a polyglot microservice application using Java, Python, and JavaScript, containerizing it with Docker, and deploying it in Kubernetes. The study aims to evaluate service discovery and auto-scaling in distributed mode and compare the performance metrics with virtual machines and containers. The results of this study can inform software development teams about the benefits of containerization in modern software development and deployment.
GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference
Authors: Qifan Wang, Shujie Cui, Lei Zhou, Ye Dong, Jianli Bai, Yun Sing Koh, Giovanni Russello
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00645
Pdf link: https://arxiv.org/pdf/2305.00645
Abstract Decision tree (DT) is a widely used machine learning model due to its versatility, speed, and interpretability. However, for privacy-sensitive applications, outsourcing DT training and inference to cloud platforms raise concerns about data privacy. Researchers have developed privacy-preserving approaches for DT training and inference using cryptographic primitives, such as Secure Multi-Party Computation (MPC). While these approaches have shown progress, they still suffer from heavy computation and communication overheads. Few recent works employ Graphical Processing Units (GPU) to improve the performance of MPC-protected deep learning. This raises a natural question: \textit{can MPC-protected DT training and inference be accelerated by GPU?} We present GTree, the first scheme that uses GPU to accelerate MPC-protected secure DT training and inference. GTree is built across 3 parties who securely and jointly perform each step of DT training and inference with GPU. Each MPC protocol in GTree is designed in a GPU-friendly version. The performance evaluation shows that GTree achieves ${\thicksim}11{\times}$ and ${\thicksim}21{\times}$ improvements in training SPECT and Adult datasets, compared to the prior most efficient CPU-based work. For inference, GTree shows its superior efficiency when the DT has less than 10 levels, which is $126\times$ faster than the prior most efficient work when inferring $10^4$ instances with a tree of 7 levels. GTree also achieves a stronger security guarantee than prior solutions, which only leaks the tree depth and size of data samples while prior solutions also leak the tree structure. With \textit{oblivious array access}, the access pattern on GPU is also protected.
File Fragment Classification using Light-Weight Convolutional Neural Networks
Authors: Mustafa Ghaleb, Kunwar Saaim, Muhamad Felemban, Saleh Al-Saleh, Ahmad Al-Mulhem
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00656
Pdf link: https://arxiv.org/pdf/2305.00656
Abstract In digital forensics, file fragment classification is an important step toward completing file carving process. There exist several techniques to identify the type of file fragments without relying on meta-data, such as using features like header/footer and N-gram to identify the fragment type. Recently, convolutional neural network (CNN) models have been used to build classification models to achieve this task. However, the number of parameters in CNNs tends to grow exponentially as the number of layers increases. This results in a dramatic increase in training and inference time. In this paper, we propose light-weight file fragment classification models based on depthwise separable CNNs. The evaluation results show that our proposed models provide faster inference time with comparable accuracy as compared to the state-of-art CNN based models. In particular, our models were able to achieve an accuracy of 79\% on the FFT-75 dataset with nearly 100K parameters and 164M FLOPs, which is 4x smaller and 6x faster than the state-of-the-art classifier in the literature.
Event Camera as Region Proposal Network
Authors: Shrutarv Awasthi, Anas Gouda, Richard Julian Lodenkaemper, Moritz Roidl
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00718
Pdf link: https://arxiv.org/pdf/2305.00718
Abstract The human eye consists of two types of photoreceptors, rods and cones. Rods are responsible for monochrome vision, and cones for color vision. The number of rods is much higher than the cones, which means that most human vision processing is done in monochrome. An event camera reports the change in pixel intensity and is analogous to rods. Event and color cameras in computer vision are like rods and cones in human vision. Humans can notice objects moving in the peripheral vision (far right and left), but we cannot classify them (think of someone passing by on your far left or far right, this can trigger your attention without knowing who they are). Thus, rods act as a region proposal network (RPN) in human vision. Therefore, an event camera can act as a region proposal network in deep learning Two-stage object detectors in deep learning, such as Mask R-CNN, consist of a backbone for feature extraction and a RPN. Currently, RPN uses the brute force method by trying out all the possible bounding boxes to detect an object. This requires much computation time to generate region proposals making two-stage detectors inconvenient for fast applications. This work replaces the RPN in Mask-RCNN of detectron2 with an event camera for generating proposals for moving objects. Thus, saving time and being computationally less expensive. The proposed approach is faster than the two-stage detectors with comparable accuracy
DNS Privacy with Speed? Evaluating DNS over QUIC and its Impact on Web Performance
Authors: Mike Kosek, Luca Schumann, Robin Marx, Trinh Viet Doan, Vaibhav Bajpai
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2305.00790
Pdf link: https://arxiv.org/pdf/2305.00790
Abstract Over the last decade, Web traffic has significantly shifted towards HTTPS due to an increased awareness for privacy. However, DNS traffic is still largely unencrypted, which allows user profiles to be derived from plaintext DNS queries. While DNS over TLS (DoT) and DNS over HTTPS (DoH) address this problem by leveraging transport encryption for DNS, both protocols are constrained by the underlying transport (TCP) and encryption (TLS) protocols, requiring multiple round-trips to establish a secure connection. In contrast, QUIC combines the transport and cryptographic handshake into a single round-trip, which allows the recently standardized DNS over QUIC (DoQ) to provide DNS privacy with minimal latency. In the first study of its kind, we perform distributed DoQ measurements across multiple vantage points to evaluate the impact of DoQ on Web performance. We find that DoQ excels over DoH, leading to significant improvements with up to 10% faster loads for simple webpages. With increasing complexity of webpages, DoQ even catches up to DNS over UDP (DoUDP) as the cost of encryption amortizes: With DoQ being only ~2% slower than DoUDP, encrypted DNS becomes much more appealing for the Web.
A comparison of methods to eliminate regularization weight tuning from data-enabled predictive control
Authors: Manuel Koch, Colin N. Jones
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00807
Pdf link: https://arxiv.org/pdf/2305.00807
Abstract Data-enabled predictive control (DeePC) is a recently established form of Model Predictive Control (MPC), based on behavioral systems theory. While eliminating the need to explicitly identify a model, it requires an additional regularization with a corresponding weight to function well with noisy data. The tuning of this weight is non-trivial and has a significant impact on performance. In this paper, we compare three reformulations of DeePC that either eliminate the regularization, or simplify the tuning to a trivial point. A building simulation study shows a comparable performance for all three reformulations of DeePC. However, a conventional MPC with a black-box model slightly outperforms them, while solving much faster, and yielding smoother optimal trajectories. Two of the DeePC variants also show sensitivity to an unobserved biased input noise, which is not present in the conventional MPC.
StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video
Authors: Lizhen Wang, Xiaochen Zhao, Jingxiang Sun, Yuxiang Zhang, Hongwen Zhang, Tao Yu, Yebin Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00942
Pdf link: https://arxiv.org/pdf/2305.00942
Abstract Face reenactment methods attempt to restore and re-animate portrait videos as realistically as possible. Existing methods face a dilemma in quality versus controllability: 2D GAN-based methods achieve higher image quality but suffer in fine-grained control of facial attributes compared with 3D counterparts. In this work, we propose StyleAvatar, a real-time photo-realistic portrait avatar reconstruction method using StyleGAN-based networks, which can generate high-fidelity portrait avatars with faithful expression control. We expand the capabilities of StyleGAN by introducing a compositional representation and a sliding window augmentation method, which enable faster convergence and improve translation generalization. Specifically, we divide the portrait scenes into three parts for adaptive adjustments: facial region, non-facial foreground region, and the background. Besides, our network leverages the best of UNet, StyleGAN and time coding for video learning, which enables high-quality video generation. Furthermore, a sliding window augmentation method together with a pre-training strategy are proposed to improve translation generalization and training performance, respectively. The proposed network can converge within two hours while ensuring high image quality and a forward rendering time of only 20 milliseconds. Furthermore, we propose a real-time live system, which further pushes research into applications. Results and experiments demonstrate the superiority of our method in terms of image quality, full portrait video generation, and real-time re-animation compared to existing facial reenactment methods. Training and inference code for this paper are at https://github.com/LizhenWangT/StyleAvatar.
Keyword: mobile

Wearing face mask detection using deep learning through COVID-19 pandemic
Authors: Javad Khoramdel, Soheila Hatami, Majid Sadedel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00068
Pdf link: https://arxiv.org/pdf/2305.00068
Abstract During the COVID-19 pandemic, wearing a face mask has been known to be an effective way to prevent the spread of COVID-19. In lots of monitoring tasks, humans have been replaced with computers thanks to the outstanding performance of the deep learning models. Monitoring the wearing of a face mask is another task that can be done by deep learning models with acceptable accuracy. The main challenge of this task is the limited amount of data because of the quarantine. In this paper, we did an investigation on the capability of three state-of-the-art object detection neural networks on face mask detection for real-time applications. As mentioned, here are three models used, Single Shot Detector (SSD), two versions of You Only Look Once (YOLO) i.e., YOLOv4-tiny, and YOLOv4-tiny-3l from which the best was selected. In the proposed method, according to the performance of different models, the best model that can be suitable for use in real-world and mobile device applications in comparison to other recent studies was the YOLOv4-tiny model, with 85.31% and 50.66 for mean Average Precision (mAP) and Frames Per Second (FPS), respectively. These acceptable values were achieved using two datasets with only 1531 images in three separate classes.
Asynchronous Distributed Protocol for Service Provisioning in the Edge-Cloud Continuum
Authors: Itamar Cohen, Paolo Giaccone, Carla Fabiana Chiasserini
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2305.00184
Pdf link: https://arxiv.org/pdf/2305.00184
Abstract In the edge-cloud continuum, datacenters provide microservices (MSs) to mobile users, with each MS having specific latency constraints and computational requirements. Deploying such a variety of MSs matching their requirements with the available computing resources is challenging. In addition, time-critical MSs may have to be migrated as the users move, to keep meeting their latency constraints. Unlike previous work relying on a central orchestrator with an always-updated global view of the available resources and of the users' locations, this work envisions a distributed solution to the above issues. In particular, we propose a distributed asynchronous protocol for MS deployment in the cloud-edge continuum that (i) dramatically reduces the system overhead compared to a centralized approach, and (ii) increases the system stability by avoiding having a single point of failure as in the case of a central orchestrator. Our solution ensures cost-efficient feasible placement of MSs, while using negligible bandwidth.
STAR-RIS-Aided Mobile Edge Computing: Computation Rate Maximization with Binary Amplitude Coefficients
Authors: Zhenrong Liu, Zongze Li, Miaowen Wen, Yi Gong, Yik-Chung Wu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.00428
Pdf link: https://arxiv.org/pdf/2305.00428
Abstract In this paper, simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) is investigated in the multi-user mobile edge computing (MEC) system to improve the computation rate. Compared with traditional RIS-aided MEC, STAR-RIS extends the service coverage from half-space to full-space and provides new flexibility for improving the computation rate for end users. However, the STAR-RIS-aided MEC system design is a challenging problem due to the non-smooth and non-convex binary amplitude coefficients with coupled phase shifters. To fill this gap, this paper formulates a computation rate maximization problem via the joint design of the STAR-RIS phase shifts, reflection and transmission amplitude coefficients, the receive beamforming vectors, and energy partition strategies for local computing and offloading. To tackle the discontinuity caused by binary variables, we propose an efficient smoothing-based method to decrease convergence error, in contrast to the conventional penalty-based method, which brings many undesired stationary points and local optima. Furthermore, a fast iterative algorithm is proposed to obtain a stationary point for the joint optimization problem, with each subproblem solved by a low-complexity algorithm, making the proposed design scalable to a massive number of users and STAR-RIS elements. Simulation results validate the strength of the proposed smoothing-based method and show that the proposed fast iterative algorithm achieves a higher computation rate than the conventional method while saving the computation time by at least an order of magnitude. Moreover, the resultant STAR-RIS-aided MEC system significantly improves the computation rate compared to other baseline schemes with conventional reflect-only/transmit-only RIS.
Guaranteed Evader Detection in Multi-Agent Search Tasks using Pincer Trajectories
Authors: Roee M. Francos, Alfred M. Bruckstein
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2305.00533
Pdf link: https://arxiv.org/pdf/2305.00533
Abstract Assume that inside an initial planar area there are smart mobile evaders attempting to avoid detection by a team of sweeping searching agents. All sweepers detect evaders with fan-shaped sensors, modeling the field of view of real cameras. Detection of all evaders is guaranteed with cooperative sweeping strategies, by setting requirements on sweepers' speed, and by carefully designing their trajectories. Assume the smart evaders have an upper limit on their speed which is a-priori known to the sweeping team. An easier task for the team of sweepers is to confine evaders to the domain in which they are initially located. The sweepers accomplish the confinement task if they move sufficiently fast and detect evaders by applying an appropriate search strategy. Any given search strategy results in a minimal sweeper's speed in order to be able to detect all evaders. The minimal speed guarantees the ability of the sweeping team to confine evaders to their original domain, and if the sweepers move faster they are able to detect all evaders that are present in the region. We present results on the total search time for a novel pincer-movement based search protocol that utilizes complementary trajectories along with adaptive sensor geometries for any even number of pursuers.
Self-supervised Activity Representation Learning with Incremental Data: An Empirical Study
Authors: Jason Liu, Shohreh Deldari, Hao Xue, Van Nguyen, Flora D. Salim
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.00619
Pdf link: https://arxiv.org/pdf/2305.00619
Abstract In the context of mobile sensing environments, various sensors on mobile devices continually generate a vast amount of data. Analyzing this ever-increasing data presents several challenges, including limited access to annotated data and a constantly changing environment. Recent advancements in self-supervised learning have been utilized as a pre-training step to enhance the performance of conventional supervised models to address the absence of labelled datasets. This research examines the impact of using a self-supervised representation learning model for time series classification tasks in which data is incrementally available. We proposed and evaluated a workflow in which a model learns to extract informative features using a corpus of unlabeled time series data and then conducts classification on labelled data using features extracted by the model. We analyzed the effect of varying the size, distribution, and source of the unlabeled data on the final classification performance across four public datasets, including various types of sensors in diverse applications.
Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing
Authors: Ibrahim Malik, Siddique Latif, Sanaullah Manzoor, Muhammad Usama, Junaid Qadir, Raja Jurdak
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2305.00725
Pdf link: https://arxiv.org/pdf/2305.00725
Abstract Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely on edge computing to analyse emotions conveyed through non-speech expressions like screaming and crying. In particular, we explore knowledge distillation to design a computationally efficient system that can be deployed on edge devices with limited resources without degrading the performance significantly. We comprehensively evaluate our proposed framework using two publicly available datasets and highlight its effectiveness by comparing the results with the well-known MobileNet model. Our results demonstrate the feasibility and effectiveness of using edge computing for non-speech emotion detection, which can potentially improve applications that rely on emotion detection in communication networks. To the best of our knowledge, this is the first work on an edge-computing-based framework for detecting emotions in non-speech audio, offering promising directions for future research.
AI-based Radio and Computing Resource Allocation and Path Planning in NOMA NTNs: AoI Minimization under CSI Uncertainty
Authors: Maryam Ansarifard, Nader Mokari, Mohammadreza Javan, Hamid Saeedi, Eduard A. Jorswieck
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.00780
Pdf link: https://arxiv.org/pdf/2305.00780
Abstract In this paper, we develop a hierarchical aerial computing framework composed of high altitude platform (HAP) and unmanned aerial vehicles (UAVs) to compute the fully offloaded tasks of terrestrial mobile users which are connected through an uplink non-orthogonal multiple access (UL-NOMA). In particular, the problem is formulated to minimize the AoI of all users with elastic tasks, by adjusting UAVs trajectory and resource allocation on both UAVs and HAP, which is restricted by the channel state information (CSI) uncertainty and multiple resource constraints of UAVs and HAP. In order to solve this non-convex optimization problem, two methods of multi-agent deep deterministic policy gradient (MADDPG) and federated reinforcement learning (FRL) are proposed to design the UAVs trajectory and obtain channel, power, and CPU allocations. It is shown that task scheduling significantly reduces the average AoI. This improvement is more pronounced for larger task sizes. On the one hand, it is shown that power allocation has a marginal effect on the average AoI compared to using full transmission power for all users. On the other hand, compared with traditional transmissions (fixed method) simulation result shows that our scheduling scheme has a lower average AoI.
Performance and Energy Consumption of Parallel Machine Learning Algorithms
Authors: Xidong Wu, Preston Brazzle, Stephen Cahoon
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00798
Pdf link: https://arxiv.org/pdf/2305.00798
Abstract Machine learning models have achieved remarkable success in various real-world applications such as data science, computer vision, and natural language processing. However, model training in machine learning requires large-scale data sets and multiple iterations before it can work properly. Parallelization of training algorithms is a common strategy to speed up the process of training. However, many studies on model training and inference focus only on aspects of performance. Power consumption is also an important metric for any type of computation, especially high-performance applications. Machine learning algorithms that can be used on low-power platforms such as sensors and mobile devices have been researched, but less power optimization is done for algorithms designed for high-performance computing. In this paper, we present a C++ implementation of logistic regression and the genetic algorithm, and a Python implementation of neural networks with stochastic gradient descent (SGD) algorithm on classification tasks. We will show the impact that the complexity of the model and the size of the training data have on the parallel efficiency of the algorithm in terms of both power and performance. We also tested these implementations using shard-memory parallelism, distributed memory parallelism, and GPU acceleration to speed up machine learning model training.
Population Protocols with Unordered Data
Authors: Michael Blondin, François Ladouceur
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2305.00872
Pdf link: https://arxiv.org/pdf/2305.00872
Abstract Population protocols form a well-established model of computation of passively mobile anonymous agents with constant-size memory. It is well known that population protocols compute Presburger-definable predicates, such as absolute majority and counting predicates. In this work, we initiate the study of population protocols operating over arbitrarily large data domains. More precisely, we introduce population protocols with unordered data as a formalism to reason about anonymous crowd computing over unordered sequences of data. We first show that it is possible to determine whether an unordered sequence from an infinite data domain has a datum with absolute majority. We then establish the expressive power of the immediate observation restriction of our model, namely where, in each interaction, an agent observes another agent who is unaware of the interaction.
Analysis of reward mechanism for quizmarket
Authors: Noorul Ali
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2305.00915
Pdf link: https://arxiv.org/pdf/2305.00915
Abstract A reward algorithm is needed for games which rewards risk, i.e. early play, and extends the longevity of a reward pool. This would allow a higher number of players and greater engagement. I created a reward mechanism that rewards risk, lasts longer, and is more profitable than existing mechanisms. I also implemented an algorithm within the mechanism to self-correct in outlier performance. This reward mechanism was used in TURBLAZE, a mobile game designed for high school students. The game has quizzes. Gamers pay a fixed fee to participate in a quiz and win a reward if their score is above a certain threshold.
Keyword: pruning

There is no result

Keyword: voxel

An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds
Authors: Zheng Liu, Fu Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00287
Pdf link: https://arxiv.org/pdf/2305.00287
Abstract Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.
Object-Centric Voxelization of Dynamic Scenes via Inverse Neural Rendering
Authors: Siyu Gao, Yanpeng Zhao, Yunbo Wang, Xiaokang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00393
Pdf link: https://arxiv.org/pdf/2305.00393
Abstract Understanding the compositional dynamics of the world in unsupervised 3D scenarios is challenging. Existing approaches either fail to make effective use of time cues or ignore the multi-view consistency of scene decomposition. In this paper, we propose DynaVol, an inverse neural rendering framework that provides a pilot study for learning time-varying volumetric representations for dynamic scenes with multiple entities (like objects). It has two main contributions. First, it maintains a time-dependent 3D grid, which dynamically and flexibly binds the spatial locations to different entities, thus encouraging the separation of information at a representational level. Second, our approach jointly learns grid-level local dynamics, object-level global dynamics, and the compositional neural radiance fields in an end-to-end architecture, thereby enhancing the spatiotemporal consistency of object-centric scene voxelization. We present a two-stage training scheme for DynaVol and validate its effectiveness on various benchmarks with multiple objects, diverse dynamics, and real-world shapes and textures. We present visualization at https://sites.google.com/view/dynavol-visual.
Towards Computational Architecture of Liberty: A Comprehensive Survey on Deep Learning for Generating Virtual Architecture in the Metaverse
Authors: Anqi Wang, Jiahua Dong, Jiachuan Shen, Lik-Hang Lee, Pan Hui
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00510
Pdf link: https://arxiv.org/pdf/2305.00510
Abstract 3D shape generation techniques utilizing deep learning are increasing attention from both computer vision and architectural design. This survey focuses on investigating and comparing the current latest approaches to 3D object generation with deep generative models (DGMs), including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), 3D-aware images, and diffusion models. We discuss 187 articles (80.7% of articles published between 2018-2022) to review the field of generated possibilities of architecture in virtual environments, limited to the architecture form. We provide an overview of architectural research, virtual environment, and related technical approaches, followed by a review of recent trends in discrete voxel generation, 3D models generated from 2D images, and conditional parameters. We highlight under-explored issues in 3D generation and parameterized control that is worth further investigation. Moreover, we speculate that four research agendas including data limitation, editability, evaluation metrics, and human-computer interaction are important enablers of ubiquitous interaction with immersive systems in architecture for computer-aided design Our work contributes to researchers' understanding of the current potential and future needs of deep learnings in generating virtual architecture.
Learning Self-Prior for Mesh Inpainting Using Self-Supervised Graph Convolutional Networks
Authors: Shota Hattori, Tatsuya Yatagawa, Yutaka Ohtake, Hiromasa Suzuki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00635
Pdf link: https://arxiv.org/pdf/2305.00635
Abstract This study presents a self-prior-based mesh inpainting framework that requires only an incomplete mesh as input, without the need for any training datasets. Additionally, our method maintains the polygonal mesh format throughout the inpainting process without converting the shape format to an intermediate, such as a voxel grid, a point cloud, or an implicit function, which are typically considered easier for deep neural networks to process. To achieve this goal, we introduce two graph convolutional networks (GCNs): single-resolution GCN (SGCN) and multi-resolution GCN (MGCN), both trained in a self-supervised manner. Our approach refines a watertight mesh obtained from the initial hole filling to generate a completed output mesh. Specifically, we train the GCNs to deform an oversmoothed version of the input mesh into the expected completed shape. To supervise the GCNs for accurate vertex displacements, despite the unknown correct displacements at real holes, we utilize multiple sets of meshes with several connected regions marked as fake holes. The correct displacements are known for vertices in these fake holes, enabling network training with loss functions that assess the accuracy of displacement vectors estimated by the GCNs. We demonstrate that our method outperforms traditional dataset-independent approaches and exhibits greater robustness compared to other deep-learning-based methods for shapes that less frequently appear in shape datasets.
Keyword: lidar

DSEC-MOS: Segment Any Moving Object with Moving Ego Vehicle
Authors: Zhuyun Zhou, Zongwei Wu, Rémi Boutteau, Fan Yang, Dominique Ginhac
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00126
Pdf link: https://arxiv.org/pdf/2305.00126
Abstract Moving Object Segmentation (MOS), a crucial task in computer vision, has numerous applications such as surveillance, autonomous driving, and video analytics. Existing datasets for moving object segmentation mainly focus on RGB or Lidar videos, but lack additional event information that can enhance the understanding of dynamic scenes. To address this limitation, we propose a novel dataset, called DSEC-MOS. Our dataset includes frames captured by RGB cameras embedded on moving vehicules and incorporates event data, which provide high temporal resolution and low-latency information about changes in the scenes. To generate accurate segmentation mask annotations for moving objects, we apply the recently emerged large model SAM - Segment Anything Model - with moving object bounding boxes from DSEC-MOD serving as prompts and calibrated RGB frames, then further revise the results. Our DSEC-MOS dataset contains in total 16 sequences (13314 images). To the best of our knowledge, DSEC-MOS is also the first moving object segmentation dataset that includes event camera in autonomous driving. Project Page: https://github.com/ZZY-Zhou/DSEC-MOS.
Sensor Equivariance by LiDAR Projection Images
Authors: Hannes Reichert, Manuel Hetzel, Steven Schreck, Konrad Doll, Bernhard Sick
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2305.00221
Pdf link: https://arxiv.org/pdf/2305.00221
Abstract In this work, we propose an extension of conventional image data by an additional channel in which the associated projection properties are encoded. This addresses the issue of sensor-dependent object representation in projection-based sensors, such as LiDAR, which can lead to distorted physical and geometric properties due to variations in sensor resolution and field of view. To that end, we propose an architecture for processing this data in an instance segmentation framework. We focus specifically on LiDAR as a key sensor modality for machine vision tasks and highly automated driving (HAD). Through an experimental setup in a controlled synthetic environment, we identify a bias on sensor resolution and field of view and demonstrate that our proposed method can reduce said bias for the task of LiDAR instance segmentation. Furthermore, we define our method such that it can be applied to other projection-based sensors, such as cameras. To promote transparency, we make our code and dataset publicly available. This method shows the potential to improve performance and robustness in various machine vision tasks that utilize projection-based sensors.
An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds
Authors: Zheng Liu, Fu Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00287
Pdf link: https://arxiv.org/pdf/2305.00287
Abstract Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.
InfraDet3D: Multi-Modal 3D Object Detection based on Roadside Infrastructure Camera and LiDAR Sensors
Authors: Walter Zimmer, Joseph Birkner, Marcel Brucker, Huu Tung Nguyen, Stefan Petrovski, Bohan Wang, Alois C. Knoll
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00314
Pdf link: https://arxiv.org/pdf/2305.00314
Abstract Current multi-modal object detection approaches focus on the vehicle domain and are limited in the perception range and the processing capabilities. Roadside sensor units (RSUs) introduce a new domain for perception systems and leverage altitude to observe traffic. Cameras and LiDARs mounted on gantry bridges increase the perception range and produce a full digital twin of the traffic. In this work, we introduce InfraDet3D, a multi-modal 3D object detector for roadside infrastructure sensors. We fuse two LiDARs using early fusion and further incorporate detections from monocular cameras to increase the robustness and to detect small objects. Our monocular 3D detection module uses HD maps to ground object yaw hypotheses, improving the final perception results. The perception framework is deployed on a real-world intersection that is part of the A9 Test Stretch in Munich, Germany. We perform several ablation studies and experiments and show that fusing two LiDARs with two cameras leads to an improvement of +1.90 mAP compared to a camera-only solution. We evaluate our results on the A9 infrastructure dataset and achieve 68.48 mAP on the test set. The dataset and code will be available at https://a9-dataset.com to allow the research community to further improve the perception results and make autonomous driving safer.
TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object Detection
Authors: Su Pang, Daniel Morris, Hayder Radha
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00397
Pdf link: https://arxiv.org/pdf/2305.00397
Abstract Despite radar's popularity in the automotive industry, for fusion-based 3D object detection, most existing works focus on LiDAR and camera fusion. In this paper, we propose TransCAR, a Transformer-based Camera-And-Radar fusion solution for 3D object detection. Our TransCAR consists of two modules. The first module learns 2D features from surround-view camera images and then uses a sparse set of 3D object queries to index into these 2D features. The vision-updated queries then interact with each other via transformer self-attention layer. The second module learns radar features from multiple radar scans and then applies transformer decoder to learn the interactions between radar features and vision-updated queries. The cross-attention layer within the transformer decoder can adaptively learn the soft-association between the radar features and vision-updated queries instead of hard-association based on sensor calibration only. Finally, our model estimates a bounding box per query using set-to-set Hungarian loss, which enables the method to avoid non-maximum suppression. TransCAR improves the velocity estimation using the radar scans without temporal information. The superior experimental results of our TransCAR on the challenging nuScenes datasets illustrate that our TransCAR outperforms state-of-the-art Camera-Radar fusion-based 3D object detection approaches.
LIMOT: A Tightly-Coupled System for LiDAR-Inertial Odometry and Multi-Object Tracking
Authors: Zhongyang Zhu, Junqiao Zhao, Xuebo Tian, Kai Huang, Chen Ye
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00406
Pdf link: https://arxiv.org/pdf/2305.00406
Abstract Simultaneous localization and mapping (SLAM) is critical to the implementation of autonomous driving. Most LiDAR-inertial SLAM algorithms assume a static environment, leading to unreliable localization in dynamic environments. Furthermore, accurate tracking of moving objects is of great significance for the control and planning of autonomous vehicle operation. This study proposes LIMOT, a tightly-coupled multi-object tracking and LiDAR-inertial SLAM system capable of accurately estimating the poses of both ego-vehicle and objects. First, we use 3D bounding boxes generated by an object detector to represent all movable objects and perform LiDAR odometry using inertial measurement unit (IMU) pre-integration result. Based on the historical trajectories of tracked objects in a sliding window, we perform robust object association. We propose a trajectory-based dynamic feature filtering method, which filters out features belonging to moving objects by leveraging tracking results. Factor graph-based optimization is then conducted to optimize the bias of the IMU and the poses of both the ego-vehicle and surrounding objects in a sliding window. Experiments conducted on KITTI datasets show that our method achieves better pose and tracking accuracy than our previous work DL-SLOT and other SLAM and multi-object tracking baseline methods.
Keyword: diffusion

Unsupervised Discovery of 3D Hierarchical Structure with Generative Diffusion Features
Authors: Nurislam Tursynbek, Marc Niethammer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00067
Pdf link: https://arxiv.org/pdf/2305.00067
Abstract Inspired by recent findings that generative diffusion models learn semantically meaningful representations, we use them to discover the intrinsic hierarchical structure in biomedical 3D images using unsupervised segmentation. We show that features of diffusion models from different stages of a U-Net-based ladder-like architecture capture different hierarchy levels in 3D biomedical images. We design three losses to train a predictive unsupervised segmentation network that encourages the decomposition of 3D volumes into meaningful nested subvolumes that represent a hierarchy. First, we pretrain 3D diffusion models and use the consistency of their features across subvolumes. Second, we use the visual consistency between subvolumes. Third, we use the invariance to photometric augmentations as a regularizer. Our models achieve better performance than prior unsupervised structure discovery approaches on challenging biologically-inspired synthetic datasets and on a real-world brain tumor MRI dataset.
Temporal Subsampling Diminishes Small Spatial Scales in Recurrent Neural Network Emulators of Geophysical Turbulence
Authors: Timothy A. Smith, Stephen G. Penny, Jason A. Platt, Tse-Chun Chen
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2305.00100
Pdf link: https://arxiv.org/pdf/2305.00100
Abstract The immense computational cost of traditional numerical weather and climate models has sparked the development of machine learning (ML) based emulators. Because ML methods benefit from long records of training data, it is common to use datasets that are temporally subsampled relative to the time steps required for the numerical integration of differential equations. Here, we investigate how this often overlooked processing step affects the quality of an emulator's predictions. We implement two ML architectures from a class of methods called reservoir computing: (1) a form of Nonlinear Vector Autoregression (NVAR), and (2) an Echo State Network (ESN). Despite their simplicity, it is well documented that these architectures excel at predicting low dimensional chaotic dynamics. We are therefore motivated to test these architectures in an idealized setting of predicting high dimensional geophysical turbulence as represented by Surface Quasi-Geostrophic dynamics. In all cases, subsampling the training data consistently leads to an increased bias at small spatial scales that resembles numerical diffusion. Interestingly, the NVAR architecture becomes unstable when the temporal resolution is increased, indicating that the polynomial based interactions are insufficient at capturing the detailed nonlinearities of the turbulent flow. The ESN architecture is found to be more robust, suggesting a benefit to the more expensive but more general structure. Spectral errors are reduced by including a penalty on the kinetic energy density spectrum during training, although the subsampling related errors persist. Future work is warranted to understand how the temporal resolution of training data affects other ML architectures.
Towards Computational Architecture of Liberty: A Comprehensive Survey on Deep Learning for Generating Virtual Architecture in the Metaverse
Authors: Anqi Wang, Jiahua Dong, Jiachuan Shen, Lik-Hang Lee, Pan Hui
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00510
Pdf link: https://arxiv.org/pdf/2305.00510
Abstract 3D shape generation techniques utilizing deep learning are increasing attention from both computer vision and architectural design. This survey focuses on investigating and comparing the current latest approaches to 3D object generation with deep generative models (DGMs), including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), 3D-aware images, and diffusion models. We discuss 187 articles (80.7% of articles published between 2018-2022) to review the field of generated possibilities of architecture in virtual environments, limited to the architecture form. We provide an overview of architectural research, virtual environment, and related technical approaches, followed by a review of recent trends in discrete voxel generation, 3D models generated from 2D images, and conditional parameters. We highlight under-explored issues in 3D generation and parameterized control that is worth further investigation. Moreover, we speculate that four research agendas including data limitation, editability, evaluation metrics, and human-computer interaction are important enablers of ubiquitous interaction with immersive systems in architecture for computer-aided design Our work contributes to researchers' understanding of the current potential and future needs of deep learnings in generating virtual architecture.
Class-Balancing Diffusion Models
Authors: Yiming Qin, Huangjie Zheng, Jiangchao Yao, Mingyuan Zhou, Ya Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00562
Pdf link: https://arxiv.org/pdf/2305.00562
Abstract Diffusion-based models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies. However, such observation is only justified with curated data distribution, where the data samples are nicely pre-processed to be uniformly distributed in terms of their labels. In practice, a long-tailed data distribution appears more common and how diffusion models perform on such class-imbalanced data remains unknown. In this work, we first investigate this problem and observe significant degradation in both diversity and fidelity when the diffusion model is trained on datasets with class-imbalanced distributions. Especially in tail classes, the generations largely lose diversity and we observe severe mode-collapse issues. To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution. Experiments show that images generated by CBDM exhibit higher diversity and quality in both quantitative and qualitative ways. Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
Diffusion Models for Time Series Applications: A Survey
Authors: Lequan Lin, Zhengkun Li, Ruikun Li, Xuliang Li, Junbin Gao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00624
Pdf link: https://arxiv.org/pdf/2305.00624
Abstract Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With a distinguished performance in generating samples that resemble the observed data, diffusion models are widely used in image, video, and text synthesis nowadays. In recent years, the concept of diffusion has been extended to time series applications, and many powerful models have been developed. Considering the deficiency of a methodical summary and discourse on these models, we provide this survey as an elementary resource for new researchers in this area and also an inspiration to motivate future research. For better understanding, we include an introduction about the basics of diffusion models. Except for this, we primarily focus on diffusion-based methods for time series forecasting, imputation, and generation, and present them respectively in three individual sections. We also compare different methods for the same application and highlight their connections if applicable. Lastly, we conclude the common limitation of diffusion-based methods and highlight potential future research directions.
Quality of approximating a mass-emitting object by a point source in a diffusion model
Authors: Qiyao Peng, Sander C. Hille
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.00717
Pdf link: https://arxiv.org/pdf/2305.00717
Abstract For the sake of computational efficiency and for theoretical purposes, in mathematical modelling, the Dirac Delta distributions are often utilized as a replacement for cells or vesicles, since the size of cells or vesicles is much smaller than the size of the surrounding tissues. Here, we consider the scenario that the cell or the vesicle releases the diffusive compounds to the immediate environment, which is modelled by the diffusion equation. Typically, one separates the intracellular and extracellular environment and uses homogeneous Neumann boundary condition for the cell boundary (so-called spatial exclusion approach), while the point source approach neglects the intracellular environment. We show that extra conditions are needed such that the solutions to the two approaches are consistent. We prove a necessary and sufficient condition for the consistency. Suggested by the numerical results, we conclude that an initial condition in the form of Gaussian kernel in the point source approach compensates for a time-delay discrepancy between the solutions to the two approaches in the numerical solutions. Various approaches determining optimal amplitude and variance of the Gaussian kernel have been discussed.
Keyword: dynamic

HermesBDD: A Multi-Core and Multi-Platform Binary Decision Diagram Package
Authors: Luigi Capogrosso, Luca Geretti, Marco Cristani, Franco Fummi, Tiziano Villa
Subjects: Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2305.00039
Pdf link: https://arxiv.org/pdf/2305.00039
Abstract BDDs are representations of a Boolean expression in the form of a directed acyclic graph. BDDs are widely used in several fields, particularly in model checking and hardware verification. There are several implementations for BDD manipulation, where each package differs depending on the application. This paper presents HermesBDD: a novel multi-core and multi-platform binary decision diagram package focused on high performance and usability. HermesBDD supports a static and dynamic memory management mechanism, the possibility to exploit lock-free hash tables, and a simple parallel implementation of the If-Then-Else procedure based on a higher-level wrapper for threads and futures. HermesBDD is completely written in C++ with no need to rely on external libraries and is developed according to software engineering principles for reliability and easy maintenance over time. We provide experimental results on the n-Queens problem, the de-facto SAT solver benchmark for BDDs, demonstrating a significant speedup of 18.73x over our non-parallel baselines, and a remarkable performance boost w.r.t. other state-of-the-art BDDs packages.
An Integrated System Dynamics and Discrete Event Supply Chain Simulation Framework for Supply Chain Resilience with Non-Stationary Pandemic Demand
Authors: Mustafa Can Camur, Chin-Yuan Tseng, Aristotelis E. Thanos, Chelsea C. White, Walter Yund, Eleftherios Iakovou
Subjects: Multiagent Systems (cs.MA); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00086
Pdf link: https://arxiv.org/pdf/2305.00086
Abstract COVID-19 resulted in some of the largest supply chain disruptions in recent history. To mitigate the impact of future disruptions, we propose an integrated hybrid simulation framework to couple nonstationary demand signals from an event like COVID-19 with a model of an end-to-end supply chain. We first create a system dynamics susceptible-infected-recovered (SIR) model, augmenting a classic epidemiological model to create a realistic portrayal of demand patterns for oxygen concentrators (OC). Informed by this granular demand signal, we then create a supply chain discrete event simulation model of OC sourcing, manufacturing, and distribution to test production augmentation policies to satisfy this increased demand. This model utilizes publicly available data, engineering teardowns of OCs, and a supply chain illumination to identify suppliers. Our findings indicate that this coupled approach can use realistic demand during a disruptive event to enable rapid recommendations of policies for increased supply chain resilience with controlled cost.
Improving Gradient Computation for Differentiable Physics Simulation with Contacts
Authors: Yaofeng Desmond Zhong, Jiequn Han, Biswadip Dey, Georgia Olympia Brikis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2305.00092
Pdf link: https://arxiv.org/pdf/2305.00092
Abstract Differentiable simulation enables gradients to be back-propagated through physics simulations. In this way, one can learn the dynamics and properties of a physics system by gradient-based optimization or embed the whole differentiable simulation as a layer in a deep learning model for downstream tasks, such as planning and control. However, differentiable simulation at its current stage is not perfect and might provide wrong gradients that deteriorate its performance in learning tasks. In this paper, we study differentiable rigid-body simulation with contacts. We find that existing differentiable simulation methods provide inaccurate gradients when the contact normal direction is not fixed - a general situation when the contacts are between two moving objects. We propose to improve gradient computation by continuous collision detection and leverage the time-of-impact (TOI) to calculate the post-collision velocities. We demonstrate our proposed method, referred to as TOI-Velocity, on two optimal control problems. We show that with TOI-Velocity, we are able to learn an optimal control sequence that matches the analytical solution, while without TOI-Velocity, existing differentiable simulation methods fail to do so.
Latent Dynamics Networks (LDNets): learning the intrinsic dynamics of spatio-temporal processes
Authors: Francesco Regazzoni, Stefano Pagani, Matteo Salvador, Luca Dede', Alfio Quarteroni
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.00094
Pdf link: https://arxiv.org/pdf/2305.00094
Abstract Predicting the evolution of systems that exhibit spatio-temporal dynamics in response to external stimuli is a key enabling technology fostering scientific innovation. Traditional equations-based approaches leverage first principles to yield predictions through the numerical approximation of high-dimensional systems of differential equations, thus calling for large-scale parallel computing platforms and requiring large computational costs. Data-driven approaches, instead, enable the description of systems evolution in low-dimensional latent spaces, by leveraging dimensionality reduction and deep learning algorithms. We propose a novel architecture, named Latent Dynamics Network (LDNet), which is able to discover low-dimensional intrinsic dynamics of possibly non-Markovian dynamical systems, thus predicting the time evolution of space-dependent fields in response to external inputs. Unlike popular approaches, in which the latent representation of the solution manifold is learned by means of auto-encoders that map a high-dimensional discretization of the system state into itself, LDNets automatically discover a low-dimensional manifold while learning the latent dynamics, without ever operating in the high-dimensional space. Furthermore, LDNets are meshless algorithms that do not reconstruct the output on a predetermined grid of points, but rather at any point of the domain, thus enabling weight-sharing across query-points. These features make LDNets lightweight and easy-to-train, with excellent accuracy and generalization properties, even in time-extrapolation regimes. We validate our method on several test cases and we show that, for a challenging highly-nonlinear problem, LDNets outperform state-of-the-art methods in terms of accuracy (normalized error 5 times smaller), by employing a dramatically smaller number of trainable parameters (more than 10 times fewer).
Temporal Subsampling Diminishes Small Spatial Scales in Recurrent Neural Network Emulators of Geophysical Turbulence
Authors: Timothy A. Smith, Stephen G. Penny, Jason A. Platt, Tse-Chun Chen
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2305.00100
Pdf link: https://arxiv.org/pdf/2305.00100
Abstract The immense computational cost of traditional numerical weather and climate models has sparked the development of machine learning (ML) based emulators. Because ML methods benefit from long records of training data, it is common to use datasets that are temporally subsampled relative to the time steps required for the numerical integration of differential equations. Here, we investigate how this often overlooked processing step affects the quality of an emulator's predictions. We implement two ML architectures from a class of methods called reservoir computing: (1) a form of Nonlinear Vector Autoregression (NVAR), and (2) an Echo State Network (ESN). Despite their simplicity, it is well documented that these architectures excel at predicting low dimensional chaotic dynamics. We are therefore motivated to test these architectures in an idealized setting of predicting high dimensional geophysical turbulence as represented by Surface Quasi-Geostrophic dynamics. In all cases, subsampling the training data consistently leads to an increased bias at small spatial scales that resembles numerical diffusion. Interestingly, the NVAR architecture becomes unstable when the temporal resolution is increased, indicating that the polynomial based interactions are insufficient at capturing the detailed nonlinearities of the turbulent flow. The ESN architecture is found to be more robust, suggesting a benefit to the more expensive but more general structure. Spectral errors are reduced by including a penalty on the kinetic energy density spectrum during training, although the subsampling related errors persist. Future work is warranted to understand how the temporal resolution of training data affects other ML architectures.
Faster Submodular Maximization for Several Classes of Matroids
Authors: Monika Henzinger, Paul Liu, Jan Vondrak, Da Wei Zheng
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.00122
Pdf link: https://arxiv.org/pdf/2305.00122
Abstract The maximization of submodular functions have found widespread application in areas such as machine learning, combinatorial optimization, and economics, where practitioners often wish to enforce various constraints; the matroid constraint has been investigated extensively due to its algorithmic properties and expressive power. Recent progress has focused on fast algorithms for important classes of matroids given in explicit form. Currently, nearly-linear time algorithms only exist for graphic and partition matroids [ICALP '19]. In this work, we develop algorithms for monotone submodular maximization constrained by graphic, transversal matroids, or laminar matroids in time near-linear in the size of their representation. Our algorithms achieve an optimal approximation of $1-1/e-\epsilon$ and both generalize and accelerate the results of Ene and Nguyen [ICALP '19]. In fact, the running time of our algorithm cannot be improved within the fast continuous greedy framework of Badanidiyuru and Vondr\'ak [SODA '14]. To achieve near-linear running time, we make use of dynamic data structures that maintain bases with approximate maximum cardinality and weight under certain element updates. These data structures need to support a weight decrease operation and a novel FREEZE operation that allows the algorithm to freeze elements (i.e. force to be contained) in its basis regardless of future data structure operations. For the laminar matroid, we present a new dynamic data structure using the top tree interface of Alstrup, Holm, de Lichtenberg, and Thorup [TALG '05] that maintains the maximum weight basis under insertions and deletions of elements in $O(\log n)$ time. For the transversal matroid the FREEZE operation corresponds to requiring the data structure to keep a certain set $S$ of vertices matched, a property that we call $S$-stability.
DSEC-MOS: Segment Any Moving Object with Moving Ego Vehicle
Authors: Zhuyun Zhou, Zongwei Wu, Rémi Boutteau, Fan Yang, Dominique Ginhac
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00126
Pdf link: https://arxiv.org/pdf/2305.00126
Abstract Moving Object Segmentation (MOS), a crucial task in computer vision, has numerous applications such as surveillance, autonomous driving, and video analytics. Existing datasets for moving object segmentation mainly focus on RGB or Lidar videos, but lack additional event information that can enhance the understanding of dynamic scenes. To address this limitation, we propose a novel dataset, called DSEC-MOS. Our dataset includes frames captured by RGB cameras embedded on moving vehicules and incorporates event data, which provide high temporal resolution and low-latency information about changes in the scenes. To generate accurate segmentation mask annotations for moving objects, we apply the recently emerged large model SAM - Segment Anything Model - with moving object bounding boxes from DSEC-MOD serving as prompts and calibrated RGB frames, then further revise the results. Our DSEC-MOS dataset contains in total 16 sequences (13314 images). To the best of our knowledge, DSEC-MOS is also the first moving object segmentation dataset that includes event camera in autonomous driving. Project Page: https://github.com/ZZY-Zhou/DSEC-MOS.
Optimal Scheduling in IoT-Driven Smart Isolated Microgrids Based on Deep Reinforcement Learning
Authors: Jiaju Qi, Lei Lei, Kan Zheng, Simon X. Yang, Xuemin (Sherman)Shen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00127
Pdf link: https://arxiv.org/pdf/2305.00127
Abstract In this paper, we investigate the scheduling issue of diesel generators (DGs) in an Internet of Things (IoT)-Driven isolated microgrid (MG) by deep reinforcement learning (DRL). The renewable energy is fully exploited under the uncertainty of renewable generation and load demand. The DRL agent learns an optimal policy from history renewable and load data of previous days, where the policy can generate real-time decisions based on observations of past renewable and load data of previous hours collected by connected sensors. The goal is to reduce operating cost on the premise of ensuring supply-demand balance. In specific, a novel finite-horizon partial observable Markov decision process (POMDP) model is conceived considering the spinning reserve. In order to overcome the challenge of discrete-continuous hybrid action space due to the binary DG switching decision and continuous energy dispatch (ED) decision, a DRL algorithm, namely the hybrid action finite-horizon RDPG (HAFH-RDPG), is proposed. HAFH-RDPG seamlessly integrates two classical DRL algorithms, i.e., deep Q-network (DQN) and recurrent deterministic policy gradient (RDPG), based on a finite-horizon dynamic programming (DP) framework. Extensive experiments are performed with real-world data in an IoT-driven MG to evaluate the capability of the proposed algorithm in handling the uncertainty due to inter-hour and inter-day power fluctuation and to compare its performance with those of the benchmark algorithms.
Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances
Authors: Bin Du, Kun Qian, Christian Claudel, Dengfeng Sun
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2305.00154
Pdf link: https://arxiv.org/pdf/2305.00154
Abstract This paper proposes to leverage the emerging~learning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilized~to aid the computation-efficient cooperation among~multiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the two~types of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.
Uniqueness and Rapid Mixing in the Bipartite Hardcore Model
Authors: Xiaoyu Chen, Jingcheng Liu, Yitong Yin
Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2305.00186
Pdf link: https://arxiv.org/pdf/2305.00186
Abstract We characterize the uniqueness condition in the hardcore model for bipartite graphs with degree bounds only on one side, and provide a nearly linear time sampling algorithm that works up to the uniqueness threshold. We show that the uniqueness threshold for bipartite graph has almost the same form of the tree uniqueness threshold for general graphs, except with degree bounds only on one side of the bipartition. The hardcore model from statistical physics can be seen as a weighted enumeration of independent sets. Its bipartite version (#BIS) is a central open problem in approximate counting. Compared to the same problem in a general graph, surprising tractable regime have been identified that are believed to be hard in general. This is made possible by two lines of algorithmic approach: the high-temperature algorithms starting from Liu and Lu (STOC 2015), and the low-temperature algorithms starting from Helmuth, Perkins, and Regts (STOC 2019). In this work, we study the limit of these algorithms in the high-temperature case. Our characterization of the uniqueness condition is obtained by proving decay of correlations for arguably the best possible regime, which involves locating fixpoints of multivariate iterative rational maps and showing their contraction. We also give a nearly linear time sampling algorithm based on simulating field dynamics only on one side of the bipartite graph that works up to the uniqueness threshold. Our algorithm is very different from the original high-temperature algorithm of Liu and Lu, and it makes use of a connection between correlation decay and spectral independence of Markov chains. Last but not the least, we are able to show that the standard Glauber dynamics on both side of the bipartite graph mixes in polynomial time up to the uniqueness.
Large-Scale Assessment of Labour Market Dynamics in China during the COVID-19 Pandemic
Authors: Ying Sun, Hengshu Zhu, Hui Xiong
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2305.00199
Pdf link: https://arxiv.org/pdf/2305.00199
Abstract The outbreak of the COVID-19 pandemic has had an unprecedented impact on China's labour market, and has largely changed the structure of labour supply and demand in different regions. It becomes critical for policy makers to understand the emerging dynamics of the post-pandemic labour market and provide the right policies for supporting the sustainable development of regional economies. To this end, in this paper, we provide a data-driven approach to assess and understand the evolving dynamics in regions' labour markets with large-scale online job search queries and job postings. In particular, we model the spatial-temporal patterns of labour flow and labour demand which reflect the attractiveness of regional labour markets. Our analysis shows that regional labour markets suffered from dramatic changes and demonstrated unusual signs of recovery during the pandemic. Specifically, the intention of labour flow quickly recovered with a trend of migrating from large to small cities and from northern to southern regions, respectively. Meanwhile, due to the pandemic, the demand of blue-collar workers has been substantially reduced compared to that of white-collar workers. In addition, the demand structure of blue-collar jobs also changed from manufacturing to service industries. Our findings reveal that the pandemic can cause varied impacts on regions with different structures of labour demand and control policies. This analysis provides timely information for both individuals and organizations in confronting the dynamic change in job markets during the extreme events, such as pandemics. Also, the governments can be better assisted for providing the right policies on job markets in facilitating the sustainable development of regions' economies.
Deep Learning Based Channel Estimation in High Mobility Communications Using Bi-RNN Networks
Authors: Abdul Karim Gizzini, Marwa Chafii
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00208
Pdf link: https://arxiv.org/pdf/2305.00208
Abstract Doubly-selective channel estimation represents a key element in ensuring communication reliability in wireless systems. Due to the impact of multi-path propagation and Doppler interference in dynamic environments, doubly-selective channel estimation becomes challenging. Conventional channel estimation schemes encounter performance degradation in high mobility scenarios due to the usage of limited training pilots. Recently, deep learning (DL) has been utilized for doubly-selective channel estimation, where convolutional neural network (CNN) networks are employed in the frame-by-frame (FBF) channel estimation. However, CNN-based estimators require high complexity, making them impractical in real-case scenarios. For this purpose, we overcome this issue by proposing an optimized and robust bi-directional recurrent neural network (Bi-RNN) based channel estimator to accurately estimate the doubly-selective channel, especially in high mobility scenarios. The proposed estimator is based on performing end-to-end interpolation using gated recurrent unit (GRU) unit. Extensive numerical experiments demonstrate that the developed Bi-GRU estimator significantly outperforms the recently proposed CNN-based estimators in different mobility scenarios, while substantially reducing the overall computational complexity.
Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention
Authors: Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00262
Pdf link: https://arxiv.org/pdf/2305.00262
Abstract Compared with standard text, understanding dialogue is more challenging for machines as the dynamic and unexpected semantic changes in each turn. To model such inconsistent semantics, we propose a simple but effective Hierarchical Dialogue Understanding model, HiDialog. Specifically, we first insert multiple special tokens into a dialogue and propose the turn-level attention to learn turn embeddings hierarchically. Then, a heterogeneous graph module is leveraged to polish the learned embeddings. We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification. Results show that our simple approach achieves state-of-the-art performance on all three tasks above. All our source code is publicly available at https://github.com/ShawX825/HiDialog.
ZIRCON: Zero-watermarking-based approach for data integrity and secure provenance in IoT networks
Authors: Omair Faraj, David Megías, Joaquin Garcia-Alfaro
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00266
Pdf link: https://arxiv.org/pdf/2305.00266
Abstract The Internet of Things (IoT) is integrating the Internet and smart devices in almost every domain such as home automation, e-healthcare systems, vehicular networks, industrial control and military applications. In these sectors, sensory data, which is collected from multiple sources and managed through intermediate processing by multiple nodes, is used for decision-making processes. Ensuring data integrity and keeping track of data provenance is a core requirement in such a highly dynamic context, since data provenance is an important tool for the assurance of data trustworthiness. Dealing with such requirements is challenging due to the limited computational and energy resources in IoT networks. This requires addressing several challenges such as processing overhead, secure provenance, bandwidth consumption and storage efficiency. In this paper, we propose ZIRCON, a novel zero-watermarking approach to establish end-to-end data trustworthiness in an IoT network. In ZIRCON, provenance information is stored in a tamper-proof centralized network database through watermarks, generated at source node before transmission. We provide an extensive security analysis showing the resilience of our scheme against passive and active attacks. We also compare our scheme with existing works based on performance metrics such as computational time, energy utilization and cost analysis. The results show that ZIRCON is robust against several attacks, lightweight, storage efficient, and better in energy utilization and bandwidth consumption, compared to prior art.
A spectral method for a Fokker-Planck equation in neuroscience with applications in neural networks with learning rules
Authors: Pei Zhang, Yanli Wang, Zhennan Zhou
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.00275
Pdf link: https://arxiv.org/pdf/2305.00275
Abstract In this work, we consider the Fokker-Planck equation of the Nonlinear Noisy Leaky Integrate-and-Fire (NNLIF) model for neuron networks. Due to the firing events of neurons at the microscopic level, this Fokker-Planck equation contains dynamic boundary conditions involving specific internal points. To efficiently solve this problem and explore the properties of the unknown, we construct a flexible numerical scheme for the Fokker-Planck equation in the framework of spectral methods that can accurately handle the dynamic boundary condition. This numerical scheme is stable with suitable choices of test function spaces, and asymptotic preserving, and it is easily extendable to variant models with multiple time scales. We also present extensive numerical examples to verify the scheme properties, including order of convergence and time efficiency, and explore unique properties of the model, including blow-up phenomena for the NNLIF model and learning and discriminative properties for the NNLIF model with learning rules.
Improving Classification of Retinal Fundus Image Using Flow Dynamics Optimized Deep Learning Methods
Authors: V. Banupriya, S. Anusuya
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.00294
Pdf link: https://arxiv.org/pdf/2305.00294
Abstract Diabetic Retinopathy (DR) refers to a barrier that takes place in diabetes mellitus damaging the blood vessel network present in the retina. This may endanger the subjects' vision if they have diabetes. It can take some time to perform a DR diagnosis using color fundus pictures because experienced clinicians are required to identify the tumors in the imagery used to identify the illness. Automated detection of the DR can be an extremely challenging task. Convolutional Neural Networks (CNN) are also highly effective at classifying images when applied in the present situation, particularly compared to the handmade and functionality methods employed. In order to guarantee high results, the researchers also suggested a cutting-edge CNN model that might determine the characteristics of the fundus images. The features of the CNN output were employed in various classifiers of machine learning for the proposed system. This model was later evaluated using different forms of deep learning methods and Visual Geometry Group (VGG) networks). It was done by employing the images from a generic KAGGLE dataset. Here, the River Formation Dynamics (RFD) algorithm proposed along with the FUNDNET to detect retinal fundus images has been employed. The investigation's findings demonstrated that the approach performed better than alternative approaches.
Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data
Authors: Arthur Josi, Mahdi Alehdaghi, Rafael M. O. Cruz, Eric Granger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00320
Pdf link: https://arxiv.org/pdf/2305.00320
Abstract Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras. The task is challenging due to the significant differences between V and I modalities, especially under real-world conditions, where images are corrupted by, e.g, blur, noise, and weather. Indeed, state-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy. In this paper, we propose an efficient model for multimodal V-I ReID -- named Multimodal Middle Stream Fusion (MMSF) -- that preserves modality-specific knowledge for improved robustness to corrupted multimodal images. In addition, three state-of-art attention-based multimodal fusion models are adapted to address corrupted multimodal data in V-I ReID, allowing to dynamically balance each modality importance. Recently, evaluation protocols have been proposed to assess the robustness of ReID models under challenging real-world scenarios. However, these protocols are limited to unimodal V settings. For realistic evaluation of multimodal (and cross-modal) V-I person ReID models, we propose new challenging corrupted datasets for scenarios where V and I cameras are co-located (CL) and not co-located (NCL). Finally, the benefits of our Masking and Local Multimodal Data Augmentation (ML-MDA) strategy are explored to improve the robustness of ReID models to multimodal corruption. Our experiments on clean and corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD datasets indicate the multimodal V-I ReID models that are more likely to perform well in real-world operational conditions. In particular, our ML-MDA is an important strategy for a V-I person ReID system to sustain high accuracy and robustness when processing corrupted multimodal images. Also, our multimodal ReID model MMSF outperforms every method under CL and NCL camera scenarios.
Maximum Match Subsequence Alignment Algorithm Finely Grained (MMSAA FG)
Authors: Bharath Reddy, Richard Fields
Subjects: Information Theory (cs.IT); Genomics (q-bio.GN)
Arxiv link: https://arxiv.org/abs/2305.00329
Pdf link: https://arxiv.org/pdf/2305.00329
Abstract Sequence alignment is common nowadays as it is used in many fields to determine how closely two sequences are related and at times to see how little they differ. In computational biology / Bioinformatics, there are many algorithms developed over the course of time to not only align two sequences quickly but also get good laboratory results from these alignments. The first algorithms developed were based of a technique called Dynamic Programming, which were very slow but were optimal when it comes to sensitivity. To improve speed, more algorithms today are based of heuristic approach, by sacrificing sensitivity. In this paper, we are going to improve on a heuristic algorithm called MASAA (Multiple Anchor Staged Local Sequence Alignment Algorithm) and MASAA Sensitive which we published previously. This new algorithm appropriately called Maximum Match Subsequence Alignment Algorithm Finely Grained. The algorithm is based on suffix tree data structure like our previous algorithms, but to improve sensitivity, we employ adaptive seeds, and finely grained perfect match seeds in between the already identified anchors. We tested this algorithm on a randomly generated sequences, and Rosetta dataset where the sequence length ranged up to 500 thousand.
Critical Scenario Generation for Developing Trustworthy Autonomy
Authors: Wenhao Ding
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00339
Pdf link: https://arxiv.org/pdf/2305.00339
Abstract Autonomous systems, such as self-driving vehicles, quadrupeds, and robot manipulators, are largely enabled by the rapid development of artificial intelligence. However, such systems involve several trustworthy challenges such as safety, robustness, and generalization, due to their deployment in open-ended and real-time environments. To evaluate and improve trustworthiness, simulations or so-called digital twins are largely utilized for system development with low cost and high efficiency. One important thing in virtual simulations is scenarios that consist of static and dynamic objects, specific tasks, and evaluation metrics. However, designing diverse, realistic, and effective scenarios is still a challenging problem. One straightforward way is creating scenarios through human design, which is time-consuming and limited by the experience of experts. Another method commonly used in self-driving areas is log replay. This method collects scenario data in the real world and then replays it in simulations or adds random perturbations. Although the replay scenarios are realistic, most of the collected scenarios are redundant since they are all ordinary scenarios that only consider a small portion of critical cases. The desired scenarios should cover all cases in the real world, especially rare but critical events with extremely low probability. Critical scenarios are rare but important to test autonomous systems under risky conditions and unpredictable perturbations, which reveal their trustworthiness.
Neural Radiance Fields (NeRFs): A Review and Some Recent Developments
Authors: Mohamed Debbagh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2305.00375
Pdf link: https://arxiv.org/pdf/2305.00375
Abstract Neural Radiance Field (NeRF) is a framework that represents a 3D scene in the weights of a fully connected neural network, known as the Multi-Layer Perception(MLP). The method was introduced for the task of novel view synthesis and is able to achieve state-of-the-art photorealistic image renderings from a given continuous viewpoint. NeRFs have become a popular field of research as recent developments have been made that expand the performance and capabilities of the base framework. Recent developments include methods that require less images to train the model for view synthesis as well as methods that are able to generate views from unconstrained and dynamic scene representations.
Image Completion via Dual-path Cooperative Filtering
Authors: Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00379
Pdf link: https://arxiv.org/pdf/2305.00379
Abstract Given the recent advances with image-generating algorithms, deep image completion methods have made significant progress. However, state-of-art methods typically provide poor cross-scene generalization, and generated masked areas often contain blurry artifacts. Predictive filtering is a method for restoring images, which predicts the most effective kernels based on the input scene. Motivated by this approach, we address image completion as a filtering problem. Deep feature-level semantic filtering is introduced to fill in missing information, while preserving local structure and generating visually realistic content. In particular, a Dual-path Cooperative Filtering (DCF) model is proposed, where one path predicts dynamic kernels, and the other path extracts multi-level features by using Fast Fourier Convolution to yield semantically coherent reconstructions. Experiments on three challenging image completion datasets show that our proposed DCF outperforms state-of-art methods.
Object-Centric Voxelization of Dynamic Scenes via Inverse Neural Rendering
Authors: Siyu Gao, Yanpeng Zhao, Yunbo Wang, Xiaokang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00393
Pdf link: https://arxiv.org/pdf/2305.00393
Abstract Understanding the compositional dynamics of the world in unsupervised 3D scenarios is challenging. Existing approaches either fail to make effective use of time cues or ignore the multi-view consistency of scene decomposition. In this paper, we propose DynaVol, an inverse neural rendering framework that provides a pilot study for learning time-varying volumetric representations for dynamic scenes with multiple entities (like objects). It has two main contributions. First, it maintains a time-dependent 3D grid, which dynamically and flexibly binds the spatial locations to different entities, thus encouraging the separation of information at a representational level. Second, our approach jointly learns grid-level local dynamics, object-level global dynamics, and the compositional neural radiance fields in an end-to-end architecture, thereby enhancing the spatiotemporal consistency of object-centric scene voxelization. We present a two-stage training scheme for DynaVol and validate its effectiveness on various benchmarks with multiple objects, diverse dynamics, and real-world shapes and textures. We present visualization at https://sites.google.com/view/dynavol-visual.
LIMOT: A Tightly-Coupled System for LiDAR-Inertial Odometry and Multi-Object Tracking
Authors: Zhongyang Zhu, Junqiao Zhao, Xuebo Tian, Kai Huang, Chen Ye
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00406
Pdf link: https://arxiv.org/pdf/2305.00406
Abstract Simultaneous localization and mapping (SLAM) is critical to the implementation of autonomous driving. Most LiDAR-inertial SLAM algorithms assume a static environment, leading to unreliable localization in dynamic environments. Furthermore, accurate tracking of moving objects is of great significance for the control and planning of autonomous vehicle operation. This study proposes LIMOT, a tightly-coupled multi-object tracking and LiDAR-inertial SLAM system capable of accurately estimating the poses of both ego-vehicle and objects. First, we use 3D bounding boxes generated by an object detector to represent all movable objects and perform LiDAR odometry using inertial measurement unit (IMU) pre-integration result. Based on the historical trajectories of tracked objects in a sliding window, we perform robust object association. We propose a trajectory-based dynamic feature filtering method, which filters out features belonging to moving objects by leveraging tracking results. Factor graph-based optimization is then conducted to optimize the bias of the IMU and the poses of both the ego-vehicle and surrounding objects in a sliding window. Experiments conducted on KITTI datasets show that our method achieves better pose and tracking accuracy than our previous work DL-SLOT and other SLAM and multi-object tracking baseline methods.
Dynamic Obstacles Tracking in mmWave Networks
Authors: Rathindra Nath Dutta, Subhojit Sarkar, Sasthi C. Ghosh
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.00429
Pdf link: https://arxiv.org/pdf/2305.00429
Abstract The advent of fifth generation communication networks has led to novel opportunities and problems that were absent in legacy networks. Stringent line-of-sight demands necessitated by fast attenuating nature of millimeter waves (mmWave) through obstacles, pose to be one of the central problems of the field. mmWave links are easily disrupted due to obstacles, both static and dynamic. Handling static obstacles is easy, while dynamic obstacles are usually tracked by expensive additional hardware like cameras and radars, which undoubtedly lead to increased deployment costs. In this manuscript, we propose a novel approach to estimate the trajectories of multiple dynamic obstacles in an ultra dense mmWave network, solely based on link failure information, without resorting to any specialized tracking hardware. We keep a track of link failures over a short window of time and use that knowledge to extrapolate the trajectories of dynamic obstacles. After proving its NP-completeness, we employ a greedy set cover based approach for this. We then use the obtained trajectories to tag upcoming links as per their blockage possibility. We simulate on real world data to validate our approach based on its accuracy, sensitivity, and precision. Our approach is also shown to outperform an existing one.
EVREAL: Towards a Comprehensive Benchmark and Analysis Suite for Event-based Video Reconstruction
Authors: Burak Ercan, Onur Eker, Aykut Erdem, Erkut Erdem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00434
Pdf link: https://arxiv.org/pdf/2305.00434
Abstract Event cameras are a new type of vision sensor that incorporates asynchronous and independent pixels, offering advantages over traditional frame-based cameras such as high dynamic range and minimal motion blur. However, their output is not easily understandable by humans, making the reconstruction of intensity images from event streams a fundamental task in event-based vision. While recent deep learning-based methods have shown promise in video reconstruction from events, this problem is not completely solved yet. To facilitate comparison between different approaches, standardized evaluation protocols and diverse test datasets are essential. This paper proposes a unified evaluation methodology and introduces an open-source framework called EVREAL to comprehensively benchmark and analyze various event-based video reconstruction methods from the literature. Using EVREAL, we give a detailed analysis of the state-of-the-art methods for event-based video reconstruction, and provide valuable insights into the performance of these methods under varying settings, challenging scenarios, and downstream tasks.
Learning, Diversity and Adaptation in Changing Environments: The Role of Weak Links
Authors: Daron Acemoglu, Asuman Ozdaglar, Sarath Pattathil
Subjects: Social and Information Networks (cs.SI); Theoretical Economics (econ.TH)
Arxiv link: https://arxiv.org/abs/2305.00474
Pdf link: https://arxiv.org/pdf/2305.00474
Abstract Adaptation to dynamic conditions requires a certain degree of diversity. If all agents take the best current action, learning that the underlying state has changed and behavior should adapt will be slower. Diversity is harder to maintain when there is fast communication between agents, because they tend to find out and pursue the best action rapidly. We explore these issues using a model of (Bayesian) learning over a social network. Agents learn rapidly from and may also have incentives to coordinate with others to whom they are connected via strong links. We show, however, that when the underlying environment changes sufficiently rapidly, any network consisting of just strong links will do only a little better than random choice in the long run. In contrast, networks combining strong and weak links, whereby the latter type of links transmit information only slowly, can achieve much higher long-run average payoffs. The best social networks are those that combine a large fraction of agents into a strongly-connected component, while still maintaining a sufficient number of smaller communities that make diverse choices and communicate with this component via weak links.
Fixed-time safe tracking control of uncertain high-order nonlinear pure-feedback systems via unified transformation functions
Authors: Chaoqun Guo, Jiangping Hu, Jiasheng Hao, Sergej Celikovsky, Xiaoming Hu
Subjects: Systems and Control (eess.SY); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2305.00505
Pdf link: https://arxiv.org/pdf/2305.00505
Abstract In this paper, a fixed-time safe control problem is investigated for an uncertain high-order nonlinear pure-feedback system with state constraints. A new nonlinear transformation function is firstly proposed to handle both the constrained and unconstrained cases in a unified way. Further, a radial basis function neural network is constructed to approximate the unknown dynamics in the system and a fixed-time dynamic surface control (FDSC) technique is developed to facilitate the fixed-time control design for the uncertain high-order pure-feedback system. Combined with the proposed unified transformation function and the FDSC technique, an adaptive fixed-time control strategy is proposed to guarantee the fixed-time tracking. The proposed fixed-time control strategy can guarantee uniform control structure when addressing both constrained and unconstrained situations. Numerical examples are presented to demonstrate the proposed fixed-time tracking control strategy.
StyleLipSync: Style-based Personalized Lip-sync Video Generation
Authors: Taekyung Ki, Dongchan Min
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00521
Pdf link: https://arxiv.org/pdf/2305.00521
Abstract In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation. In contrast to the previous lip-sync methods, we introduce pose-aware masking that dynamically locates the mask to improve the naturalness over frames by utilizing a 3D parametric mesh predictor frame by frame. Moreover, we propose a few-shot lip-sync adaptation method for an arbitrary person by introducing a sync regularizer that preserves lips-sync generalization while enhancing the person-specific visual information. Extensive experiments demonstrate that our model can generate accurate lip-sync videos even with the zero-shot setting and enhance characteristics of an unseen face using a few seconds of target video through the proposed adaptation method. Please refer to our project page.
MD-Manifold: A Medical-Distance-Based Representation Learning Approach for Medical Concept and Patient Representation
Authors: Shaodong Wang, Qing Li, Wenli Zhang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00553
Pdf link: https://arxiv.org/pdf/2305.00553
Abstract Effectively representing medical concepts and patients is important for healthcare analytical applications. Representing medical concepts for healthcare analytical tasks requires incorporating medical domain knowledge and prior information from patient description data. Current methods, such as feature engineering and mapping medical concepts to standardized terminologies, have limitations in capturing the dynamic patterns from patient description data. Other embedding-based methods have difficulties in incorporating important medical domain knowledge and often require a large amount of training data, which may not be feasible for most healthcare systems. Our proposed framework, MD-Manifold, introduces a novel approach to medical concept and patient representation. It includes a new data augmentation approach, concept distance metric, and patient-patient network to incorporate crucial medical domain knowledge and prior data information. It then adapts manifold learning methods to generate medical concept-level representations that accurately reflect medical knowledge and patient-level representations that clearly identify heterogeneous patient cohorts. MD-Manifold also outperforms other state-of-the-art techniques in various downstream healthcare analytical tasks. Our work has significant implications in information systems research in representation learning, knowledge-driven machine learning, and using design science as middle-ground frameworks for downstream explorative and predictive analyses. Practically, MD-Manifold has the potential to create effective and generalizable representations of medical concepts and patients by incorporating medical domain knowledge and prior data information. It enables deeper insights into medical data and facilitates the development of new analytical applications for better healthcare outcomes.
RAPID: Autonomous Multi-Agent Racing using Constrained Potential Dynamic Games
Authors: Yixuan Jia, Maulik Bhatt, Negar Mehr
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00579
Pdf link: https://arxiv.org/pdf/2305.00579
Abstract In this work, we consider the problem of autonomous racing with multiple agents where agents must interact closely and influence each other to compete. We model interactions among agents through a game-theoretical framework and propose an efficient algorithm for tractably solving the resulting game in real time. More specifically, we capture interactions among multiple agents through a constrained dynamic game. We show that the resulting dynamic game is an instance of a simple-to-analyze class of games. Namely, we show that our racing game is an instance of a constrained dynamic potential game. An important and appealing property of dynamic potential games is that a generalized Nash equilibrium of the underlying game can be computed by solving a single constrained optimal control problem instead of multiple coupled constrained optimal control problems. Leveraging this property, we show that the problem of autonomous racing is greatly simplified and develop RAPID (autonomous multi-agent RAcing using constrained PotentIal Dynamic games), a racing algorithm that can be solved tractably in real-time. Through simulation studies, we demonstrate that our algorithm outperforms the state-of-the-art approach. We further show the real-time capabilities of our algorithm in hardware experiments.
MAMBO-V: Dynamic Side-Channel Leakage Analysis on RISC-V
Authors: Jan Wichelmann, Christopher Peredy, Florian Sieck, Anna Pätschke, Thomas Eisenbarth
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00584
Pdf link: https://arxiv.org/pdf/2305.00584
Abstract RISC-V is an emerging technology, with applications ranging from embedded devices to high-performance servers. Therefore, more and more security-critical workloads will be conducted with code that is compiled for RISC-V. Well-known microarchitectural side-channel attacks against established platforms like x86 apply to RISC-V CPUs as well. As RISC-V does not mandate any hardware-based side-channel countermeasures, a piece of code compiled for a generic RISC-V CPU in a cloud server cannot make safe assumptions about the microarchitecture on which it is running. Existing tools for aiding software-level precautions by checking side-channel vulnerabilities on source code or x86 binaries are not compatible with RISC-V machine code. In this work, we study the requirements and goals of architecture-specific leakage analysis for RISC-V and illustrate how to achieve these goals with the help of fast and precise dynamic binary analysis. We implement all necessary building blocks for finding side-channel leakages on RISC-V, while relying on existing mature solutions when possible. Our leakage analysis builds upon the modular side-channel analysis framework Microwalk, that examines execution traces for leakage through secret-dependent memory accesses or branches. To provide suitable traces, we port the ARM dynamic binary instrumentation tool MAMBO to RISC-V. Our port named MAMBO-V can instrument arbitrary binaries which use the 64-bit general purpose instruction set. We evaluate our toolchain on several cryptographic libraries with RISC-V support and identify multiple exploitable leakages.
Modeling and Analysis of Analog Non-Volatile Devices for Compute-In-Memory Applications
Authors: Carl Brando, Minseong Park, Sayma Nowshin Chowdhury, Matthew Chen, Kyusang Lee, Sahil Shah
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2305.00618
Pdf link: https://arxiv.org/pdf/2305.00618
Abstract This paper introduces a novel simulation tool for analyzing and training neural network models tailored for compute-in-memory hardware. The tool leverages physics-based device models to enable the design of neural network models and their parameters that are more hardware-accurate. The initial study focused on modeling a CMOS-based floating-gate transistor and memristor device using measurement data from a fabricated device. Additionally, the tool incorporates hardware constraints, such as the dynamic range of data converters, and allows users to specify circuit-level constraints. A case study using the MNIST dataset and LeNet-5 architecture demonstrates the tool's capability to estimate area, power, and accuracy. The results showcase the potential of the proposed tool to optimize neural network models for compute-in-memory hardware.
Dynamic Transfer Learning across Graphs
Authors: Haohui Wang, Yuzhen Mao, Jianhui Sun, Si Zhang, Dawei Zhou
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00664
Pdf link: https://arxiv.org/pdf/2305.00664
Abstract Transferring knowledge across graphs plays a pivotal role in many high-stake domains, ranging from transportation networks to e-commerce networks, from neuroscience to finance. To date, the vast majority of existing works assume both source and target domains are sampled from a universal and stationary distribution. However, many real-world systems are intrinsically dynamic, where the underlying domains are evolving over time. To bridge the gap, we propose to shift the problem to the dynamic setting and ask: given the label-rich source graphs and the label-scarce target graphs observed in previous T timestamps, how can we effectively characterize the evolving domain discrepancy and optimize the generalization performance of the target domain at the incoming T+1 timestamp? To answer the question, for the first time, we propose a generalization bound under the setting of dynamic transfer learning across graphs, which implies the generalization performance is dominated by domain evolution and domain discrepancy between source and target domains. Inspired by the theoretical results, we propose a novel generic framework DyTrans to improve knowledge transferability across dynamic graphs. In particular, we start with a transformer-based temporal encoding module to model temporal information of the evolving domains; then, we further design a dynamic domain unification module to efficiently learn domain-invariant representations across the source and target domains. Finally, extensive experiments on various real-world datasets demonstrate the effectiveness of DyTrans in transferring knowledge from dynamic source domains to dynamic target domains.
PRSeg: A Lightweight Patch Rotate MLP Decoder for Semantic Segmentation
Authors: Yizhe Ma, Fangjian Lin, Sitong Wu, Shengwei Tian, Long Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00671
Pdf link: https://arxiv.org/pdf/2305.00671
Abstract The lightweight MLP-based decoder has become increasingly promising for semantic segmentation. However, the channel-wise MLP cannot expand the receptive fields, lacking the context modeling capacity, which is critical to semantic segmentation. In this paper, we propose a parametric-free patch rotate operation to reorganize the pixels spatially. It first divides the feature map into multiple groups and then rotates the patches within each group. Based on the proposed patch rotate operation, we design a novel segmentation network, named PRSeg, which includes an off-the-shelf backbone and a lightweight Patch Rotate MLP decoder containing multiple Dynamic Patch Rotate Blocks (DPR-Blocks). In each DPR-Block, the fully connected layer is performed following a Patch Rotate Module (PRM) to exchange spatial information between pixels. Specifically, in PRM, the feature map is first split into the reserved part and rotated part along the channel dimension according to the predicted probability of the Dynamic Channel Selection Module (DCSM), and our proposed patch rotate operation is only performed on the rotated part. Extensive experiments on ADE20K, Cityscapes and COCO-Stuff 10K datasets prove the effectiveness of our approach. We expect that our PRSeg can promote the development of MLP-based decoder in semantic segmentation.
End to End Lane detection with One-to-Several Transformer
Authors: Kunyang Zhou, Rui Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00675
Pdf link: https://arxiv.org/pdf/2305.00675
Abstract Although lane detection methods have shown impressive performance in real-world scenarios, most of methods require post-processing which is not robust enough. Therefore, end-to-end detectors like DEtection TRansformer(DETR) have been introduced in lane detection. However, one-to-one label assignment in DETR can degrade the training efficiency due to label semantic conflicts. Besides, positional query in DETR is unable to provide explicit positional prior, making it difficult to be optimized. In this paper, we present the One-to-Several Transformer(O2SFormer). We first propose the one-to-several label assignment, which combines one-to-one and one-to-many label assignments to improve the training efficiency while keeping end-to-end detection. To overcome the difficulty in optimizing one-to-one assignment. We further propose the layer-wise soft label which adjusts the positive weight of positive lane anchors across different decoder layers. Finally, we design the dynamic anchor-based positional query to explore positional prior by incorporating lane anchors into positional query. Experimental results show that O2SFormer significantly speeds up the convergence of DETR and outperforms Transformer-based and CNN-based detectors on the CULane dataset. Code will be available athttps://github.com/zkyseu/O2SFormer.
Learning Terrain-Aware Kinodynamic Model for Autonomous Off-Road Rally Driving With Model Predictive Path Integral Control
Authors: Hojin Lee, Taekyung Kim, Jungwi Mun, Wonsuk Lee
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00676
Pdf link: https://arxiv.org/pdf/2305.00676
Abstract High-speed autonomous driving in off-road environments has immense potential for various applications, but it also presents challenges due to the complexity of vehicle-terrain interactions. In such environments, it is crucial for the vehicle to predict its motion and adjust its controls proactively in response to environmental changes, such as variations in terrain elevation. To this end, we propose a method for learning terrain-aware kinodynamic model which is conditioned on both proprioceptive and exteroceptive information. The proposed model generates reliable predictions of 6-degree-of-freedom motion and can even estimate contact interactions without requiring ground truth force data during training. This enables the design of a safe and robust model predictive controller through appropriate cost function design which penalizes sampled trajectories with unstable motion, unsafe interactions, and high levels of uncertainty derived from the model. We demonstrate the effectiveness of our approach through experiments on a simulated off-road track, showing that our proposed model-controller pair outperforms the baseline and ensures robust high-speed driving performance without control failure.
Joint tone mapping and denoising of thermal infrared images via multi-scale Retinex and multi-task learning
Authors: Axel Gödrich, Daniel König, Gabriel Eilertsen, Michael Teutsch
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.00691
Pdf link: https://arxiv.org/pdf/2305.00691
Abstract Cameras digitize real-world scenes as pixel intensity values with a limited value range given by the available bits per pixel (bpp). High Dynamic Range (HDR) cameras capture those luminance values in higher resolution through an increase in the number of bpp. Most displays, however, are limited to 8 bpp. Naive HDR compression methods lead to a loss of the rich information contained in those HDR images. In this paper, tone mapping algorithms for thermal infrared images with 16 bpp are investigated that can preserve this information. An optimized multi-scale Retinex algorithm sets the baseline. This algorithm is then approximated with a deep learning approach based on the popular U-Net architecture. The remaining noise in the images after tone mapping is reduced implicitly by utilizing a self-supervised deep learning approach that can be jointly trained with the tone mapping approach in a multi-task learning scheme. Further discussions are provided on denoising and deflickering for thermal infrared video enhancement in the context of tone mapping. Extensive experiments on the public FLIR ADAS Dataset prove the effectiveness of our proposed method in comparison with the state-of-the-art.
Full Scaling Automation for Sustainable Development of Green Data Centers
Authors: Shiyu Wang, Yinbo Sun, Xiaoming Shi, Shiyi Zhu, Lin-Tao Ma, James Zhang, Yifei Zheng, Jian Liu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00706
Pdf link: https://arxiv.org/pdf/2305.00706
Abstract The rapid rise in cloud computing has resulted in an alarming increase in data centers' carbon emissions, which now accounts for >3% of global greenhouse gas emissions, necessitating immediate steps to combat their mounting strain on the global climate. An important focus of this effort is to improve resource utilization in order to save electricity usage. Our proposed Full Scaling Automation (FSA) mechanism is an effective method of dynamically adapting resources to accommodate changing workloads in large-scale cloud computing clusters, enabling the clusters in data centers to maintain their desired CPU utilization target and thus improve energy efficiency. FSA harnesses the power of deep representation learning to accurately predict the future workload of each service and automatically stabilize the corresponding target CPU usage level, unlike the previous autoscaling methods, such as Autopilot or FIRM, that need to adjust computing resources with statistical models and expert knowledge. Our approach achieves significant performance improvement compared to the existing work in real-world datasets. We also deployed FSA on large-scale cloud computing clusters in industrial data centers, and according to the certification of the China Environmental United Certification Center (CEC), a reduction of 947 tons of carbon dioxide, equivalent to a saving of 1538,000 kWh of electricity, was achieved during the Double 11 shopping festival of 2022, marking a critical step for our company's strategic goal towards carbon neutrality by 2030.
SGX Switchless Calls Made Configless
Authors: Peterson Yuhala, Michael Paper, Timothée Zerbib, Pascal Felber, Valerio Schiavoni, Alain Tchana
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.00763
Pdf link: https://arxiv.org/pdf/2305.00763
Abstract Intel's software guard extensions (SGX) provide hardware enclaves to guarantee confidentiality and integrity for sensitive code and data. However, systems leveraging such security mechanisms must often pay high performance overheads. A major source of this overhead is SGX enclave transitions which induce expensive cross-enclave context switches. The Intel SGX SDK mitigates this with a switchless call mechanism for transitionless cross-enclave calls using worker threads. Intel's SGX switchless call implementation improves performance but provides limited flexibility: developers need to statically fix the system configuration at build time, which is error-prone and misconfigurations lead to performance degradations and waste of CPU resources. ZC-SWITCHLESS is a configless and efficient technique to drive the execution of SGX switchless calls. Its dynamic approach optimises the total switchless worker threads at runtime to minimise CPU waste. The experimental evaluation shows that ZC-SWITCHLESS obviates the performance penalty of misconfigured switchless systems while minimising CPU waste.
RViDeformer: Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset
Authors: Huanjing Yue, Cong Cao, Lei Liao, Jingyu Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.00767
Pdf link: https://arxiv.org/pdf/2305.00767
Abstract In recent years, raw video denoising has garnered increased attention due to the consistency with the imaging process and well-studied noise modeling in the raw domain. However, two problems still hinder the denoising performance. Firstly, there is no large dataset with realistic motions for supervised raw video denoising, as capturing noisy and clean frames for real dynamic scenes is difficult. To address this, we propose recapturing existing high-resolution videos displayed on a 4K screen with high-low ISO settings to construct noisy-clean paired frames. In this way, we construct a video denoising dataset (named as ReCRVD) with 120 groups of noisy-clean videos, whose ISO values ranging from 1600 to 25600. Secondly, while non-local temporal-spatial attention is beneficial for denoising, it often leads to heavy computation costs. We propose an efficient raw video denoising transformer network (RViDeformer) that explores both short and long-distance correlations. Specifically, we propose multi-branch spatial and temporal attention modules, which explore the patch correlations from local window, local low-resolution window, global downsampled window, and neighbor-involved window, and then they are fused together. We employ reparameterization to reduce computation costs. Our network is trained in both supervised and unsupervised manners, achieving the best performance compared with state-of-the-art methods. Additionally, the model trained with our proposed dataset (ReCRVD) outperforms the model trained with previous benchmark dataset (CRVD) when evaluated on the real-world outdoor noisy videos. Our code and dataset will be released after the acceptance of this work.
Higher-order time domain boundary elements for elastodynamics - graded meshes and hp versions
Authors: Alessandra Aimi, Giulia Di Credico, Heiko Gimperlein, Ernst P. Stephan
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2305.00772
Pdf link: https://arxiv.org/pdf/2305.00772
Abstract The solution to the elastodynamic equation in the exterior of a polyhedral domain or a screen exhibits singular behavior from the corners and edges. The detailed expansion of the singularities implies quasi-optimal estimates for piecewise polynomial approximations of the Dirichlet trace of the solution and the traction. The results are applied to hp and graded versions of the time domain boundary element method for the weakly singular and the hypersingular integral equations. Numerical examples confirm the theoretical results for the Dirichlet and Neumann problems for screens and for polygonal domains in 2d. They exhibit the expected quasi-optimal convergence rates and the singular behavior of the solutions.
Explicit Knowledge Graph Reasoning for Conversational Recommendation
Authors: Xuhui Ren, Tong Chen, Quoc Viet Hung Nguyen, Lizhen Cui, Zi Huang, Hongzhi Yin
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.00783
Pdf link: https://arxiv.org/pdf/2305.00783
Abstract Traditional recommender systems estimate user preference on items purely based on historical interaction records, thus failing to capture fine-grained yet dynamic user interests and letting users receive recommendation only passively. Recent conversational recommender systems (CRSs) tackle those limitations by enabling recommender systems to interact with the user to obtain her/his current preference through a sequence of clarifying questions. Despite the progress achieved in CRSs, existing solutions are far from satisfaction in the following two aspects: 1) current CRSs usually require each user to answer a quantity of clarifying questions before reaching the final recommendation, which harms the user experience; 2) there is a semantic gap between the learned representations of explicitly mentioned attributes and items. To address these drawbacks, we introduce the knowledge graph (KG) as the auxiliary information for comprehending and reasoning a user's preference, and propose a new CRS framework, namely Knowledge Enhanced Conversational Reasoning (KECR) system. As a user can reflect her/his preference via both attribute- and item-level expressions, KECR closes the semantic gap between two levels by embedding the structured knowledge in the KG. Meanwhile, KECR utilizes the connectivity within the KG to conduct explicit reasoning of the user demand, making the model less dependent on the user's feedback to clarifying questions. KECR can find a prominent reasoning chain to make the recommendation explainable and more rationale, as well as smoothen the conversation process, leading to better user experience and conversational recommendation accuracy. Extensive experiments on two real-world datasets demonstrate our approach's superiority over state-of-the-art baselines in both automatic evaluations and human judgments.
Empowering Learner-Centered Instruction: Integrating ChatGPT Python API and Tinker Learning for Enhanced Creativity and Problem-Solving Skills
Authors: Yun-Cheng Tsai
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2305.00821
Pdf link: https://arxiv.org/pdf/2305.00821
Abstract The ChatGPT Python API plays a crucial role in promoting Learner-Centered Instruction (LCI) and aligns with the principles of Tinker Learning, allowing students to discover their learning strategies. LCI emphasizes the importance of active, hands-on learning experiences and encourages students to take responsibility for their learning journey. By integrating the ChatGPT Python API into the educational process, students can explore various resources, generate new ideas, and create content in a more personalized manner. This innovative approach enables students to engage with the learning material deeper, fostering a sense of ownership and motivation. As they work through the Creative Learning Spiral, students develop essential skills such as critical thinking, problem-solving, and creativity. The ChatGPT Python API is a valuable tool for students to explore different solutions, evaluate alternatives, and make informed decisions, all while encouraging self-directed learning. In Tinker Learning environments, the integration of ChatGPT Python API empowers students to experiment and iterate, allowing them to find the most effective learning strategies that cater to their individual needs and preferences. This personalized approach helps students to become more confident in their abilities, leading to tremendous academic success and long-term skill development. By leveraging the capabilities of the ChatGPT Python API, educational institutions can create a more engaging, supportive, and dynamic learning environment. This approach aligns with the principles of Learner-Centered Instruction and Tinker Learning, promoting a culture of curiosity, exploration, and creativity among students while preparing them for the challenges of the fast-paced, ever-changing world.
Jointly Managing Electrical and Thermal Energy in Solar- and Battery-powered Computer Systems
Authors: Noman Bashir, Yasra Chandio, David Irwin, Fatima M. Anwar, Jeremy Gummeson, Prashant Shenoy
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computers and Society (cs.CY); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.00855
Pdf link: https://arxiv.org/pdf/2305.00855
Abstract Environmentally-powered computer systems operate on renewable energy harvested from their environment, such as solar or wind, and stored in batteries. While harvesting environmental energy has long been necessary for small-scale embedded systems without access to external power sources, it is also increasingly important in designing sustainable larger-scale systems for edge applications. For sustained operations, such systems must consider not only the electrical energy but also the thermal energy available in the environment in their design and operation. Unfortunately, prior work generally ignores the impact of thermal effects, and instead implicitly assumes ideal temperatures. To address the problem, we develop a thermodynamic model that captures the interplay of electrical and thermal energy in environmentally-powered computer systems. The model captures the effect of environmental conditions, the system's physical properties, and workload scheduling on performance. In evaluating our model, we distill the thermal effects that impact these systems using a small-scale prototype and a programmable incubator. We then leverage our model to show how considering these thermal effects in designing and operating environmentally-powered computer systems of varying scales can improve their energy-efficiency, performance, and availability.
Supporting Contextual Conversational Agent-Based Software Development
Authors: Glaucia Melo, Luis Fernando Lins, Paulo Alencar, Donald Cowan
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2305.00885
Pdf link: https://arxiv.org/pdf/2305.00885
Abstract Software Development (SD) is remarkably dynamic and is critically dependent on the knowledge acquired by the project's software developers as the project progresses. Software developers need to understand large amounts of information related to the tasks at hand. This information (context) is often not explicit, as it can be lost in large documentation repositories, a team member's brain, or beyond their cognitive memory capacity. These contexts include tool features, integration strategies, data structures, code syntax, approaches to tasks, project definitions, and even implicit or tacit contexts, which add significant complexity to the SD process. Current software development practices still lack sufficient techniques using the existing SD execution information and context to provide developers with relevant process guidance, augmenting their capacity to do their job using available applicable information. This paper presents ongoing and future research on an approach to support conversational agent-based knowledge-augmented software development. Developers benefit by receiving recommendations about task-related information and workflows they need to execute. This work advances human-computer interaction patterns in workflow engines, from graphical user interfaces to conversational patterns in software engineering.
Learning Flight Control Systems from Human Demonstrations and Real-Time Uncertainty-Informed Interventions
Authors: Prashant Ganesh, J. Humberto Ramos, Vinicius G. Goecks, Jared Paquet, Matthew Longmire, Nicholas R. Waytowich, Kevin Brink
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00929
Pdf link: https://arxiv.org/pdf/2305.00929
Abstract This paper describes a methodology for learning flight control systems from human demonstrations and interventions while considering the estimated uncertainty in the learned models. The proposed approach uses human demonstrations to train an initial model via imitation learning and then iteratively, improve its performance by using real-time human interventions. The aim of the interventions is to correct undesired behaviors and adapt the model to changes in the task dynamics. The learned model uncertainty is estimated in real-time via Monte Carlo Dropout and the human supervisor is cued for intervention via an audiovisual signal when this uncertainty exceeds a predefined threshold. This proposed approach is validated in an autonomous quadrotor landing task on both fixed and moving platforms. It is shown that with this algorithm, a human can rapidly teach a flight task to an unmanned aerial vehicle via demonstrating expert trajectories and then adapt the learned model by intervening when the learned controller performs any undesired maneuver, the task changes, and/or the model uncertainty exceeds a threshold
A Comparison of Pneumatic Actuators for Soft Growing Vine Robots
Authors: Alexander M. Kübler, Cosima du Pasquier, Andrew Low, Betim Djambazi, Nicolas Aymon, Julian Förster, Nathaniel Agharese, Roland Siegwart, Allison M. Okamura
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.00967
Pdf link: https://arxiv.org/pdf/2305.00967
Abstract Soft pneumatic actuators are used to steer soft growing "vine" robots while being flexible enough to undergo the tip eversion required for growth. They also meet the requirements to steer soft growing vine robots through challenging terrain. In this study, we compared the performance of three types of pneumatic actuators in terms of their ability to perform eversion, bending, dynamic motion, and force: the pouch motor, the cylindrical pneumatic artificial muscle (cPAM), and the fabric pneumatic artificial muscle (fPAM). The pouch motor is advantageous for prototyping due to its simple manufacturing process. The cPAM exhibits superior bending behavior and produces the highest forces, while the fPAM actuates fastest and everts at the lowest pressure. We evaluated a similar range of dimensions for each actuator type. Larger actuators can produce more significant deformations and forces, but smaller actuators inflate more quickly and require a lower eversion pressure. Since vine robots are lightweight, the effect of gravity on the functionality of different actuators is minimal. We developed a new analytical model that predicts the pressure-to-bending behavior of vine robot actuators. Using the actuator results, we designed and demonstrated a 4.8 m long vine robot equipped with highly maneuverable 60x60 mm cPAMs in a three-dimensional obstacle course. The vine robot was able to move around sharp turns, travel through a passage smaller than its diameter, and lift itself against gravity.

A-suozhang / GetArxivDaily

New submissions for Tue, 2 May 23 #48

Keyword: efficient

Fair Distribution of Delivery Orders

Click-Feedback Retrieval

The Kolmogorov N-width for linear transport: Exact representation and the influence of the data

CarGameAR: An Integrated AR Car Game Authoring Interface for Custom-Built Car Programed on Arduino Board

Space reduction techniques for the $3$-wise Kemeny problem

Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances

Beyond Prediction: On-street Parking Recommendation using Heterogeneous Graph-based List-wise Ranking

Asynchronous Distributed Protocol for Service Provisioning in the Edge-Cloud Continuum

Distributed State Estimation for Linear Time-Varying Systems with Sensor Network Delays

Data-Driven Subgroup Identification for Linear Regression

Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming

ZIRCON: Zero-watermarking-based approach for data integrity and secure provenance in IoT networks

Path Planning for Multiple Tethered Robots Using Topological Braids

A spectral method for a Fokker-Planck equation in neuroscience with applications in neural networks with learning rules

NSLF-OL: Online Learning of Neural Surface Light Fields alongside Real-time Incremental 3D Reconstruction

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds

Patent Mining by Extracting Functional Analysis Information Modelled As Graph Structure: A Patent Knowledge-base Collaborative Building Approach

Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning

Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data

Leveraging Data Mining Algorithms to Recommend Source Code Changes

MinMaxLTTB: Leveraging MinMax-Preselection to Scale LTTB

MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer

Electricity Price Prediction for Energy Storage System Arbitrage: A Decision-focused Approach

Edge Learning for Large-Scale Internet of Things With Task-Oriented Efficient Communication

Alternately denoising and reconstructing unoriented point sets

Transformer-based Sequence Labeling for Audio Classification based on MFCCs

Ortho-Radial Drawing in Near-Linear Time

STAR-RIS-Aided Mobile Edge Computing: Computation Rate Maximization with Binary Amplitude Coefficients

TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

Hypergraphs with Edge-Dependent Vertex Weights: Spectral Clustering based on the 1-Laplacian

Unified high-order multi-scale method for mechanical behavior simulation and strength prediction of composite plate and shell structures

Efficient and accurate nonlinear model reduction via first-order empirical interpolation

Posterior Sampling for Deep Reinforcement Learning

Learned Focused Plenoptic Image Compression with Microimage Preprocessing and Global Attention

Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition

Collective Relational Inference for learning physics-consistent heterogeneous particle interactions

Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL

RAPID: Autonomous Multi-Agent Racing using Constrained Potential Dynamic Games

The MCC approaches the geometric mean of precision and recall as true negatives approach infinity

Containerization of a polyglot microservice application using Docker and Kubernetes

Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

Dynamic Transfer Learning across Graphs

On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring

Efficient dynamic model based testing using greedy test case selection

ZeroSearch: Local Image Search from Text with Zero Shot Learning

Adaptively Topological Tensor Network for Multi-view Subspace Clustering

Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing

Breaks and Code Quality: Investigating the Impact of Forgetting on Software Development. A Registered Report

SGX Switchless Calls Made Configless

Montsalvat: Intel SGX Shielding for GraalVM Native Images

RViDeformer: Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Automated Paper Screening for Clinical Reviews Using Large Language Models

(1+1)-CMA-ES with Margin for Discrete and Mixed-Integer Problems

Multi-Agent Systems with Quantitative Satisficing Goals

A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm

Keyword: faster

Neural Network Accelerated Process Design of Polycrystalline Microstructures

LAVA: Data Valuation without Pre-Specified Learning Algorithms

Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT

Physics-Guided Graph Neural Networks for Real-time AC/DC Power Flow Analysis

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images

Guaranteed Evader Detection in Multi-Agent Search Tasks using Pincer Trajectories

Containerization of a polyglot microservice application using Docker and Kubernetes

GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

File Fragment Classification using Light-Weight Convolutional Neural Networks

Event Camera as Region Proposal Network

DNS Privacy with Speed? Evaluating DNS over QUIC and its Impact on Web Performance

A comparison of methods to eliminate regularization weight tuning from data-enabled predictive control

StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video

Keyword: mobile

Wearing face mask detection using deep learning through COVID-19 pandemic

Asynchronous Distributed Protocol for Service Provisioning in the Edge-Cloud Continuum