Abstract
Increased capabilities such as recognition and self-adaptability are now required from IoT applications. While IoT node power consumption is a major concern for these applications, cloud-based processing is becoming unsustainable due to continuous sensor or image data transmission over the wireless network. Thus optimized ML capabilities and data transfers should be integrated in the IoT node. Moreover, IoT applications are torn between sporadic data-logging and energy-hungry data processing (e.g. image classification). Thus, the versatility of the node is key in addressing this wide diversity of energy and processing needs. This paper presents SamurAI, a versatile IoT node bridging this gap in processing and in energy by leveraging two on-chip sub-systems: a low power, clock-less, event-driven Always-Responsive (AR) part and an energy-efficient On-Demand (OD) part. AR contains a 1.7MOPS event-driven, asynchronous Wake-up Controller (WuC) with a 207ns wake-up time optimized for sporadic computing, while OD combines a deep-sleep RISC-V CPU and 1.3TOPS/W Machine Learning (ML) for more complex tasks up to 36GOPS. This architecture partitioning achieves best in class versatility metrics such as peak performance to idle power ratio. On an applicative classification scenario, it demonstrates system power gains, up to 3.5x compared to cloud-based processing, and thus extended battery lifetime.
A Unified Approach to Lane Change Intention Recognition and Driving Status Prediction through TCN-LSTM and Multi-Task Learning Models
Abstract
Lane change (LC) is a continuous and complex operation process. Accurately detecting and predicting LC processes can help traffic participants better understand their surrounding environment, recognize potential LC safety hazards, and improve traffic safety. This present paper focuses on LC processes, developing an LC intention recognition (LC-IR) model and an LC status prediction (LC-SP) model. A novel ensemble temporal convolutional network with Long Short-Term Memory units (TCN-LSTM) is first proposed to capture long-range dependencies in sequential data. Then, three multi-task models (MTL-LSTM, MTL-TCN, MTL-TCN -LSTM) are developed to capture the intrinsic relationship among output indicators. Furthermore, a unified modeling framework for LC intention recognition and driving status prediction (LC-IR-SP) is developed. To validate the performance of the proposed models, a total number of 1023 vehicle trajectories is extracted from the CitySim dataset. The Pearson coefficient is employed to determine the related indicators. The results indicate that using150 frames as input length, the TCN-LSTM model with 96.67% accuracy outperforms TCN and LSTM models in LC intention classification and provides more balanced results for each class. Three proposed multi-tasking learning models provide markedly increased performance compared to corresponding single-task models, with an average reduction of 24.24% and 22.86% in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), respectively. The developed LC-IR-SP model has promising applications for autonomous vehicles to identity lane change behaviors, calculate a real-time traffic conflict index and improve vehicle control strategies.
Surrogate Assisted Generation of Human-Robot Interaction Scenarios
Abstract
As human-robot interaction (HRI) systems advance, so does the difficulty of evaluating and understanding the strengths and limitations of these systems in different environments and with different users. To this end, previous methods have algorithmically generated diverse scenarios that reveal system failures in a shared control teleoperation task. However, these methods require directly evaluating generated scenarios by simulating robot policies and human actions. The computational cost of these evaluations limits their applicability in more complex domains. Thus, we propose augmenting scenario generation systems with surrogate models that predict both human and robot behaviors. In the shared control teleoperation domain and a more complex shared workspace collaboration task, we show that surrogate assisted scenario generation efficiently synthesizes diverse datasets of challenging scenarios. We demonstrate that these failures are reproducible in real-world interactions.
A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems
Authors: Yejiang Yang, Zihao Mo, Weiming Xiang
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
Abstract
In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision.
Abstract
Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills. Towards this goal, recent works have integrated semantic representations from large-scale pretrained vision-language (VL) models into manipulation models, imparting them with more general reasoning capabilities. However, we show that the conventional pretraining-finetuning pipeline for integrating such representations entangles the learning of domain-specific action information and domain-general visual information, leading to less data-efficient training and poor generalization to unseen objects and tasks. To this end, we propose ProgramPort, a modular approach to better leverage pretrained VL models by exploiting the syntactic and semantic structures of language instructions. Our framework uses a semantic parser to recover an executable program, composed of functional modules grounded on vision and action across different modalities. Each functional module is realized as a combination of deterministic computation and learnable neural networks. Program execution produces parameters to general manipulation primitives for a robotic end-effector. The entire modular network can be trained with end-to-end imitation learning objectives. Experiments show that our model successfully disentangles action and perception, translating to improved zero-shot and compositional generalization in a variety of manipulation behaviors. Project webpage at: \url{https://progport.github.io}.
Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials
Authors: Kshitiz Upadhyay, Jan N. Fuhg, Nikolaos Bouklas, K.T. Ramesh
Abstract
A novel data-driven constitutive modeling approach is proposed, which combines the physics-informed nature of modeling based on continuum thermodynamics with the benefits of machine learning. This approach is demonstrated on strain-rate-sensitive soft materials. This model is based on the viscous dissipation-based visco-hyperelasticity framework where the total stress is decomposed into volumetric, isochoric hyperelastic, and isochoric viscous overstress contributions. It is shown that each of these stress components can be written as linear combinations of the components of an irreducible integrity basis. Three Gaussian process regression-based surrogate models are trained (one per stress component) between principal invariants of strain and strain rate tensors and the corresponding coefficients of the integrity basis components. It is demonstrated that this type of model construction enforces key physics-based constraints on the predicted responses: the second law of thermodynamics, the principles of local action and determinism, objectivity, the balance of angular momentum, an assumed reference state, isotropy, and limited memory. The three surrogate models that constitute our constitutive model are evaluated by training them on small-size numerically generated data sets corresponding to a single deformation mode and then analyzing their predictions over a much wider testing regime comprising multiple deformation modes. Our physics-informed data-driven constitutive model predictions are compared with the corresponding predictions of classical continuum thermodynamics-based and purely data-driven models. It is shown that our surrogate models can reasonably capture the stress-strain-strain rate responses in both training and testing regimes, and provide improvements in terms of prediction accuracy, generalizability to multiple deformation modes, and compatibility with limited data.
MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results
Abstract
Depth completion from RGB images and sparse Time-of-Flight (ToF) measurements is an important problem in computer vision and robotics. While traditional methods for depth completion have relied on stereo vision or structured light techniques, recent advances in deep learning have enabled more accurate and efficient completion of depth maps from RGB images and sparse ToF measurements. To evaluate the performance of different depth completion methods, we organized an RGB+sparse ToF depth completion competition. The competition aimed to encourage research in this area by providing a standardized dataset and evaluation metrics to compare the accuracy of different approaches. In this report, we present the results of the competition and analyze the strengths and weaknesses of the top-performing methods. We also discuss the implications of our findings for future research in RGB+sparse ToF depth completion. We hope that this competition and report will help to advance the state-of-the-art in this important area of research. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2023.
Proportionally Representative Clustering
Authors: Haris Aziz, Barton E. Lee, Sean Morota Chu
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
Abstract
In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on clustering -- one of the fundamental tasks in unsupervised machine learning. We propose a new axiom that captures proportional representation fairness (PRF). We make a case that the concept achieves the raison d'{\^{e}}tre of several existing concepts in the literature in an arguably more convincing manner. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems.
SkinSAM: Empowering Skin Cancer Segmentation with Segment Anything Model
Authors: Mingzhe Hu, Yuheng Li, Xiaofeng Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Skin cancer is a prevalent and potentially fatal disease that requires accurate and efficient diagnosis and treatment. Although manual tracing is the current standard in clinics, automated tools are desired to reduce human labor and improve accuracy. However, developing such tools is challenging due to the highly variable appearance of skin cancers and complex objects in the background. In this paper, we present SkinSAM, a fine-tuned model based on the Segment Anything Model that showed outstanding segmentation performance. The models are validated on HAM10000 dataset which includes 10015 dermatoscopic images. While larger models (ViT_L, ViT_H) performed better than the smaller one (ViT_b), the finetuned model (ViT_b_finetuned) exhibited the greatest improvement, with a Mean pixel accuracy of 0.945, Mean dice score of 0.8879, and Mean IoU score of 0.7843. Among the lesion types, vascular lesions showed the best segmentation results. Our research demonstrates the great potential of adapting SAM to medical image segmentation tasks.
An FPTAS for Budgeted Laminar Matroid Independent Set
Authors: Ilan Doron-Arad, Ariel Kulik, Hadas Shachnai
Abstract
We study the budgeted laminar matroid independent set problem. The input is a ground set, where each element has a cost and a non-negative profit, along with a laminar matroid over the elements and a budget. The goal is to select a maximum profit independent set of the matroid whose total cost is bounded by the budget. Several well known special cases, where we have, e.g., no matroid constraint (the classic knapsack problem) or a uniform matroid constraint (knapsack with a cardinality constraint), admit a fully polynomial-time approximation scheme (FPTAS). In contrast, the budgeted matroid independent set (BMI) problem with a general matroid has an efficient polynomial-time approximation scheme (EPTAS) but does not admit an FPTAS. This implies an EPTAS for our problem, which is the best known result prior to this work. We present an FPTAS for budgeted laminar matroid independent set, improving the previous EPTAS for this matroid family and generalizing the FPTAS known for knapsack with a cardinality constraint and multiple-choice knapsack. Our scheme is based on a simple dynamic program which utilizes the tree-like structure of laminar matroids.
Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification
Authors: Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap, Stefan Winkler, Shao-Syuan Huang, Jie-Jyun Liu, Chih-Jen Lin
Abstract
Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.
A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation
Abstract
In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.
Diagonalization Based Parallel-in-Time Method for a Class of Fourth Order Time Dependent PDEs
Abstract
In this paper, we design, analyze and implement efficient time parallel method for a class of fourth order time-dependent partial differential equations (PDEs), namely biharmonic heat equation, linearized Cahn-Hilliard (CH) equation and the nonlinear CH equation. We use diagonalization technique on all-at-once system to develop efficient iterative time parallel methods for investigating the solution behaviour of said equations. We present the convergence analysis of Parallel-in-Time (PinT) algorithms. We verify our findings by presenting numerical results.
Attacks on Robust Distributed Learning Schemes via Sensitivity Curve Maximization
Authors: Christian A. Schroth, Stefan Vlaski, Abdelhak M. Zoubir
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
Abstract
Distributed learning paradigms, such as federated or decentralized learning, allow a collection of agents to solve global learning and optimization problems through limited local interactions. Most such strategies rely on a mixture of local adaptation and aggregation steps, either among peers or at a central fusion center. Classically, aggregation in distributed learning is based on averaging, which is statistically efficient, but susceptible to attacks by even a small number of malicious agents. This observation has motivated a number of recent works, which develop robust aggregation schemes by employing robust variations of the mean. We present a new attack based on sensitivity curve maximization (SCM), and demonstrate that it is able to disrupt existing robust aggregation schemes by injecting small, but effective perturbations.
COSST: Multi-organ Segmentation with Partially Labeled Datasets Using Comprehensive Supervisions and Self-training
Authors: Han Liu, Zhoubing Xu, Riqiang Gao, Hao Li, Jianing Wang, Guillaume Chabin, Ipek Oguz, Sasa Grbic
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep learning models have demonstrated remarkable success in multi-organ segmentation but typically require large-scale datasets with all organs of interest annotated. However, medical image datasets are often low in sample size and only partially labeled, i.e., only a subset of organs are annotated. Therefore, it is crucial to investigate how to learn a unified model on the available partially labeled datasets to leverage their synergistic potential. In this paper, we empirically and systematically study the partial-label segmentation with in-depth analyses on the existing approaches and identify three distinct types of supervision signals, including two signals derived from ground truth and one from pseudo label. We propose a novel training framework termed COSST, which effectively and efficiently integrates comprehensive supervision signals with self-training. Concretely, we first train an initial unified model using two ground truth-based signals and then iteratively incorporate the pseudo label signal to the initial model using self-training. To mitigate performance degradation caused by unreliable pseudo labels, we assess the reliability of pseudo labels via outlier detection in latent space and exclude the most unreliable pseudo labels from each self-training iteration. Extensive experiments are conducted on six CT datasets for three partial-label segmentation tasks. Experimental results show that our proposed COSST achieves significant improvement over the baseline method, i.e., individual networks trained on each partially labeled dataset. Compared to the state-of-the-art partial-label segmentation methods, COSST demonstrates consistent superior performance on various segmentation tasks and with different training data size.
A Parameterized Theory of PAC Learning
Authors: Cornelius Brand, Robert Ganian, Kirill Simonov
Abstract
Probably Approximately Correct (i.e., PAC) learning is a core concept of sample complexity theory, and efficient PAC learnability is often seen as a natural counterpart to the class P in classical computational complexity. But while the nascent theory of parameterized complexity has allowed us to push beyond the P-NP ``dichotomy'' in classical computational complexity and identify the exact boundaries of tractability for numerous problems, there is no analogue in the domain of sample complexity that could push beyond efficient PAC learnability. As our core contribution, we fill this gap by developing a theory of parameterized PAC learning which allows us to shed new light on several recent PAC learning results that incorporated elements of parameterized complexity. Within the theory, we identify not one but two notions of fixed-parameter learnability that both form distinct counterparts to the class FPT -- the core concept at the center of the parameterized complexity paradigm -- and develop the machinery required to exclude fixed-parameter learnability. We then showcase the applications of this theory to identify refined boundaries of tractability for CNF and DNF learning as well as for a range of learning problems on graphs.
Fourier-Gegenbauer Pseudospectral Method for Solving Time-Dependent One-Dimensional Fractional Partial Differential Equations with Variable Coefficients and Periodic Solutions
Authors: Kareem T. Elgindy
Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
Abstract
In this paper, we present a novel pseudospectral (PS) method for solving a new class of initial-value problems (IVPs) of time-dependent one-dimensional fractional partial differential equations (FPDEs) with variable coefficients and periodic solutions. A main ingredient of our work is the use of the recently developed periodic RL/Caputo fractional derivative (FD) operators with sliding positive fixed memory length of Bourafa et al. [1] or their reduced forms obtained by Elgindy [2] as the natural FD operators to accurately model FPDEs with periodic solutions. The proposed method converts the IVP into a well-conditioned linear system of equations using the PS method based on Fourier collocations and Gegenbauer quadratures. The reduced linear system has a simple special structure and can be solved accurately and rapidly by using standard linear system solvers. A rigorous study of the error and convergence of the proposed method is presented. The idea and results presented in this paper are expected to be useful in the future to address more general problems involving FPDEs with periodic solutions.
Lightweight, Pre-trained Transformers for Remote Sensing Timeseries
Authors: Gabriel Tseng, Ivan Zvonkov, Mirali Purohit, David Rolnick, Hannah Kerner
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Machine learning algorithms for parsing remote sensing data have a wide range of societally relevant applications, but labels used to train these algorithms can be difficult or impossible to acquire. This challenge has spurred research into self-supervised learning for remote sensing data aiming to unlock the use of machine learning in geographies or application domains where labelled datasets are small. Current self-supervised learning approaches for remote sensing data draw significant inspiration from techniques applied to natural images. However, remote sensing data has important differences from natural images -- for example, the temporal dimension is critical for many tasks and data is collected from many complementary sensors. We show that designing models and self-supervised training techniques specifically for remote sensing data results in both smaller and more performant models. We introduce the Pretrained Remote Sensing Transformer (Presto), a transformer-based model pre-trained on remote sensing pixel-timeseries data. Presto excels at a wide variety of globally distributed remote sensing tasks and outperforms much larger models. Presto can be used for transfer learning or as a feature extractor for simple models, enabling efficient deployment at scale.
Linear and Nonlinear Parareal Methods for the Cahn-Hilliard Equation
Abstract
In this paper, we propose, analyze and implement efficient time parallel methods for the Cahn-Hilliard (CH) equation. It is of great importance to develop efficient numerical methods for the CH equation, given the range of applicability of the CH equation has. The CH equation generally needs to be simulated for a very long time to get the solution of phase coarsening stage. Therefore it is desirable to accelerate the computation using parallel method in time. We present linear and nonlinear Parareal methods for the CH equation depending on the choice of fine approximation. We illustrate our results by numerical experiments.
Lowering the Entry Bar to HPC-Scale Uncertainty Quantification
Authors: Linus Seelinger, Anne Reinarz, Jean Benezech, Mikkel Bue Lykkegaard, Lorenzo Tamellini, Robert Scheichl
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
Abstract
Treating uncertainties in models is essential in many fields of science and engineering. Uncertainty quantification (UQ) on complex and computationally costly numerical models necessitates a combination of efficient model solvers, advanced UQ methods and HPC-scale resources. The resulting technical complexities as well as lack of separation of concerns between UQ and model experts is holding back many interesting UQ applications. The aim of this paper is to close the gap between advanced UQ methods and advanced models by removing the hurdle of complex software stack integration, which in turn will offer a straightforward way to scale even prototype-grade UQ applications to high-performance resources. We achieve this goal by introducing a parallel software architecture based on UM-Bridge, a universal interface for linking UQ and models. We present three realistic applications from different areas of science and engineering, scaling from single machines to large clusters on the Google Cloud Platform.
Securing Autonomous Air Traffic Management: Blockchain Networks Driven by Explainable AI
Authors: Louise Axon, Dimitrios Panagiotakopoulos, Samuel Ayo, Carolina Sanchez-Hernandez, Yan Zong, Simon Brown, Lei Zhang, Michael Goldsmith, Sadie Creese, Weisi Guo
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure data from diverse stakeholders. There is an urgent need to develop a secure network that has trustworthy data chains and works with the requirements generated by UTM. Here, we review existing research in 3 key interconnected areas: (1) blockchain development for secure data transfer between competing aviation stakeholders, (2) self-learning networking architectures that distribute consensus to achieve secure air traffic control, (3) explainable AI to build trust with human stakeholders and backpropagate requirements for blockchain and network optimisation. When connected together, this new digital ecosystem blueprint is tailored for safety critical UTM sectors. We motivate the readers with a case study, where a federated learning UTM uses real air traffic and weather data is secured and explained to human operators. This emerging area still requires significant research and development by the community to ensure it can enable future autonomous air mobility.
Learning Neural PDE Solvers with Parameter-Guided Channel Attention
Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert
Abstract
Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.
Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation
Authors: Zihao Li, Pan Gao, Hui Yuan, Ran Wei, Manoranjan Paul
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT
Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds
Authors: Pengfei Song, Luoyu MEI, Han Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Topology (math.GN)
Abstract
This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar. Compared with cameras and lidars, millimeter-wave radars have the advantage of not revealing privacy, having a strong anti-interference ability, and having long detection distance. The sparsity and capturing temporal-topological features of mmWave data is still a problem. However, the issue of capturing the temporal-topological coupling features under the human semantic segmentation task prevents previous advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from being well utilized in practical scenarios. To address the challenge caused by the sparsity and temporal-topological feature of the data, we (i) introduce graph structure and topological features to the point cloud, (ii) propose a semantic segmentation framework including a global feature-extracting module and a sequential feature-extracting module. In addition, we design an efficient and more fitting loss function for a better training process and segmentation results based on graph clustering. Experimentally, we deploy representative semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset. Experimental results indicate that our model achieves mean accuracy on the custom dataset by $\mathbf{82.31}\%$ and outperforms the state-of-the-art algorithms. Moreover, to validate the model's robustness, we deploy our model on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean accuracy by $\mathbf{92.6}\%$, outperforming baseline algorithms.
Multiplicity Problems on Algebraic Series and Context-Free Grammars
Authors: Nikhil Balaji, Lorenzo Clemente, Klara Nosan, Mahsa Shirmohammadi, James Worrell
Subjects: Formal Languages and Automata Theory (cs.FL); Computational Complexity (cs.CC)
Abstract
In this paper we obtain complexity bounds for computational problems on algebraic power series over several commuting variables. The power series are specified by systems of polynomial equations: a formalism closely related to weighted context-free grammars. We focus on three problems -- decide whether a given algebraic series is identically zero, determine whether all but finitely many coefficients are zero, and compute the coefficient of a specific monomial. We relate these questions to well-known computational problems on arithmetic circuits and thereby show that all three problems lie in the counting hierarchy. Our main result improves the best known complexity bound on deciding zeroness of an algebraic series. This problem is known to lie in PSPACE by reduction to the decision problem for the existential fragment of the theory of real closed fields. Here we show that the problem lies in the counting hierarchy by reduction to the problem of computing the degree of a polynomial given by an arithmetic circuit. As a corollary we obtain new complexity bounds on multiplicity equivalence of context-free grammars restricted to a bounded language, language inclusion of a nondeterministic finite automaton in an unambiguous context-free grammar, and language inclusion of a non-deterministic context-free grammar in an unambiguous finite automaton.
Tractability of sampling recovery on unweighted function classes
Abstract
It is well-known that the problem of sampling recovery in the $L_2$-norm on unweighted Korobov spaces (Sobolev spaces with mixed smoothness) as well as classical smoothness classes such as H\"older classes suffers from the curse of dimensionality. We show that the problem is tractable for those classes if they are intersected with the Wiener algebra of functions with summable Fourier coefficients. In fact, this is a relatively simple implication of powerful results by Rauhut and Ward [Appl. Comput. Harmon. Anal. 40 (2016), pp. 321--351]. Tractability is achieved by the use of non-linear algorithms, while linear algorithms cannot do the job.
The Mutual Information In The Vicinity of Capacity-Achieving Input Distributions
Abstract
The mutual information is analyzed as a function of the input distribution using an identity due to Tops\o{e} for channels with (possibly multiple) linear cost constraints and finite input and output sets. The mutual information is bounded above by a function decreasing quadratically with the distance to the set of all capacity-achieving input distributions for the case when the distance is less than a certain threshold. The closed-form expressions for the threshold and the coefficient of the quadratic decrease are derived. A counter-example demonstrating the non-existence of such a quadratic bound in the case of infinitely many linear cost constraints is provided. Implications of these observations for the channel coding problem and applications of the proof technique to related problems are discussed.
Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis
Authors: Nicholson Collier, Justin M. Wozniak, Abby Stevens, Yadu Babuji, Mickaël Binois, Ardindam Fadikar, Alexandra Würth, Kyle Chard, Jonathan Ozik
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team's work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository.
Evaluating the Impact of Pair Documentation on Requirements Quality and Team Productivity
Authors: Nosheen Qamar, Nosheen Sabahat, Amir Mashmool, Amir Mosavi
Abstract
The most important deliverable of the requirements engineering process is the software requirements specification(SRS)document. Requirements documentation is important during the complete software development lifecycle to share the vision and effective communication between major stakeholders. The Standish Group reported that the top factors behind project failures are related to requirements. By giving the right level of attention to key requirements good quality software can be produced. Therefore, more research is needed in this area and this study is trying to fill this gap. This empirical study aims to examine the importance of pair documentation. Unconventional documentation refers to the approach when two persons work on the same document's requirements collaboratively just like pair programming on the requirements quality and team productivity. Twenty pairs of documentation writers worked into two groups. one group using pair documentation, i.e., the experimental group, and the other one using conventional documentation, i.e., the control group. the resultant requirement's documents for the same project, produced by both groups were then compared. It is observed that there is a significant improvement in the quality and productivity of the experimental group using pair documentation. The findings of this study may assist requirement engineers in forming efficient teams that can create high-quality SRS documents.
A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services
Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding
Abstract
Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.
On Solution Discovery via Reconfiguration
Authors: Michael R. Fellows, Mario Grobler, Nicole Megow, Amer E. Mouawad, Vijayaragunathan Ramamoorthi, Frances A. Rosamond, Daniel Schmand, Sebastian Siebertz
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Abstract
The dynamics of real-world applications and systems require efficient methods for improving infeasible solutions or restoring corrupted ones by making modifications to the current state of a system in a restricted way. We propose a new framework of solution discovery via reconfiguration for constructing a feasible solution for a given problem by executing a sequence of small modifications starting from a given state. Our framework integrates and formalizes different aspects of classical local search, reoptimization, and combinatorial reconfiguration. We exemplify our framework on a multitude of fundamental combinatorial problems, namely Vertex Cover, Independent Set, Dominating Set, and Coloring. We study the classical as well as the parameterized complexity of the solution discovery variants of those problems and explore the boundary between tractable and intractable instances.
Incremental Generalized Category Discovery
Authors: Bingchen Zhao, Oisin Mac Aodha
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We explore the problem of Incremental Generalized Category Discovery (IGCD). This is a challenging category incremental learning setting where the goal is to develop models that can correctly categorize images from previously seen categories, in addition to discovering novel ones. Learning is performed over a series of time steps where the model obtains new labeled and unlabeled data, and discards old data, at each iteration. The difficulty of the problem is compounded in our generalized setting as the unlabeled data can contain images from categories that may or may not have been observed before. We present a new method for IGCD which combines non-parametric categorization with efficient image sampling to mitigate catastrophic forgetting. To quantify performance, we propose a new benchmark dataset named iNatIGCD that is motivated by a real-world fine-grained visual categorization task. In our experiments we outperform existing related methods
Empirical Individual State Observability
Authors: Benjamin Cellini, Burak Boyacıoğlu, Floris van Breugel
Abstract
A dynamical system is observable if there is a one-to-one mapping from the system's measured outputs and inputs to all of the system's states. Analytical and empirical tools exist for quantifying the (full state) observability of linear and nonlinear systems; however, empirical tools for evaluating the observability of individual state variables are lacking. Here, a new empirical approach termed Empirical Individual State Observability (E-ISO) is developed to quantify the level of observability of individual state variables. E-ISO first builds an empirical observability matrix via simulation, then applies convex optimization to efficiently determine the subset of its rows required to estimate each state variable individually. Finally, (un)observability measures for these subsets are calculated to provide independent estimates of the observability of each state variable. Multiple example applications of E-ISO on linear and nonlinear systems are shown to be consistent with analytical results. Broadly, E-ISO will be an invaluable tool both for designing active sensing control laws or optimizing sensor placement to increase the observability of individual state variables for engineered systems, and analyzing the trajectory decisions made by organisms.
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.
$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
Abstract
Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks. However, the potential of such multi-task learners has not been exploited during transfer learning. In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning ($\pi$-Tuning), for vision, language, and vision-language tasks. It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task. The task similarities are predicted in a unified modality-independent space, yielding a scalable graph to demonstrate task relationships. $\pi$-Tuning has several appealing benefits. First, it flexibly explores both intra- and inter-modal transferability between similar tasks to improve the accuracy and robustness of transfer learning, especially in data-scarce scenarios. Second, it offers a systematical solution for transfer learning with multi-task prediction-and-then-interpolation, compatible with diverse types of parameter-efficient experts, such as prompt and adapter. Third, an extensive study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets shows that $\pi$-Tuning surpasses fine-tuning and other parameter-efficient transfer learning methods both in full-shot and low-shot regimes. The task graph also enables an in-depth interpretable analysis of task transferability across modalities.
Dynamic Pricing and Learning with Bayesian Persuasion
Authors: Shipra Agrawal, Yiding Feng, Wei Tang
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Abstract
We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.
string2string: A Modern Python Library for String-to-String Algorithms
Authors: Mirac Suzgun, Stuart M. Shieber, Dan Jurafsky
Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL)
Abstract
We introduce string2string, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis -- along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods. Notable algorithms featured in the library include the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. Besides, it wraps existing efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is appropriate and suitable. Overall, the library aims to provide extensive coverage and increased flexibility in comparison to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. It is implemented in Python, easily installable via pip, and accessible through a simple API. Source code, documentation, and tutorials are all available on our GitHub page: https://github.com/stanfordnlp/string2string.
Maximizing Model Generalization for Manufacturing with Self-Supervised Learning and Federated Learning
Authors: Matthew Russell, Peng Wang
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
Deep Learning (DL) can diagnose faults and assess machine health from raw condition monitoring data without manually designed statistical features. However, practical manufacturing applications remain extremely difficult for existing DL methods. Machine data is often unlabeled and from very few health conditions (e.g., only normal operating data). Furthermore, models often encounter shifts in domain as process parameters change and new categories of faults emerge. Traditional supervised learning may struggle to learn compact, discriminative representations that generalize to these unseen target domains since it depends on having plentiful classes to partition the feature space with decision boundaries. Transfer Learning (TL) with domain adaptation attempts to adapt these models to unlabeled target domains but assumes similar underlying structure that may not be present if new faults emerge. This study proposes focusing on maximizing the feature generality on the source domain and applying TL via weight transfer to copy the model to the target domain. Specifically, Self-Supervised Learning (SSL) with Barlow Twins may produce more discriminative features for monitoring health condition than supervised learning by focusing on semantic properties of the data. Furthermore, Federated Learning (FL) for distributed training may also improve generalization by efficiently expanding the effective size and diversity of training data by sharing information across multiple client machines. Results show that Barlow Twins outperforms supervised learning in an unlabeled target domain with emerging motor faults when the source training data contains very few distinct categories. Incorporating FL may also provide a slight advantage by diffusing knowledge of health conditions between machines.
Keyword: faster
Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines
Authors: Kamaljyoti Nath, Xuhui Meng, Daniel J Smith, George Em Karniadakis
Abstract
This paper presents a physics-informed neural network (PINN) approach for monitoring the health of diesel engines. The aim is to evaluate the engine dynamics, identify unknown parameters in a "mean value" model, and anticipate maintenance requirements. The PINN model is applied to diesel engines with a variable-geometry turbocharger and exhaust gas recirculation, using measurement data of selected state variables. The results demonstrate the ability of the PINN model to predict simultaneously both unknown parameters and dynamics accurately with both clean and noisy data, and the importance of the self-adaptive weight in the loss function for faster convergence. The input data for these simulations are derived from actual engine running conditions, while the outputs are simulated data, making this a practical case study of PINN's ability to predict real-world dynamical systems. The mean value model of the diesel engine incorporates empirical formulae to represent certain states, but these formulae may not be generalizable to other engines. To address this, the study considers the use of deep neural networks (DNNs) in addition to the PINN model. The DNNs are trained using laboratory test data and are used to model the engine-specific empirical formulae in the mean value model, allowing for a more flexible and adaptive representation of the engine's states. In other words, the mean value model uses both the PINN model and the DNNs to represent the engine's states, with the PINN providing a physics-based understanding of the engine's overall dynamics and the DNNs offering a more engine-specific and adaptive representation of the empirical formulae. By combining these two approaches, the study aims to offer a comprehensive and versatile approach to monitoring the health and performance of diesel engines.
A Survey on Solving and Discovering Differential Equations Using Deep Neural Networks
Authors: Hyeonjung (Tari)Jung, Jayant Gupta, Bharat Jayaprakash, Matthew Eagon, Harish Panneer Selvam, Carl Molnar, William Northrop, Shashi Shekhar
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation.
Abstract
Variational Bayes is a popular method for approximate inference but its derivation can be cumbersome. To simplify the process, we give a 3-step recipe to identify the posterior form by explicitly looking for linearity with respect to expectations of well-known distributions. We can then directly write the update by simply ``reading-off'' the terms in front of those expectations. The recipe makes the derivation easier, faster, shorter, and more general.
Keyword: mobile
AI-based Predictive Analytic Approaches for safeguarding the Future of Electric/Hybrid Vehicles
Authors: Ishan Shivansh Bangroo
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract
In response to the global need for sustainable energy, green technology may help fight climate change. Before green infrastructure to be easily integrated into the world's energy system, it needs upgrading. By improving energy infrastructure and decision-making, artificial intelligence (AI) may help solve this challenge. EHVs have grown in popularity because to concerns about global warming and the need for more ecologically friendly transportation. EHVs may work better with cutting-edge technologies like AI. Electric vehicles (EVs) reduce greenhouse gas emissions and promote sustainable mobility. Electric automobiles (EVs) are growing in popularity due to their benefits for climate change mitigation and sustainable mobility. Unfortunately, EV production consumes a lot of energy and materials, which may harm nature. EV production is being improved using green technologies like artificial intelligence and predictive analysis. Electric and hybrid vehicles (EHVs) may help meet the need for ecologically friendly transportation. However, the Battery Management System (BMS) controls EHV performance and longevity. AI may improve EHV energy efficiency, emissions reduction, and sustainability. Remote hijacking, security breaches, and unauthorized access are EHV cybersecurity vulnerabilities addressed in the article. AI research and development may help make transportation more sustainable, as may optimizing EHVs and charging infrastructure.
Detecting inner-LAN anomalies using hierarchical forecasting
Abstract
Increasing activity and the number of devices online are leading to increasing and more diverse cyber attacks. This continuously evolving attack activity makes signature-based detection methods ineffective. Once malware has infiltrated into a LAN, bypassing an external gateway or entering via an unsecured mobile device, it can potentially infect all nodes in the LAN as well as carry out nefarious activities such as stealing valuable data, leading to financial damage and loss of reputation. Such infiltration could be viewed as an insider attack, increasing the need for LAN monitoring and security. In this paper we aim to detect such inner-LAN activity by studying the variations in Address Resolution Protocol (ARP) calls within the LAN. We find anomalous nodes by modelling inner-LAN traffic using hierarchical forecasting methods. We substantially reduce the false positives ever present in anomaly detection, by using an extreme value theory based method. We use a dataset from a real inner-LAN monitoring project, containing over 10M ARP calls from 362 nodes. Furthermore, the small number of false positives generated using our methods, is a potential solution to the "alert fatigue" commonly reported by security experts.
A Review of Panoptic Segmentation for Mobile Mapping Point Clouds
Authors: Binbin Xiang, Yuanwen Yue, Torben Peters, Konrad Schindler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
3D point cloud panoptic segmentation is the combined task to (i) assign each point to a semantic class and (ii) separate the points in each class into object instances. Recently there has been an increased interest in such comprehensive 3D scene understanding, building on the rapid advances of semantic segmentation due to the advent of deep 3D neural networks. Yet, to date there is very little work about panoptic segmentation of outdoor mobile-mapping data, and no systematic comparisons. The present paper tries to close that gap. It reviews the building blocks needed to assemble a panoptic segmentation pipeline and the related literature. Moreover, a modular pipeline is set up to perform comprehensive, systematic experiments to assess the state of panoptic segmentation in the context of street mapping. As a byproduct, we also provide the first public dataset for that task, by extending the NPM3D dataset to include instance labels.
A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation
Abstract
In applications that involve human-robot interaction (HRI), human-robot teaming (HRT), and cooperative human-machine systems, the inference of the human partner's intent is of critical importance. This paper presents a method for the inference of the human operator's navigational intent, in the context of mobile robots that provide full or partial (e.g., shared control) teleoperation. We propose the Machine Learning Operator Intent Inference (MLOII) method, which a) processes spatial data collected by the robot's sensors; b) utilizes a supervised machine learning algorithm to estimate the operator's most probable navigational goal online. The proposed method's ability to reliably and efficiently infer the intent of the human operator is experimentally evaluated in realistically simulated exploration and remote inspection scenarios. The results in terms of accuracy and uncertainty indicate that the proposed method is comparable to another state-of-the-art method found in the literature.
MCLFIQ: Mobile Contactless Fingerprint Image Quality
Authors: Jannis Priesnitz, Axel Weißenfeld, Christian Rathgeb, Bernhard Strobl, Ralph Lessmann, Christoph Busch1
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We propose MCLFIQ: Mobile Contactless Fingerprint Image Quality, the first quality assessment algorithm for mobile contactless fingerprint samples. To this end, we retrained the NIST Fingerprint Image Quality (NFIQ) 2 method, which was originally designed for contact-based fingerprints, with a synthetic contactless fingerprint database. We evaluate the predictive performance of the resulting MCLFIQ model in terms of Error-vs.-Discard Characteristic (EDC) curves on three real-world contactless fingerprint databases using two recognition algorithms. In experiments, the MCLFIQ method is compared against the original NFIQ 2 method and a sharpness-based quality assessment algorithm developed for contactless fingerprint images. Obtained results show that the re-training of NFIQ 2 on synthetic data is a viable alternative to training on real databases. Moreover, the evaluation shows that our MCLFIQ method works more accurate and robust compared to NFIQ 2 and the sharpness-based quality assessment. We suggest considering the proposed MCLFIQ method as a candidate for a new standard algorithm for contactless fingerprint quality assessment.
Combining HoloLens with Instant-NeRFs: Advanced Real-Time 3D Mobile Mapping
Authors: Dennis Haitz, Boris Jutzi, Markus Ulrich, Miriam Jaeger, Patrick Huebner
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This work represents a large step into modern ways of fast 3D reconstruction based on RGB camera images. Utilizing a Microsoft HoloLens 2 as a multisensor platform that includes an RGB camera and an inertial measurement unit for SLAM-based camera-pose determination, we train a Neural Radiance Field (NeRF) as a neural scene representation in real-time with the acquired data from the HoloLens. The HoloLens is connected via Wifi to a high-performance PC that is responsible for the training and 3D reconstruction. After the data stream ends, the training is stopped and the 3D reconstruction is initiated, which extracts a point cloud of the scene. With our specialized inference algorithm, five million scene points can be extracted within 1 second. In addition, the point cloud also includes radiometry per point. Our method of 3D reconstruction outperforms grid point sampling with NeRFs by multiple orders of magnitude and can be regarded as a complete real-time 3D reconstruction method in a mobile mapping setup.
A Versatile Low-Complexity Feedback Scheme for FDD Systems via Generative Modeling
Authors: Nurettin Turan, Benedikt Fesl, Michael Koller, Michael Joham, Wolfgang Utschick
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In this work, we propose a versatile feedback scheme which can be deployed for both single- and multi-user multiple-input multiple-output (MIMO) frequency division duplex (FDD) systems. Particularly, we propose to use a Gaussian mixture model (GMM) with a reduced number of parameters for codebook construction, feedback encoding, and precoder design. The GMM is fitted offline at the base station (BS) to uplink (UL) training samples to approximate the channel distribution of all possible mobile terminals (MTs) located inside the BS cell. Afterwards, a codebook is constructed, where each codebook entry is based on one GMM component. By extracting directional information of the constructed codebook, the proposed GMM-based feedback approach allows to jointly design the precoders of a multi-user MIMO (MU-MIMO) system using common precoding algorithms. Alternatively, the GMM's sample generation ability can be utilized to design the precoders using a state-of-the-art stochastic iterative algorithm. After offloading the GMM to the MTs, they determine their feedback simply as the index of the GMM component with the highest responsibility for their received pilot signal. This strategy exhibits low complexity and allows for parallelization. Simulation results show that the proposed approach outperforms conventional methods, especially for a reduced number of pilots.
Abstract
Given the prevalence of crowd sourced labor in creating Natural Language processing datasets, these aforementioned sets have become increasingly large. For instance, the SQUAD dataset currently sits at over 80,000 records. However, because the English language is rather repetitive in structure, the distribution of word frequencies in the SQUAD dataset's contexts are relatively unchanged. By measuring each sentences distance from the co-variate distance of frequencies of all sentences in the dataset, we identify 10,500 examples that create a more uniform distribution for training. While fine-tuning ELECTRA [4] on this subset of examples reaches better performance to a model trained on all 87,000 examples. Herein we introduce a methodology for systematically pruning datasets for fine tuning reaching better out of sample performance.
JaxPruner: A concise library for sparsity research
Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Karolina Dziugaite, Pablo Samuel Castro, Utku Evci
Abstract
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
Keyword: voxel
There is no result
Keyword: lidar
Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds
Authors: Pengfei Song, Luoyu MEI, Han Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Topology (math.GN)
Abstract
This paper presents a framework for semantic segmentation on sparse sequential point clouds of millimeter-wave radar. Compared with cameras and lidars, millimeter-wave radars have the advantage of not revealing privacy, having a strong anti-interference ability, and having long detection distance. The sparsity and capturing temporal-topological features of mmWave data is still a problem. However, the issue of capturing the temporal-topological coupling features under the human semantic segmentation task prevents previous advanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) from being well utilized in practical scenarios. To address the challenge caused by the sparsity and temporal-topological feature of the data, we (i) introduce graph structure and topological features to the point cloud, (ii) propose a semantic segmentation framework including a global feature-extracting module and a sequential feature-extracting module. In addition, we design an efficient and more fitting loss function for a better training process and segmentation results based on graph clustering. Experimentally, we deploy representative semantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset. Experimental results indicate that our model achieves mean accuracy on the custom dataset by $\mathbf{82.31}\%$ and outperforms the state-of-the-art algorithms. Moreover, to validate the model's robustness, we deploy our model on the well-known S3DIS dataset. On the S3DIS dataset, our model achieves mean accuracy by $\mathbf{92.6}\%$, outperforming baseline algorithms.
Quadric Representations for LiDAR Odometry, Mapping and Localization
Authors: Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks. However, the space-inefficiency of methods that use point-wise representations limits their development and usage in practical applications. In particular, scan-submap matching and global map representation methods are restricted by the inefficiency of nearest neighbor searching (NNS) for large-volume point clouds. To improve space-time efficiency, we propose a novel method of describing scenes using quadric surfaces, which are far more compact representations of 3D objects than conventional point clouds. In contrast to point cloud-based methods, our quadric representation-based method decomposes a 3D scene into a collection of sparse quadric patches, which improves storage efficiency and avoids the slow point-wise NNS process. Our method first segments a given point cloud into patches and fits each of them to a quadric implicit function. Each function is then coupled with other geometric descriptors of the patch, such as its center position and covariance matrix. Collectively, these patch representations fully describe a 3D scene, which can be used in place of the original point cloud and employed in LiDAR odometry, mapping and localization algorithms. We further design a novel incremental growing method for quadric representations, which eliminates the need to repeatedly re-fit quadric surfaces from the original point cloud. Extensive odometry, mapping and localization experiments on large-volume point clouds in the KITTI and UrbanLoco datasets demonstrate that our method maintains low latency and memory utility while achieving competitive, and even superior, accuracy.
A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services
Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding
Abstract
Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.
SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments
Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan
Abstract
With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.
Abstract
The impact of artificial intelligence systems on our society is increasing at an unprecedented speed. For instance, ChatGPT is being tested in mental health treatment applications such as Koko, Stable Diffusion generates pieces of art competitive with (or outperforming) human artists, and so on. Ethical concerns regarding the behavior and applications of generative AI systems have been increasing over the past years, and the field of AI alignment - steering the behavior of AI systems towards being aligned with human values - is a rapidly growing subfield of modern AI. In this paper, we address the challenges involved in ethical evaluation of a multimodal artificial intelligence system. The multimodal systems we focus on take both text and an image as input and output text, completing the sentence or answering the question asked as input. We perform the evaluation of these models in two steps: we first discus the creation of a multimodal ethical database and then use this database to construct morality-evaluating algorithms. The creation of the multimodal ethical database is done interactively through human feedback. Users are presented with multiple examples and votes on whether they are ethical or not. Once these answers have been aggregated into a dataset, we built and tested different algorithms to automatically evaluate the morality of multimodal systems. These algorithms aim to classify the answers as ethical or not. The models we tested are a RoBERTa-large classifier and a multilayer perceptron classifier.
Preserving Superconvergence of Spectral Elements for Curved Domains via $h$ and $p$-Geometric Refinement
Authors: Jacob Jones, Rebecca Conley, Xiangmin Jiao
Abstract
Spectral element methods (SEM), which are extensions of finite element methods (FEM), are important emerging techniques for solving partial differential equations in physics and engineering. SEM can potentially deliver better accuracy due to the potential superconvergence for well-shaped tensor-product elements. However, for complex geometries, the accuracy of SEM often degrades due to a combination of geometric inaccuracies near curved boundaries and the loss of superconvergence with simplicial or non-tensor-product elements. We propose to overcome the first issue by using $h$- and $p$-geometric refinement, to refine the mesh near high-curvature regions and increase the degree of geometric basis functions, respectively. We show that when using mixed-meshes with tensor-product elements in the interior of the domain, curvature-based geometric refinement near boundaries can improve the accuracy of the interior elements by reducing pollution errors and preserving the superconvergence. To overcome the second issue, we apply a post-processing technique to recover the accuracy near the curved boundaries by using the adaptive extended stencil finite element method (AES-FEM). The combination of curvature-based geometric refinement and accurate post-processing delivers an effective and easier-to-implement alternative to other methods based on exact geometries. We demonstrate our techniques by solving the convection-diffusion equation in 2D and show one to two orders of magnitude of improvement in the solution accuracy, even when the elements are poorly shaped near boundaries.
Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models
Authors: Abhishek Mandal, Susan Leavy, Suzanne Little
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract
Generative multimodal models based on diffusion models have seen tremendous growth and advances in recent years. Models such as DALL-E and Stable Diffusion have become increasingly popular and successful at creating images from texts, often combining abstract ideas. However, like other deep learning models, they also reflect social biases they inherit from their training data, which is often crawled from the internet. Manually auditing models for biases can be very time and resource consuming and is further complicated by the unbounded and unconstrained nature of inputs these models can take. Research into bias measurement and quantification has generally focused on small single-stage models working on a single modality. Thus the emergence of multistage multimodal models requires a different approach. In this paper, we propose Multimodal Composite Association Score (MCAS) as a new method of measuring gender bias in multimodal generative models. Evaluating both DALL-E 2 and Stable Diffusion using this approach uncovered the presence of gendered associations of concepts embedded within the models. We propose MCAS as an accessible and scalable method of quantifying potential bias for models with different modalities and a range of potential biases.
Two kinds of numerical algorithms for ultra-slow diffusion equations
Abstract
In this article, two kinds of numerical algorithms are derived for the ultra-slow (or superslow) diffusion equation in one and two space dimensions, where the ultra-slow diffusion is characterized by the Caputo-Hadamard fractional derivative of order $\alpha \in (0,1)$. To describe the spatial interaction, the Riesz fractional derivative and the fractional Laplacian are used in one and two space dimensions, respectively. The Caputo-Hadamard derivative is discretized by two typical approximate formulae, i.e., L2-1${\sigma}$ and L1-2 methods. The spatial fractional derivatives are discretized by the 2-nd order finite difference methods. When L2-1${\sigma}$ discretization is used, the derived numerical scheme is unconditionally stable with error estimate $\mathcal{O}(\tau^{2}+h^{2})$ for all $\alpha \in (0, 1)$, in which $\tau$ and $h$ are temporal and spatial stepsizes, respectively. When L1-2 discretization is used, the derived numerical scheme is stable with error estimate $\mathcal{O}(\tau^{3-\alpha}+h^{2})$ for $\alpha \in (0, 0.3738)$. The illustrative examples displayed are in line with the theoretical analysis.
Edit Everything: A Text-Guided Generative System for Images Editing
Abstract
We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.
Localized orthogonal decomposition for a multiscale parabolic stochastic partial differential equation
Abstract
A multiscale method is proposed for a parabolic stochastic partial differential equation with additive noise and highly oscillatory diffusion. The framework is based on the localized orthogonal decomposition (LOD) method and computes a coarse-scale representation of the elliptic operator, enriched by fine-scale information on the diffusion. Optimal order strong convergence is derived. The LOD technique is combined with a (multilevel) Monte-Carlo estimator and the weak error is analyzed. Numerical examples that confirm the theoretical findings are provided, and the computational efficiency of the method is highlighted.
DataComp: In search of the next generation of multimodal datasets
Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Large multimodal datasets have been instrumental in recent breakthroughs such as CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DataComp, a benchmark where the training code is fixed and researchers innovate by proposing new training sets. We provide a testbed for dataset experiments centered around a new candidate pool of 12.8B image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing on 38 downstream test sets. Our benchmark consists of multiple scales, with four candidate pool sizes and associated compute budgets ranging from 12.8M to 12.8B samples seen during training. This multi-scale design facilitates the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow is a promising way of improving multimodal datasets. We introduce DataComp-1B, a dataset created by applying a simple filtering algorithm to the 12.8B candidate pool. The resulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a larger ViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x less training compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentage points, which is trained with the same compute budget as our model. These gains highlight the potential for improving model performance by carefully curating training sets. We view DataComp-1B as only the first step and hope that DataComp paves the way toward the next generation of multimodal datasets.
Functional Diffusion Maps
Authors: María Barroso, Carlos María Alaíz, Ángela Fernández, Jose Luis Torrecilla
Abstract
Nowadays many real-world datasets can be considered as functional, in the sense that the processes which generate them are continuous. A fundamental property of this type of data is that in theory they belong to an infinite-dimensional space. Although in practice we usually receive finite observations, they are still high-dimensional and hence dimensionality reduction methods are crucial. In this vein, the main state-of-the-art method for functional data analysis is Functional PCA. Nevertheless, this classic technique assumes that the data lie in a linear manifold, and hence it could have problems when this hypothesis is not fulfilled. In this research, attention has been placed on a non-linear manifold learning method: Diffusion Maps. The article explains how to extend this multivariate method to functional data and compares its behavior against Functional PCA over different simulated and real examples.
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
Abstract
Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.
Putting People in Their Place: Affordance-Aware Human Insertion into Scenes
Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh
Abstract
We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. We set up the task in a self-supervised fashion by learning to re-pose humans in video clips. We train a large-scale diffusion model on a dataset of 2.4M video clips that produces diverse plausible poses while respecting the scene context. Given the learned human-scene composition, our model can also hallucinate realistic people and scenes when prompted without conditioning and also enables interactive editing. A quantitative evaluation shows that our method synthesizes more realistic human appearance and more natural human-scene interactions than prior work.
Keyword: dynamic
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
Authors: Zhaoyan Liu, Noel Vouitsis, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem
Abstract
We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. For example, we show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CLIP. TR0N learns a lightweight stochastic mapping which "translates" between the space of conditions and the latent space of the generative model, in such a way that the generated latent corresponds to a data sample satisfying the desired condition. The translated latent samples are then further improved upon through Langevin dynamics, enabling us to obtain higher-quality data samples. TR0N requires no training data nor fine-tuning, yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed -- all while retaining a much higher level of generality. Our code is available at https://github.com/layer6ai-labs/tr0n.
Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines
Authors: Kamaljyoti Nath, Xuhui Meng, Daniel J Smith, George Em Karniadakis
Abstract
This paper presents a physics-informed neural network (PINN) approach for monitoring the health of diesel engines. The aim is to evaluate the engine dynamics, identify unknown parameters in a "mean value" model, and anticipate maintenance requirements. The PINN model is applied to diesel engines with a variable-geometry turbocharger and exhaust gas recirculation, using measurement data of selected state variables. The results demonstrate the ability of the PINN model to predict simultaneously both unknown parameters and dynamics accurately with both clean and noisy data, and the importance of the self-adaptive weight in the loss function for faster convergence. The input data for these simulations are derived from actual engine running conditions, while the outputs are simulated data, making this a practical case study of PINN's ability to predict real-world dynamical systems. The mean value model of the diesel engine incorporates empirical formulae to represent certain states, but these formulae may not be generalizable to other engines. To address this, the study considers the use of deep neural networks (DNNs) in addition to the PINN model. The DNNs are trained using laboratory test data and are used to model the engine-specific empirical formulae in the mean value model, allowing for a more flexible and adaptive representation of the engine's states. In other words, the mean value model uses both the PINN model and the DNNs to represent the engine's states, with the PINN providing a physics-based understanding of the engine's overall dynamics and the DNNs offering a more engine-specific and adaptive representation of the empirical formulae. By combining these two approaches, the study aims to offer a comprehensive and versatile approach to monitoring the health and performance of diesel engines.
A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems
Authors: Yejiang Yang, Zihao Mo, Weiming Xiang
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
Abstract
In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision.
Controlled density transport using Perron Frobenius generators
Authors: Jake Buzhardt, Phanindra Tallapragada
Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Fluid Dynamics (physics.flu-dyn)
Abstract
We consider the problem of the transport of a density of states from an initial state distribution to a desired final state distribution through a dynamical system with actuation. In particular, we consider the case where the control signal is a function of time, but not space; that is, the same actuation is applied at every point in the state space. This is motivated by several problems in fluid mechanics, such as mixing and manipulation of a collection of particles by a global control input such as a uniform magnetic field, as well as by more general control problems where a density function describes an uncertainty distribution or a distribution of agents in a multi-agent system. We formulate this problem using the generators of the Perron-Frobenius operator associated with the drift and control vector fields of the system. By considering finite-dimensional approximations of these operators, the density transport problem can be expressed as a control problem for a bilinear system in a high-dimensional, lifted state. With this system, we frame the density control problem as a problem of driving moments of the density function to the moments of a desired density function, where the moments of the density can be expressed as an output which is linear in the lifted state. This output tracking problem for the lifted bilinear system is then solved using differential dynamic programming, an iterative trajectory optimization scheme.
Understand the Dynamic World: An End-to-End Knowledge Informed Framework for Open Domain Entity State Tracking
Abstract
Open domain entity state tracking aims to predict reasonable state changes of entities (i.e., [attribute] of [entity] was [before_state] and [after_state] afterwards) given the action descriptions. It's important to many reasoning tasks to support human everyday activities. However, it's challenging as the model needs to predict an arbitrary number of entity state changes caused by the action while most of the entities are implicitly relevant to the actions and their attributes as well as states are from open vocabularies. To tackle these challenges, we propose a novel end-to-end Knowledge Informed framework for open domain Entity State Tracking, namely KIEST, which explicitly retrieves the relevant entities and attributes from external knowledge graph (i.e., ConceptNet) and incorporates them to autoregressively generate all the entity state changes with a novel dynamic knowledge grained encoder-decoder framework. To enforce the logical coherence among the predicted entities, attributes, and states, we design a new constraint decoding strategy and employ a coherence reward to improve the decoding process. Experimental results show that our proposed KIEST framework significantly outperforms the strong baselines on the public benchmark dataset OpenPI.
Ensoul: A framework for the creation of self organizing intelligent ultra low power systems (SOULS) through evolutionary enerstatic networks
Authors: Ty Roachford
Subjects: Artificial Intelligence (cs.AI); Adaptation and Self-Organizing Systems (nlin.AO)
Abstract
Ensoul is a framework proposed for the purpose of creating technologies that create more technologies through the combined use of networks, and nests, of energy homeostatic (enerstatic) loops and open-ended evolutionary techniques. Generative technologies developed by such an approach serve as both simple, yet insightful models of thermodynamically driven complex systems and as powerful sources of novel technologies. "Self Organizing intelligent Ultra Low power Systems" (SOULS) is a term that well describes the technologies produced by such a generative technology, as well as the generative technology itself. The term is meant to capture the abstract nature of such technologies as being independent of the substrate in which they are embedded. In other words, SOULS can be biological, artificial or hybrid in form.
Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials
Authors: Kshitiz Upadhyay, Jan N. Fuhg, Nikolaos Bouklas, K.T. Ramesh
Abstract
A novel data-driven constitutive modeling approach is proposed, which combines the physics-informed nature of modeling based on continuum thermodynamics with the benefits of machine learning. This approach is demonstrated on strain-rate-sensitive soft materials. This model is based on the viscous dissipation-based visco-hyperelasticity framework where the total stress is decomposed into volumetric, isochoric hyperelastic, and isochoric viscous overstress contributions. It is shown that each of these stress components can be written as linear combinations of the components of an irreducible integrity basis. Three Gaussian process regression-based surrogate models are trained (one per stress component) between principal invariants of strain and strain rate tensors and the corresponding coefficients of the integrity basis components. It is demonstrated that this type of model construction enforces key physics-based constraints on the predicted responses: the second law of thermodynamics, the principles of local action and determinism, objectivity, the balance of angular momentum, an assumed reference state, isotropy, and limited memory. The three surrogate models that constitute our constitutive model are evaluated by training them on small-size numerically generated data sets corresponding to a single deformation mode and then analyzing their predictions over a much wider testing regime comprising multiple deformation modes. Our physics-informed data-driven constitutive model predictions are compared with the corresponding predictions of classical continuum thermodynamics-based and purely data-driven models. It is shown that our surrogate models can reasonably capture the stress-strain-strain rate responses in both training and testing regimes, and provide improvements in terms of prediction accuracy, generalizability to multiple deformation modes, and compatibility with limited data.
Conditional dominance in games with unawareness
Authors: Martin Meier, Burkhard C. Schipper
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
Heifetz, Meier, and Schipper (2013) introduced dynamic game with unawareness consisting of a partially ordered set of games in extensive form. Here, we study the normal form of dynamic games with unawareness. The generalized normal form associated with a dynamic game with unawareness consists of a partially ordered set of games in norm form. We use the generalized normal form to characterize extensive-form rationalizability (resp., prudent rationalizability) in dynamic games with unawareness by iterated conditional strict (resp., weak) dominance in the associated generalized normal form. We also show that the analogue to iterated admissibility for dynamic games with unawareness depends on extensive-form structure. This is because under unawareness, a player's information set not only determines which nodes she considers possible but also of which game tree(s) she is aware of.
Abstract
Many games feature a progression of levels that doesn't adapt to the player. This can be problematic because some players may get stuck if the progression is too difficult, while others may find it boring if the progression is too slow to get to more challenging levels. This can be addressed by building levels based on the player's performance and preferences. In this work, we formulate the problem of generating levels for a player as a Markov Decision Process (MDP) and use adaptive dynamic programming (ADP) to solve the MDP before assembling a level. We tested with two case studies and found that using an ADP outperforms two baselines. Furthermore, we experimented with player proxies and switched them in the middle of play, and we show that a simple modification prior to running ADP results in quick adaptation. By using ADP, which searches the entire MDP, we produce a dynamic progression of levels that adapts to the player.
A One-Dimensional Symmetric Force-Based Blending Method for Atomistic-to-Continuum Coupling
Abstract
Inspired by the blending method developed by [P. Seleson, S. Beneddine, and S. Prudhome, \emph{A Force-Based Coupling Scheme for Peridynamics and Classical Elasticity}, (2013)] for the nonlocal-to-local coupling, we create a symmetric and consistent blended force-based Atomistic-to-Continuum (a/c) scheme for the atomistic chain in one-dimensional space. The conditions for the well-posedness of the underlying model are established by analyzing an optimal blending size and blending type to ensure the $H^1$ semi-norm stability for the blended force-based operator. We present several numerical experiments to test and confirm the theoretical findings.
Provably Stabilizing Global-Position Tracking Control for Hybrid Models of Multi-Domain Bipedal Walking via Multiple Lyapunov Analysis
Authors: Yuan Gao, Kentaro Barhydt, Christopher Niezrecki, Yan Gu
Abstract
Accurate control of a humanoid robot's global position (i.e., its three-dimensional position in the world) is critical to the reliable execution of high-risk tasks such as avoiding collision with pedestrians in a crowded environment. This paper introduces a time-based nonlinear control method that achieves accurate global-position tracking (GPT) for multi-domain bipedal walking. Deriving a tracking controller for bipedal robots is challenging due to the highly complex robot dynamics that are time-varying and hybrid, especially for multi-domain walking that involves multiple phases/domains of full actuation, over actuation, and underactuation. To tackle this challenge, we introduce a continuous-phase GPT control law for multi-domain walking, which provably ensures the exponential convergence of the entire error state within the full and over actuation domains and that of the directly regulated error state within the underactuation domain. We then construct sufficient multiple-Lyapunov stability conditions for the hybrid multi-domain tracking error system under the proposed GPT control law. We illustrate the proposed controller design through both three-domain walking with all motors activated and two-domain gait with inactive ankle motors. Simulations of a ROBOTIS OP3 bipedal humanoid robot demonstrate the satisfactory accuracy and convergence rate of the proposed control approach under two different cases of multi-domain walking as well as various walking speeds and desired paths.
A central scheme for coupled hyperbolic systems
Authors: Michael Herty, Niklas Kolbe, Siegfried Müller
Abstract
A novel numerical scheme to solve coupled systems of conservation laws is introduced. The scheme is derived based on a relaxation approach and does not require information on the Lax curves of the coupled systems, which simplifies the computation of suitable coupling data. The coupling condition for the underlying relaxation system plays a crucial role as it determines the behavior of the scheme in the zero relaxation limit. The role of this condition is discussed, a consistency concept with respect to the original problem is introduced, well-posedness is analyzed and explicit, nodal Riemann solvers are provided. Based on a case study considering the p-system of gas dynamics a strategy for the design of the relaxation coupling condition within the new scheme is provided.
Data-driven time-scale separation of ODE right-hand sides using dynamic mode decomposition and time delay embedding
Abstract
Multi-physics simulation often involve multiple different scales. The ARKODE ODE solver package in the SUNDIALS library addresses multi-scale problems with a multi-rate time-integrator that can work with a right-hand side that has fast scale and slow scale components. In this report, we use dynamic mode decomposition and time delay embedding to extract the fast and and slow components of the right-hand sides of a simple ODE from data. We then use the extracted components to solve the ODE with ARKODE. Finally, to move towards a real-world use case, we attempt to extract fast and slow scale dynamics from synthetic seismic modeling data.
An FPTAS for Budgeted Laminar Matroid Independent Set
Authors: Ilan Doron-Arad, Ariel Kulik, Hadas Shachnai
Abstract
We study the budgeted laminar matroid independent set problem. The input is a ground set, where each element has a cost and a non-negative profit, along with a laminar matroid over the elements and a budget. The goal is to select a maximum profit independent set of the matroid whose total cost is bounded by the budget. Several well known special cases, where we have, e.g., no matroid constraint (the classic knapsack problem) or a uniform matroid constraint (knapsack with a cardinality constraint), admit a fully polynomial-time approximation scheme (FPTAS). In contrast, the budgeted matroid independent set (BMI) problem with a general matroid has an efficient polynomial-time approximation scheme (EPTAS) but does not admit an FPTAS. This implies an EPTAS for our problem, which is the best known result prior to this work. We present an FPTAS for budgeted laminar matroid independent set, improving the previous EPTAS for this matroid family and generalizing the FPTAS known for knapsack with a cardinality constraint and multiple-choice knapsack. Our scheme is based on a simple dynamic program which utilizes the tree-like structure of laminar matroids.
communication of information in systems of heterogenious agents and systems' dynamics
Authors: Inga Ivanova
Subjects: Computers and Society (cs.CY); Information Theory (cs.IT); Social and Information Networks (cs.SI)
Abstract
Communication of information in complex systems can be considered as major driver of systems evolution. What matters is not the communicated information by itself but rather the meaning that is supplied to the information. However informational exchange in a system of heterogenious agents, which code and decode information with different meaning processing structures, is more complex than simple input-output model. The structural difference of coding and decoding algorithms in a system of three or more groups of agents, entertaining different sets of communication codes,provide a source of additional options which has an impact on system's dynamics. The mechanisms of meaning and information processing can be evaluated analytically ion a model framework. The results show that model predictions acccurately fit empirically observed data in systems of different origions.
Unification of Lagrangian staggered-grid hydrodynamics and cell-centered hydrodynamics in one dimension
Abstract
This paper focuses on the novel scheme to unify both Lagrangian staggered-grid and cell-centered hydrodynamic methods in one dimension. The scheme neither contains empirical parameters nor solves the Riemann problem. It includes two key points: one is the relationship between pressure and velocity, and the other is Newton's second law. The two methods that make use of this scheme satisfy the entropy condition and are conservative in total mass, momentum, and energy. Numerical results show the robustness and accuracy of both methods.
Comparison of Optimization-Based Methods for Energy-Optimal Quadrotor Motion Planning
Authors: Welf Rehberg, Joaquim Ortiz-Haro, Marc Toussaint, Wolfgang Hönig
Abstract
Quadrotors are agile flying robots that are challenging to control. Considering the full dynamics of quadrotors during motion planning is crucial to achieving good solution quality and small tracking errors during flight. Optimization-based methods scale well with high-dimensional state spaces and can handle dynamic constraints directly, therefore they are often used in these scenarios. The resulting optimization problem is notoriously difficult to solve due to its nonconvex constraints. In this work, we present an analysis of four solvers for nonlinear trajectory optimization (KOMO, direct collocation with SCvx, direct collocation with CasADi, Crocoddyl) and evaluate their performance in scenarios where the solvers are tasked to find minimum-effort solutions to geometrically complex problems and problems requiring highly dynamic solutions. Benchmarking these methods helps to determine the best algorithm structures for these kinds of problems.
Abstract
Human-object interactions (HOIs) are crucial for human-centric scene understanding applications such as human-centric visual generation, AR/VR, and robotics. Since existing methods mainly explore capturing HOIs, rendering HOI remains less investigated. In this paper, we address this challenge in HOI animation from a compositional perspective, i.e., animating novel HOIs including novel interaction, novel human and/or novel object driven by a novel pose sequence. Specifically, we adopt neural human-object deformation to model and render HOI dynamics based on implicit neural representations. To enable the interaction pose transferring among different persons and objects, we then devise a new compositional conditional neural radiance field (or CC-NeRF), which decomposes the interdependence between human and object using latent codes to enable compositionally animation control of novel HOIs. Experiments show that the proposed method can generalize well to various novel HOI animation settings. Our project page is https://zhihou7.github.io/CHONA/
Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach
Abstract
Many decision-making problems feature multiple objectives. In such problems, it is not always possible to know the preferences of a decision-maker for different objectives. However, it is often possible to observe the behavior of decision-makers. In multi-objective decision-making, preference inference is the process of inferring the preferences of a decision-maker for different objectives. This research proposes a Dynamic Weight-based Preference Inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems, based on observed behavior trajectories in the environment. The proposed method is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering. The performance of the proposed DWPI approach is compared to two existing preference inference methods from the literature, and empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. The Dynamic Weight-based Preference Inference algorithm also maintains its performance when inferring preferences for sub-optimal behavior demonstrations. In addition to its impressive performance, the Dynamic Weight-based Preference Inference algorithm does not require any interactions during training with the agent whose preferences are inferred, all that is required is a trajectory of observed behavior.
Learning Neural PDE Solvers with Parameter-Guided Channel Attention
Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert
Abstract
Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.
A particle method for non-local advection-selection-mutation equations
Abstract
The well-posedness of a non-local advection-selection-mutation problem deriving from adaptive dynamics models is shown for a wide family of initial data. A particle method is then developed, in order to approximate the solution of such problem by a regularised sum of weighted Dirac masses whose characteristics solve a suitably defined ODE system. The convergence of the particle method over any finite interval is shown and an explicit rate of convergence is given. Furthermore, we investigate the asymptotic-preserving properties of the method in large times, providing sufficient conditions for it to hold true as well as examples and counter-examples. Finally, we illustrate the method in two cases taken from the literature.
Some of the variables, some of the parameters, some of the times, with some physics known: Identification with partial information
Authors: Saurabh Malani, Tom S. Bertalan, Tianqi Cui, Jose L. Avalos, Michael Betenbaugh, Ioannis G. Kevrekidis
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
Abstract
Experimental data is often comprised of variables measured independently, at different sampling rates (non-uniform ${\Delta}$t between successive measurements); and at a specific time point only a subset of all variables may be sampled. Approaches to identifying dynamical systems from such data typically use interpolation, imputation or subsampling to reorganize or modify the training data $\textit{prior}$ to learning. Partial physical knowledge may also be available $\textit{a priori}$ (accurately or approximately), and data-driven techniques can complement this knowledge. Here we exploit neural network architectures based on numerical integration methods and $\textit{a priori}$ physical knowledge to identify the right-hand side of the underlying governing differential equations. Iterates of such neural-network models allow for learning from data sampled at arbitrary time points $\textit{without}$ data modification. Importantly, we integrate the network with available partial physical knowledge in "physics informed gray-boxes"; this enables learning unknown kinetic rates or microbial growth functions while simultaneously estimating experimental parameters.
Fast Sampling of $b$-Matchings and $b$-Edge Covers
Authors: Zongchen Chen, Yuzhou Gu
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO); Probability (math.PR)
Abstract
For integer $b \ge 1$, a $b$-matching (resp. $b$-edge cover) of a graph $G=(V,E)$ is a subset $S\subseteq E$ of edges such that every vertex is incident with at most (resp. at least) $b$ edges from $S$. We prove that for any $b \ge 1$ the simple Glauber dynamics for sampling (weighted) $b$-matchings and $b$-edge covers mixes in $O(n\log n)$ time on all $n$-vertex bounded-degree graphs. This significantly improves upon previous results which have worse running time and only work for $b$-matchings with $b \le 7$ and for $b$-edge covers with $b \le 2$. Moreover generally, we prove spectral independence for a broad class of binary symmetric Holant problems with log-concave signatures, including $b$-matchings, $b$-edge covers, and antiferromagnetic $2$-spin edge models. We hence deduce optimal mixing time of Glauber dynamics from spectral independence.
Structured interpolation for multivariate transfer functions of quadratic-bilinear systems
Authors: Peter Benner, Serkan Gugercin, Steffen W. R. Werner
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Abstract
High-dimensional/high-fidelity nonlinear dynamical systems appear naturally when the goal is to accurately model real-world phenomena. Many physical properties are thereby encoded in the internal differential structure of these resulting large-scale nonlinear systems. The high-dimensionality of the dynamics causes computational bottlenecks, especially when these large-scale systems need to be simulated for a variety of situations such as different forcing terms. This motivates model reduction where the goal is to replace the full-order dynamics with accurate reduced-order surrogates. Interpolation-based model reduction has been proven to be an effective tool for the construction of cheap-to-evaluate surrogate models that preserve the internal structure in the case of weak nonlinearities. In this paper, we consider the construction of multivariate interpolants in frequency domain for structured quadratic-bilinear systems. We propose definitions for structured variants of the symmetric subsystem and generalized transfer functions of quadratic-bilinear systems and provide conditions for structure-preserving interpolation by projection. The theoretical results are illustrated using two numerical examples including the simulation of molecular dynamics in crystal structures.
On Solution Discovery via Reconfiguration
Authors: Michael R. Fellows, Mario Grobler, Nicole Megow, Amer E. Mouawad, Vijayaragunathan Ramamoorthi, Frances A. Rosamond, Daniel Schmand, Sebastian Siebertz
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Abstract
The dynamics of real-world applications and systems require efficient methods for improving infeasible solutions or restoring corrupted ones by making modifications to the current state of a system in a restricted way. We propose a new framework of solution discovery via reconfiguration for constructing a feasible solution for a given problem by executing a sequence of small modifications starting from a given state. Our framework integrates and formalizes different aspects of classical local search, reoptimization, and combinatorial reconfiguration. We exemplify our framework on a multitude of fundamental combinatorial problems, namely Vertex Cover, Independent Set, Dominating Set, and Coloring. We study the classical as well as the parameterized complexity of the solution discovery variants of those problems and explore the boundary between tractable and intractable instances.
Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates
Authors: Ke Alexander Wang, Matthew E. Levine, Jiaxin Shi, Emily B. Fox
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Quantitative Methods (q-bio.QM)
Abstract
Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficult to model mechanistically. In this paper, we propose to learn the effects of macronutrition content from glucose-insulin data and meal covariates. Given macronutrition information and meal times, we use a neural network to predict an individual's glucose absorption rate. We use this neural rate function as the control function in a differential equation of glucose dynamics, enabling end-to-end training. On simulated data, our approach is able to closely approximate true absorption rates, resulting in better forecast than heuristic parameterizations, despite only observing glucose, insulin, and macronutritional information. Our work readily generalizes to meal events with higher-dimensional covariates, such as images, setting the stage for glucose dynamics models that are personalized to each individual's daily life.
Empirical Individual State Observability
Authors: Benjamin Cellini, Burak Boyacıoğlu, Floris van Breugel
Abstract
A dynamical system is observable if there is a one-to-one mapping from the system's measured outputs and inputs to all of the system's states. Analytical and empirical tools exist for quantifying the (full state) observability of linear and nonlinear systems; however, empirical tools for evaluating the observability of individual state variables are lacking. Here, a new empirical approach termed Empirical Individual State Observability (E-ISO) is developed to quantify the level of observability of individual state variables. E-ISO first builds an empirical observability matrix via simulation, then applies convex optimization to efficiently determine the subset of its rows required to estimate each state variable individually. Finally, (un)observability measures for these subsets are calculated to provide independent estimates of the observability of each state variable. Multiple example applications of E-ISO on linear and nonlinear systems are shown to be consistent with analytical results. Broadly, E-ISO will be an invaluable tool both for designing active sensing control laws or optimizing sensor placement to increase the observability of individual state variables for engineered systems, and analyzing the trajectory decisions made by organisms.
An Audit Framework for Adopting AI-Nudging on Children
Authors: Marianna Ganapini, Enrico Panai
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Abstract
This is an audit framework for AI-nudging. Unlike the static form of nudging usually discussed in the literature, we focus here on a type of nudging that uses large amounts of data to provide personalized, dynamic feedback and interfaces. We call this AI-nudging (Lanzing, 2019, p. 549; Yeung, 2017). The ultimate goal of the audit outlined here is to ensure that an AI system that uses nudges will maintain a level of moral inertia and neutrality by complying with the recommendations, requirements, or suggestions of the audit (in other words, the criteria of the audit). In the case of unintended negative consequences, the audit suggests risk mitigation mechanisms that can be put in place. In the case of unintended positive consequences, it suggests some reinforcement mechanisms. Sponsored by the IBM-Notre Dame Tech Ethics Lab
SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments
Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan
Abstract
With the increasing prevalence of robots in daily life, it is crucial to enable robots to construct a reliable map online to navigate in unbounded and changing environments. Although existing methods can individually achieve the goals of spatial mapping and dynamic object detection and tracking, limited research has been conducted on an effective combination of these two important abilities. The proposed framework, SMAT (Simultaneous Mapping and Tracking), integrates the front-end dynamic object detection and tracking module with the back-end static mapping module using a self-reinforcing mechanism, which promotes mutual improvement of mapping and tracking performance. The conducted experiments demonstrate the framework's effectiveness in real-world applications, achieving successful long-range navigation and mapping in multiple urban environments using only one LiDAR, a CPU-only onboard computer, and a consumer-level GPS receiver.
Measuring and Modeling the Free Content Web
Authors: Abdulrahman Alabduljabbar, Runyu Ma, Ahmed Abusnaina, Rhongho Jang, Songqing Chen, DaeHun Nyang, and David Mohaisen
Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Performance (cs.PF)
Abstract
Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types, an analysis that supports this belief is lacking in the literature. In particular, it is unclear if those websites are as safe as their premium counterparts. In this paper, we set out to investigate, by analysis and quantification, the similarities and differences between free content and premium websites, including their risk profiles. To conduct this analysis, we assembled a list of 834 free content websites offering books, games, movies, music, and software, and 728 premium websites offering content of the same type. We then contribute domain-, content-, and risk-level analysis, examining and contrasting the websites' domain names, creation times, SSL certificates, HTTP requests, page size, average load time, and content type. For risk analysis, we consider and examine the maliciousness of these websites at the website- and component-level. Among other interesting findings, we show that free content websites tend to be vastly distributed across the TLDs and exhibit more dynamics with an upward trend for newly registered domains. Moreover, the free content websites are 4.5 times more likely to utilize an expired certificate, 19 times more likely to be malicious at the website level, and 2.64 times more likely to be malicious at the component level. Encouraged by the clear differences between the two types of websites, we explore the automation and generalization of the risk modeling of the free content risky websites, showing that a simple machine learning-based technique can produce 86.81\% accuracy in identifying them.
Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics
Authors: Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B. Tenenbaum, Tao Du, Chuang Gan, Wojciech Matusik
Abstract
We propose a hybrid neural network (NN) and PDE approach for learning generalizable PDE dynamics from motion observations. Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and constitutive models (or material models). Without explicit PDE knowledge, these approaches cannot guarantee physical correctness and have limited generalizability. We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned. Instead, constitutive models are particularly suitable for learning due to their data-fitting nature. To this end, we introduce a new framework termed "Neural Constitutive Laws" (NCLaw), which utilizes a network architecture that strictly guarantees standard constitutive priors, including rotation equivariance and undeformed state equilibrium. We embed this network inside a differentiable simulation and train the model by minimizing a loss function based on the difference between the simulation and the motion observation. We validate NCLaw on various large-deformation dynamical systems, ranging from solids to fluids. After training on a single motion trajectory, our method generalizes to new geometries, initial/boundary conditions, temporal ranges, and even multi-physics systems. On these extremely out-of-distribution generalization tasks, NCLaw is orders-of-magnitude more accurate than previous NN approaches. Real-world experiments demonstrate our method's ability to learn constitutive laws from videos.
Pseudo-Hamiltonian neural networks for learning partial differential equations
Abstract
Pseudo-Hamiltonian neural networks (PHNN) were recently introduced for learning dynamical systems that can be modelled by ordinary differential equations. In this paper, we extend the method to partial differential equations. The resulting model is comprised of up to three neural networks, modelling terms representing conservation, dissipation and external forces, and discrete convolution operators that can either be learned or be prior knowledge. We demonstrate numerically the superior performance of PHNN compared to a baseline model that models the full dynamics by a single neural network. Moreover, since the PHNN model consists of three parts with different physical interpretations, these can be studied separately to gain insight into the system, and the learned model is applicable also if external forces are removed or changed.
Dynamic Pricing and Learning with Bayesian Persuasion
Authors: Shipra Agrawal, Yiding Feng, Wei Tang
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Abstract
We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.
SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos
Authors: John Z. Zhang, Shuo Yang, Gengshan Yang, Arun L. Bishop, Deva Ramanan, Zachary Manchester
Abstract
We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured "in the wild" video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
Abstract
Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.
Keyword: efficient
SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration
A Unified Approach to Lane Change Intention Recognition and Driving Status Prediction through TCN-LSTM and Multi-Task Learning Models
Surrogate Assisted Generation of Human-Robot Interaction Scenarios
A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems
Programmatically Grounded, Compositionally Generalizable Robotic Manipulation
Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials
MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results
Proportionally Representative Clustering
SkinSAM: Empowering Skin Cancer Segmentation with Segment Anything Model
An FPTAS for Budgeted Laminar Matroid Independent Set
Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification
A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation
Diagonalization Based Parallel-in-Time Method for a Class of Fourth Order Time Dependent PDEs
Attacks on Robust Distributed Learning Schemes via Sensitivity Curve Maximization
COSST: Multi-organ Segmentation with Partially Labeled Datasets Using Comprehensive Supervisions and Self-training
A Parameterized Theory of PAC Learning
Fourier-Gegenbauer Pseudospectral Method for Solving Time-Dependent One-Dimensional Fractional Partial Differential Equations with Variable Coefficients and Periodic Solutions
Lightweight, Pre-trained Transformers for Remote Sensing Timeseries
Linear and Nonlinear Parareal Methods for the Cahn-Hilliard Equation
Lowering the Entry Bar to HPC-Scale Uncertainty Quantification
Securing Autonomous Air Traffic Management: Blockchain Networks Driven by Explainable AI
Learning Neural PDE Solvers with Parameter-Guided Channel Attention
Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation
Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds
Multiplicity Problems on Algebraic Series and Context-Free Grammars
Tractability of sampling recovery on unweighted function classes
The Mutual Information In The Vicinity of Capacity-Achieving Input Distributions
Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis
Evaluating the Impact of Pair Documentation on Requirements Quality and Team Productivity
A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services
On Solution Discovery via Reconfiguration
Incremental Generalized Category Discovery
Empirical Individual State Observability
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
Dynamic Pricing and Learning with Bayesian Persuasion
string2string: A Modern Python Library for String-to-String Algorithms
Maximizing Model Generalization for Manufacturing with Self-Supervised Learning and Federated Learning
Keyword: faster
Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines
A Survey on Solving and Discovering Differential Equations Using Deep Neural Networks
Variational Bayes Made Easy
Keyword: mobile
AI-based Predictive Analytic Approaches for safeguarding the Future of Electric/Hybrid Vehicles
Detecting inner-LAN anomalies using hierarchical forecasting
A Review of Panoptic Segmentation for Mobile Mapping Point Clouds
A Supervised Machine Learning Approach to Operator Intent Recognition for Teleoperated Mobile Robot Navigation
MCLFIQ: Mobile Contactless Fingerprint Image Quality
Combining HoloLens with Instant-NeRFs: Advanced Real-Time 3D Mobile Mapping
A Versatile Low-Complexity Feedback Scheme for FDD Systems via Generative Modeling
Keyword: pruning
Fine Tuning with Abnormal Examples
JaxPruner: A concise library for sparsity research
Keyword: voxel
There is no result
Keyword: lidar
Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds
Quadric Representations for LiDAR Odometry, Mapping and Localization
A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments
Keyword: diffusion
Towards ethical multimodal systems
Preserving Superconvergence of Spectral Elements for Curved Domains via $h$ and $p$-Geometric Refinement
Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models
Two kinds of numerical algorithms for ultra-slow diffusion equations
Edit Everything: A Text-Guided Generative System for Images Editing
Localized orthogonal decomposition for a multiscale parabolic stochastic partial differential equation
DataComp: In search of the next generation of multimodal datasets
Functional Diffusion Maps
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
Putting People in Their Place: Affordance-Aware Human Insertion into Scenes
Keyword: dynamic
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines
A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems
Controlled density transport using Perron Frobenius generators
Understand the Dynamic World: An End-to-End Knowledge Informed Framework for Open Domain Entity State Tracking
Ensoul: A framework for the creation of self organizing intelligent ultra low power systems (SOULS) through evolutionary enerstatic networks
Physics-informed Data-driven Discovery of Constitutive Models with Application to Strain-Rate-sensitive Soft Materials
Conditional dominance in games with unawareness
Level Assembly as a Markov Decision Process
A One-Dimensional Symmetric Force-Based Blending Method for Atomistic-to-Continuum Coupling
Provably Stabilizing Global-Position Tracking Control for Hybrid Models of Multi-Domain Bipedal Walking via Multiple Lyapunov Analysis
A central scheme for coupled hyperbolic systems
Data-driven time-scale separation of ODE right-hand sides using dynamic mode decomposition and time delay embedding
An FPTAS for Budgeted Laminar Matroid Independent Set
communication of information in systems of heterogenious agents and systems' dynamics
Unification of Lagrangian staggered-grid hydrodynamics and cell-centered hydrodynamics in one dimension
Comparison of Optimization-Based Methods for Energy-Optimal Quadrotor Motion Planning
Compositional 3D Human-Object Neural Animation
Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach
Learning Neural PDE Solvers with Parameter-Guided Channel Attention
A particle method for non-local advection-selection-mutation equations
Some of the variables, some of the parameters, some of the times, with some physics known: Identification with partial information
Fast Sampling of $b$-Matchings and $b$-Edge Covers
Structured interpolation for multivariate transfer functions of quadratic-bilinear systems
On Solution Discovery via Reconfiguration
Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates
Empirical Individual State Observability
An Audit Framework for Adopting AI-Nudging on Children
SMAT: A Self-Reinforcing Framework for Simultaneous Mapping and Tracking in Unbounded Urban Environments
Measuring and Modeling the Free Content Web
Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics
Pseudo-Hamiltonian neural networks for learning partial differential equations
Dynamic Pricing and Learning with Bayesian Persuasion
SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos
Motion-Conditioned Diffusion Model for Controllable Video Synthesis