Abstract
This paper proposes a technique for efficiently modeling dynamic humans by explicifying the implicit neural fields via a Neural Explicit Surface (NES). Implicit neural fields have advantages over traditional explicit representations in modeling dynamic 3D content from sparse observations and effectively representing complex geometries and appearances. Implicit neural fields defined in 3D space, however, are expensive to render due to the need for dense sampling during volumetric rendering. Moreover, their memory efficiency can be further optimized when modeling sparse 3D space. To overcome these issues, the paper proposes utilizing Neural Explicit Surface (NES) to explicitly represent implicit neural fields, facilitating memory and computational efficiency. To achieve this, the paper creates a fully differentiable conversion between the implicit neural fields and the explicit rendering interface of NES, leveraging the strengths of both implicit and explicit approaches. This conversion enables effective training of the hybrid representation using implicit methods and efficient rendering by integrating the explicit rendering interface with a newly proposed rasterization-based neural renderer that only incurs a texture color query once for the initial ray interaction with the explicit surface, resulting in improved inference efficiency. NES describes dynamic human geometries with pose-dependent neural implicit surface deformation fields and their dynamic neural textures both in 2D space, which is a more memory-efficient alternative to traditional 3D methods, reducing redundancy and computational load. The comprehensive experiments show that NES performs similarly to previous 3D approaches, with greatly improved rendering speed and reduced memory cost.
High-Level Features Parallelization for Inference Cost Reduction Through Selective Attention
Authors: André Peter Kelm, Lucas Schmidt, Tim Rolff, Christian Wilms, Ehsan Yaghoubi, Simone Frintrop
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this work, we parallelize high-level features in deep networks to selectively skip or select class-specific features to reduce inference costs. This challenges most deep learning methods due to their limited ability to efficiently and effectively focus on selected class-specific features without retraining. We propose a serial-parallel hybrid architecture with serial generic low-level features and parallel high-level features. This accounts for the fact that many high-level features are class-specific rather than generic, and has connections to recent neuroscientific findings that observe spatially and contextually separated neural activations in the human brain. Our approach provides the unique functionality of cutouts: selecting parts of the network to focus on only relevant subsets of classes without requiring retraining. High performance is maintained, but the cost of inference can be significantly reduced. In some of our examples, up to $75\,\%$ of parameters are skipped and $35\,\%$ fewer GMACs (Giga multiply-accumulate) operations are used as the approach adapts to a change in task complexity. This is important for mobile, industrial, and robotic applications where reducing the number of parameters, the computational complexity, and thus the power consumption can be paramount. Another unique functionality is that it allows processing to be directly influenced by enhancing or inhibiting high-level class-specific features, similar to the mechanism of selective attention in the human brain. This can be relevant for cross-modal applications, the use of semantic prior knowledge, and/or context-aware processing.
Federated Online/Offline Remote Data Inspection for Distributed Edge Computing
Authors: Mohammad Ali, Ximeng Liu
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
In edge computing environments, app vendors can cache their data to be shared with their users in many geographically distributed edge servers. However, the cached data is particularly vulnerable to several intentional attacks or unintentional events. Given the limited resources of edge servers and prohibitive storage costs incurred by app vendors, designing an efficient approach to inspect and maintain the data over tremendous edge servers is a critical issue. To tackle the problem, we design a novel data inspection approach, named ${\text{O}^2\text{DI}}$, that provides the following services: i) using ${\text{O}^2\text{DI}}$, app vendors can inspect the data cached in edge servers without having the original data, which reduces the incurred I/O and storage overhead significantly; ii) computational operations conducted by both edge servers and app vendors are highly efficient because of a novel online/offline technique; iii) many data files cached in different edge servers can be verified quickly and at once by using a novel batch verification method; iv) corrupted data in edge servers can be localized and repaired efficiently. We analyze the security and performance of ${\text{O}^2\text{DI}}$. We see that it is secure in the random oracle model, much faster, and more cost-effective compared to state-of-the-art approaches.
Decoding Layer Saliency in Language Transformers
Authors: Elizabeth M. Hou, Gregory Castanon
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.
Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping
Authors: Ali Jamali, Swalpa Kumar Roy, Danfeng Hong, Peter M Atkinson, Pedram Ghamisi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the available training data are limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP
AI-Enabled Software and System Architecture Frameworks: Focusing on smart Cyber-Physical Systems (CPS)
Authors: Armin Moin, Atta Badii, Stephan Günnemann, Moharram Challenger
Abstract
Several architecture frameworks for software, systems, and enterprises have been proposed in the literature. They identified various stakeholders and defined architecture viewpoints and views to frame and address stakeholder concerns. However, the stakeholders with data science and Machine Learning (ML) related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. Therefore, they failed to address the architecture viewpoints and views responsive to the concerns of the data science community. In this paper, we address this gap by establishing the architecture frameworks adapted to meet the requirements of modern applications and organizations where ML artifacts are both prevalent and crucial. In particular, we focus on ML-enabled Cyber-Physical Systems (CPSs) and propose two sets of merit criteria for their efficient development and performance assessment, namely the criteria for evaluating and benchmarking ML-enabled CPSs, and the criteria for evaluation and benchmarking of the tools intended to support users through the modeling and development pipeline. In this study, we deploy multiple empirical and qualitative research methods based on literature review and survey instruments including expert interviews and an online questionnaire. We collect, analyze, and integrate the opinions of 77 experts from more than 25 organizations in over 10 countries to devise and validate the proposed framework.
Advancing Early Detection of Virus Yellows: Developing a Hybrid Convolutional Neural Network for Automatic Aphid Counting in Sugar Beet Fields
Abstract
Aphids are efficient vectors to transmit virus yellows in sugar beet fields. Timely monitoring and control of their populations are thus critical to prevent the large-scale outbreak of virus yellows. However, the manual counting of aphids, which is the most common practice, is labor-intensive and time-consuming. Additionally, two of the biggest challenges in aphid counting are that aphids are small objects and their density distributions are varied in different areas of the field. To address these challenges, we proposed a hybrid automatic aphid counting network architecture which integrates the detection network and the density map estimation network. When the distribution density of aphids is low, it utilizes an improved Yolov5 to count aphids. Conversely, when the distribution density of aphids is high, its witches to CSRNet to count aphids. To the best of our knowledge, this is the first framework integrating the detection network and the density map estimation network for counting tasks. Through comparison experiments of counting aphids, it verified that our proposed approach outperforms all other methods in counting aphids. It achieved the lowest MAE and RMSE values for both the standard and high-density aphid datasets: 2.93 and 4.01 (standard), and 34.19 and 38.66 (high-density), respectively. Moreover, the AP of the improved Yolov5 is 5% higher than that of the original Yolov5. Especially for extremely small aphids and densely distributed aphids, the detection performance of the improved Yolov5 is significantly better than the original Yolov5. This work provides an effective early warning for the virus yellows risk caused by aphids in sugar beet fields, offering protection for sugar beet growth and ensuring sugar beet yield. The datasets and project code are released at: https://github.com/JunfengGaolab/Counting-Aphids.
Output-feedback model predictive and model-free control for ramp metering
Abstract
We study the stability of freeway traffic flow under output-feedback ramp metering in cases of incomplete information about the traffic flow state. We propose a set-membership estimation method for the cell transmission model with capacity drops and design a model predictive controller accordingly. This controller has linear running and terminal costs in cell densities, and its output comprises density measurements from a subset of the cells, e.g., through loop detectors or connected vehicles. For a line network, we provide sufficient conditions under which the traffic system is input-to-state stable, meaning the ramp queue length remains bounded, on the control horizons, cost coefficients, and the inflows at the ramps. To further relax the requirement on exact knowledge of model parameters, we design a model-free controller that computes metering rates directly from measurement data. In addition to the proposed controllers, we provide extensive simulations of other ramp metering algorithms in the literature as baselines for evaluating performance. Simulation results show that traffic flow is unstable without ramp metering or with other ramp metering methods under high inflows, congested initial conditions, and incomplete state measurements. In contrast, the designed model predictive and model-free controllers stabilize traffic flow and provide higher throughput with measurements from an arbitrary subset of the cells.
A general fourth-order mesoscopic multiple-relaxation-time lattice Boltzmann model and equivalent macroscopic finite-difference scheme for two-dimensional diffusion equations
Authors: Ying Chen, Zhenhua Chai, Baochang Shi
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
In this work, we first develop a general mesoscopic multiple-relaxation-time lattice Boltzmann (MRT-LB) model for the two-dimensional diffusion equation with the constant diffusion coefficient and source term, where the D2Q5 (five discrete velocities in two-dimensional space) lattice structure is considered. Then we exactly derive the equivalent macroscopic finite-difference scheme of the MRT-LB model. Additionally, we also propose a proper MRT-LB model for the diffusion equation with a linear source term, and obtain an equivalent macroscopic six-level finite-difference scheme. After that, we conduct the accuracy and stability analysis of the finite-difference scheme and the mesoscopic MRT-LB model. It is found that at the diffusive scaling, both of them can achieve a fourth-order accuracy in space based on the Taylor expansion. The stability analysis also shows that they are both unconditionally stable. Finally, some numerical experiments are conducted, and the numerical results are also consistent with our theoretical analysis.
Bayesian Meta-Learning on Control Barrier Functions with Data from On-Board Sensors
Abstract
In this paper, we consider a way to safely navigate the robots in unknown environments using measurement data from sensory devices. The control barrier function (CBF) is one of the promising approaches to encode safety requirements of the system and the recent progress on learning-based approaches for CBF realizes online synthesis of CBF-based safe controllers with sensor measurements. However, the existing methods are inefficient in the sense that the trained CBF cannot be generalized to different environments and the re-synthesis of the controller is necessary when changes in the environment occur. Thus, this paper considers a way to learn CBF that can quickly adapt to a new environment with few amount of data by utilizing the currently developed Bayesian meta-learning framework. The proposed scheme realizes efficient online synthesis of the controller as shown in the simulation study and provides probabilistic safety guarantees on the resulting controller.
Abstract
The analysis of mapping relationships and distortions in multidimensional data poses a significant challenge in contemporary research. While Beltrami coefficients offer a precise description of distortions in two-dimensional mappings, current tools lack this capability in the context of three-dimensional space. This paper presents a novel approach: a 3D quasiconformal representation that captures the local dilation of 3D mappings, along with an algorithm that establishes a connection between this representation and the corresponding mapping. Experimental results showcase the algorithm's effectiveness in eliminating foldings in 3D mappings, as well as in mapping reconstruction and generation. These features bear a resemblance to the 2D Linear Beltrami Solver technique. The work presented in this paper offers a promising solution for the precise analysis and adjustment of distortions in 3D data and mappings.
FocusFlow: Leveraging Focal Depth for Gaze Interaction in Virtual Reality
Abstract
Current gaze input methods for VR headsets predominantly utilize the gaze ray as a pointing cursor, often neglecting depth information in it. This study introduces FocusFlow, a novel gaze interaction technique that integrates focal depth into gaze input dimensions, facilitating users to actively shift their focus along the depth dimension for interaction. A detection algorithm to identify the user's focal depth is developed. Based on this, a layer-based UI is proposed, which uses focal depth changes to enable layer switch operations, offering an intuitive hands-free selection method. We also designed visual cues to guide users to adjust focal depth accurately and get familiar with the interaction process. Preliminary evaluations demonstrate the system's usability, and several potential applications are discussed. Through FocusFlow, we aim to enrich the input dimensions of gaze interaction, achieving more intuitive and efficient human-computer interactions on headset devices.
Machine Learning aided Computer Architecture Design for CNN Inferencing Systems
Abstract
Efficient and timely calculations of Machine Learning (ML) algorithms are essential for emerging technologies like autonomous driving, the Internet of Things (IoT), and edge computing. One of the primary ML algorithms used in such systems is Convolutional Neural Networks (CNNs), which demand high computational resources. This requirement has led to the use of ML accelerators like GPGPUs to meet design constraints. However, selecting the most suitable accelerator involves Design Space Exploration (DSE), a process that is usually time-consuming and requires significant manual effort. Our work presents approaches to expedite the DSE process by identifying the most appropriate GPGPU for CNN inferencing systems. We have developed a quick and precise technique for forecasting the power and performance of CNNs during inference, with a MAPE of 5.03% and 5.94%, respectively. Our approach empowers computer architects to estimate power and performance in the early stages of development, reducing the necessity for numerous prototypes. This saves time and money while also improving the time-to-market period.
Product Review Image Ranking for Fashion E-commerce
Abstract
In a fashion e-commerce platform where customers can't physically examine the products on their own, being able to see other customers' text and image reviews of the product is critical while making purchase decisions. Given the high reliance on these reviews, over the years we have observed customers proactively sharing their reviews. With an increase in the coverage of User Generated Content (UGC), there has been a corresponding increase in the number of customer images. It is thus imperative to display the most relevant images on top as it may influence users' online shopping choices and behavior. In this paper, we propose a simple yet effective training procedure for ranking customer images. We created a dataset consisting of Myntra (A Major Indian Fashion e-commerce company) studio posts and highly engaged (upvotes/downvotes) UGC images as our starting point and used selected distortion techniques on the images of the above dataset to bring their quality at par with those of bad UGC images. We train our network to rank bad-quality images lower than high-quality ones. Our proposed method outperforms the baseline models on two metrics, namely correlation coefficient, and accuracy, by substantial margins.
Learning Gabor Texture Features for Fine-Grained Recognition
Authors: Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, Jun Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Extracting and using class-discriminative features is critical for fine-grained recognition. Existing works have demonstrated the possibility of applying deep CNNs to exploit features that distinguish similar classes. However, CNNs suffer from problems including frequency bias and loss of detailed local information, which restricts the performance of recognizing fine-grained categories. To address the challenge, we propose a novel texture branch as complimentary to the CNN branch for feature extraction. We innovatively utilize Gabor filters as a powerful extractor to exploit texture features, motivated by the capability of Gabor filters in effectively capturing multi-frequency features and detailed local information. We implement several designs to enhance the effectiveness of Gabor filters, including imposing constraints on parameter values and developing a learning method to determine the optimal parameters. Moreover, we introduce a statistical feature extractor to utilize informative statistical information from the signals captured by Gabor filters, and a gate selection mechanism to enable efficient computation by only considering qualified regions as input for texture extraction. Through the integration of features from the Gabor-filter-based texture branch and CNN-based semantic branch, we achieve comprehensive information extraction. We demonstrate the efficacy of our method on multiple datasets, including CUB-200-2011, NA-bird, Stanford Dogs, and GTOS-mobile. State-of-the-art performance is achieved using our approach.
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
Authors: Guangyao Li, Wenxuan Hou, Di Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract
Audio-Visual Question Answering (AVQA) task aims to answer questions about different visual objects, sounds, and their associations in videos. Such naturally multi-modal videos are composed of rich and complex dynamic audio-visual components, where most of which could be unrelated to the given questions, or even play as interference in answering the content of interest. Oppositely, only focusing on the question-aware audio-visual content could get rid of influence, meanwhile enabling the model to answer more efficiently. In this paper, we propose a Progressive Spatio-Temporal Perception Network (PSTP-Net), which contains three modules that progressively identify key spatio-temporal regions w.r.t. questions. Specifically, a temporal segment selection module is first introduced to select the most relevant audio-visual segments related to the given question. Then, a spatial region selection module is utilized to choose the most relevant regions associated with the question from the selected temporal segments. To further refine the selection of features, an audio-guided visual attention module is employed to perceive the association between auido and selected spatial regions. Finally, the spatio-temporal features from these modules are integrated for answering the question. Extensive experimental results on the public MUSIC-AVQA and AVQA datasets provide compelling evidence of the effectiveness and efficiency of PSTP-Net. Code is available at: \href{https://github.com/GeWu-Lab/PSTP-Net}{https://github.com/GeWu-Lab/PSTP-Net}
Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
Abstract
Speech-driven 3D face animation poses significant challenges due to the intricacy and variability inherent in human facial movements. This paper emphasizes the importance of considering both the composite and regional natures of facial movements in speech-driven 3D face animation. The composite nature pertains to how speech-independent factors globally modulate speech-driven facial movements along the temporal dimension. Meanwhile, the regional nature alludes to the notion that facial movements are not globally correlated but are actuated by local musculature along the spatial dimension. It is thus indispensable to incorporate both natures for engendering vivid animation. To address the composite nature, we introduce an adaptive modulation module that employs arbitrary facial movements to dynamically adjust speech-driven facial movements across frames on a global scale. To accommodate the regional nature, our approach ensures that each constituent of the facial features for every frame focuses on the local spatial movements of 3D faces. Moreover, we present a non-autoregressive backbone for translating audio to 3D facial movements, which maintains high-frequency nuances of facial movements and facilitates efficient inference. Comprehensive experiments and user studies demonstrate that our method surpasses contemporary state-of-the-art approaches both qualitatively and quantitatively.
Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation
Authors: Jun Zhou, Kai Chen, Linlin Xu, Qi Dou, Jing Qin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
One critical challenge in 6D object pose estimation from a single RGBD image is efficient integration of two different modalities, i.e., color and depth. In this work, we tackle this problem by a novel Deep Fusion Transformer~(DFTr) block that can aggregate cross-modality features for improving pose estimation. Unlike existing fusion methods, the proposed DFTr can better model cross-modality semantic correlation by leveraging their semantic similarity, such that globally enhanced features from different modalities can be better integrated for improved information extraction. Moreover, to further improve robustness and efficiency, we introduce a novel weighted vector-wise voting algorithm that employs a non-iterative global optimization strategy for precise 3D keypoint localization while achieving near real-time inference. Extensive experiments show the effectiveness and strong generalization capability of our proposed 3D keypoint voting algorithm. Results on four widely used benchmarks also demonstrate that our method outperforms the state-of-the-art methods by large margins.
Occupancy Grid Map to Pose Graph-based Map: Robust BIM-based 2D-LiDAR Localization for Lifelong Indoor Navigation in Changing and Dynamic Environments
Authors: Miguel Arturo Vega Torres, Alexander Braun, André Borrmann
Abstract
Several studies rely on the de facto standard Adaptive Monte Carlo Localization (AMCL) method to localize a robot in an Occupancy Grid Map (OGM) extracted from a building information model (BIM model). However, most of these studies assume that the BIM model precisely represents the real world, which is rarely true. Discrepancies between the reference BIM model and the real world (Scan-BIM deviations) are not only due to furniture or clutter but also the usual as-planned and as-built deviations that exist with any model created in the design phase. These deviations affect the accuracy of AMCL drastically. This paper proposes an open-source method to generate appropriate Pose Graph-based maps from BIM models for robust 2D-LiDAR localization in changing and dynamic environments. First, 2D OGMs are automatically generated from complex BIM models. These OGMs only represent structural elements allowing indoor autonomous robot navigation. Then, an efficient technique converts these 2D OGMs into Pose Graph-based maps enabling more accurate robot pose tracking. Finally, we leverage the different map representations for accurate, robust localization with a combination of state-of-the-art algorithms. Moreover, we provide a quantitative comparison of various state-of-the-art localization algorithms in three simulated scenarios with varying levels of Scan-BIM deviations and dynamic agents. More precisely, we compare two Particle Filter (PF) algorithms: AMCL and General Monte Carlo Localization (GMCL); and two Graph-based Localization (GBL) methods: Google's Cartographer and SLAM Toolbox, solving the global localization and pose tracking problems. The numerous experiments demonstrate that the proposed method contributes to a robust localization with an as-designed BIM model or a sparse OGM in changing and dynamic environments, outperforming the conventional AMCL in accuracy and robustness.
Scheduling for Periodic Multi-Source Systems with Peak-Age Violation Guarantees
Abstract
Age of information (AoI) is an effective performance metric measuring the freshness of information and is particularly suitable for applications involving status update. In this paper, using the age violation probability as the metric, scheduling for heterogeneous multi-source systems is studied. Two queueing disciplines, namely the infinite packet queueing discipline and the single packet queueing discipline, are considered for scheduling packets within each source. A generalized round-robin (GRR) scheduling policy is then proposed to schedule the sources. Bounds on the exponential decay rate of the age violation probability for the proposed GRR scheduling policy under each queueing discipline are rigorously analyzed. Simulation results are provided, which show that the proposed GRR scheduling policy can efficiently serve many sources with heterogeneous arrivals and that our bounds can capture the true decay rate quite accurately. When specialized to the homogeneous source setting, the analysis concretizes the common belief that the single packet queueing discipline has a better AoI performance than the infinite packet queueing discipline. Moreover, simulations on this special case reveals that under the proposed scheduling policy, the two disciplines would have similar asymptotic performance when the inter-arrival time is much larger than the total transmission time.
$\mathcal{G}^2Pxy$: Generative Open-Set Node Classification on Graphs with Proxy Unknowns
Authors: Qin Zhang, Zelin Shi, Xiaolin Zhang, Xiaojun Chen, Philippe Fournier-Viger, Shirui Pan
Abstract
Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are often applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e. $\mathcal{G}^2Pxy$, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints of both cross entropy loss and complement entropy loss, $\mathcal{G}^2Pxy$ achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on benchmark graph datasets. Moreover, $\mathcal{G}^2Pxy$ does not have specific requirement on the GNN architecture and shows good generalizations.
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
Abstract
Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
Abstract
We aim at providing the object detection community with an efficient and performant object detector, termed YOLO-MS. The core design is based on a series of investigations on how convolutions with different kernel sizes affect the detection performance of objects at different scales. The outcome is a new strategy that can strongly enhance multi-scale feature representations of real-time object detectors. To verify the effectiveness of our strategy, we build a network architecture, termed YOLO-MS. We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets, like ImageNet, or pre-trained weights. Without bells and whistles, our YOLO-MS outperforms the recent state-of-the-art real-time object detectors, including YOLO-v7 and RTMDet, when using a comparable number of parameters and FLOPs. Taking the XS version of YOLO-MS as an example, with only 4.5M learnable parameters and 8.7G FLOPs, it can achieve an AP score of 43%+ on MS COCO, which is about 2%+ higher than RTMDet with the same model size. Moreover, our work can also be used as a plug-and-play module for other YOLO models. Typically, our method significantly improves the AP of YOLOv8 from 37%+ to 40%+ with even fewer parameters and FLOPs. Code is available at https://github.com/FishAndWasabi/YOLO-MS.
LLM As DBA
Authors: Xuanhe Zhou, Guoliang Li, Zhiyuan Liu
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results that D-Bot can efficiently and effectively diagnose the root causes and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.
Quality Diversity under Sparse Reward and Sparse Interaction: Application to Grasping in Robotics
Authors: J. Huber, F. Hélénon, M. Coninx, F. Ben Amar, S. Doncieux
Abstract
Quality-Diversity (QD) methods are algorithms that aim to generate a set of diverse and high-performing solutions to a given problem. Originally developed for evolutionary robotics, most QD studies are conducted on a limited set of domains - mainly applied to locomotion, where the fitness and the behavior signal are dense. Grasping is a crucial task for manipulation in robotics. Despite the efforts of many research communities, this task is yet to be solved. Grasping cumulates unprecedented challenges in QD literature: it suffers from reward sparsity, behavioral sparsity, and behavior space misalignment. The present work studies how QD can address grasping. Experiments have been conducted on 15 different methods on 10 grasping domains, corresponding to 2 different robot-gripper setups and 5 standard objects. An evaluation framework that distinguishes the evaluation of an algorithm from its internal components has also been proposed for a fair comparison. The obtained results show that MAP-Elites variants that select successful solutions in priority outperform all the compared methods on the studied metrics by a large margin. We also found experimental evidence that sparse interaction can lead to deceptive novelty. To our knowledge, the ability to efficiently produce examples of grasping trajectories demonstrated in this work has no precedent in the literature.
Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
Authors: Xu Zheng, Tianbo Pan, Yunhao Luo, Lin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Endeavors have been recently made to transfer knowledge from the labeled pinhole image domain to the unlabeled panoramic image domain via Unsupervised Domain Adaptation (UDA). The aim is to tackle the domain gaps caused by the style disparities and distortion problem from the non-uniformly distributed pixels of equirectangular projection (ERP). Previous works typically focus on transferring knowledge based on geometric priors with specially designed multi-branch network architectures. As a result, considerable computational costs are induced, and meanwhile, their generalization abilities are profoundly hindered by the variation of distortion among pixels. In this paper, we find that the pixels' neighborhood regions of the ERP indeed introduce less distortion. Intuitively, we propose a novel UDA framework that can effectively address the distortion problems for panoramic semantic segmentation. In comparison, our method is simpler, easier to implement, and more computationally efficient. Specifically, we propose distortion-aware attention (DA) capturing the neighboring pixel distribution without using any geometric constraints. Moreover, we propose a class-wise feature aggregation (CFA) module to iteratively update the feature representations with a memory bank. As such, the feature similarity between two domains can be consistently optimized. Extensive experiments show that our method achieves new state-of-the-art performance while remarkably reducing 80% parameters.
An Adaptive Algorithm Based on Stochastic Discontinuous Galerkin for Convection Dominated Equations with Random Data
Abstract
In this paper, we propose an adaptive approach, based on mesh refinement or parametric enrichment, for convection diffusion equations containing randomness in their coefficients. A parametric system of convection diffusion equations obtained by an application of stochastic Galerkin approach is discretized by using a symmetric interior penalty Galerkin (SIPG) method with upwinding for the convection term in the spatial domain. We show the reliability of the proposed residual-based error estimator in the energy norm contributed by the error due to the SIPG discretization, the error due to the data oscillations, and the error due to the (generalized) polynomial chaos discretization in the parametric space. To illustrate the performance of the proposed estimator, several benchmark examples including a random diffusivity parameter, a random velocity parameter, random diffusivity/velocity parameters, and a random (jump) discontinuous diffusivity parameter, are tested.
Effects of Dynamic and Stochastic Travel Times on the Operation of Mobility-on-Demand Services
Authors: Fynn Wolf, Roman Engelhardt, Yunfei Zhang, Florian Dandl, Klaus Bogenberger
Abstract
Mobility-on-Demand (MoD) services have been an active research topic in recent years. Many studies focused on developing control algorithms to supply efficient services. To cope with a large search space to solve the underlying vehicle routing problem, studies usually apply hard time-constraints on pick-up and drop-off while considering static network travel times to reduce computational time. As travel times in real street networks are dynamic and stochastic, assigned routes considered feasible by the control algorithm in one time step might become infeasible in the next. Since once assigned and confirmed, customers should still be part of the solution, damage control is necessary to counteract this effect. In this study, a detailed simulation framework for MoD services is coupled with a microscopic traffic simulation to create dynamic and stochastic travel times, and tested in a case study for Munich, Germany. Results showed that the combination of inaccurate travel time estimation and damage control strategies for infeasible routes deteriorates the performance of MoD services -- hailing and pooling -- significantly. Moreover, customers suffer from unreliable pick-up time and travel time estimations. Allowing re-assignments of initial vehicle schedules according to updated system states helps to restore system efficiency and reliability, but only to a minor extent.
Abstract
Although App updates are frequent and software engineers would like to verify updated features only, automated testing techniques verify entire Apps and are thus wasting resources. We present Continuous Adaptation of Learned Models (CALM), an automated App testing approach that efficiently tests App updates by adapting App models learned when automatically testing previous App versions. CALM focuses on functional testing. Since functional correctness can be mainly verified through the visual inspection of App screens, CALM minimizes the number of App screens to be visualized by software testers while maximizing the percentage of updated methods and instructions exercised. Our empirical evaluation shows that CALM exercises a significantly higher proportion of updated methods and instructions than six state-of-the-art approaches, for the same maximum number of App screens to be visually inspected. Further, in common update scenarios, where only a small fraction of methods are updated, CALM is even quicker to outperform all competing approaches in a more significant way.
Accountability of Things: Large-Scale Tamper-Evident Logging for Smart Devices
Abstract
Our modern world relies on a growing number of interconnected and interacting devices, leading to a plethora of logs establishing audit trails for all kinds of events. Simultaneously, logs become increasingly important for forensic investigations, and thus, an adversary will aim to alter logs to avoid culpability, e.g., by compromising devices that generate and store logs. Thus, it is essential to ensure that no one can tamper with any logs without going undetected. However, existing approaches to establish tamper evidence of logs do not scale and cannot protect the increasingly large number of devices found today, as they impose large storage or network overheads. Additionally, most schemes do not provide an efficient mechanism to prove that individual events have been logged to establish accountability when different devices interact. This paper introduces a novel scheme for practical large-scale tamper-evident logging with the help of a trusted third party. To achieve this, we present a new binary hash tree construction designed around timestamps to achieve constant storage overhead with a configured temporal resolution. Additionally, our design enables the efficient construction of shareable proofs, proving that an event was indeed logged. Our evaluation shows that - using practical parameters - our scheme can localize any tampering of logs with a sub-second resolution, with a constant overhead of ~8KB per hour per device.
Using Machine Learning To Identify Software Weaknesses From Software Requirement Specifications
Authors: Mounika Vanamala, Sean Loesch, Alexander Caravella
Abstract
Secure software engineering is crucial but can be time-consuming; therefore, methods that could expedite the identification of software weaknesses without reducing the process efficacy would benefit the software engineering industry and thus benefit modern life. This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications. The research uses the CWE repository and PROMISE exp dataset for training. Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested, with SVM and neural network producing reliable results. The research is unique contribution lies in the mapping technique and algorithm selection. It serves as a valuable reference for the secure software engineering community seeking to expedite the development lifecycle without compromising efficacy. Future work involves testing more algorithms, optimizing existing ones, and improving the training sets accuracy.
Bijective Density-Equalizing Quasiconformal Map for Multiply-Connected Open Surfaces
Authors: Zhiyuan Lyu, Gary P. T. Choi, Lok Ming Lui
Abstract
This paper proposes a novel method for computing bijective density-equalizing quasiconformal (DEQ) flattening maps for multiply-connected open surfaces. In conventional density-equalizing maps, shape deformations are solely driven by prescribed constraints on the density distribution, defined as the population per unit area, while the bijectivity and local geometric distortions of the mappings are uncontrolled. Also, prior methods have primarily focused on simply-connected open surfaces but not surfaces with more complicated topologies. Our proposed method overcomes these issues by formulating the density diffusion process as a quasiconformal flow, which allows us to effectively control the local geometric distortion and guarantee the bijectivity of the mapping by solving an energy minimization problem involving the Beltrami coefficient of the mapping. To achieve an optimal parameterization of multiply-connected surfaces, we develop an iterative scheme that optimizes both the shape of the target planar circular domain and the density-equalizing quasiconformal map onto it. In addition, landmark constraints can be incorporated into our proposed method for consistent feature alignment. The method can also be naturally applied to simply-connected open surfaces. By changing the prescribed population, a large variety of surface flattening maps with different desired properties can be achieved. The method is tested on both synthetic and real examples, demonstrating its efficacy in various applications in computer graphics and medical imaging.
Banzhaf Values for Facts in Query Answering
Authors: Omer Abramovich, Daniel Deutch, Nave Frost, Ahmet Kara, Dan Olteanu
Abstract
Quantifying the contribution of database facts to query answers has been studied as means of explanation. The Banzhaf value, originally developed in Game Theory, is a natural measure of fact contribution, yet its efficient computation for select-project-join-union queries is challenging. In this paper, we introduce three algorithms to compute the Banzhaf value of database facts: an exact algorithm, an anytime deterministic approximation algorithm with relative error guarantees, and an algorithm for ranking and top-$k$. They have three key building blocks: compilation of query lineage into an equivalent function that allows efficient Banzhaf value computation; dynamic programming computation of the Banzhaf values of variables in a Boolean function using the Banzhaf values for constituent functions; and a mechanism to compute efficiently lower and upper bounds on Banzhaf values for any positive DNF function. We complement the algorithms with a dichotomy for the Banzhaf-based ranking problem: given two facts, deciding whether the Banzhaf value of one is greater than of the other is tractable for hierarchical queries and intractable for non-hierarchical queries. We show experimentally that our algorithms significantly outperform exact and approximate algorithms from prior work, most times up to two orders of magnitude. Our algorithms can also cover challenging problem instances that are beyond reach for prior work.
Optimizing Cache Content Placement in Integrated Terrestrial and Non-terrestrial Networks
Authors: Feng Wang, Giovanni Geraci, Tony Q. S. Quek
Subjects: Systems and Control (eess.SY); Networking and Internet Architecture (cs.NI)
Abstract
Non-terrestrial networks (NTN) offer potential for efficient content broadcast in remote regions, thereby extending the reach of digital services. In this paper, we introduce a novel approach to optimize wireless edge content placement using NTN. Specifically, we dynamically select content for placement via NTN links based on popularity and suitability for delivery through NTN, while considering the orbital motion of LEO satellites. Our comprehensive system-level case studies, based on a practical LEO constellation, demonstrate the significant improvement in placement speed compared to existing methods that neglect network mobility. We further show that the advantages of NTN links over standalone wireless TN solutions are more pronounced in the early stages of content delivery and are amplified by higher content popularity correlation across geographical regions.
LASIGE and UNICAGE solution to the NASA LitCoin NLP Competition
Authors: Pedro Ruas, Diana F. Sousa, André Neves, Carlos Cruz, Francisco M. Couto
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Performance (cs.PF)
Abstract
Biomedical Natural Language Processing (NLP) tends to become cumbersome for most researchers, frequently due to the amount and heterogeneity of text to be processed. To address this challenge, the industry is continuously developing highly efficient tools and creating more flexible engineering solutions. This work presents the integration between industry data engineering solutions for efficient data processing and academic systems developed for Named Entity Recognition (LasigeUnicage_NER) and Relation Extraction (BiOnt). Our design reflects an integration of those components with external knowledge in the form of additional training data from other datasets and biomedical ontologies. We used this pipeline in the 2022 LitCoin NLP Challenge, where our team LasigeUnicage was awarded the 7th Prize out of approximately 200 participating teams, reflecting a successful collaboration between the academia (LASIGE) and the industry (Unicage). The software supporting this work is available at \url{https://github.com/lasigeBioTM/Litcoin-Lasige_Unicage}.
CoBaIR: A Python Library for Context-Based Intention Recognition in Human-Robot-Interaction
Authors: Adrian Lubitz, Lisa Gutzeit, Frank Kirchner
Abstract
Human-Robot Interaction (HRI) becomes more and more important in a world where robots integrate fast in all aspects of our lives but HRI applications depend massively on the utilized robotic system as well as the deployment environment and cultural differences. Because of these variable dependencies it is often not feasible to use a data-driven approach to train a model for human intent recognition. Expert systems have been proven to close this gap very efficiently. Furthermore, it is important to support understandability in HRI systems to establish trust in the system. To address the above-mentioned challenges in HRI we present an adaptable python library in which current state-of-the-art Models for context recognition can be integrated. For Context-Based Intention Recognition a two-layer Bayesian Network (BN) is used. The bayesian approach offers explainability and clarity in the creation of scenarios and is easily extendable with more modalities. Additionally, it can be used as an expert system if no data is available but can as well be fine-tuned when data becomes available.
ReLU and Addition-based Gated RNN
Authors: Rickard Brännvall, Henrik Forsgren, Fredrik Sandin, Marcus Liwicki
Abstract
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost, thereby opening up for more efficient execution or larger models on restricted hardware. Recurrent Neural Networks (RNNs) with gating mechanisms such as LSTM and GRU have been widely successful in learning from sequential data due to their ability to capture long-term dependencies. Conventionally, the update based on current inputs and the previous state history is each multiplied with dynamic weights and combined to compute the next state. However, multiplication can be computationally expensive, especially for certain hardware architectures or alternative arithmetic systems such as homomorphic encryption. It is demonstrated that the novel gating mechanism can capture long-term dependencies for a standard synthetic sequence learning task while significantly reducing computational costs such that execution time is reduced by half on CPU and by one-third under encryption. Experimental results on handwritten text recognition tasks furthermore show that the proposed architecture can be trained to achieve comparable accuracy to conventional GRU and LSTM baselines. The gating mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the multiplication of encrypted variables. It can also support quantization in (unencrypted) plaintext applications, with the potential for substantial performance gains since the addition-based formulation can avoid the expansion to double precision often required for multiplication.
The Fast and the Private: Task-based Dataset Search
Abstract
Recent dataset search platforms use ML task-based utility measures rather than metadata-based keywords, to search large dataset corpora. Requesters provide an initial dataset, and the platform seeks additional datasets that augment -- join or union -- requester's dataset to most improve the model (e.g., linear regression) performance. Although effective, current task-based data searches are stymied by (1) high latency which deters users, (2) privacy concerns for regulatory standards, and (3) low data quality which provides low utility. We introduce Mileena, a fast, private, and high-quality task-based dataset search platform. At its heart, Mileena is built on pre-computed semi-ring sketches for efficient ML training and evaluation. Based on semi-ring, we develop a novel Factorized Privacy Mechanism that makes the search differentially private and scales to arbitrary corpus sizes and numbers of requests without major quality degradation. We also demonstrate the early promise in using LLM-based agents for automatic data transformation and applying semi-rings to support causal discovery and treatment effect estimation.
AST-MHSA : Code Summarization using Multi-Head Self-Attention
Authors: Yeshwanth Nagaraj, Ujjwal Gupta
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE)
Abstract
Code summarization aims to generate concise natural language descriptions for source code. The prevailing approaches adopt transformer-based encoder-decoder architectures, where the Abstract Syntax Tree (AST) of the source code is utilized for encoding structural information. However, ASTs are much longer than the corresponding source code, and existing methods ignore this size constraint by directly feeding the entire linearized AST into the encoders. This simplistic approach makes it challenging to extract truly valuable dependency relations from the overlong input sequence and leads to significant computational overhead due to self-attention applied to all nodes in the AST. To address this issue effectively and efficiently, we present a model, AST-MHSA that uses multi-head attention to extract the important semantic information from the AST. The model consists of two main components: an encoder and a decoder. The encoder takes as input the abstract syntax tree (AST) of the code and generates a sequence of hidden states. The decoder then takes these hidden states as input and generates a natural language summary of the code. The multi-head attention mechanism allows the model to learn different representations of the input code, which can be combined to generate a more comprehensive summary. The model is trained on a dataset of code and summaries, and the parameters of the model are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
ESBMC v7.3: Model Checking C++ Programs using Clang AST
Authors: Kunjian Song, Mikhail R. Gadelha, Franz Brauße, Rafael S. Menezes, Lucas C. Cordeiro
Abstract
This paper introduces ESBMC v7.3, the latest Efficient SMT-Based Context-Bounded Model Checker version, which now incorporates a new clang-based C++ front-end. While the previous CPROVER-based front-end served well for handling C++03 programs, it encountered challenges keeping up with the evolving C++ language. As new language and library features were added in each C++ version, the limitations of the old front-end became apparent, leading to difficult-to-maintain code. Consequently, modern C++ programs were challenging to verify. To overcome this obstacle, we redeveloped the front-end, opting for a more robust approach using clang. The new front-end efficiently traverses the Abstract Syntax Tree (AST) in-memory using clang APIs and transforms each AST node into ESBMC's Intermediate Representation. Through extensive experimentation, our results demonstrate that ESBMC v7.3 with the new front-end significantly reduces parse and conversion errors, enabling successful verification of a wide range of C++ programs, thereby outperforming previous ESBMC versions.
Efficient Function Approximation in Enriched Approximation Spaces
Abstract
An enriched approximation space is the span of a conventional basis with a few extra functions included, for example to capture known features of the solution to a computational problem. Adding functions to a basis makes it overcomplete and, consequently, the corresponding discretized approximation problem may require solving an ill-conditioned system. Recent research indicates that these systems can still provide highly accurate numerical approximations under reasonable conditions. In this paper we propose an efficient algorithm to compute such approximations. It is based on the AZ algorithm for overcomplete sets and frames, which simplifies in the case of an enriched basis. In addition, analysis of the original AZ algorithm and of the proposed variant gives constructive insights on how to achieve optimal and stable discretizations using enriched bases. We apply the algorithm to examples of enriched approximation spaces in literature, including a few non-standard approximation problems and an enriched spectral method for a 2D boundary value problem, and show that the simplified AZ algorithm is indeed stable, accurate and efficient.
Automatic Extraction of Relevant Road Infrastructure using Connected vehicle data and Deep Learning Model
Authors: Adu-Gyamfi Kojo, Kandiboina Raghupathi, Ravichandra-Mouli Varsha, Knickerbocker Skylar, Hans Zachary N, Hawkins, Neal R, Sharma Anuj
Abstract
In today's rapidly evolving urban landscapes, efficient and accurate mapping of road infrastructure is critical for optimizing transportation systems, enhancing road safety, and improving the overall mobility experience for drivers and commuters. Yet, a formidable bottleneck obstructs progress - the laborious and time-intensive manual identification of intersections. Simply considering the shear number of intersections that need to be identified, and the labor hours required per intersection, the need for an automated solution becomes undeniable. To address this challenge, we propose a novel approach that leverages connected vehicle data and cutting-edge deep learning techniques. By employing geohashing to segment vehicle trajectories and then generating image representations of road segments, we utilize the YOLOv5 (You Only Look Once version 5) algorithm for accurate classification of both straight road segments and intersections. Experimental results demonstrate an impressive overall classification accuracy of 95%, with straight roads achieving a remarkable 97% F1 score and intersections reaching a 90% F1 score. This approach not only saves time and resources but also enables more frequent updates and a comprehensive understanding of the road network. Our research showcases the potential impact on traffic management, urban planning, and autonomous vehicle navigation systems. The fusion of connected vehicle data and deep learning models holds promise for a transformative shift in road infrastructure mapping, propelling us towards a smarter, safer, and more connected transportation ecosystem.
A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control
Authors: Marshall Wang, John Willes, Thomas Jiralerspong, Matin Moezzi
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Reinforcement learning (RL) is a promising approach for optimizing HVAC control. RL offers a framework for improving system performance, reducing energy consumption, and enhancing cost efficiency. We benchmark two popular classical and deep RL methods (Q-Learning and Deep-Q-Networks) across multiple HVAC environments and explore the practical consideration of model hyper-parameter selection and reward tuning. The findings provide insight for configuring RL agents in HVAC systems, promoting energy-efficient and cost-effective operation.
Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction
Authors: Yangyang Xu, Yibo Yang, Bernard Ghanemm, Lefei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL). Most of the current studies on MTL solely rely on CNN or Transformer. In this work, we present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction. This combination may offer a simple and efficient solution owing to its powerful and flexible task-specific learning and advantages of lower cost, less complexity and smaller parameters than the traditional MTL methods. We introduce deformable mixer Transformer with gating (DeMTG), a simple and effective encoder-decoder architecture up-to-date that incorporates the convolution and attention mechanism in a unified network for MTL. It is exquisitely designed to use advantages of each block, and provide deformable and comprehensive features for all tasks from local and global perspective. First, the deformable mixer encoder contains two types of operators: the channel-aware mixing operator leveraged to allow communication among different channels, and the spatial-aware deformable operator with deformable convolution applied to efficiently sample more informative spatial locations. Second, the task-aware gating transformer decoder is used to perform the task-specific predictions, in which task interaction block integrated with self-attention is applied to capture task interaction features, and the task query block integrated with gating attention is leveraged to select corresponding task-specific features. Further, the experiment results demonstrate that the proposed DeMTG uses fewer GFLOPs and significantly outperforms current Transformer-based and CNN-based competitive models on a variety of metrics on three dense prediction datasets. Our code and models are available at https://github.com/yangyangxu0/DeMTG.
Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review
Authors: Steffen Hagedorn, Marcel Hallgarten, Martin Stoll, Alexandru Condurache
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract
Automated driving has the potential to revolutionize personal, public, and freight mobility. Besides the enormous challenge of perception, i.e. accurately perceiving the environment using available sensor data, automated driving comprises planning a safe, comfortable, and efficient motion trajectory. To promote safety and progress, many works rely on modules that predict the future motion of surrounding traffic. Modular automated driving systems commonly handle prediction and planning as sequential separate tasks. While this accounts for the influence of surrounding traffic on the ego-vehicle, it fails to anticipate the reactions of traffic participants to the ego-vehicle's behavior. Recent works suggest that integrating prediction and planning in an interdependent joint step is necessary to achieve safe, efficient, and comfortable driving. While various models implement such integrated systems, a comprehensive overview and theoretical understanding of different principles are lacking. We systematically review state-of-the-art deep learning-based prediction, planning, and integrated prediction and planning models. Different facets of the integration ranging from model architecture and model design to behavioral aspects are considered and related to each other. Moreover, we discuss the implications, strengths, and limitations of different integration methods. By pointing out research gaps, describing relevant future challenges, and highlighting trends in the research field, we identify promising directions for future research.
PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers
Authors: Phillip Lippe, Bastiaan S. Veeling, Paris Perdikaris, Richard E. Turner, Johannes Brandstetter
Abstract
Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.
Zero Grads Ever Given: Learning Local Surrogate Losses for Non-Differentiable Graphics
Abstract
Gradient-based optimization is now ubiquitous across graphics, but unfortunately can not be applied to problems with undefined or zero gradients. To circumvent this issue, the loss function can be manually replaced by a "surrogate" that has similar minima but is differentiable. Our proposed framework, ZeroGrads, automates this process by learning a neural approximation of the objective function, the surrogate, which in turn can be used to differentiate through arbitrary black-box graphics pipelines. We train the surrogate on an actively smoothed version of the objective and encourage locality, focusing the surrogate's capacity on what matters at the current training episode. The fitting is performed online, alongside the parameter optimization, and self-supervised, without pre-computed data or pre-trained models. As sampling the objective is expensive (it requires a full rendering or simulator run), we devise an efficient sampling scheme that allows for tractable run-times and competitive performance at little overhead. We demonstrate optimizing diverse non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling or optimal control in physics-driven animation. In contrast to more traditional algorithms, our approach scales well to higher dimensions, which we demonstrate on problems with up to 35k interlinked variables.
Neural Progressive Meshes
Authors: Yun-Chun Chen, Vladimir G. Kim, Noam Aigerman, Alec Jacobson
Abstract
The recent proliferation of 3D content that can be consumed on hand-held devices necessitates efficient tools for transmitting large geometric data, e.g., 3D meshes, over the Internet. Detailed high-resolution assets can pose a challenge to storage as well as transmission bandwidth, and level-of-detail techniques are often used to transmit an asset using an appropriate bandwidth budget. It is especially desirable for these methods to transmit data progressively, improving the quality of the geometry with more data. Our key insight is that the geometric details of 3D meshes often exhibit similar local patterns even across different shapes, and thus can be effectively represented with a shared learned generative space. We learn this space using a subdivision-based encoder-decoder architecture trained in advance on a large collection of surfaces. We further observe that additional residual features can be transmitted progressively between intermediate levels of subdivision that enable the client to control the tradeoff between bandwidth cost and quality of reconstruction, providing a neural progressive mesh representation. We evaluate our method on a diverse set of complex 3D shapes and demonstrate that it outperforms baselines in terms of compression ratio and reconstruction quality.
Iterative Reweighted Least Squares Networks With Convergence Guarantees for Solving Inverse Imaging Problems
Abstract
In this work we present a novel optimization strategy for image reconstruction tasks under analysis-based image regularization, which promotes sparse and/or low-rank solutions in some learned transform domain. We parameterize such regularizers using potential functions that correspond to weighted extensions of the $\ell_p^p$-vector and $\mathcal{S}_p^p$ Schatten-matrix quasi-norms with $0 < p \le 1$. Our proposed minimization strategy extends the Iteratively Reweighted Least Squares (IRLS) method, typically used for synthesis-based $\ell_p$ and $\mathcal{S}_p$ norm and analysis-based $\ell_1$ and nuclear norm regularization. We prove that under mild conditions our minimization algorithm converges linearly to a stationary point, and we provide an upper bound for its convergence rate. Further, to select the parameters of the regularizers that deliver the best results for the problem at hand, we propose to learn them from training data by formulating the supervised learning process as a stochastic bilevel optimization problem. We show that thanks to the convergence guarantees of our proposed minimization strategy, such optimization can be successfully performed with a memory-efficient implicit back-propagation scheme. We implement our learned IRLS variants as recurrent networks and assess their performance on the challenging image reconstruction tasks of non-blind deblurring, super-resolution and demosaicking. The comparisons against other existing learned reconstruction approaches demonstrate that our overall method is very competitive and in many cases outperforms existing unrolled networks, whose number of parameters is orders of magnitude higher than in our case.
Keyword: faster
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Authors: Benjamin Ramhorst (Imperial College London), George A. Constantinides (Imperial College London), Vladimir Loncar (Massachusetts Institute of Technology)
Abstract
Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. With the ever-increasing need for faster computation and lower power consumption, driven by real-time systems and Internet-of-Things (IoT) devices, FPGAs have emerged as suitable devices for deep learning inference. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning, quantization and knowledge distillation, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and load-balance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. The primary emphasis is on real-time inference, with latencies in the order of 1$\mu$s, accelerated with hls4ml, an open-source framework for deep learning inference on FPGAs. Evaluated on a range of tasks, including real-time particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method achieves a reduction ranging between 55% and 92% in the utilization of digital signal processing blocks (DSP) and up to 81% in block memory (BRAM) utilization.
Federated Online/Offline Remote Data Inspection for Distributed Edge Computing
Authors: Mohammad Ali, Ximeng Liu
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
In edge computing environments, app vendors can cache their data to be shared with their users in many geographically distributed edge servers. However, the cached data is particularly vulnerable to several intentional attacks or unintentional events. Given the limited resources of edge servers and prohibitive storage costs incurred by app vendors, designing an efficient approach to inspect and maintain the data over tremendous edge servers is a critical issue. To tackle the problem, we design a novel data inspection approach, named ${\text{O}^2\text{DI}}$, that provides the following services: i) using ${\text{O}^2\text{DI}}$, app vendors can inspect the data cached in edge servers without having the original data, which reduces the incurred I/O and storage overhead significantly; ii) computational operations conducted by both edge servers and app vendors are highly efficient because of a novel online/offline technique; iii) many data files cached in different edge servers can be verified quickly and at once by using a novel batch verification method; iv) corrupted data in edge servers can be localized and repaired efficiently. We analyze the security and performance of ${\text{O}^2\text{DI}}$. We see that it is secure in the random oracle model, much faster, and more cost-effective compared to state-of-the-art approaches.
Co-movement Pattern Mining from Videos
Authors: Dongxiang Zhang, Teng Ma, Junnan Hu, Yijun Bei, Kian-Lee Tan, Gang Chen
Abstract
Co-movement pattern mining from GPS trajectories has been an intriguing subject in spatial-temporal data mining. In this paper, we extend this research line by migrating the data source from GPS sensors to surveillance cameras, and presenting the first investigation into co-movement pattern mining from videos. We formulate the new problem, re-define the spatial-temporal proximity constraints from cameras deployed in a road network, and theoretically prove its hardness. Due to the lack of readily applicable solutions, we adapt existing techniques and propose two competitive baselines using Apriori-based enumerator and CMC algorithm, respectively. As the principal technical contributions, we introduce a novel index called temporal-cluster suffix tree (TCS-tree), which performs two-level temporal clustering within each camera and constructs a suffix tree from the resulting clusters. Moreover, we present a sequence-ahead pruning framework based on TCS-tree, which allows for the simultaneous leverage of all pattern constraints to filter candidate paths. Finally, to reduce verification cost on the candidate paths, we propose a sliding-window based co-movement pattern enumeration strategy and a hashing-based dominance eliminator, both of which are effective in avoiding redundant operations. We conduct extensive experiments for scalability and effectiveness analysis. Our results validate the efficiency of the proposed index and mining algorithm, which runs remarkably faster than the two baseline methods. Additionally, we construct a video database with 1169 cameras and perform an end-to-end pipeline analysis to study the performance gap between GPS-driven and video-driven methods. Our results demonstrate that the derived patterns from the video-driven approach are similar to those derived from groundtruth trajectories, providing evidence of its effectiveness.
Isolated Scheduling for Distributed Training Tasks in GPU Clusters
Abstract
Distributed machine learning (DML) technology makes it possible to train large neural networks in a reasonable amount of time. Meanwhile, as the computing power grows much faster than network capacity, network communication has gradually become the bottleneck of DML. Current multi-tenant GPU clusters face network contention caused by hash-collision problem which not only further increases the overhead of communication, but also creates unfairness and affects the user experience. In this paper, we firstly analyse how network contention affects the training time in a cluster with 32 NVIDIA V100 GPUs. Then we propose vClos to eliminate network contention by jointly optimizing network topology and communication pattern in distributed training. An OCS-vClos which introduces a layer of optical circuit switches (OCSs) in the leaf-spine network is also proposed to reduce potential network resource fragmentation caused by resource allocation strategy in vClos. Testbed experiments and real-trace-based large-scale simulations are conducted to demonstrate the superiority of vClos over existing network resource scheduling strategies.
Keyword: mobile
High-Level Features Parallelization for Inference Cost Reduction Through Selective Attention
Authors: André Peter Kelm, Lucas Schmidt, Tim Rolff, Christian Wilms, Ehsan Yaghoubi, Simone Frintrop
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this work, we parallelize high-level features in deep networks to selectively skip or select class-specific features to reduce inference costs. This challenges most deep learning methods due to their limited ability to efficiently and effectively focus on selected class-specific features without retraining. We propose a serial-parallel hybrid architecture with serial generic low-level features and parallel high-level features. This accounts for the fact that many high-level features are class-specific rather than generic, and has connections to recent neuroscientific findings that observe spatially and contextually separated neural activations in the human brain. Our approach provides the unique functionality of cutouts: selecting parts of the network to focus on only relevant subsets of classes without requiring retraining. High performance is maintained, but the cost of inference can be significantly reduced. In some of our examples, up to $75\,\%$ of parameters are skipped and $35\,\%$ fewer GMACs (Giga multiply-accumulate) operations are used as the approach adapts to a change in task complexity. This is important for mobile, industrial, and robotic applications where reducing the number of parameters, the computational complexity, and thus the power consumption can be paramount. Another unique functionality is that it allows processing to be directly influenced by enhancing or inhibiting high-level class-specific features, similar to the mechanism of selective attention in the human brain. This can be relevant for cross-modal applications, the use of semantic prior knowledge, and/or context-aware processing.
JutePestDetect: An Intelligent Approach for Jute Pest Identification Using Fine-Tuned Transfer Learning
Authors: Md. Simul Hasan Talukder, Mohammad Raziuddin Chowdhury, Md Sakib Ullah Sourav, Abdullah Al Rakin, Shabbir Ahmed Shuvo, Rejwan Bin Sulaiman, Musarrat Saberin Nipun, Muntarin Islam, Mst Rumpa Islam, Md Aminul Islam, Zubaer Haque
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In certain Asian countries, Jute is one of the primary sources of income and Gross Domestic Product (GDP) for the agricultural sector. Like many other crops, Jute is prone to pest infestations, and its identification is typically made visually in countries like Bangladesh, India, Myanmar, and China. In addition, this method is time-consuming, challenging, and somewhat imprecise, which poses a substantial financial risk. To address this issue, the study proposes a high-performing and resilient transfer learning (TL) based JutePestDetect model to identify jute pests at the early stage. Firstly, we prepared jute pest dataset containing 17 classes and around 380 photos per pest class, which were evaluated after manual and automatic pre-processing and cleaning, such as background removal and resizing. Subsequently, five prominent pre-trained models -DenseNet201, InceptionV3, MobileNetV2, VGG19, and ResNet50 were selected from a previous study to design the JutePestDetect model. Each model was revised by replacing the classification layer with a global average pooling layer and incorporating a dropout layer for regularization. To evaluate the models performance, various metrics such as precision, recall, F1 score, ROC curve, and confusion matrix were employed. These analyses provided additional insights for determining the efficacy of the models. Among them, the customized regularized DenseNet201-based proposed JutePestDetect model outperformed the others, achieving an impressive accuracy of 99%. As a result, our proposed method and strategy offer an enhanced approach to pest identification in the case of Jute, which can significantly benefit farmers worldwide.
DCM: A Developers Certification Model for Mobile Ecosystems
Authors: Paulo Trezentos, Ricardo Capote, Tiago Teodoro, João Carneiro
Abstract
This article introduces a distributed model of trust for app developers in Android and iOS mobile ecosystems. The model aims to allow the co-existence of multiple app stores and distribution channels while retaining a high level of safety for mobile device users and minimum changes to current mobile operating systems. The Developers Certification Model (DCM) is a trust model for Android and iOS that aims to distinguish legit applications from security threats to user safeness by answering the question: "is the developer of this app trustable"? It proposes security by design, where safety relies on a chain of trust mapping real-world levels of trust across organizations. For the technical implementation, DCM is heavily inspired by SSL/TLS certification protocol, as a proven model that has been working for over 30 years.
Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR
Authors: Changkun Liu, Yukun Zhao, Tristan Braud
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Visual Inertial Odometry (VIO) is an essential component of modern Augmented Reality (AR) applications. However, VIO only tracks the relative pose of the device, leading to drift over time. Absolute pose estimation methods infer the device's absolute pose, but their accuracy depends on the input quality. This paper introduces VIO-APR, a new framework for markerless mobile AR that combines an absolute pose regressor (APR) with a local VIO tracking system. VIO-APR uses VIO to assess the reliability of the APR and the APR to identify and compensate for VIO drift. This feedback loop results in more accurate positioning and more stable AR experiences. To evaluate VIO-APR, we created a dataset that combines camera images with ARKit's VIO system output for six indoor and outdoor scenes of various scales. Over this dataset, VIO-APR improves the median accuracy of popular APR by up to 36\% in position and 29\% in orientation, increases the percentage of frames in the high ($0.25 m, 2^{\circ}$) accuracy level by up to 112\% and reduces the percentage of frames predicted below the low ($5 m, 10^\circ$) accuracy greatly. We implement VIO-APR into a mobile AR application using Unity to demonstrate its capabilities. VIO-APR results in noticeably more accurate localization and a more stable overall experience.
Learning Gabor Texture Features for Fine-Grained Recognition
Authors: Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, Jun Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Extracting and using class-discriminative features is critical for fine-grained recognition. Existing works have demonstrated the possibility of applying deep CNNs to exploit features that distinguish similar classes. However, CNNs suffer from problems including frequency bias and loss of detailed local information, which restricts the performance of recognizing fine-grained categories. To address the challenge, we propose a novel texture branch as complimentary to the CNN branch for feature extraction. We innovatively utilize Gabor filters as a powerful extractor to exploit texture features, motivated by the capability of Gabor filters in effectively capturing multi-frequency features and detailed local information. We implement several designs to enhance the effectiveness of Gabor filters, including imposing constraints on parameter values and developing a learning method to determine the optimal parameters. Moreover, we introduce a statistical feature extractor to utilize informative statistical information from the signals captured by Gabor filters, and a gate selection mechanism to enable efficient computation by only considering qualified regions as input for texture extraction. Through the integration of features from the Gabor-filter-based texture branch and CNN-based semantic branch, we achieve comprehensive information extraction. We demonstrate the efficacy of our method on multiple datasets, including CUB-200-2011, NA-bird, Stanford Dogs, and GTOS-mobile. State-of-the-art performance is achieved using our approach.
KS-APR: Keyframe Selection for Robust Absolute Pose Regression
Authors: Changkun Liu, Yukun Zhao, Tristan Braud
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Markerless Mobile Augmented Reality (AR) aims to anchor digital content in the physical world without using specific 2D or 3D objects. Absolute Pose Regressors (APR) are end-to-end machine learning solutions that infer the device's pose from a single monocular image. Thanks to their low computation cost, they can be directly executed on the constrained hardware of mobile AR devices. However, APR methods tend to yield significant inaccuracies for input images that are too distant from the training set. This paper introduces KS-APR, a pipeline that assesses the reliability of an estimated pose with minimal overhead by combining the inference results of the APR and the prior images in the training set. Mobile AR systems tend to rely upon visual-inertial odometry to track the relative pose of the device during the experience. As such, KS-APR favours reliability over frequency, discarding unreliable poses. This pipeline can integrate most existing APR methods to improve accuracy by filtering unreliable images with their pose estimates. We implement the pipeline on three types of APR models on indoor and outdoor datasets. The median error on position and orientation is reduced for all models, and the proportion of large errors is minimized across datasets. Our method enables state-of-the-art APRs such as DFNetdm to outperform single-image and sequential APR methods. These results demonstrate the scalability and effectiveness of KS-APR for visual localization tasks that do not require one-shot decisions.
Robust Lifelong Indoor LiDAR Localization using the Area Graph
Abstract
Lifelong indoor localization in a given map is the basis for navigation of autonomous mobile robots. In this letter, we address the problem of robust localization in cluttered indoor environments like office spaces and corridors using 3D LiDAR point clouds in a given Area Graph, which is a hierarchical, topometric semantic map representation that uses polygons to demark areas such as rooms, corridors or buildings. This representation is very compact, can represent different floors of buildings through its hierarchy and provides semantic information that helps with localization, like poses of doors and glass. In contrast to this, commonly used map representations, such as occupancy grid maps or point clouds, lack these features and require frequent updates in response to environmental changes (e.g. moved furniture), unlike our approach, which matches against lifelong architectural features such as walls and doors. For that we apply filtering to remove clutter from the 3D input point cloud and then employ further scoring and weight functions for localization. Given a broad initial guess from WiFi localization, our experiments show that our global localization and the weighted point to line ICP pose tracking perform very well, even when compared to localization and SLAM algorithms that use the current, feature-rich cluttered map for localization.
MobiScout: A Scalable Cloud-Based Driving and Activity Monitoring Platform Featuring an IOS App and a WatchOS Extension
Abstract
MobiScout is an iOS software that monitors users' driving habits and physiological conditions while on the road. The Mobiscout app was created to provide a low-cost next-generation data collection and analysis solution for naturalistic driving studies. MobiScout collects real-time data, including physiological information from drivers in their normal driving conditions using sensors and cameras on mobile phones, smartwatches, and Bluetooth-enabled OBD equipment. The MobiScout software captures vehicle and driving data, including speed, braking, pulse rate, and acceleration, while the phone's camera captures everything inside and outside the car. Data captured can be streamed to cloud storage in real-time or persisted in local storage in WIFI dead zones. The information gathered will be studied further to better understand typical traffic behavior, performance, surroundings, and driving context among drivers.
Keyword: pruning
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Authors: Benjamin Ramhorst (Imperial College London), George A. Constantinides (Imperial College London), Vladimir Loncar (Massachusetts Institute of Technology)
Abstract
Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. With the ever-increasing need for faster computation and lower power consumption, driven by real-time systems and Internet-of-Things (IoT) devices, FPGAs have emerged as suitable devices for deep learning inference. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning, quantization and knowledge distillation, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and load-balance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. The primary emphasis is on real-time inference, with latencies in the order of 1$\mu$s, accelerated with hls4ml, an open-source framework for deep learning inference on FPGAs. Evaluated on a range of tasks, including real-time particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method achieves a reduction ranging between 55% and 92% in the utilization of digital signal processing blocks (DSP) and up to 81% in block memory (BRAM) utilization.
Co-movement Pattern Mining from Videos
Authors: Dongxiang Zhang, Teng Ma, Junnan Hu, Yijun Bei, Kian-Lee Tan, Gang Chen
Abstract
Co-movement pattern mining from GPS trajectories has been an intriguing subject in spatial-temporal data mining. In this paper, we extend this research line by migrating the data source from GPS sensors to surveillance cameras, and presenting the first investigation into co-movement pattern mining from videos. We formulate the new problem, re-define the spatial-temporal proximity constraints from cameras deployed in a road network, and theoretically prove its hardness. Due to the lack of readily applicable solutions, we adapt existing techniques and propose two competitive baselines using Apriori-based enumerator and CMC algorithm, respectively. As the principal technical contributions, we introduce a novel index called temporal-cluster suffix tree (TCS-tree), which performs two-level temporal clustering within each camera and constructs a suffix tree from the resulting clusters. Moreover, we present a sequence-ahead pruning framework based on TCS-tree, which allows for the simultaneous leverage of all pattern constraints to filter candidate paths. Finally, to reduce verification cost on the candidate paths, we propose a sliding-window based co-movement pattern enumeration strategy and a hashing-based dominance eliminator, both of which are effective in avoiding redundant operations. We conduct extensive experiments for scalability and effectiveness analysis. Our results validate the efficiency of the proposed index and mining algorithm, which runs remarkably faster than the two baseline methods. Additionally, we construct a video database with 1169 cameras and perform an end-to-end pipeline analysis to study the performance gap between GPS-driven and video-driven methods. Our results demonstrate that the derived patterns from the video-driven approach are similar to those derived from groundtruth trajectories, providing evidence of its effectiveness.
Keyword: diffusion
PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions
Abstract
While diffusion-based text-to-image (T2I) models provide a simple and powerful way to generate images, guiding this generation remains a challenge. For concepts that are difficult to describe through language, users may struggle to create prompts. Moreover, many of these models are built as end-to-end systems, lacking support for iterative shaping of the image. In response, we introduce PromptPaint, which combines T2I generation with interactions that model how we use colored paints. PromptPaint allows users to go beyond language to mix prompts that express challenging concepts. Just as we iteratively tune colors through layered placements of paint on a physical canvas, PromptPaint similarly allows users to apply different prompts to different canvas areas and times of the generative process. Through a set of studies, we characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models. With PromptPaint we provide insight into future steerable generative tools.
A general fourth-order mesoscopic multiple-relaxation-time lattice Boltzmann model and equivalent macroscopic finite-difference scheme for two-dimensional diffusion equations
Authors: Ying Chen, Zhenhua Chai, Baochang Shi
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
In this work, we first develop a general mesoscopic multiple-relaxation-time lattice Boltzmann (MRT-LB) model for the two-dimensional diffusion equation with the constant diffusion coefficient and source term, where the D2Q5 (five discrete velocities in two-dimensional space) lattice structure is considered. Then we exactly derive the equivalent macroscopic finite-difference scheme of the MRT-LB model. Additionally, we also propose a proper MRT-LB model for the diffusion equation with a linear source term, and obtain an equivalent macroscopic six-level finite-difference scheme. After that, we conduct the accuracy and stability analysis of the finite-difference scheme and the mesoscopic MRT-LB model. It is found that at the diffusive scaling, both of them can achieve a fourth-order accuracy in space based on the Taylor expansion. The stability analysis also shows that they are both unconditionally stable. Finally, some numerical experiments are conducted, and the numerical results are also consistent with our theoretical analysis.
Investigating disaster response through social media data and the Susceptible-Infected-Recovered (SIR) model: A case study of 2020 Western U.S. wildfire season
Authors: Zihui Ma, Lingyao Li, Libby Hemphill, Gregory B. Baecher
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Effective disaster response is critical for affected communities. Responders and decision-makers would benefit from reliable, timely measures of the issues impacting their communities during a disaster, and social media offers a potentially rich data source. Social media can reflect public concerns and demands during a disaster, offering valuable insights for decision-makers to understand evolving situations and optimize resource allocation. We used Bidirectional Encoder Representations from Transformers (BERT) topic modeling to cluster topics from Twitter data. Then, we conducted a temporal-spatial analysis to examine the distribution of these topics across different regions during the 2020 western U.S. wildfire season. Our results show that Twitter users mainly focused on three topics:"health impact," "damage," and "evacuation." We used the Susceptible-Infected-Recovered (SIR) theory to explore the magnitude and velocity of topic diffusion on Twitter. The results displayed a clear relationship between topic trends and wildfire propagation patterns. The estimated parameters obtained from the SIR model in selected cities revealed that residents exhibited a high level of several concerns during the wildfire. Our study details how the SIR model and topic modeling using social media data can provide decision-makers with a quantitative approach to measure disaster response and support their decision-making processes.
Beyond Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization
Authors: Hongyang Du, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Yijing Lin, Zonghang Li, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shuguang Cui, Bo Ai, Haibo Zhou, Dong In Kim
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Generative Diffusion Models (GDMs) have emerged as a transformative force in the realm of Generative Artificial Intelligence (GAI), demonstrating their versatility and efficacy across a variety of applications. The ability to model complex data distributions and generate high-quality samples has made GDMs particularly effective in tasks such as image generation and reinforcement learning. Furthermore, their iterative nature, which involves a series of noise addition and denoising steps, is a powerful and unique approach to learning and generating data. This paper serves as a comprehensive tutorial on applying GDMs in network optimization tasks. We delve into the strengths of GDMs, emphasizing their wide applicability across various domains, such as vision, text, and audio generation.We detail how GDMs can be effectively harnessed to solve complex optimization problems inherent in networks. The paper first provides a basic background of GDMs and their applications in network optimization. This is followed by a series of case studies, showcasing the integration of GDMs with Deep Reinforcement Learning (DRL), incentive mechanism design, Semantic Communications (SemCom), Internet of Vehicles (IoV) networks, etc. These case studies underscore the practicality and efficacy of GDMs in real-world scenarios, offering insights into network design. We conclude with a discussion on potential future directions for GDM research and applications, providing major insights into how they can continue to shape the future of network optimization.
An Adaptive Algorithm Based on Stochastic Discontinuous Galerkin for Convection Dominated Equations with Random Data
Abstract
In this paper, we propose an adaptive approach, based on mesh refinement or parametric enrichment, for convection diffusion equations containing randomness in their coefficients. A parametric system of convection diffusion equations obtained by an application of stochastic Galerkin approach is discretized by using a symmetric interior penalty Galerkin (SIPG) method with upwinding for the convection term in the spatial domain. We show the reliability of the proposed residual-based error estimator in the energy norm contributed by the error due to the SIPG discretization, the error due to the data oscillations, and the error due to the (generalized) polynomial chaos discretization in the parametric space. To illustrate the performance of the proposed estimator, several benchmark examples including a random diffusivity parameter, a random velocity parameter, random diffusivity/velocity parameters, and a random (jump) discontinuous diffusivity parameter, are tested.
Bijective Density-Equalizing Quasiconformal Map for Multiply-Connected Open Surfaces
Authors: Zhiyuan Lyu, Gary P. T. Choi, Lok Ming Lui
Abstract
This paper proposes a novel method for computing bijective density-equalizing quasiconformal (DEQ) flattening maps for multiply-connected open surfaces. In conventional density-equalizing maps, shape deformations are solely driven by prescribed constraints on the density distribution, defined as the population per unit area, while the bijectivity and local geometric distortions of the mappings are uncontrolled. Also, prior methods have primarily focused on simply-connected open surfaces but not surfaces with more complicated topologies. Our proposed method overcomes these issues by formulating the density diffusion process as a quasiconformal flow, which allows us to effectively control the local geometric distortion and guarantee the bijectivity of the mapping by solving an energy minimization problem involving the Beltrami coefficient of the mapping. To achieve an optimal parameterization of multiply-connected surfaces, we develop an iterative scheme that optimizes both the shape of the target planar circular domain and the density-equalizing quasiconformal map onto it. In addition, landmark constraints can be incorporated into our proposed method for consistent feature alignment. The method can also be naturally applied to simply-connected open surfaces. By changing the prescribed population, a large variety of surface flattening maps with different desired properties can be achieved. The method is tested on both synthetic and real examples, demonstrating its efficacy in various applications in computer graphics and medical imaging.
Generative Diffusion Models for Radio Wireless Channel Modelling and Sampling
Abstract
Channel modelling is essential to designing modern wireless communication systems. The increasing complexity of channel modelling and the cost of collecting high-quality wireless channel data have become major challenges. In this paper, we propose a diffusion model based channel sampling approach for rapidly synthesizing channel realizations from limited data. We use a diffusion model with a U Net based architecture operating in the frequency space domain. To evaluate how well the proposed model reproduces the true distribution of channels in the training dataset, two evaluation metrics are used: $i)$ the approximate $2$-Wasserstein distance between real and generated distributions of the normalized power spectrum in the antenna and frequency domains and $ii)$ precision and recall metric for distributions. We show that, compared to existing GAN based approaches which suffer from mode collapse and unstable training, our diffusion based approach trains stably and generates diverse and high-fidelity samples from the true channel distribution. We also show that we can pretrain the model on a simulated urban macro-cellular channel dataset and fine-tune it on a smaller, out-of-distribution urban micro-cellular dataset, therefore showing that it is feasible to model real world channels using limited data with this approach.
Masked Diffusion as Self-supervised Representation Learner
Authors: Zixuan Pan, Jianxu Chen, Yiyu Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Denoising diffusion probabilistic models have recently demonstrated state-of-the-art generative performance and been used as strong pixel-level representation learners. This paper decomposes the interrelation between the generative capability and representation learning ability inherent in diffusion models. We present masked diffusion model (MDM), a scalable self-supervised representation learner that substitutes the conventional additive Gaussian noise of traditional diffusion with a masking mechanism. Our proposed approach convincingly surpasses prior benchmarks, demonstrating remarkable advancements in both medical and natural image semantic segmentation tasks, particularly within the context of few-shot scenario.
PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers
Authors: Phillip Lippe, Bastiaan S. Veeling, Paris Perdikaris, Richard E. Turner, Johannes Brandstetter
Abstract
Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Authors: Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Abstract
Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called language of audio (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate new state-of-the-art or competitive performance to previous approaches. Our demo and code are available at https://audioldm.github.io/audioldm2.
Keyword: adaptive
Adv-Inpainting: Generating Natural and Transferable Adversarial Patch via Attention-guided Feature Fusion
Authors: Yanjie Li, Mingxing Duan, Bin Xiao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
The rudimentary adversarial attacks utilize additive noise to attack facial recognition (FR) models. However, because manipulating the total face is impractical in the physical setting, most real-world FR attacks are based on adversarial patches, which limit perturbations to a small area. Previous adversarial patch attacks often resulted in unnatural patterns and clear boundaries that were easily noticeable. In this paper, we argue that generating adversarial patches with plausible content can result in stronger transferability than using additive noise or directly sampling from the latent space. To generate natural-looking and highly transferable adversarial patches, we propose an innovative two-stage coarse-to-fine attack framework called Adv-Inpainting. In the first stage, we propose an attention-guided StyleGAN (Att-StyleGAN) that adaptively combines texture and identity features based on the attention map to generate high-transferable and natural adversarial patches. In the second stage, we design a refinement network with a new boundary variance loss to further improve the coherence between the patch and its surrounding area. Experiment results demonstrate that Adv-Inpainting is stealthy and can produce adversarial patches with stronger transferability and improved visual quality than previous adversarial patch attacks.
Match-based solution of general parametric eigenvalue problems
Authors: Davide Pradovera, Alessandro Borghi
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)
Abstract
We describe a novel algorithm for solving general parametric (nonlinear) eigenvalue problems. Our method has two steps: first, high-accuracy solutions of non-parametric versions of the problem are gathered at some values of the parameters; these are then combined to obtain global approximations of the parametric eigenvalues. To gather the non-parametric data, we use non-intrusive contour-integration-based methods, which, however, cannot track eigenvalues that migrate into/out of the contour as the parameter changes. Special strategies are described for performing the combination-over-parameter step despite having only partial information on such "migrating" eigenvalues. Moreover, we dedicate a special focus to the approximation of eigenvalues that undergo bifurcations. Finally, we propose an adaptive strategy that allows one to effectively apply our method even without any a priori information on the behavior of the sought-after eigenvalues. Numerical tests are performed, showing that our algorithm can achieve remarkably high approximation accuracy.
Flexible Isosurface Extraction for Gradient-Based Mesh Optimization
Authors: Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, Jun Gao
Abstract
This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics. Existing implementations adapt classic isosurface extraction algorithms like Marching Cubes or Dual Contouring; these techniques were designed to extract meshes from fixed, known fields, and in the optimization setting they lack the degrees of freedom to represent high-quality feature-preserving meshes, or suffer from numerical instabilities. We introduce FlexiCubes, an isosurface representation specifically designed for optimizing an unknown mesh with respect to geometric, visual, or even physical objectives. Our main insight is to introduce additional carefully-chosen parameters into the representation, which allow local flexible adjustments to the extracted mesh geometry and connectivity. These parameters are updated along with the underlying scalar field via automatic differentiation when optimizing for a downstream task. We base our extraction scheme on Dual Marching Cubes for improved topological properties, and present extensions to optionally generate tetrahedral and hierarchically-adaptive meshes. Extensive experiments validate FlexiCubes on both synthetic benchmarks and real-world applications, showing that it offers significant improvements in mesh quality and geometric fidelity.
Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification
Authors: Tao Zou, Le Yu, Leilei Sun, Bowen Du, Deqing Wang, Fuzhen Zhuang
Abstract
Patent classification aims to assign multiple International Patent Classification (IPC) codes to a given patent. Recent methods for automatically classifying patents mainly focus on analyzing the text descriptions of patents. However, apart from the texts, each patent is also associated with some assignees, and the knowledge of their applied patents is often valuable for classification. Furthermore, the hierarchical taxonomy formulated by the IPC system provides important contextual information and enables models to leverage the correlations between IPC codes for more accurate classification. However, existing methods fail to incorporate the above aspects. In this paper, we propose an integrated framework that comprehensively considers the information on patents for patent classification. To be specific, we first present an IPC codes correlations learning module to derive their semantic representations via adaptively passing and aggregating messages within the same level and across different levels along the hierarchical taxonomy. Moreover, we design a historical application patterns learning component to incorporate the corresponding assignee's previous patents by a dual channel aggregation mechanism. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions. Experiments on real-world datasets demonstrate the superiority of our approach over the existing methods. Besides, we present the model's ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.
Adaptive Low Rank Adaptation of Segment Anything to Salient Object Detection
Authors: Ruikai Cui, Siyuan He, Shi Qiu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Foundation models, such as OpenAI's GPT-3 and GPT-4, Meta's LLaMA, and Google's PaLM2, have revolutionized the field of artificial intelligence. A notable paradigm shift has been the advent of the Segment Anything Model (SAM), which has exhibited a remarkable capability to segment real-world objects, trained on 1 billion masks and 11 million images. Although SAM excels in general object segmentation, it lacks the intrinsic ability to detect salient objects, resulting in suboptimal performance in this domain. To address this challenge, we present the Segment Salient Object Model (SSOM), an innovative approach that adaptively fine-tunes SAM for salient object detection by harnessing the low-rank structure inherent in deep learning. Comprehensive qualitative and quantitative evaluations across five challenging RGB benchmark datasets demonstrate the superior performance of our approach, surpassing state-of-the-art methods.
Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
Abstract
Speech-driven 3D face animation poses significant challenges due to the intricacy and variability inherent in human facial movements. This paper emphasizes the importance of considering both the composite and regional natures of facial movements in speech-driven 3D face animation. The composite nature pertains to how speech-independent factors globally modulate speech-driven facial movements along the temporal dimension. Meanwhile, the regional nature alludes to the notion that facial movements are not globally correlated but are actuated by local musculature along the spatial dimension. It is thus indispensable to incorporate both natures for engendering vivid animation. To address the composite nature, we introduce an adaptive modulation module that employs arbitrary facial movements to dynamically adjust speech-driven facial movements across frames on a global scale. To accommodate the regional nature, our approach ensures that each constituent of the facial features for every frame focuses on the local spatial movements of 3D faces. Moreover, we present a non-autoregressive backbone for translating audio to 3D facial movements, which maintains high-frequency nuances of facial movements and facilitates efficient inference. Comprehensive experiments and user studies demonstrate that our method surpasses contemporary state-of-the-art approaches both qualitatively and quantitatively.
Occupancy Grid Map to Pose Graph-based Map: Robust BIM-based 2D-LiDAR Localization for Lifelong Indoor Navigation in Changing and Dynamic Environments
Authors: Miguel Arturo Vega Torres, Alexander Braun, André Borrmann
Abstract
Several studies rely on the de facto standard Adaptive Monte Carlo Localization (AMCL) method to localize a robot in an Occupancy Grid Map (OGM) extracted from a building information model (BIM model). However, most of these studies assume that the BIM model precisely represents the real world, which is rarely true. Discrepancies between the reference BIM model and the real world (Scan-BIM deviations) are not only due to furniture or clutter but also the usual as-planned and as-built deviations that exist with any model created in the design phase. These deviations affect the accuracy of AMCL drastically. This paper proposes an open-source method to generate appropriate Pose Graph-based maps from BIM models for robust 2D-LiDAR localization in changing and dynamic environments. First, 2D OGMs are automatically generated from complex BIM models. These OGMs only represent structural elements allowing indoor autonomous robot navigation. Then, an efficient technique converts these 2D OGMs into Pose Graph-based maps enabling more accurate robot pose tracking. Finally, we leverage the different map representations for accurate, robust localization with a combination of state-of-the-art algorithms. Moreover, we provide a quantitative comparison of various state-of-the-art localization algorithms in three simulated scenarios with varying levels of Scan-BIM deviations and dynamic agents. More precisely, we compare two Particle Filter (PF) algorithms: AMCL and General Monte Carlo Localization (GMCL); and two Graph-based Localization (GBL) methods: Google's Cartographer and SLAM Toolbox, solving the global localization and pose tracking problems. The numerous experiments demonstrate that the proposed method contributes to a robust localization with an as-designed BIM model or a sparse OGM in changing and dynamic environments, outperforming the conventional AMCL in accuracy and robustness.
A Generalized Physical-knowledge-guided Dynamic Model for Underwater Image Enhancement
Authors: Pan Mu, Hanning Xu, Zheyuan Liu, Zheng Wang, Sixian Chan, Cong Bai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Underwater images often suffer from color distortion and low contrast resulting in various image types, due to the scattering and absorption of light by water. While it is difficult to obtain high-quality paired training samples with a generalized model. To tackle these challenges, we design a Generalized Underwater image enhancement method via a Physical-knowledge-guided Dynamic Model (short for GUPDM), consisting of three parts: Atmosphere-based Dynamic Structure (ADS), Transmission-guided Dynamic Structure (TDS), and Prior-based Multi-scale Structure (PMS). In particular, to cover complex underwater scenes, this study changes the global atmosphere light and the transmission to simulate various underwater image types (e.g., the underwater image color ranging from yellow to blue) through the formation model. We then design ADS and TDS that use dynamic convolutions to adaptively extract prior information from underwater images and generate parameters for PMS. These two modules enable the network to select appropriate parameters for various water types adaptively. Besides, the multi-scale feature extraction module in PMS uses convolution blocks with different kernel sizes and obtains weights for each feature map via channel attention block and fuses them to boost the receptive field of the network. The source code will be available at \href{https://github.com/shiningZZ/GUPDM}{https://github.com/shiningZZ/GUPDM}.
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
Abstract
Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.
An Adaptive Algorithm Based on Stochastic Discontinuous Galerkin for Convection Dominated Equations with Random Data
Abstract
In this paper, we propose an adaptive approach, based on mesh refinement or parametric enrichment, for convection diffusion equations containing randomness in their coefficients. A parametric system of convection diffusion equations obtained by an application of stochastic Galerkin approach is discretized by using a symmetric interior penalty Galerkin (SIPG) method with upwinding for the convection term in the spatial domain. We show the reliability of the proposed residual-based error estimator in the energy norm contributed by the error due to the SIPG discretization, the error due to the data oscillations, and the error due to the (generalized) polynomial chaos discretization in the parametric space. To illustrate the performance of the proposed estimator, several benchmark examples including a random diffusivity parameter, a random velocity parameter, random diffusivity/velocity parameters, and a random (jump) discontinuous diffusivity parameter, are tested.
Critical Points ++: An Agile Point Cloud Importance Measure for Robust Classification, Adversarial Defense and Explainable AI
Authors: Meir Yossef Levi, Guy Gilboa
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
The ability to cope accurately and fast with Out-Of-Distribution (OOD) samples is crucial in real-world safety demanding applications. In this work we first study the interplay between critical points of 3D point clouds and OOD samples. Our findings are that common corruptions and outliers are often interpreted as critical points. We generalize the notion of critical points into importance measures. We show that training a classification network based only on less important points dramatically improves robustness, at a cost of minor performance loss on the clean set. We observe that normalized entropy is highly informative for corruption analysis. An adaptive threshold based on normalized entropy is suggested for selecting the set of uncritical points. Our proposed importance measure is extremely fast to compute. We show it can be used for a variety of applications, such as Explainable AI (XAI), Outlier Removal, Uncertainty Estimation, Robust Classification and Adversarial Defense. We reach SOTA results on the two latter tasks.
Keyword: quantization
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Authors: Benjamin Ramhorst (Imperial College London), George A. Constantinides (Imperial College London), Vladimir Loncar (Massachusetts Institute of Technology)
Abstract
Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. With the ever-increasing need for faster computation and lower power consumption, driven by real-time systems and Internet-of-Things (IoT) devices, FPGAs have emerged as suitable devices for deep learning inference. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning, quantization and knowledge distillation, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and load-balance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. The primary emphasis is on real-time inference, with latencies in the order of 1$\mu$s, accelerated with hls4ml, an open-source framework for deep learning inference on FPGAs. Evaluated on a range of tasks, including real-time particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method achieves a reduction ranging between 55% and 92% in the utilization of digital signal processing blocks (DSP) and up to 81% in block memory (BRAM) utilization.
Vector quantization loss analysis in VQGANs: a single-GPU ablation study for image-to-image synthesis
Authors: Luv Verma, Varun Mohan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This study performs an ablation analysis of Vector Quantized Generative Adversarial Networks (VQGANs), concentrating on image-to-image synthesis utilizing a single NVIDIA A100 GPU. The current work explores the nuanced effects of varying critical parameters including the number of epochs, image count, and attributes of codebook vectors and latent dimensions, specifically within the constraint of limited resources. Notably, our focus is pinpointed on the vector quantization loss, keeping other hyperparameters and loss components (GAN loss) fixed. This was done to delve into a deeper understanding of the discrete latent space, and to explore how varying its size affects the reconstruction. Though, our results do not surpass the existing benchmarks, however, our findings shed significant light on VQGAN's behaviour for a smaller dataset, particularly concerning artifacts, codebook size optimization, and comparative analysis with Principal Component Analysis (PCA). The study also uncovers the promising direction by introducing 2D positional encodings, revealing a marked reduction in artifacts and insights into balancing clarity and overfitting.
NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search
Authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models quantization, namely, non-uniform quantization. NUPES leverages automorphisms to preserve the scalar multiplications. Such transformations are derived from power functions. However, the optimization of the exponent parameter and weight values remains a challenging and novel problem which could not be solved with previous post training optimization techniques which only learn to round up or down weight values in order to preserve the predictive function. We circumvent this limitation with a new paradigm: learning new quantized weights over the entire quantized space. Similarly, we enable the optimization of the power exponent, i.e. the optimization of the quantization operator itself during training by alleviating all the numerical instabilities. The resulting predictive function is compatible with integer-only low-bit inference. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.
ReLU and Addition-based Gated RNN
Authors: Rickard Brännvall, Henrik Forsgren, Fredrik Sandin, Marcus Liwicki
Abstract
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost, thereby opening up for more efficient execution or larger models on restricted hardware. Recurrent Neural Networks (RNNs) with gating mechanisms such as LSTM and GRU have been widely successful in learning from sequential data due to their ability to capture long-term dependencies. Conventionally, the update based on current inputs and the previous state history is each multiplied with dynamic weights and combined to compute the next state. However, multiplication can be computationally expensive, especially for certain hardware architectures or alternative arithmetic systems such as homomorphic encryption. It is demonstrated that the novel gating mechanism can capture long-term dependencies for a standard synthetic sequence learning task while significantly reducing computational costs such that execution time is reduced by half on CPU and by one-third under encryption. Experimental results on handwritten text recognition tasks furthermore show that the proposed architecture can be trained to achieve comparable accuracy to conventional GRU and LSTM baselines. The gating mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the multiplication of encrypted variables. It can also support quantization in (unencrypted) plaintext applications, with the potential for substantial performance gains since the addition-based formulation can avoid the expansion to double precision often required for multiplication.
Keyword: efficient
Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface
High-Level Features Parallelization for Inference Cost Reduction Through Selective Attention
Federated Online/Offline Remote Data Inspection for Distributed Edge Computing
Decoding Layer Saliency in Language Transformers
Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping
AI-Enabled Software and System Architecture Frameworks: Focusing on smart Cyber-Physical Systems (CPS)
Advancing Early Detection of Virus Yellows: Developing a Hybrid Convolutional Neural Network for Automatic Aphid Counting in Sugar Beet Fields
Output-feedback model predictive and model-free control for ramp metering
A general fourth-order mesoscopic multiple-relaxation-time lattice Boltzmann model and equivalent macroscopic finite-difference scheme for two-dimensional diffusion equations
Bayesian Meta-Learning on Control Barrier Functions with Data from On-Board Sensors
3D Quasiconformal Representation and Solver
FocusFlow: Leveraging Focal Depth for Gaze Interaction in Virtual Reality
Machine Learning aided Computer Architecture Design for CNN Inferencing Systems
Product Review Image Ranking for Fashion E-commerce
Learning Gabor Texture Features for Fine-Grained Recognition
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation
Occupancy Grid Map to Pose Graph-based Map: Robust BIM-based 2D-LiDAR Localization for Lifelong Indoor Navigation in Changing and Dynamic Environments
Scheduling for Periodic Multi-Source Systems with Peak-Age Violation Guarantees
$\mathcal{G}^2Pxy$: Generative Open-Set Node Classification on Graphs with Proxy Unknowns
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
LLM As DBA
Quality Diversity under Sparse Reward and Sparse Interaction: Application to Grasping in Robotics
Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
An Adaptive Algorithm Based on Stochastic Discontinuous Galerkin for Convection Dominated Equations with Random Data
Effects of Dynamic and Stochastic Travel Times on the Operation of Mobility-on-Demand Services
Testing Updated Apps by Adapting Learned Models
Accountability of Things: Large-Scale Tamper-Evident Logging for Smart Devices
Using Machine Learning To Identify Software Weaknesses From Software Requirement Specifications
Bijective Density-Equalizing Quasiconformal Map for Multiply-Connected Open Surfaces
Banzhaf Values for Facts in Query Answering
Optimizing Cache Content Placement in Integrated Terrestrial and Non-terrestrial Networks
LASIGE and UNICAGE solution to the NASA LitCoin NLP Competition
CoBaIR: A Python Library for Context-Based Intention Recognition in Human-Robot-Interaction
ReLU and Addition-based Gated RNN
The Fast and the Private: Task-based Dataset Search
AST-MHSA : Code Summarization using Multi-Head Self-Attention
ESBMC v7.3: Model Checking C++ Programs using Clang AST
Efficient Function Approximation in Enriched Approximation Spaces
Automatic Extraction of Relevant Road Infrastructure using Connected vehicle data and Deep Learning Model
A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control
Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction
Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review
PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers
Zero Grads Ever Given: Learning Local Surrogate Losses for Non-Differentiable Graphics
Neural Progressive Meshes
Iterative Reweighted Least Squares Networks With Convergence Guarantees for Solving Inverse Imaging Problems
Keyword: faster
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Federated Online/Offline Remote Data Inspection for Distributed Edge Computing
Co-movement Pattern Mining from Videos
Isolated Scheduling for Distributed Training Tasks in GPU Clusters
Keyword: mobile
High-Level Features Parallelization for Inference Cost Reduction Through Selective Attention
JutePestDetect: An Intelligent Approach for Jute Pest Identification Using Fine-Tuned Transfer Learning
DCM: A Developers Certification Model for Mobile Ecosystems
Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR
Learning Gabor Texture Features for Fine-Grained Recognition
KS-APR: Keyframe Selection for Robust Absolute Pose Regression
Robust Lifelong Indoor LiDAR Localization using the Area Graph
MobiScout: A Scalable Cloud-Based Driving and Activity Monitoring Platform Featuring an IOS App and a WatchOS Extension
Keyword: pruning
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Co-movement Pattern Mining from Videos
Keyword: diffusion
PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions
A general fourth-order mesoscopic multiple-relaxation-time lattice Boltzmann model and equivalent macroscopic finite-difference scheme for two-dimensional diffusion equations
Investigating disaster response through social media data and the Susceptible-Infected-Recovered (SIR) model: A case study of 2020 Western U.S. wildfire season
Beyond Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization
An Adaptive Algorithm Based on Stochastic Discontinuous Galerkin for Convection Dominated Equations with Random Data
Bijective Density-Equalizing Quasiconformal Map for Multiply-Connected Open Surfaces
Generative Diffusion Models for Radio Wireless Channel Modelling and Sampling
Masked Diffusion as Self-supervised Representation Learner
PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Keyword: adaptive
Adv-Inpainting: Generating Natural and Transferable Adversarial Patch via Attention-guided Feature Fusion
Match-based solution of general parametric eigenvalue problems
Flexible Isosurface Extraction for Gradient-Based Mesh Optimization
Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification
Adaptive Low Rank Adaptation of Segment Anything to Salient Object Detection
Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
Occupancy Grid Map to Pose Graph-based Map: Robust BIM-based 2D-LiDAR Localization for Lifelong Indoor Navigation in Changing and Dynamic Environments
A Generalized Physical-knowledge-guided Dynamic Model for Underwater Image Enhancement
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
An Adaptive Algorithm Based on Stochastic Discontinuous Galerkin for Convection Dominated Equations with Random Data
Critical Points ++: An Agile Point Cloud Importance Measure for Robust Classification, Adversarial Defense and Explainable AI
Keyword: quantization
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Vector quantization loss analysis in VQGANs: a single-GPU ablation study for image-to-image synthesis
NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search
ReLU and Addition-based Gated RNN