Abstract
This paper focuses on the crucial task of addressing the evacuation of hazardous places, which holds great importance for coordinators, event hosts, and authorities. To facilitate the development of effective solutions, the paper employs Artificial Intelligence (AI) techniques, specifically Multi-Agent Systems (MAS), to construct a simulation model for evacuation. NetLogo is selected as the simulation tool of choice due to its ability to provide a comprehensive understanding of human behaviour in distressing situations within hazardous environments. The primary objective of this paper is to enhance our comprehension of how individuals react and respond during such distressing situations. By leveraging AI and MAS, the simulation model aims to capture the complex dynamics of evacuation scenarios, enabling policymakers and emergency planners to make informed decisions and implement more efficient and effective evacuation strategies. This paper endeavours to contribute to the advancement of evacuation planning and ultimately improve the safety and well-being of individuals in hazardous places
PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization
Abstract
Accurate yet efficient Deep Neural Networks (DNNs) are in high demand, especially for applications that require their execution on constrained edge devices. Finding such DNNs in a reasonable time for new applications requires automated optimization pipelines since the huge space of hyper-parameter combinations is impossible to explore extensively by hand. In this work, we propose PLiNIO, an open-source library implementing a comprehensive set of state-of-the-art DNN design automation techniques, all based on lightweight gradient-based optimization, under a unified and user-friendly interface. With experiments on several edge-relevant tasks, we show that combining the various optimizations available in PLiNIO leads to rich sets of solutions that Pareto-dominate the considered baselines in terms of accuracy vs model size. Noteworthy, PLiNIO achieves up to 94.34% memory reduction for a <1% accuracy drop compared to a baseline architecture.
Variable Independence in Linear Real Arithmetic
Authors: Alexander Mayorov
Subjects: Logic in Computer Science (cs.LO); Databases (cs.DB)
Abstract
Variable independence and decomposability are algorithmic techniques for simplifying logical formulas by tearing apart connections between free variables. These techniques were originally proposed to speed up query evaluation in constraint databases, in particular by representing the query as a Boolean combination of formulas with no interconnected variables. They also have many other applications in SMT, string analysis, databases, automata theory and other areas. However, the precise complexity of variable independence and decomposability has been left open especially for the quantifier-free theory of linear real arithmetic (LRA), which is central in database applications. We introduce a novel characterization of formulas admitting decompositions and use it to show that it is coNP-complete to decide variable decomposability over LRA. As a corollary, we obtain that deciding variable independence is in $ \Sigma_2^p $. These results substantially improve the best known double-exponential time algorithms for variable decomposability and independence. In many practical applications, it is crucial to be able to efficiently eliminate connections between variables whenever possible. We design and implement an algorithm for this problem, which is optimal in theory, exponentially faster compared to the current state-of-the-art algorithm and efficient on various microbenchmarks. In particular, our algorithm is the first one to overcome a fundamental barrier between non-discrete and discrete first-order theories. Formulas arising in practice often have few or even no free variables that are perfectly independent. In this case, our algorithm can compute a best-possible approximation of a decomposition, which can be used to optimize database queries by exploiting partial variable independence, which is present in almost every logical formula or database query constraint.
Efficient Inverse-designed Structural Infill for Complex Engineering Structures
Authors: Peter Dørffler Ladegaard Jensen, Tim Felle Olsen, J. Andreas Bærentzen, Niels Aage, Ole Sigmund
Subjects: Graphics (cs.GR); Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA); Optimization and Control (math.OC)
Abstract
Inverse design of high-resolution and fine-detailed 3D lightweight mechanical structures is notoriously expensive due to the need for vast computational resources and the use of very fine-scaled complex meshes. Furthermore, in designing for additive manufacturing, infill is often neglected as a component of the optimized structure. In this paper, both concerns are addressed using a de-homogenization topology optimization procedure on complex engineering structures discretized by 3D unstructured hexahedrals. Using a rectangular-hole microstructure (reminiscent to the stiffness optimal orthogonal rank-3 multi-scale) as a base material for the multi-scale optimization, a coarse-scale optimized geometry can be obtained using homogenization-based topology optimization. Due to the microstructure periodicity, this coarse-scale geometry can be up-sampled to a fine physical geometry with optimized infill, with minor loss in structural performance and at a fraction of the cost of a fine-scale solution. The upsampling on 3D unstructured grids is achieved through stream surface tracing which aligns with the optimized local orientation. The periodicity of the physical geometry can be tuned, such that the material serves as a structural component and also as an efficient infill for additive manufacturing designs. The method is demonstrated through three examples. It achieves comparable structural performance to state-of-the-art methods but stands out for its significant computational time reduction, much faster than the base-line method. By allowing multiple active layers, the mapped solution becomes more mechanically stable, leading to an increased critical buckling load factor without additional computational expense. The proposed approach achieves promising results, benchmarking against large-scale SIMP models demonstrates computational efficiency improvements of up to 250 times.
LOG-LIO: A LiDAR-Inertial Odometry with Efficient Local Geometric Information Estimation
Authors: Kai Huang, Junqiao Zhao, Tiantian Feng, Zhongyang Zhu, Chen Ye
Abstract
Local geometric information, i.e. normal and point distribution, is crucial for LiDAR-based simultaneous localization and mapping (SLAM) because it provides constrains for data association, which further determines the direction of optimization and ultimately affects the accuracy of poses. However, estimating normal and point distribution are time-consuming tasks even with the assistance of the KDtree or volumetic maps. To achieve fast normal estimation, we look into the structural information of LiDAR scan and propose a novel fast approximate least squares (FALS) method. With the pre-computed bearing information, estimating the normal requires only the range information of the points when a new scan arrives. To efficiently estimate the distribution of points, we extend the ikd-tree to manage the map in voxels and update its point cloud distribution incrementally while maintaining its consistency with the normals. For scan points that satisfy visibility and consistency checks based on normal, we devise a robust and accurate hierarchical data association schema considering the distribution where point-to-surfel is prioritized over point-to-plane. We further fix voxels after the distribution convergences to balance the time consumption and the correctness of representation. Extensive experiments on diverse public datasets demonstrate the advantages of our system compared to other state-of-the-art methods.
Approximately counting independent sets in dense bipartite graphs via subspace enumeration
Authors: Charlie Carlson, Ewan Davies, Alexandra Kolla, Aditya Potukuchi
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
Abstract
We give a randomized algorithm that approximates the number of independent sets in a dense, regular bipartite graph -- in the language of approximate counting, we give an FPRAS for #BIS on the class of dense, regular bipartite graphs. Efficient counting algorithms typically apply to ``high-temperature'' problems on bounded-degree graphs, and our contribution is a notable exception as it applies to dense graphs in a low-temperature setting. Our methods give a counting-focused complement to the long line of work in combinatorial optimization showing that CSPs such as Max-Cut and Unique Games are easy on dense graphs via spectral arguments. The proof exploits the fact that dense, regular graphs exhibit a kind of small-set expansion (i.e. bounded threshold rank), which via subspace enumeration lets us enumerate small cuts efficiently.
Guided Linear Upsampling
Authors: Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, Changhe Tu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Guided upsampling is an effective approach for accelerating high-resolution image processing. In this paper, we propose a simple yet effective guided upsampling method. Each pixel in the high-resolution image is represented as a linear interpolation of two low-resolution pixels, whose indices and weights are optimized to minimize the upsampling error. The downsampling can be jointly optimized in order to prevent missing small isolated regions. Our method can be derived from the color line model and local color transformations. Compared to previous methods, our method can better preserve detail effects while suppressing artifacts such as bleeding and blurring. It is efficient, easy to implement, and free of sensitive parameters. We evaluate the proposed method with a wide range of image operators, and show its advantages through quantitative and qualitative analysis. We demonstrate the advantages of our method for both interactive image editing and real-time high-resolution video processing. In particular, for interactive editing, the joint optimization can be precomputed, thus allowing for instant feedback without hardware acceleration.
Gradient strikes back: How filtering out high frequencies improves explanations
Authors: Sabine Muzellec, Leo Andeol, Thomas Fel, Rufin VanRullen, Thomas Serre
Abstract
Recent years have witnessed an explosion in the development of novel prediction-based attribution methods, which have slowly been supplanting older gradient-based methods to explain the decisions of deep neural networks. However, it is still not clear why prediction-based methods outperform gradient-based ones. Here, we start with an empirical observation: these two approaches yield attribution maps with very different power spectra, with gradient-based methods revealing more high-frequency content than prediction-based methods. This observation raises multiple questions: What is the source of this high-frequency information, and does it truly reflect decisions made by the system? Lastly, why would the absence of high-frequency information in prediction-based methods yield better explainability scores along multiple metrics? We analyze the gradient of three representative visual classification models and observe that it contains noisy information emanating from high-frequencies. Furthermore, our analysis reveals that the operations used in Convolutional Neural Networks (CNNs) for downsampling appear to be a significant source of this high-frequency content -- suggesting aliasing as a possible underlying basis. We then apply an optimal low-pass filter for attribution maps and demonstrate that it improves gradient-based attribution methods. We show that (i) removing high-frequency noise yields significant improvements in the explainability scores obtained with gradient-based methods across multiple models -- leading to (ii) a novel ranking of state-of-the-art methods with gradient-based methods at the top. We believe that our results will spur renewed interest in simpler and computationally more efficient gradient-based methods for explainability.
AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications
Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun Tang, Jean Sexton, Sheng Di, Zarija Lukić, Kai Zhao, Bo Fang, Franck Cappello, James Ahrens, Dingwen Tao
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to tackle the latter issue. Despite their respective advantages, few attempts have been made to investigate how AMR and error-bounded lossy compression can function together. To this end, this study presents a novel in-situ lossy compression framework that employs the HDF5 filter to improve both I/O costs and boost compression quality for AMR applications. We implement our solution into the AMReX framework and evaluate on two real-world AMR applications, Nyx and WarpX, on the Summit supercomputer. Experiments with 4096 CPU cores demonstrate that AMRIC improves the compression ratio by up to 81X and the I/O performance by up to 39X over AMReX's original compression solution.
VISER: A Tractable Solution Concept for Games with Information Asymmetry
Authors: Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, Qiaomin Xie
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
Abstract
Many real-world games suffer from information asymmetry: one player is only aware of their own payoffs while the other player has the full game information. Examples include the critical domain of security games and adversarial multi-agent reinforcement learning. Information asymmetry renders traditional solution concepts such as Strong Stackelberg Equilibrium (SSE) and Robust-Optimization Equilibrium (ROE) inoperative. We propose a novel solution concept called VISER (Victim Is Secure, Exploiter best-Responds). VISER enables an external observer to predict the outcome of such games. In particular, for security applications, VISER allows the victim to better defend itself while characterizing the most damaging attacks available to the attacker. We show that each player's VISER strategy can be computed independently in polynomial time using linear programming (LP). We also extend VISER to its Markov-perfect counterpart for Markov games, which can be solved efficiently using a series of LPs.
Towards A Unified Agent with Foundation Models
Authors: Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess, Martin Riedmiller
Abstract
Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges, such as efficient exploration, reusing experience data, scheduling skills, and learning from observations, which traditionally require separate, vertically designed algorithms. We test our method on a sparse-reward simulated robotic manipulation environment, where a robot needs to stack a set of objects. We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets, and illustrate how to reuse learned skills to solve novel tasks or imitate videos of human experts.
PubMed and Beyond: Recent Advances and Best Practices in Biomedical Literature Search
Authors: Qiao Jin, Robert Leaman, Zhiyong Lu
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Digital Libraries (cs.DL)
Abstract
Biomedical research yields a wealth of information, much of which is only accessible through the literature. Consequently, literature search is an essential tool for building on prior knowledge in clinical and biomedical research. Although recent improvements in artificial intelligence have expanded functionality beyond keyword-based search, these advances may be unfamiliar to clinicians and researchers. In response, we present a survey of literature search tools tailored to both general and specific information needs in biomedicine, with the objective of helping readers efficiently fulfill their information needs. We first examine the widely used PubMed search engine, discussing recent improvements and continued challenges. We then describe literature search tools catering to five specific information needs: 1. Identifying high-quality clinical research for evidence-based medicine. 2. Retrieving gene-related information for precision medicine and genomics. 3. Searching by meaning, including natural language questions. 4. Locating related articles with literature recommendation. 5. Mining literature to discover associations between concepts such as diseases and genetic variants. Additionally, we cover practical considerations and best practices for choosing and using these tools. Finally, we provide a perspective on the future of literature search engines, considering recent breakthroughs in large language models such as ChatGPT. In summary, our survey provides a comprehensive view of biomedical literature search functionalities with 36 publicly available tools.
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Authors: Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi
Abstract
Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across different institutions. Moreover, improvements in metrics such as FLOPs often fail to translate to progress in real-world applications. In response, we introduce Pentathlon, a benchmark for holistic and realistic evaluation of model efficiency. Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle. It offers a strictly-controlled hardware platform, and is designed to mirror real-world applications scenarios. It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption. Pentathlon also comes with a software library that can be seamlessly integrated into any codebase and enable evaluation. As a standardized and centralized evaluation platform, Pentathlon can drastically reduce the workload to make fair and reproducible efficiency comparisons. While initially focused on natural language processing (NLP) models, Pentathlon is designed to allow flexible extension to other fields. We envision Pentathlon will stimulate algorithmic innovations in building efficient models, and foster an increased awareness of the social and environmental implications in the development of future-generation NLP models.
Efficient Guided Generation for LLMs
Authors: Brandon T. Willard, Rémi Louf
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
In this article we describe an efficient approach to guiding language model text generation with regular expressions and context-free grammars. Our approach adds little to no overhead to the token sequence generation process, and makes guided generation feasible in practice. An implementation is provided in the open source Python library Outlines.
Two Tales of Platoon Intelligence for Autonomous Mobility Control: Enabling Deep Learning Recipes
Authors: Soohyun Park, Haemin Lee, Chanyoung Park, Soyi Jung, Minseok Choi, Joongheon Kim
Abstract
This paper presents the deep learning-based recent achievements to resolve the problem of autonomous mobility control and efficient resource management of autonomous vehicles and UAVs, i.e., (i) multi-agent reinforcement learning (MARL), and (ii) neural Myerson auction. Representatively, communication network (CommNet), which is one of the most popular MARL algorithms, is introduced to enable multiple agents to take actions in a distributed manner for their shared goals by training all agents' states and actions in a single neural network. Moreover, the neural Myerson auction guarantees trustfulness among multiple agents as well as achieves the optimal revenue of highly dynamic systems. Therefore, we survey the recent studies on autonomous mobility control based on MARL and neural Myerson auction. Furthermore, we emphasize that integration of MARL and neural Myerson auction is expected to be critical for efficient and trustful autonomous mobility services.
A simple and efficient convex optimization based bound-preserving high order accurate limiter for Cahn-Hilliard-Navier-Stokes system
Authors: Chen Liu, Beatrice Riviere, Jie Shen, Xiangxiong Zhang
Abstract
For time-dependent PDEs, the numerical schemes can be rendered bound-preserving without losing conservation and accuracy, by a post processing procedure of solving a constrained minimization in each time step. Such a constrained optimization can be formulated as a nonsmooth convex minimization, which can be efficiently solved by first order optimization methods, if using the optimal algorithm parameters. By analyzing the asymptotic linear convergence rate of the generalized Douglas-Rachford splitting method, optimal algorithm parameters can be approximately expressed as a simple function of the number of out-of-bounds cells. We demonstrate the efficiency of this simple choice of algorithm parameters by applying such a limiter to cell averages of a discontinuous Galerkin scheme solving phase field equations for 3D demanding problems. Numerical tests on a sophisticated 3D Cahn-Hilliard-Navier-Stokes system indicate that the limiter is high order accurate, very efficient, and well-suited for large-scale simulations. For each time step, it takes at most $20$ iterations for the Douglas-Rachford splitting to enforce bounds and conservation up to the round-off error, for which the computational cost is at most $80N$ with $N$ being the total number of cells.
Uncertainty-Driven Multi-Scale Feature Fusion Network for Real-time Image Deraining
Authors: Ming Tong, Xuefeng Yan, Yongzhen Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Visual-based measurement systems are frequently affected by rainy weather due to the degradation caused by rain streaks in captured images, and existing imaging devices struggle to address this issue in real-time. While most efforts leverage deep networks for image deraining and have made progress, their large parameter sizes hinder deployment on resource-constrained devices. Additionally, these data-driven models often produce deterministic results, without considering their inherent epistemic uncertainty, which can lead to undesired reconstruction errors. Well-calibrated uncertainty can help alleviate prediction errors and assist measurement devices in mitigating risks and improving usability. Therefore, we propose an Uncertainty-Driven Multi-Scale Feature Fusion Network (UMFFNet) that learns the probability mapping distribution between paired images to estimate uncertainty. Specifically, we introduce an uncertainty feature fusion block (UFFB) that utilizes uncertainty information to dynamically enhance acquired features and focus on blurry regions obscured by rain streaks, reducing prediction errors. In addition, to further boost the performance of UMFFNet, we fused feature information from multiple scales to guide the network for efficient collaborative rain removal. Extensive experiments demonstrate that UMFFNet achieves significant performance improvements with few parameters, surpassing state-of-the-art image deraining methods.
Improved Distribution Matching for Dataset Condensation
Abstract
Dataset Condensation aims to condense a large dataset into a smaller one while maintaining its ability to train a well-performing model, thus reducing the storage cost and training effort in deep learning applications. However, conventional dataset condensation methods are optimization-oriented and condense the dataset by performing gradient or parameter matching during model optimization, which is computationally intensive even on small datasets and models. In this paper, we propose a novel dataset condensation method based on distribution matching, which is more efficient and promising. Specifically, we identify two important shortcomings of naive distribution matching (i.e., imbalanced feature numbers and unvalidated embeddings for distance computation) and address them with three novel techniques (i.e., partitioning and expansion augmentation, efficient and enriched model sampling, and class-aware distribution regularization). Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources, thereby scaling data condensation to larger datasets and models. Extensive experiments demonstrate the effectiveness of our method. Codes are available at https://github.com/uitrbn/IDM
Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation
Abstract
Semi-Supervised Semantic Segmentation (S4) aims to train a segmentation model with limited labeled images and a substantial volume of unlabeled images. To improve the robustness of representations, powerful methods introduce a pixel-wise contrastive learning approach in latent space (i.e., representation space) that aggregates the representations to their prototypes in a fully supervised manner. However, previous contrastive-based S4 methods merely rely on the supervision from the model's output (logits) in logit space during unlabeled training. In contrast, we utilize the outputs in both logit space and representation space to obtain supervision in a collaborative way. The supervision from two spaces plays two roles: 1) reduces the risk of over-fitting to incorrect semantic information in logits with the help of representations; 2) enhances the knowledge exchange between the two spaces. Furthermore, unlike previous approaches, we use the similarity between representations and prototypes as a new indicator to tilt training those under-performing representations and achieve a more efficient contrastive learning process. Results on two public benchmarks demonstrate the competitive performance of our method compared with state-of-the-art methods.
DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis
Authors: Along He, Kai Wang, Zhihong Wang, Tao Li, Huazhu Fu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Limited labeled data makes it hard to train models from scratch in medical domain, and an important paradigm is pre-training and then fine-tuning. Large pre-trained models contain rich representations, which can be adapted to downstream medical tasks. However, existing methods either tune all the parameters or the task-specific layers of the pre-trained models, ignoring the input variations of medical images, and thus they are not efficient or effective. In this work, we aim to study parameter-efficient fine-tuning (PEFT) for medical image analysis, and propose a dynamic visual prompt tuning method, named DVPT. It can extract knowledge beneficial to downstream tasks from large models with a few trainable parameters. Firstly, the frozen features are transformed by an lightweight bottleneck layer to learn the domain-specific distribution of downstream medical tasks, and then a few learnable visual prompts are used as dynamic queries and then conduct cross-attention with the transformed features, attempting to acquire sample-specific knowledge that are suitable for each sample. Finally, the features are projected to original feature dimension and aggregated with the frozen features. This DVPT module can be shared between different Transformer layers, further reducing the trainable parameters. To validate DVPT, we conduct extensive experiments with different pre-trained models on medical classification and segmentation tasks. We find such PEFT method can not only efficiently adapt the pre-trained models to the medical domain, but also brings data efficiency with partial labeled data. For example, with 0.5\% extra trainable parameters, our method not only outperforms state-of-the-art PEFT methods, even surpasses the full fine-tuning by more than 2.20\% Kappa score on medical classification task. It can saves up to 60\% labeled data and 99\% storage cost of ViT-B/16.
A Low-Complexity Beamforming Design for Beyond-Diagonal RIS aided Multi-User Networks
Authors: Tianyu Fang, Yijie Mao
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Beyond-diagonal reconfigurable intelligent surface (BD-RIS) has been proposed recently as a novel and generalized RIS architecture that offers enhanced wave manipulation flexibility and large coverage expansion. However, the beyond-diagonal mathematical model in BD-RIS inevitably introduces additional optimization challenges in beamforming design. In this letter, we derive a closed-form solution for the BD-RIS passive beamforming matrix that maximizes the sum of the effective channel gains among users. We further propose a computationally efficient two-stage beamforming framework to jointly design the active beamforming at the base station and passive beamforming at the BD-RIS to enhance the sum-rate for a BD-RIS aided multi-user multi-antenna network.Numerical results show that our proposed algorithm achieves a higher sum-rate while requiring less computation time compared to state-of-the-art algorithms. The proposed algorithm paves the way for practical beamforming design in BD-RIS aided wireless networks.
On the optimality of target-data-dependent kernel greedy interpolation in Sobolev Reproducing Kernel Hilbert Spaces
Authors: Gabriele Santin, Tizian Wenzel, Bernard Haasdonk
Abstract
Kernel interpolation is a versatile tool for the approximation of functions from data, and it can be proven to have some optimality properties when used with kernels related to certain Sobolev spaces. In the context of interpolation, the selection of optimal function sampling locations is a central problem, both from a practical perspective, and as an interesting theoretical question. Greedy interpolation algorithms provide a viable solution for this task, being efficient to run and provably accurate in their approximation. In this paper we close a gap that is present in the convergence theory for these algorithms by employing a recent result on general greedy algorithms. This modification leads to new convergence rates which match the optimal ones when restricted to the $P$-greedy target-data-independent selection rule, and can additionally be proven to be optimal when they fully exploit adaptivity ($f$-greedy). Other than closing this gap, the new results have some significance in the broader setting of the optimality of general approximation algorithms in Reproducing Kernel Hilbert Spaces, as they allow us to compare adaptive interpolation with non-adaptive best nonlinear approximation.
Online Continual Learning for Robust Indoor Object Recognition
Abstract
Vision systems mounted on home robots need to interact with unseen classes in changing environments. Robots have limited computational resources, labelled data and storage capability. These requirements pose some unique challenges: models should adapt without forgetting past knowledge in a data- and parameter-efficient way. We characterize the problem as few-shot (FS) online continual learning (OCL), where robotic agents learn from a non-repeated stream of few-shot data updating only a few model parameters. Additionally, such models experience variable conditions at test time, where objects may appear in different poses (e.g., horizontal or vertical) and environments (e.g., day or night). To improve robustness of CL agents, we propose RobOCLe, which; 1) constructs an enriched feature space computing high order statistical moments from the embedded features of samples; and 2) computes similarity between high order statistics of the samples on the enriched feature space, and predicts their class labels. We evaluate robustness of CL models to train/test augmentations in various cases. We show that different moments allow RobOCLe to capture different properties of deformations, providing higher robustness with no decrease of inference speed.
A Fast and Map-Free Model for Trajectory Prediction in Traffics
Authors: Junhong Xiang, Jingmin Zhang, Zhixiong Nan
Abstract
To handle the two shortcomings of existing methods, (i)nearly all models rely on high-definition (HD) maps, yet the map information is not always available in real traffic scenes and HD map-building is expensive and time-consuming and (ii) existing models usually focus on improving prediction accuracy at the expense of reducing computing efficiency, yet the efficiency is crucial for various real applications, this paper proposes an efficient trajectory prediction model that is not dependent on traffic maps. The core idea of our model is encoding single-agent's spatial-temporal information in the first stage and exploring multi-agents' spatial-temporal interactions in the second stage. By comprehensively utilizing attention mechanism, LSTM, graph convolution network and temporal transformer in the two stages, our model is able to learn rich dynamic and interaction information of all agents. Our model achieves the highest performance when comparing with existing map-free methods and also exceeds most map-based state-of-the-art methods on the Argoverse dataset. In addition, our model also exhibits a faster inference speed than the baseline methods.
Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application to Sparse Autoencoders
Authors: Guillaume Perez, Laurent Condat, Michel Barlaud
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
Looking for sparsity is nowadays crucial to speed up the training of large-scale neural networks. Projections onto the $\ell{1,2}$ and $\ell{1,\infty}$ are among the most efficient techniques to sparsify and reduce the overall cost of neural networks. In this paper, we introduce a new projection algorithm for the $\ell{1,\infty}$ norm ball. The worst-case time complexity of this algorithm is $\mathcal{O}\big(nm+J\log(nm)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. $J$ is a term that tends to 0 when the sparsity is high, and to $nm$ when the sparsity is low. Its implementation is easy and it is guaranteed to converge to the exact solution in a finite time. Moreover, we propose to incorporate the $\ell{1,\infty}$ ball projection while training an autoencoder to enforce feature selection and sparsity of the weights. Sparsification appears in the encoder to primarily do feature selection due to our application in biology, where only a very small part ($<2\%$) of the data is relevant. We show that both in the biological case and in the general case of sparsity that our method is the fastest.
Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment
Authors: Changyu Lee, Dongha Chung, Jonghwi Kim, Jinwhan Kim
Abstract
In this paper, we describe the development process of autonomous navigation capabilities of a small cruise boat operating in a canal environment and present the results of a field experiment conducted in the Pohang Canal, South Korea. Nonlinear model predictive control (NMPC) was used for the online trajectory planning and tracking control of the cruise boat in a narrow passage in the canal. To consider the nonlinear characteristics of boat dynamics, system identification was performed using experimental data from various test maneuvers, such as acceleration-deceleration and zigzag trials. To efficiently represent the obstacle structures in the canal environment, we parameterized the canal walls as line segments with point cloud data, captured by an onboard LiDAR sensor, and considered them as constraints for obstacle avoidance. The proposed method was implemented in a single NMPC layer, and its real-world performance was verified through experimental runs in the Pohang Canal.
Transmitter Side Beyond-Diagonal Reconfigurable Intelligent Surface for Massive MIMO Networks
Authors: Anup Mishra, Yijie Mao, Carmen D'Andrea, Stefano Buzzi, Bruno Clerckx
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This letter focuses on a transmitter or base station (BS) side beyond-diagonal reflecting intelligent surface (BD-RIS) deployment strategy to enhance the spectral efficiency (SE) of a time-division-duplex massive multiple-input multiple-output (MaMIMO) network. In this strategy, the active antenna array utilizes a BD-RIS at the BS to serve multiple users in the downlink. Based on the knowledge of statistical channel state information (CSI), the BD-RIS coefficients matrix is optimized by employing a novel manifold algorithm, and the power control coefficients are then optimized with the objective of maximizing the minimum SE. Through numerical results we illustrate the SE performance of the proposed transmission framework and compare it with that of a conventional MaMIMO transmission for different network settings.
Cross-thread critical sections and efficient dynamic race prediction methods
Abstract
The lock set method and the partial order method are two main approaches to guarantee that dynamic data race prediction remains efficient. There are many variations of these ideas. Common to all of them is the assumption that the events in a critical section belong to the same thread. We have evidence that critical sections in the wild do extend across thread boundaries even if the surrounding acquire and release events occur in the same thread. We introduce the novel concept of a cross-thread critical section to capture such situations, offer a theoretical comprehensive framework, and study their impact on state-of-the-art data race analyses. For sound partial order relations such as WCP, SDP, and DCtp, the occurrence of cross-thread critical sections negatively impacts their precision. For complete partial order relations such as WDP and PWR, cross-thread critical sections help to eliminate more false positives. The same (positive) impact applies to the lock set construction. Our experimental evaluation confirms that cross-thread critical sections arise in practice. For the complete relation PWR, we are able to reduce the number of false positives. The performance overhead incurred by tracking cross-thread critical sections slows down the analysis by 10\%-20\%, on average.
Towards a population-informed approach to the definition of data-driven models for structural dynamics
Authors: G. Tsialiamanis, N. Dervilis, D.J. Wagg, K. Worden
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract
Machine learning has affected the way in which many phenomena for various domains are modelled, one of these domains being that of structural dynamics. However, because machine-learning algorithms are problem-specific, they often fail to perform efficiently in cases of data scarcity. To deal with such issues, combination of physics-based approaches and machine learning algorithms have been developed. Although such methods are effective, they also require the analyser's understanding of the underlying physics of the problem. The current work is aimed at motivating the use of models which learn such relationships from a population of phenomena, whose underlying physics are similar. The development of such models is motivated by the way that physics-based models, and more specifically finite element models, work. Such models are considered transferrable, explainable and trustworthy, attributes which are not trivially imposed or achieved for machine-learning models. For this reason, machine-learning approaches are less trusted by industry and often considered more difficult to form validated models. To achieve such data-driven models, a population-based scheme is followed here and two different machine-learning algorithms from the meta-learning domain are used. The two algorithms are the model-agnostic meta-learning (MAML) algorithm and the conditional neural processes (CNP) model. The algorithms seem to perform as intended and outperform a traditional machine-learning algorithm at approximating the quantities of interest. Moreover, they exhibit behaviour similar to traditional machine learning algorithms (e.g. neural networks or Gaussian processes), concerning their performance as a function of the available structures in the training population.
Agricultural Robotic System: The Automation of Detection and Speech Control
Authors: Yang Wenkai, Ji Ruihang, Yue Yiran, Gu Zhonghan, Shu Wanyang, Sam Ge Shuzhi
Abstract
Agriculture industries often face challenges in manual tasks such as planting, harvesting, fertilizing, and detection, which can be time consuming and prone to errors. The "Agricultural Robotic System" project addresses these issues through a modular design that integrates advanced visual, speech recognition, and robotic technologies. This system is comprised of separate but interconnected modules for vision detection and speech recognition, creating a flexible and adaptable solution. The vision detection module uses computer vision techniques, trained on YOLOv5 and deployed on the Jetson Nano in TensorRT format, to accurately detect and identify different items. A robotic arm module then precisely controls the picking up of seedlings or seeds, and arranges them in specific locations. The speech recognition module enhances intelligent human robot interaction, allowing for efficient and intuitive control of the system. This modular approach improves the efficiency and accuracy of agricultural tasks, demonstrating the potential of robotics in the agricultural industry.
Amortised Experimental Design and Parameter Estimation for User Models of Pointing
Authors: Antti Keurulainen, Isak Westerlund, Oskar Keurulainen, Andrew Howes
Abstract
User models play an important role in interaction design, supporting automation of interaction design choices. In order to do so, model parameters must be estimated from user data. While very large amounts of user data are sometimes required, recent research has shown how experiments can be designed so as to gather data and infer parameters as efficiently as possible, thereby minimising the data requirement. In the current article, we investigate a variant of these methods that amortises the computational cost of designing experiments by training a policy for choosing experimental designs with simulated participants. Our solution learns which experiments provide the most useful data for parameter estimation by interacting with in-silico agents sampled from the model space thereby using synthetic data rather than vast amounts of human data. The approach is demonstrated for three progressively complex models of pointing.
Amortised Design Optimization for Item Response Theory
Authors: Antti Keurulainen, Isak Westerlund, Oskar Keurulainen, Andrew Howes
Abstract
Item Response Theory (IRT) is a well known method for assessing responses from humans in education and psychology. In education, IRT is used to infer student abilities and characteristics of test items from student responses. Interactions with students are expensive, calling for methods that efficiently gather information for inferring student abilities. Methods based on Optimal Experimental Design (OED) are computationally costly, making them inapplicable for interactive applications. In response, we propose incorporating amortised experimental design into IRT. Here, the computational cost is shifted to a precomputing phase by training a Deep Reinforcement Learning (DRL) agent with synthetic data. The agent is trained to select optimally informative test items for the distribution of students, and to conduct amortised inference conditioned on the experiment outcomes. During deployment the agent estimates parameters from data, and suggests the next test item for the student, in close to real-time, by taking into account the history of experiments and outcomes.
Chit-Chat or Deep Talk: Prompt Engineering for Process Mining
Authors: Urszula Jessen, Michal Sroka, Dirk Fahland
Abstract
This research investigates the application of Large Language Models (LLMs) to augment conversational agents in process mining, aiming to tackle its inherent complexity and diverse skill requirements. While LLM advancements present novel opportunities for conversational process mining, generating efficient outputs is still a hurdle. We propose an innovative approach that amend many issues in existing solutions, informed by prior research on Natural Language Processing (NLP) for conversational agents. Leveraging LLMs, our framework improves both accessibility and agent performance, as demonstrated by experiments on public question and data sets. Our research sets the stage for future explorations into LLMs' role in process mining and concludes with propositions for enhancing LLM memory, implementing real-time user testing, and examining diverse data sets.
TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network
Abstract
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials based on longitudinal patient electronic health records (EHR) data and eligibility criteria of clinical trials. However, they either depend on trial-specific expert rules that cannot expand to other trials or perform matching at a very general level with a black-box model where the lack of interpretability makes the model results difficult to be adopted. To provide accurate and interpretable patient trial matching, we introduce a personalized dynamic tree-based memory network model named TREEMENT. It utilizes hierarchical clinical ontologies to expand the personalized patient representation learned from sequential EHR data, and then uses an attentional beam-search query learned from eligibility criteria embedding to offer a granular level of alignment for improved performance and interpretability. We evaluated TREEMENT against existing models on real-world datasets and demonstrated that TREEMENT outperforms the best baseline by 7% in terms of error reduction in criteria-level matching and achieves state-of-the-art results in its trial-level matching ability. Furthermore, we also show TREEMENT can offer good interpretability to make the model results easier for adoption.
ProtoCaps: A Fast and Non-Iterative Capsule Network Routing Method
Authors: Miles Everett, Mingjun Zhong, Georgios Leontidis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Capsule Networks have emerged as a powerful class of deep learning architectures, known for robust performance with relatively few parameters compared to Convolutional Neural Networks (CNNs). However, their inherent efficiency is often overshadowed by their slow, iterative routing mechanisms which establish connections between Capsule layers, posing computational challenges resulting in an inability to scale. In this paper, we introduce a novel, non-iterative routing mechanism, inspired by trainable prototype clustering. This innovative approach aims to mitigate computational complexity, while retaining, if not enhancing, performance efficacy. Furthermore, we harness a shared Capsule subspace, negating the need to project each lower-level Capsule to each higher-level Capsule, thereby significantly reducing memory requisites during training. Our approach demonstrates superior results compared to the current best non-iterative Capsule Network and tests on the Imagewoof dataset, which is too computationally demanding to handle efficiently by iterative approaches. Our findings underscore the potential of our proposed methodology in enhancing the operational efficiency and performance of Capsule Networks, paving the way for their application in increasingly complex computational scenarios.
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Abstract
State-of-the-art text-video retrieval (TVR) methods typically utilize CLIP and cosine similarity for efficient retrieval. Meanwhile, cross attention methods, which employ a transformer decoder to compute attention between each text query and all frames in a video, offer a more comprehensive interaction between text and videos. However, these methods lack important fine-grained spatial information as they directly compute attention between text and video-level tokens. To address this issue, we propose CrossTVR, a two-stage text-video retrieval architecture. In the first stage, we leverage existing TVR methods with cosine similarity network for efficient text/video candidate selection. In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions. Additionally, we employ the frozen CLIP model strategy in fine-grained retrieval, enabling scalability to larger pre-trained vision models like ViT-G, resulting in improved retrieval performance. Experiments on text video retrieval datasets demonstrate the effectiveness and scalability of our proposed CrossTVR compared to state-of-the-art approaches.
TinyTrain: Deep Neural Network Training at the Extreme Edge
Authors: Young D. Kwon, Rui Li, Stylianos I. Venieris, Jagmohan Chauhan, Nicholas D. Lane, Cecilia Mascolo
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCU), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), or induce substantial accuracy loss ($\geq$10\%). We propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0\% in accuracy, while reducing the backward-pass memory and computation cost by up to 2,286$\times$ and 7.68$\times$, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5$\times$ faster and 3.5$\times$ more energy-efficient training over status-quo approaches, and 2.8$\times$ smaller memory footprint than SOTA approaches, while remaining within the 1 MB memory envelope of MCU-grade platforms.
Impact of Disentanglement on Pruning Neural Networks
Authors: Carl Shneider, Peyman Rostami, Anis Kacem, Nilotpal Sinha, Abd El Rahman Shabayek, Djamila Aouada
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Abstract
Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can be realized via efficient model compression. Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because they mainly retain task-specific information, discarding useless information for the task at hand. We make use of the Beta-VAE framework combined with a standard criterion for pruning to investigate the impact of forcing the network to learn disentangled representations on the pruning process for the task of classification. In particular, we perform experiments on MNIST and CIFAR10 datasets, examine disentanglement challenges, and propose a path forward for future works.
TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition
Abstract
To enable context-aware computer assistance in the operating room of the future, cognitive systems need to understand automatically which surgical phase is being performed by the medical team. The primary source of information for surgical phase recognition is typically video, which presents two challenges: extracting meaningful features from the video stream and effectively modeling temporal information in the sequence of visual features. For temporal modeling, attention mechanisms have gained popularity due to their ability to capture long-range dependencies. In this paper, we explore design choices for attention in existing temporal models for surgical phase recognition and propose a novel approach that does not resort to local attention or regularization of attention weights: TUNeS is an efficient and simple temporal model that incorporates self-attention at the coarsest stage of a U-Net-like structure. In addition, we propose to train the feature extractor, a standard CNN, together with an LSTM on preferably long video segments, i.e., with long temporal context. In our experiments, all temporal models performed better on top of feature extractors that were trained with longer temporal context. On top of these contextualized features, TUNeS achieves state-of-the-art results on Cholec80.
As large as it gets: Learning infinitely large Filters via Neural Implicit Functions in the Fourier Domain
Authors: Julia Grabinski, Janis Keuper, Margret Keuper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Motivated by the recent trend towards the usage of larger receptive fields for more context-aware neural networks in vision applications, we aim to investigate how large these receptive fields really need to be. To facilitate such study, several challenges need to be addressed, most importantly: (i) We need to provide an effective way for models to learn large filters (potentially as large as the input data) without increasing their memory consumption during training or inference, (ii) the study of filter sizes has to be decoupled from other effects such as the network width or number of learnable parameters, and (iii) the employed convolution operation should be a plug-and-play module that can replace any conventional convolution in a Convolutional Neural Network (CNN) and allow for an efficient implementation in current frameworks. To facilitate such models, we propose to learn not spatial but frequency representations of filter weights as neural implicit functions, such that even infinitely large filters can be parameterized by only a few learnable weights. The resulting neural implicit frequency CNNs are the first models to achieve results on par with the state-of-the-art on large image classification benchmarks while executing convolutions solely in the frequency domain and can be employed within any CNN architecture. They allow us to provide an extensive analysis of the learned receptive fields. Interestingly, our analysis shows that, although the proposed networks could learn very large convolution kernels, the learned filters practically translate into well-localized and relatively small convolution kernels in the spatial domain.
6G Network Business Support System
Authors: Ye Ouyang, Yaqin Zhang, Peng Wang, Yunxin Liu, Wen Qiao, Jun Zhu, Yang Liu, Feng Zhang, Shuling Wang, Xidong Wang
Abstract
6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc. 6G will realize the transition from serving people and people-things communication to supporting the efficient connection of intelligent agents, and comprehensively leading the digital, intelligent and green transformation of the economy and the society. As the core support system for mobile communication network, 6 6G BSS need to integrate with new business models brought about by the development of the next-generation Internet and IT, upgrade from "network-centric" to "business and service centric" and "customer-centric". 6G OSS and BSS systems need to strengthen their integration to improve the operational efficiency and benefits of customers by connecting the digital intelligence support capabilities on both sides of supply and demand. This paper provides a detailed introduction to the overall vision, potential key technologies, and functional architecture of 6G BSS systems. It also presents an evolutionary roadmap and technological prospects for the BSS systems from 5G to 6G.
Optimizing the extended Fourier Mellin Transformation Algorithm
Abstract
With the increasing application of robots, stable and efficient Visual Odometry (VO) algorithms are becoming more and more important. Based on the Fourier Mellin Transformation (FMT) algorithm, the extended Fourier Mellin Transformation (eFMT) is an image registration approach that can be applied to downward-looking cameras, for example on aerial and underwater vehicles. eFMT extends FMT to multi-depth scenes and thus more application scenarios. It is a visual odometry method which estimates the pose transformation between three overlapping images. On this basis, we develop an optimized eFMT algorithm that improves certain aspects of the method and combines it with back-end optimization for the small loop of three consecutive frames. For this we investigate the extraction of uncertainty information from the eFMT registration, the related objective function and the graph-based optimization. Finally, we design a series of experiments to investigate the properties of this approach and compare it with other VO and SLAM (Simultaneous Localization and Mapping) algorithms. The results show the superior accuracy and speed of our o-eFMT approach, which is published as open source.
Validation of Modern JSON Schema: Formalization and Complexity
Authors: Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, Stefanie Scherzinger
Subjects: Databases (cs.DB); Programming Languages (cs.PL)
Abstract
JSON Schema is the de-facto standard schema language for JSON data. The language went through many minor revisions, but the most recent versions of the language added two novel features, dynamic references and annotation-dependent validation, that change the evaluation model. Modern JSON Schema is the name used to indicate all versions from Draft 2019-09, which are characterized by these new features, while Classical JSON Schema is used to indicate the previous versions. These new "modern" features make the schema language quite difficult to understand, and have generated many discussions about the correct interpretation of their official specifications; for this reason we undertook the task of their formalization. During this process, we also analyzed the complexity of data validation in Modern JSON Schema, with the idea of confirming the PTIME complexity of Classical JSON Schema validation, and we were surprised to discover a completely different truth: data validation, that is expected to be an extremely efficient process, acquires, with Modern JSON Schema features, a PSPACE complexity. In this paper, we give the first formal description of Modern JSON Schema, which we consider a central contribution of the work that we present here. We then prove that its data validation problem is PSPACE-complete. We prove that the origin of the problem lies in dynamic references, and not in annotation-dependent validation. We study the schema and data complexities, showing that the problem is PSPACE-complete with respect to the schema size even with a fixed instance, but is in PTIME when the schema is fixed and only the instance size is allowed to vary. Finally, we run experiments that show that there are families of schemas where the difference in asymptotic complexity between dynamic and static references is extremely visible, even with small schemas.
Deteksi Sampah di Permukaan dan Dalam Perairan pada Objek Video dengan Metode Robust and Efficient Post-Processing dan Tubelet-Level Bounding Box Linking
Authors: Bryan Tjandra, Made S. N. Negara, Nyoo S. C. Handoko
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Indonesia, as a maritime country, has a significant portion of its territory covered by water. Ineffective waste management has resulted in a considerable amount of trash in Indonesian waters, leading to various issues. The development of an automated trash-collecting robot can be a solution to address this problem. The robot requires a system capable of detecting objects in motion, such as in videos. However, using naive object detection methods in videos has limitations, particularly when image focus is reduced and the target object is obstructed by other objects. This paper's contribution provides an explanation of the methods that can be applied to perform video object detection in an automated trash-collecting robot. The study utilizes the YOLOv5 model and the Robust & Efficient Post Processing (REPP) method, along with tubelet-level bounding box linking on the FloW and Roboflow datasets. The combination of these methods enhances the performance of naive object detection from YOLOv5 by considering the detection results in adjacent frames. The results show that the post-processing stage and tubelet-level bounding box linking can improve the quality of detection, achieving approximately 3% better performance compared to YOLOv5 alone. The use of these methods has the potential to detect surface and underwater trash and can be applied to a real-time image-based trash-collecting robot. Implementing this system is expected to mitigate the damage caused by trash in the past and improve Indonesia's waste management system in the future.
Scientific Exploration of Challenging Planetary Analog Environments with a Team of Legged Robots
Authors: Philip Arm, Gabriel Waibel, Jan Preisig, Turcan Tuna, Ruyi Zhou, Valentin Bickel, Gabriela Ligeza, Takahiro Miki, Florian Kehl, Hendrik Kolvenbach, Marco Hutter
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
The interest in exploring planetary bodies for scientific investigation and in-situ resource utilization is ever-rising. Yet, many sites of interest are inaccessible to state-of-the-art planetary exploration robots because of the robots' inability to traverse steep slopes, unstructured terrain, and loose soil. Additionally, current single-robot approaches only allow a limited exploration speed and a single set of skills. Here, we present a team of legged robots with complementary skills for exploration missions in challenging planetary analog environments. We equipped the robots with an efficient locomotion controller, a mapping pipeline for online and post-mission visualization, instance segmentation to highlight scientific targets, and scientific instruments for remote and in-situ investigation. Furthermore, we integrated a robotic arm on one of the robots to enable high-precision measurements. Legged robots can swiftly navigate representative terrains, such as granular slopes beyond 25 degrees, loose soil, and unstructured terrain, highlighting their advantages compared to wheeled rover systems. We successfully verified the approach in analog deployments at the BeyondGravity ExoMars rover testbed, in a quarry in Switzerland, and at the Space Resources Challenge in Luxembourg. Our results show that a team of legged robots with advanced locomotion, perception, and measurement skills, as well as task-level autonomy, can conduct successful, effective missions in a short time. Our approach enables the scientific exploration of planetary target sites that are currently out of human and robotic reach.
An efficient displacement-based isogeometric formulation for geometrically exact viscoelastic beams
Abstract
We propose a novel approach to the linear viscoelastic problem of shear-deformable geometrically exact beams. The generalized Maxwell model for one-dimensional solids is here efficiently extended to the case of arbitrarily curved beams undergoing finite displacement and rotations. High efficiency is achieved by combining a series of distinguishing features, that are i) the formulation is displacement-based, therefore no additional unknowns, other than incremental displacements and rotations, are needed for the internal variables associated with the rate-dependent material; ii) the governing equations are discretized in space using the isogeometric collocation method, meaning that elements integration is totally bypassed; iii) finite rotations are updated using the incremental rotation vector, leading to two main benefits: minimum number of rotation unknowns (the three components of the incremental rotation vector) and no singularity problems; iv) the same $\rm SO(3)$-consistent linearization of the governing equations and update procedures as for non-rate-dependent linear elastic material can be used; v) a standard second-order accurate time integration scheme is made consistent with the underlying geometric structure of the kinematic problem. Moreover, taking full advantage of the isogeometric analysis features, the formulation permits accurately representing beams and beam structures with highly complex initial shape and topology, paving the way for a large number of potential applications in the field of architectured materials, meta-materials, morphing/programmable objects, topological optimizations, etc. Numerical applications are finally presented in order to demonstrate attributes and potentialities of the proposed formulation.
Curvature-based Clustering on Graphs
Authors: Yu Tian, Zachary Lubberts, Melanie Weber
Subjects: Social and Information Networks (cs.SI); Discrete Mathematics (cs.DM); Machine Learning (cs.LG); Combinatorics (math.CO); Machine Learning (stat.ML)
Abstract
Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms, which exploit the geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric flows, under which the edge weights of the graph evolve to reveal its community structure. We consider several discrete curvature notions and analyze the utility of the resulting algorithms. In contrast to prior literature, we study not only single-membership community detection, where each node belongs to exactly one community, but also mixed-membership community detection, where communities may overlap. For the latter, we argue that it is beneficial to perform community detection on the line graph, i.e., the graph's dual. We provide both theoretical and empirical evidence for the utility of our curvature-based clustering algorithms. In addition, we give several results on the relationship between the curvature of a graph and that of its dual, which enable the efficient implementation of our proposed mixed-membership community detection approach and which may be of independent interest for curvature-based network analysis.
Robust Driving Policy Learning with Guided Meta Reinforcement Learning
Authors: Kanghoon Lee, Jiachen Li, David Isele, Jinkyoo Park, Kikuo Fujimura, Mykel J. Kochenderfer
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract
Although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors. In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives. We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.
Adversarial Latent Autoencoder with Self-Attention for Structural Image Synthesis
Authors: Jiajie Fan, Laure Vuaille, Hao Wang, Thomas Bäck
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE); Image and Video Processing (eess.IV)
Abstract
Generative Engineering Design approaches driven by Deep Generative Models (DGM) have been proposed to facilitate industrial engineering processes. In such processes, designs often come in the form of images, such as blueprints, engineering drawings, and CAD models depending on the level of detail. DGMs have been successfully employed for synthesis of natural images, e.g., displaying animals, human faces and landscapes. However, industrial design images are fundamentally different from natural scenes in that they contain rich structural patterns and long-range dependencies, which are challenging for convolution-based DGMs to generate. Moreover, DGM-driven generation process is typically triggered based on random noisy inputs, which outputs unpredictable samples and thus cannot perform an efficient industrial design exploration. We tackle these challenges by proposing a novel model Self-Attention Adversarial Latent Autoencoder (SA-ALAE), which allows generating feasible design images of complex engineering parts. With SA-ALAE, users can not only explore novel variants of an existing design, but also control the generation process by operating in latent space. The potential of SA-ALAE is shown by generating engineering blueprints in a real automotive design task.
LightPath: Lightweight and Scalable Path Representation Learning
Authors: Sean Bin Yang, Jilin Hu, Chenjuan Guo, Bin Yang, Christian S. Jensen
Abstract
Movement paths are used widely in intelligent transportation and smart city applications. To serve such applications, path representation learning aims to provide compact representations of paths that enable efficient and accurate operations when used for different downstream tasks such as path ranking and travel cost estimation. In many cases, it is attractive that the path representation learning is lightweight and scalable; in resource-limited environments and under green computing limitations, it is essential. Yet, existing path representation learning studies focus on accuracy and pay at most secondary attention to resource consumption and scalability. We propose a lightweight and scalable path representation learning framework, termed LightPath, that aims to reduce resource consumption and achieve scalability without affecting accuracy, thus enabling broader applicability. More specifically, we first propose a sparse auto-encoder that ensures that the framework achieves good scalability with respect to path length. Next, we propose a relational reasoning framework to enable faster training of more robust sparse path encoders. We also propose global-local knowledge distillation to further reduce the size and improve the performance of sparse path encoders. Finally, we report extensive experiments on two real-world datasets to offer insight into the efficiency, scalability, and effectiveness of the proposed framework.
Keyword: faster
Variable Independence in Linear Real Arithmetic
Authors: Alexander Mayorov
Subjects: Logic in Computer Science (cs.LO); Databases (cs.DB)
Abstract
Variable independence and decomposability are algorithmic techniques for simplifying logical formulas by tearing apart connections between free variables. These techniques were originally proposed to speed up query evaluation in constraint databases, in particular by representing the query as a Boolean combination of formulas with no interconnected variables. They also have many other applications in SMT, string analysis, databases, automata theory and other areas. However, the precise complexity of variable independence and decomposability has been left open especially for the quantifier-free theory of linear real arithmetic (LRA), which is central in database applications. We introduce a novel characterization of formulas admitting decompositions and use it to show that it is coNP-complete to decide variable decomposability over LRA. As a corollary, we obtain that deciding variable independence is in $ \Sigma_2^p $. These results substantially improve the best known double-exponential time algorithms for variable decomposability and independence. In many practical applications, it is crucial to be able to efficiently eliminate connections between variables whenever possible. We design and implement an algorithm for this problem, which is optimal in theory, exponentially faster compared to the current state-of-the-art algorithm and efficient on various microbenchmarks. In particular, our algorithm is the first one to overcome a fundamental barrier between non-discrete and discrete first-order theories. Formulas arising in practice often have few or even no free variables that are perfectly independent. In this case, our algorithm can compute a best-possible approximation of a decomposition, which can be used to optimize database queries by exploiting partial variable independence, which is present in almost every logical formula or database query constraint.
Efficient Inverse-designed Structural Infill for Complex Engineering Structures
Authors: Peter Dørffler Ladegaard Jensen, Tim Felle Olsen, J. Andreas Bærentzen, Niels Aage, Ole Sigmund
Subjects: Graphics (cs.GR); Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA); Optimization and Control (math.OC)
Abstract
Inverse design of high-resolution and fine-detailed 3D lightweight mechanical structures is notoriously expensive due to the need for vast computational resources and the use of very fine-scaled complex meshes. Furthermore, in designing for additive manufacturing, infill is often neglected as a component of the optimized structure. In this paper, both concerns are addressed using a de-homogenization topology optimization procedure on complex engineering structures discretized by 3D unstructured hexahedrals. Using a rectangular-hole microstructure (reminiscent to the stiffness optimal orthogonal rank-3 multi-scale) as a base material for the multi-scale optimization, a coarse-scale optimized geometry can be obtained using homogenization-based topology optimization. Due to the microstructure periodicity, this coarse-scale geometry can be up-sampled to a fine physical geometry with optimized infill, with minor loss in structural performance and at a fraction of the cost of a fine-scale solution. The upsampling on 3D unstructured grids is achieved through stream surface tracing which aligns with the optimized local orientation. The periodicity of the physical geometry can be tuned, such that the material serves as a structural component and also as an efficient infill for additive manufacturing designs. The method is demonstrated through three examples. It achieves comparable structural performance to state-of-the-art methods but stands out for its significant computational time reduction, much faster than the base-line method. By allowing multiple active layers, the mapped solution becomes more mechanically stable, leading to an increased critical buckling load factor without additional computational expense. The proposed approach achieves promising results, benchmarking against large-scale SIMP models demonstrates computational efficiency improvements of up to 250 times.
Generating Redstone Style Cities in Minecraft
Authors: Shuo Huang, Chengpeng Hu, Julian Togelius, Jialin Liu
Abstract
Procedurally generating cities in Minecraft provides players more diverse scenarios and could help understand and improve the design of cities in other digital worlds and the real world. This paper presents a city generator that was submitted as an entry to the 2023 Edition of Minecraft Settlement Generation Competition for Minecraft. The generation procedure is composed of six main steps, namely vegetation clearing, terrain reshaping, building layout generation, route planning, streetlight placement, and wall construction. Three algorithms, including a heuristic-based algorithm, an evolving layout algorithm, and a random one are applied to generate the building layout, thus determining where to place different redstone style buildings, and tested by generating cities on random maps in limited time. Experimental results show that the heuristic-based algorithm is capable of finding an acceptable building layout faster for flat maps, while the evolving layout algorithm performs better in evolving layout for rugged maps. A user study is conducted to compare our generator with outstanding entries of the competition's 2022 edition using the competition's evaluation criteria and shows that our generator performs well in the adaptation and functionality criteria
Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling
Authors: Julia Grabinski, Janis Keuper, Margret Keuper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Convolutional neural networks encode images through a sequence of convolutions, normalizations and non-linearities as well as downsampling operations into potentially strong semantic embeddings. Yet, previous work showed that even slight mistakes during sampling, leading to aliasing, can be directly attributed to the networks' lack in robustness. To address such issues and facilitate simpler and faster adversarial training, [12] recently proposed FLC pooling, a method for provably alias-free downsampling - in theory. In this work, we conduct a further analysis through the lens of signal processing and find that such current pooling methods, which address aliasing in the frequency domain, are still prone to spectral leakage artifacts. Hence, we propose aliasing and spectral artifact-free pooling, short ASAP. While only introducing a few modifications to FLC pooling, networks using ASAP as downsampling method exhibit higher native robustness against common corruptions, a property that FLC pooling was missing. ASAP also increases native robustness against adversarial attacks on high and low resolution data while maintaining similar clean accuracy or even outperforming the baseline.
A Fast and Map-Free Model for Trajectory Prediction in Traffics
Authors: Junhong Xiang, Jingmin Zhang, Zhixiong Nan
Abstract
To handle the two shortcomings of existing methods, (i)nearly all models rely on high-definition (HD) maps, yet the map information is not always available in real traffic scenes and HD map-building is expensive and time-consuming and (ii) existing models usually focus on improving prediction accuracy at the expense of reducing computing efficiency, yet the efficiency is crucial for various real applications, this paper proposes an efficient trajectory prediction model that is not dependent on traffic maps. The core idea of our model is encoding single-agent's spatial-temporal information in the first stage and exploring multi-agents' spatial-temporal interactions in the second stage. By comprehensively utilizing attention mechanism, LSTM, graph convolution network and temporal transformer in the two stages, our model is able to learn rich dynamic and interaction information of all agents. Our model achieves the highest performance when comparing with existing map-free methods and also exceeds most map-based state-of-the-art methods on the Argoverse dataset. In addition, our model also exhibits a faster inference speed than the baseline methods.
Adversarial Likelihood Estimation with One-way Flows
Authors: Omri Ben-Dov, Pravir Singh Gupta, Victoria Abrevaya, Michael J. Black, Partha Ghosh
Abstract
Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require to have a tractable inverse function. Our experimental results show that we converge faster, produce comparable sample quality to GANs with similar architecture, successfully avoid over-fitting to commonly used datasets and produce smooth low-dimensional latent representations of the training data.
DISA: DIfferentiable Similarity Approximation for Universal Multimodal Registration
Authors: Matteo Ronchetti, Wolfgang Wein, Nassir Navab, Oliver Zettinig, Raphael Prevost
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Multimodal image registration is a challenging but essential step for numerous image-guided procedures. Most registration algorithms rely on the computation of complex, frequently non-differentiable similarity metrics to deal with the appearance discrepancy of anatomical structures between imaging modalities. Recent Machine Learning based approaches are limited to specific anatomy-modality combinations and do not generalize to new settings. We propose a generic framework for creating expressive cross-modal descriptors that enable fast deformable global registration. We achieve this by approximating existing metrics with a dot-product in the feature space of a small convolutional neural network (CNN) which is inherently differentiable can be trained without registered data. Our method is several orders of magnitude faster than local patch-based metrics and can be directly applied in clinical settings by replacing the similarity measure with the proposed one. Experiments on three different datasets demonstrate that our approach generalizes well beyond the training data, yielding a broad capture range even on unseen anatomies and modality pairs, without the need for specialized retraining. We make our training code and data publicly available.
EPUF: A Novel Scheme Based on Entropy Features of Latency-based DRAM PUFs Providing Lightweight Authentication in IoT Networks
Abstract
Physical unclonable functions (PUFs) are hardware-oriented primitives that exploit manufacturing variations to generate a unique identity for a physical system. Recent advancements showed how DRAM can be exploited to implement PUFs. DRAM PUFs require no additional circuits for PUF operations and can be used in most of the applications with resource-constrained nodes such as Internet of Things (IoT) networks. However, the existing DRAM PUF solutions either require to interrupt other functions in the host system, or provide unreliable responses due to their sensitiveness to the environmental conditions. In this paper, we propose EPUF, a novel strategy to extract random and unique features from DRAM cells to generate reliable PUF responses. In particular, we use the bitmap images of the binary DRAM values and their entropy features. We show via real device experiments that EPUF is approximately $1.7$ times faster than other state of the art solutions, achieves $100\%$ reliability, generates features with $47.79\%$ uniqueness, and supports a large set of CRP that leads to new potentials for DRAM PUF-based authentication. We also propose a lightweight authentication protocol based on EPUF, which not only provides far better security guarantees but also outperforms the state-of-the-art in terms of communication overhead and computational cost.
TinyTrain: Deep Neural Network Training at the Extreme Edge
Authors: Young D. Kwon, Rui Li, Stylianos I. Venieris, Jagmohan Chauhan, Nicholas D. Lane, Cecilia Mascolo
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCU), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), or induce substantial accuracy loss ($\geq$10\%). We propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0\% in accuracy, while reducing the backward-pass memory and computation cost by up to 2,286$\times$ and 7.68$\times$, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5$\times$ faster and 3.5$\times$ more energy-efficient training over status-quo approaches, and 2.8$\times$ smaller memory footprint than SOTA approaches, while remaining within the 1 MB memory envelope of MCU-grade platforms.
Fast Algorithms for a New Relaxation of Optimal Transport
Authors: Moses Charikar, Beidi Chen, Christopher Re, Erik Waingarten
Abstract
We introduce a new class of objectives for optimal transport computations of datasets in high-dimensional Euclidean spaces. The new objectives are parametrized by $\rho \geq 1$, and provide a metric space $\mathcal{R}{\rho}(\cdot, \cdot)$ for discrete probability distributions in $\mathbb{R}^d$. As $\rho$ approaches $1$, the metric approaches the Earth Mover's distance, but for $\rho$ larger than (but close to) $1$, admits significantly faster algorithms. Namely, for distributions $\mu$ and $\nu$ supported on $n$ and $m$ vectors in $\mathbb{R}^d$ of norm at most $r$ and any $\epsilon > 0$, we give an algorithm which outputs an additive $\epsilon r$-approximation to $\mathcal{R}{\rho}(\mu, \nu)$ in time $(n+m) \cdot \mathrm{poly}((nm)^{(\rho-1)/\rho} \cdot 2^{\rho / (\rho-1)} / \epsilon)$.
Benchmarking Potential Based Rewards for Learning Humanoid Locomotion
Authors: Se Hwan Jeon, Steve Heim, Charles Khazoom, Sangbae Kim
Abstract
The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.
LightPath: Lightweight and Scalable Path Representation Learning
Authors: Sean Bin Yang, Jilin Hu, Chenjuan Guo, Bin Yang, Christian S. Jensen
Abstract
Movement paths are used widely in intelligent transportation and smart city applications. To serve such applications, path representation learning aims to provide compact representations of paths that enable efficient and accurate operations when used for different downstream tasks such as path ranking and travel cost estimation. In many cases, it is attractive that the path representation learning is lightweight and scalable; in resource-limited environments and under green computing limitations, it is essential. Yet, existing path representation learning studies focus on accuracy and pay at most secondary attention to resource consumption and scalability. We propose a lightweight and scalable path representation learning framework, termed LightPath, that aims to reduce resource consumption and achieve scalability without affecting accuracy, thus enabling broader applicability. More specifically, we first propose a sparse auto-encoder that ensures that the framework achieves good scalability with respect to path length. Next, we propose a relational reasoning framework to enable faster training of more robust sparse path encoders. We also propose global-local knowledge distillation to further reduce the size and improve the performance of sparse path encoders. Finally, we report extensive experiments on two real-world datasets to offer insight into the efficiency, scalability, and effectiveness of the proposed framework.
Keyword: mobile
6G Network Business Support System
Authors: Ye Ouyang, Yaqin Zhang, Peng Wang, Yunxin Liu, Wen Qiao, Jun Zhu, Yang Liu, Feng Zhang, Shuling Wang, Xidong Wang
Abstract
6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc. 6G will realize the transition from serving people and people-things communication to supporting the efficient connection of intelligent agents, and comprehensively leading the digital, intelligent and green transformation of the economy and the society. As the core support system for mobile communication network, 6 6G BSS need to integrate with new business models brought about by the development of the next-generation Internet and IT, upgrade from "network-centric" to "business and service centric" and "customer-centric". 6G OSS and BSS systems need to strengthen their integration to improve the operational efficiency and benefits of customers by connecting the digital intelligence support capabilities on both sides of supply and demand. This paper provides a detailed introduction to the overall vision, potential key technologies, and functional architecture of 6G BSS systems. It also presents an evolutionary roadmap and technological prospects for the BSS systems from 5G to 6G.
Keyword: pruning
Impact of Disentanglement on Pruning Neural Networks
Authors: Carl Shneider, Peyman Rostami, Anis Kacem, Nilotpal Sinha, Abd El Rahman Shabayek, Djamila Aouada
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Abstract
Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can be realized via efficient model compression. Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because they mainly retain task-specific information, discarding useless information for the task at hand. We make use of the Beta-VAE framework combined with a standard criterion for pruning to investigate the impact of forcing the network to learn disentangled representations on the pruning process for the task of classification. In particular, we perform experiments on MNIST and CIFAR10 datasets, examine disentanglement challenges, and propose a path forward for future works.
Keyword: diffusion
Leveraging Mixed Precision in Exponential Time Integration Methods
Authors: Cody J. Balos, Steven Roberts, David J. Gardner
Abstract
The machine learning explosion has created a prominent trend in modern computer hardware towards low precision floating-point operations. In response, there have been growing efforts to use low and mixed precision in general scientific computing. One important area that has received limited exploration is time-integration methods, which are used for solving differential equations that are ubiquitous in science and engineering applications. In this work, we develop two new approaches for leveraging mixed precision in exponential time integration methods. The first approach is based on a reformulation of the exponential Rosenbrock--Euler method allowing for low precision computations in matrix exponentials independent of the particular algorithm for matrix exponentiation. The second approach is based on an inexact and incomplete Arnoldi procedure in Krylov approximation methods for computing matrix exponentials and is agnostic to the chosen integration method. We show that both approaches improve accuracy compared to using purely low precision and offer better efficiency than using only double precision when solving an advection-diffusion-reaction partial differential equation.
With Flying Colors: Predicting Community Success in Large-scale Collaborative Campaigns
Authors: Abraham Israeli, Oren Tsur
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Abstract
Online communities develop unique characteristics, establish social norms, and exhibit distinct dynamics among their members. Activity in online communities often results in concrete `off-line'' actions with a broad societal impact (e.g., political street protests and norms related to sexual misconduct). While community dynamics, information diffusion, and online collaborations have been widely studied in the past two decades, quantitative studies that measure the effectiveness of online communities in promoting their agenda are scarce. In this work, we study the correspondence between the effectiveness of a community, measured by its success level in a competitive online campaign, and the underlying dynamics between its members. To this end, we define a novel task: predicting the success level of online communities in Reddit's r/place - a large-scale distributed experiment that required collaboration between community members. We consider an array of definitions for success level; each is geared toward different aspects of collaborative achievement. We experiment with several hybrid models, combining various types of features. Our models significantly outperform all baseline models over all definitions ofsuccess level'. Analysis of the results and the factors that contribute to the success of coordinated campaigns can provide a better understanding of the resilience or the vulnerability of communities to online social threats such as election interference or anti-science trends. We make all data used for this study publicly available for further research.
Unmaking AI Imagemaking: A Methodological Toolkit for Critical Investigation
Abstract
AI image models are rapidly evolving, disrupting aesthetic production in many industries. However, understanding of their underlying archives, their logic of image reproduction, and their persistent biases remains limited. What kind of methods and approaches could open up these black boxes? In this paper, we provide three methodological approaches for investigating AI image models and apply them to Stable Diffusion as a case study. Unmaking the ecosystem analyzes the values, structures, and incentives surrounding the model's production. Unmaking the data analyzes the images and text the model draws upon, with their attendant particularities and biases. Unmaking the output analyzes the model's generative results, revealing its logics through prompting, reflection, and iteration. Each mode of inquiry highlights particular ways in which the image model captures, "understands," and recreates the world. This accessible framework supports the work of critically investigating generative AI image models and paves the way for more socially and politically attuned analyses of their impacts in the world.
Text2Layer: Layered Image Generation using Latent Diffusion Model
Authors: Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien
Abstract
Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.
A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images
Authors: Lydia Abady, Jun Wang, Benedetta Tondi, Mauro Barni
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Despite the wide variety of methods developed for synthetic image attribution, most of them can only attribute images generated by models or architectures included in the training set and do not work with unknown architectures, hindering their applicability in real-world scenarios. In this paper, we propose a verification framework that relies on a Siamese Network to address the problem of open-set attribution of synthetic images to the architecture that generated them. We consider two different settings. In the first setting, the system determines whether two images have been produced by the same generative architecture or not. In the second setting, the system verifies a claim about the architecture used to generate a synthetic image, utilizing one or multiple reference images generated by the claimed architecture. The main strength of the proposed system is its ability to operate in both closed and open-set scenarios so that the input images, either the query and reference images, can belong to the architectures considered during training or not. Experimental evaluations encompassing various generative architectures such as GANs, diffusion models, and transformers, focusing on synthetic face image generation, confirm the excellent performance of our method in both closed and open-set settings, as well as its strong generalization capabilities.
BSDM: Background Suppression Diffusion Model for Hyperspectral Anomaly Detection
Abstract
Hyperspectral anomaly detection (HAD) is widely used in Earth observation and deep space exploration. A major challenge for HAD is the complex background of the input hyperspectral images (HSIs), resulting in anomalies confused in the background. On the other hand, the lack of labeled samples for HSIs leads to poor generalization of existing HAD methods. This paper starts the first attempt to study a new and generalizable background learning problem without labeled samples. We present a novel solution BSDM (background suppression diffusion model) for HAD, which can simultaneously learn latent background distributions and generalize to different datasets for suppressing complex background. It is featured in three aspects: (1) For the complex background of HSIs, we design pseudo background noise and learn the potential background distribution in it with a diffusion model (DM). (2) For the generalizability problem, we apply a statistical offset module so that the BSDM adapts to datasets of different domains without labeling samples. (3) For achieving background suppression, we innovatively improve the inference process of DM by feeding the original HSIs into the denoising network, which removes the background as noise. Our work paves a new background suppression way for HAD that can improve HAD performance without the prerequisite of manually labeled data. Assessments and generalization experiments of four HAD methods on several real HSI datasets demonstrate the above three unique properties of the proposed method. The code is available at https://github.com/majitao-xd/BSDM-HAD.
Abstract
Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior. However, directly extracting reusable robot manipulation skills from unstructured human videos is challenging due to the big embodiment difference and unobserved action parameters. To bridge this embodiment gap, this paper introduces XSkill, an imitation learning framework that 1) discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos, 2) transfers the skill representation to robot actions using conditional diffusion policy, and finally, 3) composes the learned skill to accomplish unseen tasks specified by a human prompt video. Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate both skill transfer and composition for unseen tasks, resulting in a more general and scalable imitation learning framework. The performance of XSkill is best understood from the anonymous website: https://xskillcorl.github.io.
FABRIC: Personalizing Diffusion Models with Iterative Feedback
Authors: Dimitri von Rütte, Elisabetta Fedele, Jonathan Thomm, Lukas Wolf
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In an era where visual content generation is increasingly driven by machine learning, the integration of human feedback into generative models presents significant opportunities for enhancing user experience and output quality. This study explores strategies for incorporating iterative human feedback into the generative process of diffusion-based text-to-image models. We propose FABRIC, a training-free approach applicable to a wide range of popular diffusion models, which exploits the self-attention layer present in the most widely used architectures to condition the diffusion process on a set of feedback images. To ensure a rigorous assessment of our approach, we introduce a comprehensive evaluation methodology, offering a robust mechanism to quantify the performance of generative visual models that integrate human feedback. We show that generation results improve over multiple rounds of iterative feedback through exhaustive analysis, implicitly optimizing arbitrary user preferences. The potential applications of these findings extend to fields such as personalized content creation and customization.
Keyword: adaptive
Rethinking Intersection Over Union for Small Object Detection in Few-Shot Regime
Authors: Pierre Le Jeune, Anissa Mokraoui
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In Few-Shot Object Detection (FSOD), detecting small objects is extremely difficult. The limited supervision cripples the localization capabilities of the models and a few pixels shift can dramatically reduce the Intersection over Union (IoU) between the ground truth and predicted boxes for small objects. To this end, we propose Scale-adaptive Intersection over Union (SIoU), a novel box similarity measure. SIoU changes with the objects' size, it is more lenient with small object shifts. We conducted a user study and SIoU better aligns than IoU with human judgment. Employing SIoU as an evaluation criterion helps to build more user-oriented models. SIoU can also be used as a loss function to prioritize small objects during training, outperforming existing loss functions. SIoU improves small object detection in the non-few-shot regime, but this setting is unrealistic in the industry as annotated detection datasets are often too expensive to acquire. Hence, our experiments mainly focus on the few-shot regime to demonstrate the superiority and versatility of SIoU loss. SIoU improves significantly FSOD performance on small objects in both natural (Pascal VOC and COCO datasets) and aerial images (DOTA and DIOR). In aerial imagery, small objects are critical and SIoU loss achieves new state-of-the-art FSOD on DOTA and DIOR.
Group Testing in Arbitrary Hypergraphs and Related Combinatorial Structures
Authors: Annalisa De Bonis
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
Abstract
We consider a generalization of group testing where the potentially contaminated sets are the members of a given hypergraph ${\cal F}=(V,E)$. This generalization finds application in contexts where contaminations can be conditioned by some kinds of social and geographical clusterings. We study non-adaptive algorithms, two-stage algorithms, and three-stage algorithms. Non-adaptive group testing algorithms are algorithms in which all tests are decided beforehand and therefore can be performed in parallel, whereas two-stage group testing algorithms and three-stage group testing algorithms are algorithms that consist of two stages and three stages, respectively, with each stage being a completely non-adaptive algorithm. In classical group testing, the potentially infected sets are all subsets of up to a certain number of elements of the given input set. For classical group testing, it is known that there exists a correspondence between classical superimposed codes and non-adaptive algorithms, and between two stage algorithms and selectors. Bounds on the number of tests for those algorithms are derived from the bounds on the dimensions of the corresponding combinatorial structures. Obviously, the upper bounds for the classical case apply also to our group testing model. In the present paper, we aim at improving on those upper bounds by leveraging on the characteristics of the particular hypergraph at hand. In order to cope with our version of the problem, we introduce new combinatorial structures that generalize the notions of classical selectors and superimposed codes.
AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications
Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun Tang, Jean Sexton, Sheng Di, Zarija Lukić, Kai Zhao, Bo Fang, Franck Cappello, James Ahrens, Dingwen Tao
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to tackle the latter issue. Despite their respective advantages, few attempts have been made to investigate how AMR and error-bounded lossy compression can function together. To this end, this study presents a novel in-situ lossy compression framework that employs the HDF5 filter to improve both I/O costs and boost compression quality for AMR applications. We implement our solution into the AMReX framework and evaluate on two real-world AMR applications, Nyx and WarpX, on the Summit supercomputer. Experiments with 4096 CPU cores demonstrate that AMRIC improves the compression ratio by up to 81X and the I/O performance by up to 39X over AMReX's original compression solution.
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Authors: Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar
Abstract
Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of such optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.
Physics-based Reduced Order Modeling for Uncertainty Quantification of Guided Wave Propagation using Bayesian Optimization
Authors: G. I. Drakoulas, T. V. Gortsas, D. Polyzos
Abstract
In the context of digital twins, structural health monitoring (SHM) constitutes the backbone of condition-based maintenance, facilitating the interconnection between virtual and physical assets. Guided wave propagation (GWP) is commonly employed for the inspection of structures in SHM. However, GWP is sensitive to variations in the material properties of the structure, leading to false alarms. In this direction, uncertainty quantification (UQ) is regularly applied to improve the reliability of predictions. Computational mechanics is a useful tool for the simulation of GWP, and is often applied for UQ. Even so, the application of UQ methods requires numerous simulations, while large-scale, transient numerical GWP solutions increase the computational cost. Reduced order models (ROMs) are commonly employed to provide numerical results in a limited amount of time. In this paper, we propose a machine learning (ML)-based ROM, mentioned as BO-ML-ROM, to decrease the computational time related to the simulation of the GWP. The ROM is integrated with a Bayesian optimization (BO) framework, to adaptively sample the parameters for the ROM training. The finite element method is used for the simulation of the high-fidelity models. The formulated ROM is used for forward UQ of the GWP in an aluminum plate with varying material properties. To determine the influence of each parameter perturbation, a global, variance-based sensitivity analysis is implemented based on Sobol' indices. It is shown that Bayesian optimization outperforms one-shot sampling methods, both in terms of accuracy and speed-up. The predicted results reveal the efficiency of BO-ML-ROM for GWP and demonstrate its value for UQ.
Domain Adaptation for Enhanced Object Detection in Foggy and Rainy Weather for Autonomous Driving
Abstract
Most object detection models for autonomous driving may experience a significant drop in performance when deployed in real-world applications, due to the well-known domain shift issue. Supervised object detection methods for autonomous driving usually assume a consistent feature distribution between training and testing data, however, such assumptions may not always be the case when weather conditions differ significantly. For example, an object detection model trained under clear weather may not perform well in foggy or rainy weather, due to the domain gap. Overcoming detection bottlenecks in foggy or rainy weather scenarios is a significant challenge for autonomous vehicles deployed in the wild. To address the domain gap in different weather conditions, This paper proposes a novel domain adaptive object detection framework for autonomous driving in foggy and rainy weather. Our method leverages both image-level and object-level adaptation to reduce the domain discrepancy in image style and object appearance. Additionally, to enhance the model's performance under challenging samples, we introduce a new adversarial gradient reversal layer that performs adversarial mining on hard examples alongside domain adaptation. Moreover, we propose to generate an auxiliary domain by data augmentation to enforce a new domain-level metric regularization. Experimental results on public benchmarks demonstrate that object detection performance is significantly improved when using our proposed method in domain shift scenarios for autonomous driving applications.
Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
Authors: Hang Guo, Tao Dai, Guanghao Meng, Shu-Tao Xia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Scene text image super-resolution (STISR), aiming to improve image quality while boosting downstream scene text recognition accuracy, has recently achieved great success. However, most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process, and neglect the disturbance from the complex background, thus limiting the performance. To address these issues, in this paper, we propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution. To model the location of characters effectively, we propose the location enhancement module to extract character region features based on the attention map sequence. Besides, we propose the multi-modal alignment module to perform bidirectional visual-semantic alignment to generate high-quality prior guidance, which is then incorporated into the super-resolution branch in an adaptive manner using the proposed adaptive fusion module. Experiments on TextZoom and four scene text recognition benchmarks demonstrate the superiority of our method over other state-of-the-art methods. Code is available at https://github.com/csguoh/LEMMA.
Towards Building More Robust Models with Frequency Bias
Authors: Qingwen Bu, Dong Huang, Heming Cui
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
The vulnerability of deep neural networks to adversarial samples has been a major impediment to their broad applications, despite their success in various fields. Recently, some works suggested that adversarially-trained models emphasize the importance of low-frequency information to achieve higher robustness. While several attempts have been made to leverage this frequency characteristic, they have all faced the issue that applying low-pass filters directly to input images leads to irreversible loss of discriminative information and poor generalizability to datasets with distinct frequency features. This paper presents a plug-and-play module called the Frequency Preference Control Module that adaptively reconfigures the low- and high-frequency components of intermediate feature representations, providing better utilization of frequency in robust learning. Empirical studies show that our proposed module can be easily incorporated into any adversarial training framework, further improving model robustness across different architectures and datasets. Additionally, experiments were conducted to examine how the frequency bias of robust models impacts the adversarial training process and its final robustness, revealing interesting insights.
On the optimality of target-data-dependent kernel greedy interpolation in Sobolev Reproducing Kernel Hilbert Spaces
Authors: Gabriele Santin, Tizian Wenzel, Bernard Haasdonk
Abstract
Kernel interpolation is a versatile tool for the approximation of functions from data, and it can be proven to have some optimality properties when used with kernels related to certain Sobolev spaces. In the context of interpolation, the selection of optimal function sampling locations is a central problem, both from a practical perspective, and as an interesting theoretical question. Greedy interpolation algorithms provide a viable solution for this task, being efficient to run and provably accurate in their approximation. In this paper we close a gap that is present in the convergence theory for these algorithms by employing a recent result on general greedy algorithms. This modification leads to new convergence rates which match the optimal ones when restricted to the $P$-greedy target-data-independent selection rule, and can additionally be proven to be optimal when they fully exploit adaptivity ($f$-greedy). Other than closing this gap, the new results have some significance in the broader setting of the optimality of general approximation algorithms in Reproducing Kernel Hilbert Spaces, as they allow us to compare adaptive interpolation with non-adaptive best nonlinear approximation.
Hierarchical Spatio-Temporal Representation Learning for Gait Recognition
Authors: Lei Wang, Bo Liu, Fangfang Liang, Bincheng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Gait recognition is a biometric technique that identifies individuals by their unique walking styles, which is suitable for unconstrained environments and has a wide range of applications. While current methods focus on exploiting body part-based representations, they often neglect the hierarchical dependencies between local motion patterns. In this paper, we propose a hierarchical spatio-temporal representation learning (HSTL) framework for extracting gait features from coarse to fine. Our framework starts with a hierarchical clustering analysis to recover multi-level body structures from the whole body to local details. Next, an adaptive region-based motion extractor (ARME) is designed to learn region-independent motion features. The proposed HSTL then stacks multiple ARMEs in a top-down manner, with each ARME corresponding to a specific partition level of the hierarchy. An adaptive spatio-temporal pooling (ASTP) module is used to capture gait features at different levels of detail to perform hierarchical feature mapping. Finally, a frame-level temporal aggregation (FTA) module is employed to reduce redundant information in gait sequences through multi-scale temporal downsampling. Extensive experiments on CASIA-B, OUMVLP, GREW, and Gait3D datasets demonstrate that our method outperforms the state-of-the-art while maintaining a reasonable balance between model accuracy and complexity.
AutoAMG($θ$): An Auto-tuned AMG Method Based on Deep Learning for Strong Threshold
Authors: Haifeng Zou, Xiaowen Xu, Chen-Song Zhang, Zeyao Mo
Abstract
Algebraic Multigrid (AMG) is one of the most widely used iterative algorithms for solving large sparse linear equations $Ax=b$. In AMG, the coarse grid is a key component that affects the efficiency of the algorithm, the construction of which relies on the strong threshold parameter $\theta$. This parameter is generally chosen empirically, with a default value in many current AMG solvers of 0.25 for 2D problems and 0.5 for 3D problems. However, for many practical problems, the quality of the coarse grid and the efficiency of the AMG algorithm are sensitive to $\theta$; the default value is rarely optimal, and sometimes is far from it. Therefore, how to choose a better $\theta$ is an important question. In this paper, we propose a deep learning based auto-tuning method, AutoAMG($\theta$) for multiscale sparse linear equations, which are widely used in practical problems. The method uses Graph Neural Networks (GNNs) to extract matrix features, and a Multilayer Perceptron (MLP) to build the mapping between matrix features and the optimal $\theta$, which can adaptively output $\theta$ values for different matrices. Numerical experiments show that AutoAMG($\theta$) can achieve significant speedup compared to the default $\theta$ value.
A3D: Adaptive, Accurate, and Autonomous Navigation for Edge-Assisted Drones
Subjects: Networking and Internet Architecture (cs.NI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)
Abstract
Accurate navigation is of paramount importance to ensure flight safety and efficiency for autonomous drones. Recent research starts to use Deep Neural Networks to enhance drone navigation given their remarkable predictive capability for visual perception. However, existing solutions either run DNN inference tasks on drones in situ, impeded by the limited onboard resource, or offload the computation to external servers which may incur large network latency. Few works consider jointly optimizing the offloading decisions along with image transmission configurations and adapting them on the fly. In this paper, we propose A3D, an edge server assisted drone navigation framework that can dynamically adjust task execution location, input resolution, and image compression ratio in order to achieve low inference latency, high prediction accuracy, and long flight distances. Specifically, we first augment state-of-the-art convolutional neural networks for drone navigation and define a novel metric called Quality of Navigation as our optimization objective which can effectively capture the above goals. We then design a deep reinforcement learning based neural scheduler at the drone side for which an information encoder is devised to reshape the state features and thus improve its learning ability. To further support simultaneous multi-drone serving, we extend the edge server design by developing a network-aware resource allocation algorithm, which allows provisioning containerized resources aligned with drones' demand. We finally implement a proof-of-concept prototype with realistic devices and validate its performance in a real-world campus scene, as well as a simulation environment for thorough evaluation upon AirSim. Extensive experimental results show that A3D can reduce end-to-end latency by 28.06% and extend the flight distance by up to 27.28% compared with non-adaptive solutions.
A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading
Authors: Tatiana Fountoukidou, Raphael Sznitman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Recent advances in machine learning models have greatly increased the performance of automated methods in medical image analysis. However, the internal functioning of such models is largely hidden, which hinders their integration in clinical practice. Explainability and trust are viewed as important aspects of modern methods, for the latter's widespread use in clinical communities. As such, validation of machine learning models represents an important aspect and yet, most methods are only validated in a limited way. In this work, we focus on providing a richer and more appropriate validation approach for highly powerful Visual Question Answering (VQA) algorithms. To better understand the performance of these methods, which answer arbitrary questions related to images, this work focuses on an automatic visual Turing test (VTT). That is, we propose an automatic adaptive questioning method, that aims to expose the reasoning behavior of a VQA algorithm. Specifically, we introduce a reinforcement learning (RL) agent that observes the history of previously asked questions, and uses it to select the next question to pose. We demonstrate our approach in the context of evaluating algorithms that automatically answer questions related to diabetic macular edema (DME) grading. The experiments show that such an agent has similar behavior to a clinician, whereby asking questions that are relevant to key clinical concepts.
Source-Free Domain Adaptive Fundus Image Segmentation with Class-Balanced Mean Teacher
Authors: Longxiang Tang, Kai Li, Chunming He, Yulun Zhang, Xiu Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper studies source-free domain adaptive fundus image segmentation which aims to adapt a pretrained fundus segmentation model to a target domain using unlabeled images. This is a challenging task because it is highly risky to adapt a model only using unlabeled data. Most existing methods tackle this task mainly by designing techniques to carefully generate pseudo labels from the model's predictions and use the pseudo labels to train the model. While often obtaining positive adaption effects, these methods suffer from two major issues. First, they tend to be fairly unstable - incorrect pseudo labels abruptly emerged may cause a catastrophic impact on the model. Second, they fail to consider the severe class imbalance of fundus images where the foreground (e.g., cup) region is usually very small. This paper aims to address these two issues by proposing the Class-Balanced Mean Teacher (CBMT) model. CBMT addresses the unstable issue by proposing a weak-strong augmented mean teacher learning scheme where only the teacher model generates pseudo labels from weakly augmented images to train a student model that takes strongly augmented images as input. The teacher is updated as the moving average of the instantly trained student, which could be noisy. This prevents the teacher model from being abruptly impacted by incorrect pseudo-labels. For the class imbalance issue, CBMT proposes a novel loss calibration approach to highlight foreground classes according to global statistics. Experiments show that CBMT well addresses these two issues and outperforms existing methods on multiple benchmarks.
Learner Referral for Cost-Effective Federated Learning Over Hierarchical IoT Networks
Authors: Yulan Gao, Ziqiang Ye, Yue Xiao, Wei Xiang
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract
The paradigm of federated learning (FL) to address data privacy concerns by locally training parameters on resource-constrained clients in a distributed manner has garnered significant attention. Nonetheless, FL is not applicable when not all clients within the coverage of the FL server are registered with the FL network. To bridge this gap, this paper proposes joint learner referral aided federated client selection (LRef-FedCS), along with communications and computing resource scheduling, and local model accuracy optimization (LMAO) methods. These methods are designed to minimize the cost incurred by the worst-case participant and ensure the long-term fairness of FL in hierarchical Internet of Things (HieIoT) networks. Utilizing the Lyapunov optimization technique, we reformulate the original problem into a stepwise joint optimization problem (JOP). Subsequently, to tackle the mixed-integer non-convex JOP, we separatively and iteratively address LRef-FedCS and LMAO through the centralized method and self-adaptive global best harmony search (SGHS) algorithm, respectively. To enhance scalability, we further propose a distributed LRef-FedCS approach based on a matching game to replace the centralized method described above. Numerical simulations and experimental results on the MNIST/CIFAR-10 datasets demonstrate that our proposed LRef-FedCS approach could achieve a good balance between pursuing high global accuracy and reducing cost.
TinyTrain: Deep Neural Network Training at the Extreme Edge
Authors: Young D. Kwon, Rui Li, Stylianos I. Venieris, Jagmohan Chauhan, Nicholas D. Lane, Cecilia Mascolo
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCU), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), or induce substantial accuracy loss ($\geq$10\%). We propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0\% in accuracy, while reducing the backward-pass memory and computation cost by up to 2,286$\times$ and 7.68$\times$, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5$\times$ faster and 3.5$\times$ more energy-efficient training over status-quo approaches, and 2.8$\times$ smaller memory footprint than SOTA approaches, while remaining within the 1 MB memory envelope of MCU-grade platforms.
Non-Parametric Self-Identification and Model Predictive Control of Dexterous In-Hand Manipulation
Authors: Podshara Chanrungmaneekul, Kejia Ren, Joshua T. Grace, Aaron M. Dollar, Kaiyu Hang
Abstract
Building hand-object models for dexterous in-hand manipulation remains a crucial and open problem. Major challenges include the difficulty of obtaining the geometric and dynamical models of the hand, object, and time-varying contacts, as well as the inevitable physical and perception uncertainties. Instead of building accurate models to map between the actuation inputs and the object motions, this work proposes to enable the hand-object systems to continuously approximate their local models via a self-identification process where an underlying manipulation model is estimated through a small number of exploratory actions and non-parametric learning. With a very small number of data points, as opposed to most data-driven methods, our system self-identifies the underlying manipulation models online through exploratory actions and non-parametric learning. By integrating the self-identified hand-object model into a model predictive control framework, the proposed system closes the control loop to provide high accuracy in-hand manipulation. Furthermore, the proposed self-identification is able to adaptively trigger online updates through additional exploratory actions, as soon as the self-identified local models render large discrepancies against the observed manipulation outcomes. We implemented the proposed approach on a sensorless underactuated Yale Model O hand with a single external camera to observe the object's motion. With extensive experiments, we show that the proposed self-identification approach can enable accurate and robust dexterous manipulation without requiring an accurate system model nor a large amount of data for offline training.
Divert More Attention to Vision-Language Object Tracking
Abstract
Multimodal vision-language (VL) learning has noticeably pushed the tendency toward generic intelligence owing to emerging large foundation models. However, tracking, as a fundamental vision problem, surprisingly enjoys less bonus from recent flourishing VL learning. We argue that the reasons are two-fold: the lack of large-scale vision-language annotated videos and ineffective vision-language interaction learning of current works. These nuisances motivate us to design more effective vision-language representation for tracking, meanwhile constructing a large database with language annotation for model learning. Particularly, in this paper, we first propose a general attribute annotation strategy to decorate videos in six popular tracking benchmarks, which contributes a large-scale vision-language tracking database with more than 23,000 videos. We then introduce a novel framework to improve tracking by learning a unified-adaptive VL representation, where the cores are the proposed asymmetric architecture search and modality mixer (ModaMixer). To further improve VL representation, we introduce a contrastive loss to align different modalities. To thoroughly evidence the effectiveness of our method, we integrate the proposed framework on three tracking methods with different designs, i.e., the CNN-based SiamCAR, the Transformer-based OSTrack, and the hybrid structure TransT. The experiments demonstrate that our framework can significantly improve all baselines on six benchmarks. Besides empirical results, we theoretically analyze our approach to show its rationality. By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking with diversified multimodal messages.
Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples
Authors: JoonHo Lee, Jae Oh Woo, Hankyu Moon, Kwonho Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Deploying deep visual models can lead to performance drops due to the discrepancies between source and target distributions. Several approaches leverage labeled source data to estimate target domain accuracy, but accessing labeled source data is often prohibitively difficult due to data confidentiality or resource limitations on serving devices. Our work proposes a new framework to estimate model accuracy on unlabeled target data without access to source data. We investigate the feasibility of using pseudo-labels for accuracy estimation and evolve this idea into adopting recent advances in source-free domain adaptation algorithms. Our approach measures the disagreement rate between the source hypothesis and the target pseudo-labeling function, adapted from the source hypothesis. We mitigate the impact of erroneous pseudo-labels that may arise due to a high ideal joint hypothesis risk by employing adaptive adversarial perturbation on the input of the target model. Our proposed source-free framework effectively addresses the challenging distribution shift scenarios and outperforms existing methods requiring source data and labels for training.
Boundary-Refined Prototype Generation: A General End-to-End Paradigm for Semi-Supervised Semantic Segmentation
Abstract
Prototype-based classification is a classical method in machine learning, and recently it has achieved remarkable success in semi-supervised semantic segmentation. However, the current approach isolates the prototype initialization process from the main training framework, which appears to be unnecessary. Furthermore, while the direct use of K-Means algorithm for prototype generation has considered rich intra-class variance, it may not be the optimal solution for the classification task. To tackle these problems, we propose a novel boundary-refined prototype generation (BRPG) method, which is incorporated into the whole training framework. Specifically, our approach samples and clusters high- and low-confidence features separately based on a confidence threshold, aiming to generate prototypes closer to the class boundaries. Moreover, an adaptive prototype optimization strategy is introduced to make prototype augmentation for categories with scattered feature distributions. Extensive experiments on the PASCAL VOC 2012 and Cityscapes datasets demonstrate the superiority and scalability of the proposed method, outperforming the current state-of-the-art approaches. The code is available at xxxxxxxxxxxxxx.
An Improved NeuMIP with Better Accuracy
Authors: Bowen Xue, Shuang Zhao, Henrik Wann Jensen, Zahra Montazeri
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Neural reflectance models are capable of accurately reproducing the spatially-varying appearance of many real-world materials at different scales. However, existing methods have difficulties handling highly glossy materials. To address this problem, we introduce a new neural reflectance model which, compared with existing methods, better preserves not only specular highlights but also fine-grained details. To this end, we enhance the neural network performance by encoding input data to frequency space, inspired by NeRF, to better preserve the details. Furthermore, we introduce a gradient-based loss and employ it in multiple stages, adaptive to the progress of the learning phase. Lastly, we utilize an optional extension to the decoder network using the Inception module for more accurate yet costly performance. We demonstrate the effectiveness of our method using a variety of synthetic and real examples.
Keyword: quantization
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Authors: Xiaoxia Wu, Zhewei Yao, Yuxiong He
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
In the complex domain of large language models (LLMs), striking a balance between computational efficiency and maintaining model quality is a formidable challenge. Navigating the inherent limitations of uniform quantization, particularly when dealing with outliers, and motivated by the launch of NVIDIA's H100 hardware, this study delves into the viability of floating-point (FP) quantization, particularly focusing on FP8 and FP4, as a potential solution. Our comprehensive investigation reveals that for LLMs, FP8 activation consistently outshines its integer (INT8) equivalent, with the performance edge becoming more noticeable in models possessing parameters beyond one billion. For weight quantization, our findings indicate that FP4 exhibits comparable, if not superior, performance to INT4, simplifying deployment on FP-supported hardware like H100. To mitigate the overhead from precision alignment caused by the disparity between weights and activations, we propose two scaling constraints for weight quantization that negligibly impact the performance compared to the standard W4A8 model. We additionally enhance our quantization methods by integrating the Low Rank Compensation (LoRC) strategy, yielding improvements especially in smaller models. The results of our investigation emphasize the immense potential of FP quantization for LLMs, paving the way for high-efficiency deployment in resource-limited settings.
Keyword: efficient
Enhancing Evacuation Planning through Multi-Agent Simulation and Artificial Intelligence: Understanding Human Behavior in Hazardous Environments
PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization
Variable Independence in Linear Real Arithmetic
Efficient Inverse-designed Structural Infill for Complex Engineering Structures
LOG-LIO: A LiDAR-Inertial Odometry with Efficient Local Geometric Information Estimation
Approximately counting independent sets in dense bipartite graphs via subspace enumeration
Guided Linear Upsampling
Gradient strikes back: How filtering out high frequencies improves explanations
AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications
VISER: A Tractable Solution Concept for Games with Information Asymmetry
Towards A Unified Agent with Foundation Models
PubMed and Beyond: Recent Advances and Best Practices in Biomedical Literature Search
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Efficient Guided Generation for LLMs
Two Tales of Platoon Intelligence for Autonomous Mobility Control: Enabling Deep Learning Recipes
A simple and efficient convex optimization based bound-preserving high order accurate limiter for Cahn-Hilliard-Navier-Stokes system
Uncertainty-Driven Multi-Scale Feature Fusion Network for Real-time Image Deraining
Improved Distribution Matching for Dataset Condensation
Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation
DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis
A Low-Complexity Beamforming Design for Beyond-Diagonal RIS aided Multi-User Networks
On the optimality of target-data-dependent kernel greedy interpolation in Sobolev Reproducing Kernel Hilbert Spaces
Online Continual Learning for Robust Indoor Object Recognition
A Fast and Map-Free Model for Trajectory Prediction in Traffics
Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application to Sparse Autoencoders
Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment
Transmitter Side Beyond-Diagonal Reconfigurable Intelligent Surface for Massive MIMO Networks
Cross-thread critical sections and efficient dynamic race prediction methods
Towards a population-informed approach to the definition of data-driven models for structural dynamics
Agricultural Robotic System: The Automation of Detection and Speech Control
Amortised Experimental Design and Parameter Estimation for User Models of Pointing
Amortised Design Optimization for Item Response Theory
Chit-Chat or Deep Talk: Prompt Engineering for Process Mining
TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network
ProtoCaps: A Fast and Non-Iterative Capsule Network Routing Method
Fine-grained Text-Video Retrieval with Frozen Image Encoders
TinyTrain: Deep Neural Network Training at the Extreme Edge
Impact of Disentanglement on Pruning Neural Networks
TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition
As large as it gets: Learning infinitely large Filters via Neural Implicit Functions in the Fourier Domain
6G Network Business Support System
Optimizing the extended Fourier Mellin Transformation Algorithm
Validation of Modern JSON Schema: Formalization and Complexity
Deteksi Sampah di Permukaan dan Dalam Perairan pada Objek Video dengan Metode Robust and Efficient Post-Processing dan Tubelet-Level Bounding Box Linking
Scientific Exploration of Challenging Planetary Analog Environments with a Team of Legged Robots
An efficient displacement-based isogeometric formulation for geometrically exact viscoelastic beams
Curvature-based Clustering on Graphs
Robust Driving Policy Learning with Guided Meta Reinforcement Learning
Adversarial Latent Autoencoder with Self-Attention for Structural Image Synthesis
LightPath: Lightweight and Scalable Path Representation Learning
Keyword: faster
Variable Independence in Linear Real Arithmetic
Efficient Inverse-designed Structural Infill for Complex Engineering Structures
Generating Redstone Style Cities in Minecraft
Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling
A Fast and Map-Free Model for Trajectory Prediction in Traffics
Adversarial Likelihood Estimation with One-way Flows
DISA: DIfferentiable Similarity Approximation for Universal Multimodal Registration
EPUF: A Novel Scheme Based on Entropy Features of Latency-based DRAM PUFs Providing Lightweight Authentication in IoT Networks
TinyTrain: Deep Neural Network Training at the Extreme Edge
Fast Algorithms for a New Relaxation of Optimal Transport
Benchmarking Potential Based Rewards for Learning Humanoid Locomotion
LightPath: Lightweight and Scalable Path Representation Learning
Keyword: mobile
6G Network Business Support System
Keyword: pruning
Impact of Disentanglement on Pruning Neural Networks
Keyword: diffusion
Leveraging Mixed Precision in Exponential Time Integration Methods
With Flying Colors: Predicting Community Success in Large-scale Collaborative Campaigns
`off-line'' actions with a broad societal impact (e.g., political street protests and norms related to sexual misconduct). While community dynamics, information diffusion, and online collaborations have been widely studied in the past two decades, quantitative studies that measure the effectiveness of online communities in promoting their agenda are scarce. In this work, we study the correspondence between the effectiveness of a community, measured by its success level in a competitive online campaign, and the underlying dynamics between its members. To this end, we define a novel task: predicting the success level of online communities in Reddit's r/place - a large-scale distributed experiment that required collaboration between community members. We consider an array of definitions for success level; each is geared toward different aspects of collaborative achievement. We experiment with several hybrid models, combining various types of features. Our models significantly outperform all baseline models over all definitions of
success level'. Analysis of the results and the factors that contribute to the success of coordinated campaigns can provide a better understanding of the resilience or the vulnerability of communities to online social threats such as election interference or anti-science trends. We make all data used for this study publicly available for further research.Unmaking AI Imagemaking: A Methodological Toolkit for Critical Investigation
Text2Layer: Layered Image Generation using Latent Diffusion Model
A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images
BSDM: Background Suppression Diffusion Model for Hyperspectral Anomaly Detection
XSkill: Cross Embodiment Skill Discovery
FABRIC: Personalizing Diffusion Models with Iterative Feedback
Keyword: adaptive
Rethinking Intersection Over Union for Small Object Detection in Few-Shot Regime
Group Testing in Arbitrary Hypergraphs and Related Combinatorial Structures
AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Physics-based Reduced Order Modeling for Uncertainty Quantification of Guided Wave Propagation using Bayesian Optimization
Domain Adaptation for Enhanced Object Detection in Foggy and Rainy Weather for Autonomous Driving
Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
Towards Building More Robust Models with Frequency Bias
On the optimality of target-data-dependent kernel greedy interpolation in Sobolev Reproducing Kernel Hilbert Spaces
Hierarchical Spatio-Temporal Representation Learning for Gait Recognition
AutoAMG($θ$): An Auto-tuned AMG Method Based on Deep Learning for Strong Threshold
A3D: Adaptive, Accurate, and Autonomous Navigation for Edge-Assisted Drones
A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading
Source-Free Domain Adaptive Fundus Image Segmentation with Class-Balanced Mean Teacher
Learner Referral for Cost-Effective Federated Learning Over Hierarchical IoT Networks
TinyTrain: Deep Neural Network Training at the Extreme Edge
Non-Parametric Self-Identification and Model Predictive Control of Dexterous In-Hand Manipulation
Divert More Attention to Vision-Language Object Tracking
Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples
Boundary-Refined Prototype Generation: A General End-to-End Paradigm for Semi-Supervised Semantic Segmentation
An Improved NeuMIP with Better Accuracy
Keyword: quantization
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats