Abstract
We study a class of nonlinear nonlocal conservation laws with discontinuous flux, modeling crowd dynamics and traffic flow, without any additional conditions on finiteness/discreteness of the set of discontinuities or on the monotonicity of the kernel/the discontinuous coefficient. Strong compactness of the Godunov and Lax-Friedrichs type approximations is proved, providing the existence of entropy solutions. A proof of the uniqueness of the adapted entropy solutions is provided, establishing the convergence of the entire sequence of finite volume approximations to the adapted entropy solution. As per the current literature, this is the first well-posedness result for the aforesaid class and connects the theory of nonlocal conservation laws (with discontinuous flux), with its local counterpart in a generic setup. Some numerical examples are presented to display the performance of the schemes and explore the limiting behavior of these nonlocal conservation laws to their local counterparts.
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Abstract
Audio-visual learning seeks to enhance the computer's multi-modal perception leveraging the correlation between the auditory and visual modalities. Despite their many useful downstream tasks, such as video retrieval, AR/VR, and accessibility, the performance and adoption of existing audio-visual models have been impeded by the availability of high-quality datasets. Annotating audio-visual datasets is laborious, expensive, and time-consuming. To address this challenge, we designed and developed an efficient audio-visual annotation tool called Peanut. Peanut's human-AI collaborative pipeline separates the multi-modal task into two single-modal tasks, and utilizes state-of-the-art object detection and sound-tagging models to reduce the annotators' effort to process each frame and the number of manually-annotated frames needed. A within-subject user study with 20 participants found that Peanut can significantly accelerate the audio-visual data annotation process while maintaining high annotation accuracy.
Small, but important: Traffic light proposals for detecting small traffic lights and beyond
Authors: Tom Sanitz, Christian Wilms, Simone Frintrop
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Traffic light detection is a challenging problem in the context of self-driving cars and driver assistance systems. While most existing systems produce good results on large traffic lights, detecting small and tiny ones is often overlooked. A key problem here is the inherent downsampling in CNNs, leading to low-resolution features for detection. To mitigate this problem, we propose a new traffic light detection system, comprising a novel traffic light proposal generator that utilizes findings from general object proposal generation, fine-grained multi-scale features, and attention for efficient processing. Moreover, we design a new detection head for classifying and refining our proposals. We evaluate our system on three challenging, publicly available datasets and compare it against six methods. The results show substantial improvements of at least $12.6\%$ on small and tiny traffic lights, as well as strong results across all sizes of traffic lights.
BOURNE: Bootstrapped Self-supervised Learning Framework for Unified Graph Anomaly Detection
Authors: Jie Liu, Mengting He, Xuequn Shang, Jieming Shi, Bin Cui, Hongzhi Yin
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Abstract
Graph anomaly detection (GAD) has gained increasing attention in recent years due to its critical application in a wide range of domains, such as social networks, financial risk management, and traffic analysis. Existing GAD methods can be categorized into node and edge anomaly detection models based on the type of graph objects being detected. However, these methods typically treat node and edge anomalies as separate tasks, overlooking their associations and frequent co-occurrences in real-world graphs. As a result, they fail to leverage the complementary information provided by node and edge anomalies for mutual detection. Additionally, state-of-the-art GAD methods, such as CoLA and SL-GAD, heavily rely on negative pair sampling in contrastive learning, which incurs high computational costs, hindering their scalability to large graphs. To address these limitations, we propose a novel unified graph anomaly detection framework based on bootstrapped self-supervised learning (named BOURNE). We extract a subgraph (graph view) centered on each target node as node context and transform it into a dual hypergraph (hypergraph view) as edge context. These views are encoded using graph and hypergraph neural networks to capture the representations of nodes, edges, and their associated contexts. By swapping the context embeddings between nodes and edges and measuring the agreement in the embedding space, we enable the mutual detection of node and edge anomalies. Furthermore, we adopt a bootstrapped training strategy that eliminates the need for negative sampling, enabling BOURNE to handle large graphs efficiently. Extensive experiments conducted on six benchmark datasets demonstrate the superior effectiveness and efficiency of BOURNE in detecting both node and edge anomalies.
A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition
Abstract
Recent studies on pedestrian attribute recognition progress with either explicit or implicit modeling of the co-occurrence among attributes. Considering that this known a prior is highly variable and unforeseeable regarding the specific scenarios, we show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution, resulting in the underlined bias of attributes co-occurrence. To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others, and which is sequentially formulated as a problem of mutual information minimization. Rooting from it, practical strategies are devised to efficiently decouple attributes, which substantially improve the baseline and establish state-of-the-art performance on realistic datasets like PETAzs and RAPzs. Code is released on https://github.com/SDret/A-Solution-to-Co-occurence-Bias-in-Pedestrian-Attribute-Recognition.
Higher-order multi-scale deep Ritz method for multi-scale problems of authentic composite materials
Authors: Jiale Linghu, Hao Dong, Junzhi Cui, Yufeng Nie
Abstract
The direct deep learning simulation for multi-scale problems remains a challenging issue. In this work, a novel higher-order multi-scale deep Ritz method (HOMS-DRM) is developed for thermal transfer equation of authentic composite materials with highly oscillatory and discontinuous coefficients. In this novel HOMS-DRM, higher-order multi-scale analysis and modeling are first employed to overcome limitations of prohibitive computation and Frequency Principle when direct deep learning simulation. Then, improved deep Ritz method are designed to high-accuracy and mesh-free simulation for macroscopic homogenized equation without multi-scale property and microscopic lower-order and higher-order cell problems with highly discontinuous coefficients. Moreover, the theoretical convergence of the proposed HOMS-DRM is rigorously demonstrated under appropriate assumptions. Finally, extensive numerical experiments are presented to show the computational accuracy of the proposed HOMS-DRM. This study offers a robust and high-accuracy multi-scale deep learning framework that enables the effective simulation and analysis of multi-scale problems of authentic composite materials.
Learning with Constraint Learning: New Perspective, Solution Strategy and Various Applications
Authors: Risheng Liu, Jiaxin Gao, Xuan Liu, Xin Fan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The complexity of learning problems, such as Generative Adversarial Network (GAN) and its variants, multi-task and meta-learning, hyper-parameter learning, and a variety of real-world vision applications, demands a deeper understanding of their underlying coupling mechanisms. Existing approaches often address these problems in isolation, lacking a unified perspective that can reveal commonalities and enable effective solutions. Therefore, in this work, we proposed a new framework, named Learning with Constraint Learning (LwCL), that can holistically examine challenges and provide a unified methodology to tackle all the above-mentioned complex learning and vision problems. Specifically, LwCL is designed as a general hierarchical optimization model that captures the essence of these diverse learning and vision problems. Furthermore, we develop a gradient-response based fast solution strategy to overcome optimization challenges of the LwCL framework. Our proposed framework efficiently addresses a wide range of applications in learning and vision, encompassing three categories and nine different problem types. Extensive experiments on synthetic tasks and real-world applications verify the effectiveness of our approach. The LwCL framework offers a comprehensive solution for tackling complex machine learning and computer vision problems, bridging the gap between theory and practice.
Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
Authors: Hai Wu, Qunsong Zeng, Kaibin Huang
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
Abstract
For the 6G mobile networks, in-situ model downloading has emerged as an important use case to enable real-time adaptive artificial intelligence on edge devices. However, the simultaneous downloading of diverse and high-dimensional models to multiple devices over wireless links presents a significant communication bottleneck. To overcome the bottleneck, we propose the framework of model broadcasting and assembling (MBA), which represents the first attempt on leveraging reusable knowledge, referring to shared parameters among tasks, to enable parameter broadcasting to reduce communication overhead. The MBA framework comprises two key components. The first, the MBA protocol, defines the system operations including parameter selection from a model library, power control for broadcasting, and model assembling at devices. The second component is the joint design of parameter-selection-and-power-control (PS-PC), which provides guarantees on devices' model performance and minimizes the downloading latency. The corresponding optimization problem is simplified by decomposition into the sequential PS and PC sub-problems without compromising its optimality. The PS sub-problem is solved efficiently by designing two efficient algorithms. On one hand, the low-complexity algorithm of greedy parameter selection features the construction of candidate model sets and a selection metric, both of which are designed under the criterion of maximum reusable knowledge among tasks. On the other hand, the optimal tree-search algorithm gains its efficiency via the proposed construction of a compact binary tree pruned using model architecture constraints and an intelligent branch-and-bound search. Given optimal PS, the optimal PC policy is derived in closed form. Extensive experiments demonstrate the substantial reduction in downloading latency achieved by the proposed MBA compared to traditional model downloading.
Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF
Authors: Haotian Bai, Yiqi Lin, Yize Chen, Lin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The explicit neural radiance field (NeRF) has gained considerable interest for its efficient training and fast inference capabilities, making it a promising direction such as virtual reality and gaming. In particular, PlenOctree (POT)[1], an explicit hierarchical multi-scale octree representation, has emerged as a structural and influential framework. However, POT's fixed structure for direct optimization is sub-optimal as the scene complexity evolves continuously with updates to cached color and density, necessitating refining the sampling distribution to capture signal complexity accordingly. To address this issue, we propose the dynamic PlenOctree DOT, which adaptively refines the sample distribution to adjust to changing scene complexity. Specifically, DOT proposes a concise yet novel hierarchical feature fusion strategy during the iterative rendering process. Firstly, it identifies the regions of interest through training signals to ensure adaptive and efficient refinement. Next, rather than directly filtering out valueless nodes, DOT introduces the sampling and pruning operations for octrees to aggregate features, enabling rapid parameter learning. Compared with POT, our DOT outperforms it by enhancing visual quality, reducing over $55.15$/$68.84\%$ parameters, and providing 1.7/1.9 times FPS for NeRF-synthetic and Tanks $\&$ Temples, respectively. Project homepage:https://vlislab22.github.io/DOT. [1] Yu, Alex, et al. "Plenoctrees for real-time rendering of neural radiance fields." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
Learning Compliant Stiffness by Impedance Control-Aware Task Segmentation and Multi-objective Bayesian Optimization with Priors
Abstract
Rather than traditional position control, impedance control is preferred to ensure the safe operation of industrial robots programmed from demonstrations. However, variable stiffness learning studies have focused on task performance rather than safety (or compliance). Thus, this paper proposes a novel stiffness learning method to satisfy both task performance and compliance requirements. The proposed method optimizes the task and compliance objectives (T/C objectives) simultaneously via multi-objective Bayesian optimization. We define the stiffness search space by segmenting a demonstration into task phases, each with constant responsible stiffness. The segmentation is performed by identifying impedance control-aware switching linear dynamics (IC-SLD) from the demonstration. We also utilize the stiffness obtained by proposed IC-SLD as priors for efficient optimization. Experiments on simulated tasks and a real robot demonstrate that IC-SLD-based segmentation and the use of priors improve the optimization efficiency compared to existing baseline methods.
Prompt Guided Transformer for Multi-Task Dense Prediction
Abstract
Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.
Leveraging Optical Communication Fiber and AI for Distributed Water Pipe Leak Detection
Authors: Huan Wu, Huan-Feng Duan, Wallace W. L. Lai, Kun Zhu, Xin Cheng, Hao Yin, Bin Zhou, Chun-Cheung Lai, Chao Lu, Xiaoli Ding
Abstract
Detecting leaks in water networks is a costly challenge. This article introduces a practical solution: the integration of optical network with water networks for efficient leak detection. Our approach uses a fiber-optic cable to measure vibrations, enabling accurate leak identification and localization by an intelligent algorithm. We also propose a method to access leak severity for prioritized repairs. Our solution detects even small leaks with flow rates as low as 0.027 L/s. It offers a cost-effective way to improve leak detection, enhance water management, and increase operational efficiency.
Co-attention Graph Pooling for Efficient Pairwise Graph Interaction Learning
Authors: Junhyun Lee, Bumsoo Kim, Minji Jeon, Jaewoo Kang
Abstract
Graph Neural Networks (GNNs) have proven to be effective in processing and learning from graph-structured data. However, previous works mainly focused on understanding single graph inputs while many real-world applications require pair-wise analysis for graph-structured data (e.g., scene graph matching, code searching, and drug-drug interaction prediction). To this end, recent works have shifted their focus to learning the interaction between pairs of graphs. Despite their improved performance, these works were still limited in that the interactions were considered at the node-level, resulting in high computational costs and suboptimal performance. To address this issue, we propose a novel and efficient graph-level approach for extracting interaction representations using co-attention in graph pooling. Our method, Co-Attention Graph Pooling (CAGPool), exhibits competitive performance relative to existing methods in both classification and regression tasks using real-world datasets, while maintaining lower computational complexity.
Improving Social Media Popularity Prediction with Multiple Post Dependencies
Authors: Zhizhen Zhang, Xiaohui Xie, Mengyu Yang, Ye Tian, Yong Jiang, Yong Cui
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
Social Media Popularity Prediction has drawn a lot of attention because of its profound impact on many different applications, such as recommendation systems and multimedia advertising. Despite recent efforts to leverage the content of social media posts to improve prediction accuracy, many existing models fail to fully exploit the multiple dependencies between posts, which are important to comprehensively extract content information from posts. To tackle this problem, we propose a novel prediction framework named Dependency-aware Sequence Network (DSN) that exploits both intra- and inter-post dependencies. For intra-post dependency, DSN adopts a multimodal feature extractor with an efficient fine-tuning strategy to obtain task-specific representations from images and textual information of posts. For inter-post dependency, DSN uses a hierarchical information propagation method to learn category representations that could better describe the difference between posts. DSN also exploits recurrent networks with a series of gating layers for more flexible local temporal processing abilities and multi-head attention for long-term dependencies. The experimental results on the Social Media Popularity Dataset demonstrate the superiority of our method compared to existing state-of-the-art models.
Nonlinear reduced basis using mixture Wasserstein barycenters: application to an eigenvalue problem inspired from quantum chemistry
Authors: Maxime Dalery, Genevieve Dusson, Virginie Ehrlacher, Alexei Lozinski
Abstract
The aim of this article is to propose a new reduced-order modelling approach for parametric eigenvalue problems arising in electronic structure calculations. Namely, we develop nonlinear reduced basis techniques for the approximation of parametric eigenvalue problems inspired from quantum chemistry applications. More precisely, we consider here a one-dimensional model which is a toy model for the computation of the electronic ground state wavefunction of a system of electrons within a molecule, solution to the many-body electronic Schr\"odinger equation, where the varying parameters are the positions of the nuclei in the molecule. We estimate the decay rate of the Kolmogorov n-width of the set of solutions for this parametric problem in several settings, including the standard L2-norm as well as with distances based on optimal transport. The fact that the latter decays much faster than in the traditional L2-norm setting motivates us to propose a practical nonlinear reduced basis method, which is based on an offline greedy algorithm, and an efficient stochastic energy minimization in the online phase. We finally provide numerical results illustrating the capabilities of the method and good approximation properties, both in the offline and the online phase.
Abstract
In multi-task learning (MTL), gradient balancing has recently attracted more research interest than loss balancing since it often leads to better performance. However, loss balancing is much more efficient than gradient balancing, and thus it is still worth further exploration in MTL. Note that prior studies typically ignore that there exist varying improvable gaps across multiple tasks, where the improvable gap per task is defined as the distance between the current training progress and desired final training progress. Therefore, after loss balancing, the performance imbalance still arises in many cases. In this paper, following the loss balancing framework, we propose two novel improvable gap balancing (IGB) algorithms for MTL: one takes a simple heuristic, and the other (for the first time) deploys deep reinforcement learning for MTL. Particularly, instead of directly balancing the losses in MTL, both algorithms choose to dynamically assign task weights for improvable gap balancing. Moreover, we combine IGB and gradient balancing to show the complementarity between the two types of algorithms. Extensive experiments on two benchmark datasets demonstrate that our IGB algorithms lead to the best results in MTL via loss balancing and achieve further improvements when combined with gradient balancing. Code is available at https://github.com/YanqiDai/IGB4MTL.
On the Design of Region-Avoiding Metrics for Collision-Safe Motion Generation on Riemannian Manifolds
Authors: Holger Klein, Noémie Jaquier, Andre Meixner, Tamim Asfour
Abstract
The generation of energy-efficient and dynamic-aware robot motions that satisfy constraints such as joint limits, self-collisions, and collisions with the environment remains a challenge. In this context, Riemannian geometry offers promising solutions by identifying robot motions with geodesics on the so-called configuration space manifold. While this manifold naturally considers the intrinsic robot dynamics, constraints such as joint limits, self-collisions, and collisions with the environment remain overlooked. In this paper, we propose a modification of the Riemannian metric of the configuration space manifold allowing for the generation of robot motions as geodesics that efficiently avoid given regions. We introduce a class of Riemannian metrics based on barrier functions that guarantee strict region avoidance by systematically generating accelerations away from no-go regions in joint and task space. We evaluate the proposed Riemannian metric to generate energy-efficient, dynamic-aware, and collision-free motions of a humanoid robot as geodesics and sequences thereof.
Framework to Automatically Determine the Quality of Open Data Catalogs
Abstract
Data catalogs play a crucial role in modern data-driven organizations by facilitating the discovery, understanding, and utilization of diverse data assets. However, ensuring their quality and reliability is complex, especially in open and large-scale data environments. This paper proposes a framework to automatically determine the quality of open data catalogs, addressing the need for efficient and reliable quality assessment mechanisms. Our framework can analyze various core quality dimensions, such as accuracy, completeness, consistency, scalability, and timeliness, offer several alternatives for the assessment of compatibility and similarity across such catalogs as well as the implementation of a set of non-core quality dimensions such as provenance, readability, and licensing. The goal is to empower data-driven organizations to make informed decisions based on trustworthy and well-curated data assets. The source code that illustrates our approach can be downloaded from https://www.github.com/jorge-martinez-gil/dataq/.
From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs
Authors: Lorenz Richter, Leon Sallandt, Nikolas Nüsken
Abstract
The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue that tensor trains provide an appealing framework for parabolic PDEs: The combination of reformulations in terms of backward stochastic differential equations and regression-type methods holds the promise of leveraging latent low-rank structures, enabling both compression and efficient computation. Emphasizing a continuous-time viewpoint, we develop iterative schemes, which differ in terms of computational efficiency and robustness. We demonstrate both theoretically and numerically that our methods can achieve a favorable trade-off between accuracy and computational efficiency. While previous methods have been either accurate or fast, we have identified a novel numerical strategy that can often combine both of these aspects.
Improving Image Quality of Sparse-view Lung Cancer CT Images with a Convolutional Neural Network
Authors: Annika Ries, Tina Dorosti, Johannes Thalhammer, Daniel Sasse, Andreas Sauter, Felix Meurer, Ashley Benne, Franz Pfeiffer, Daniela Pfeiffer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Abstract
Purpose: To improve the image quality of sparse-view computed tomography (CT) images with a U-Net for lung cancer detection and to determine the best trade-off between number of views, image quality, and diagnostic confidence. Methods: CT images from 41 subjects (34 with lung cancer, seven healthy) were retrospectively selected (01.2016-12.2018) and forward projected onto 2048-view sinograms. Six corresponding sparse-view CT data subsets at varying levels of undersampling were reconstructed from sinograms using filtered backprojection with 16, 32, 64, 128, 256, and 512 views, respectively. A dual-frame U-Net was trained and evaluated for each subsampling level on 8,658 images from 22 diseased subjects. A representative image per scan was selected from 19 subjects (12 diseased, seven healthy) for a single-blinded reader study. The selected slices, for all levels of subsampling, with and without post-processing by the U-Net model, were presented to three readers. Image quality and diagnostic confidence were ranked using pre-defined scales. Subjective nodule segmentation was evaluated utilizing sensitivity (Se) and Dice Similarity Coefficient (DSC) with 95% confidence intervals (CI). Results: The 64-projection sparse-view images resulted in Se = 0.89 and DSC = 0.81 [0.75,0.86] while their counterparts, post-processed with the U-Net, had improved metrics (Se = 0.94, DSC = 0.85 [0.82,0.87]). Fewer views lead to insufficient quality for diagnostic purposes. For increased views, no substantial discrepancies were noted between the sparse-view and post-processed images. Conclusion: Projection views can be reduced from 2048 to 64 while maintaining image quality and the confidence of the radiologists on a satisfactory level.
Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration
Authors: Jianyi Cheng, Cheng Zhang, Zhewen Yu, Alex Montgomerie-Corcoran, Can Xiao, Christos-Savvas Bouganis, Yiren Zhao
Abstract
Machine learning (ML) accelerators have been studied and used extensively to compute ML models with high performance and low power. However, designing such accelerators normally takes a long time and requires significant effort. Unfortunately, the pace of development of ML software models is much faster than the accelerator design cycle, leading to frequent and drastic modifications in the model architecture, thus rendering many accelerators obsolete. Existing design tools and frameworks can provide quick accelerator prototyping, but only for a limited range of models that can fit into a single hardware device, such as an FPGA. Furthermore, with the emergence of large language models, such as GPT-3, there is an increased need for hardware prototyping of these large models within a many-accelerator system to ensure the hardware can scale with the ever-growing model sizes. In this paper, we propose an efficient and scalable approach for exploring accelerator systems to compute large ML models. We developed a tool named MASE that can directly map large ML models onto an efficient streaming accelerator system. Over a set of ML models, we show that MASE can achieve better energy efficiency to GPUs when computing inference for recent transformer models. Our tool will open-sourced upon publication.
Abstract
In this paper we give the first efficient algorithms for the $k$-center problem on dynamic graphs undergoing edge updates. In this problem, the goal is to partition the input into $k$ sets by choosing $k$ centers such that the maximum distance from any data point to the closest center is minimized. It is known that it is NP-hard to get a better than $2$ approximation for this problem. While in many applications the input may naturally be modeled as a graph, all prior works on $k$-center problem in dynamic settings are on metrics. In this paper, we give a deterministic decremental $(2+\epsilon)$-approximation algorithm and a randomized incremental $(4+\epsilon)$-approximation algorithm, both with amortized update time $kn^{o(1)}$ for weighted graphs. Moreover, we show a reduction that leads to a fully dynamic $(2+\epsilon)$-approximation algorithm for the $k$-center problem, with worst-case update time that is within a factor $k$ of the state-of-the-art upper bound for maintaining $(1+\epsilon)$-approximate single-source distances in graphs. Matching this bound is a natural goalpost because the approximate distances of each vertex to its center can be used to maintain a $(2+\epsilon)$-approximation of the graph diameter and the fastest known algorithms for such a diameter approximation also rely on maintaining approximate single-source distances.
Swiper and Dora: efficient solutions to weighted distributed problems
Authors: Luciano Freitas, Andrei Tonkikh
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
Abstract
The majority of fault-tolerant distributed algorithms are designed assuming a nominal corruption model, in which at most a fraction $f_n$ of parties can be corrupted by the adversary. However, due to the infamous Sybil attack, nominal models are not sufficient to express the trust assumptions in open (i.e., permissionless) settings. Instead, permissionless systems typically operate in a weighted model, where each participant is associated with a weight and the adversary can corrupt a set of parties holding at most a fraction $f_w$ of total weight. In this paper, we suggest a simple way to transform a large class of protocols designed for the nominal model into the weighted model. To this end, we formalize and solve three novel optimization problems, which we collectively call the weight reduction problems, that allow us to map large real weights into small integer weights while preserving the properties necessary for the correctness of the protocols. In all cases, we manage to keep the sum of the integer weights to be at most linear in the number of parties, resulting in extremely efficient protocols for the weighted model. Moreover, we demonstrate that, on weight distributions that emerge in practice, the sum of the integer weights tends to be far from the theoretical worst-case and, often even smaller than the number of participants. While, for some protocols, our transformation requires an arbitrarily small reduction in resilience (i.e., $f_w = f_n - \epsilon$), surprisingly, for many important problems we manage to obtain weighted solutions with the same resilience ($f_w = f_n$) as nominal ones. Notable examples include asynchronous consensus, verifiable secret sharing, erasure-coded distributed storage and broadcast protocols.
Learning to Open Doors with an Aerial Manipulator
Authors: Eugenio Cuniato, Ismail Geles, Weixuan Zhang, Olov Andersson, Marco Tognon, Roland Siegwart
Abstract
The field of aerial manipulation has seen rapid advances, transitioning from push-and-slide tasks to interaction with articulated objects. So far, when more complex actions are performed, the motion trajectory is usually handcrafted or a result of online optimization methods like Model Predictive Control (MPC) or Model Predictive Path Integral (MPPI) control. However, these methods rely on heuristics or model simplifications to efficiently run on onboard hardware, producing results in acceptable amounts of time. Moreover, they can be sensitive to disturbances and differences between the real environment and its simulated counterpart. In this work, we propose a Reinforcement Learning (RL) approach to learn motion behaviors for a manipulation task while producing policies that are robust to disturbances and modeling errors. Specifically, we train a policy to perform a door-opening task with an Omnidirectional Micro Aerial Vehicle (OMAV). The policy is trained in a physics simulator and experiments are presented both in simulation and running onboard the real platform, investigating the simulation to real world transfer. We compare our method against a state-of-the-art MPPI solution, showing a considerable increase in robustness and speed.
Quasi-Monte Carlo Algorithms (not only) for Graphics Software
Authors: Alexander Keller, Carsten Wächter, Nikolaus Binder
Abstract
Quasi-Monte Carlo methods have become the industry standard in computer graphics. For that purpose, efficient algorithms for low discrepancy sequences are discussed. In addition, numerical pitfalls encountered in practice are revealed. We then take a look at massively parallel quasi-Monte Carlo integro-approximation for image synthesis by light transport simulation. Beyond superior uniformity, low discrepancy points may be optimized with respect to additional criteria, such as noise characteristics at low sampling rates or the quality of low-dimensional projections.
Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search
Authors: Alexander Chebykin, Arkadiy Dushatskiy, Tanja Alderliesten, Peter A. N. Bosman
Abstract
In this work, we show that simultaneously training and mixing neural networks is a promising way to conduct Neural Architecture Search (NAS). For hyperparameter optimization, reusing the partially trained weights allows for efficient search, as was previously demonstrated by the Population Based Training (PBT) algorithm. We propose PBT-NAS, an adaptation of PBT to NAS where architectures are improved during training by replacing poorly-performing networks in a population with the result of mixing well-performing ones and inheriting the weights using the shrink-perturb technique. After PBT-NAS terminates, the created networks can be directly used without retraining. PBT-NAS is highly parallelizable and effective: on challenging tasks (image generation and reinforcement learning) PBT-NAS achieves superior performance compared to baselines (random search and mutation-based PBT).
Two-level Continuous Topology Optimization in Structural Mechanics
Authors: Rafael Merli, Antolín Martínez-Martínez, Juan José Ródenas, Marc Bosch-Galera, Enrique Nadal
Subjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)
Abstract
In the current industry, the development of optimized mechanical components able to satisfy the customer requirements evolves quickly. Therefore, companies are asked for efficient solutions to improve their products in terms of stiffness and strength. In this sense, Topology Optimization has been extensively used to determine the best topology of structural components from the mechanical point of view. Its main objective is to distribute a given amount of material into a predefined domain to reach the maximum overall stiffness of the component. Besides, high-resolution solutions are essential to define the final distribution of material. Standard Topological Optimization tools are able to propose an optimal topology for the whole component, but when small topological details are required (i.e. trabecular-type structures) the computational cost is prohibitive. In order to mitigate this issue, the present work proposes a two-level topology optimization method to solve high-resolution problems by using density-based methods. The proposed methodology includes three steps: The first one subdivides the whole component in cells and generates a coarse optimized low-definition material distribution assigning one different density to each cell. The second one uses an equilibrating technique that provides tractions continuity between adjacent cells, thus ensuring the material inter-cell continuity after the cells optimization process. Finally, each cell is optimized at fine scale taking as input data the densities and the equilibrated tractions obtained from the macro problem. The main goal of this work is to efficiently solve high-resolution topology optimization problems using density-based methods, which would be unaffordable with standard computing facilities and the current methodologies.
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
Authors: Nico Gürtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel Wüthrich, Stefan Bauer, Bernhard Schölkopf, Georg Martius
Abstract
Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
SimDETR: Simplifying self-supervised pretraining for DETR
Authors: Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
DETR-based object detectors have achieved remarkable performance but are sample-inefficient and exhibit slow convergence. Unsupervised pretraining has been found to be helpful to alleviate these impediments, allowing training with large amounts of unlabeled data to improve the detector's performance. However, existing methods have their own limitations, like keeping the detector's backbone frozen in order to avoid performance degradation and utilizing pretraining objectives misaligned with the downstream task. To overcome these limitations, we propose a simple pretraining framework for DETR-based detectors that consists of three simple yet key ingredients: (i) richer, semantics-based initial proposals derived from high-level feature maps, (ii) discriminative training using object pseudo-labels produced via clustering, (iii) self-training to take advantage of the improved object proposals learned by the detector. We report two main findings: (1) Our pretraining outperforms prior DETR pretraining works on both the full and low data regimes by significant margins. (2) We show we can pretrain DETR from scratch (including the backbone) directly on complex image datasets like COCO, paving the path for unsupervised representation learning directly using DETR.
The Strong Maximum Circulation Algorithm: A New Method for Aggregating Preference Rankings
Authors: Nathan Atkinson, Scott C. Ganz, Dorit S. Hochbaum, James B. Orlin
Subjects: Social and Information Networks (cs.SI); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
Abstract
We present a new optimization-based method for aggregating preferences in settings where each decision maker, or voter, expresses preferences over pairs of alternatives. The challenge is to come up with a ranking that agrees as much as possible with the votes cast in cases when some of the votes conflict. Only a collection of votes that contains no cycles is non-conflicting and can induce a partial order over alternatives. Our approach is motivated by the observation that a collection of votes that form a cycle can be treated as ties. The method is then to remove unions of cycles of votes, or circulations, from the vote graph and determine aggregate preferences from the remainder. We introduce the strong maximum circulation which is formed by a union of cycles, the removal of which guarantees a unique outcome in terms of the induced partial order. Furthermore, it contains all the aggregate preferences remaining following the elimination of any maximum circulation. In contrast, the well-known, optimization-based, Kemeny method has non-unique output and can return multiple, conflicting rankings for the same input. In addition, Kemeny's method requires solving an NP-hard problem, whereas our algorithm is efficient, based on network flow techniques, and runs in strongly polynomial time, independent of the number of votes. We address the construction of a ranking from the partial order and show that rankings based on a convex relaxation of Kemeny's model are consistent with our partial order. We then study the properties of removing a maximal circulation versus a maximum circulation and establish that, while maximal circulations will in general identify a larger number of aggregate preferences, the partial orders induced by the removal of different maximal circulations are not unique and may be conflicting. Moreover, finding a minimum maximal circulation is an NP-hard problem.
Keyword: faster
Nonlinear reduced basis using mixture Wasserstein barycenters: application to an eigenvalue problem inspired from quantum chemistry
Authors: Maxime Dalery, Genevieve Dusson, Virginie Ehrlacher, Alexei Lozinski
Abstract
The aim of this article is to propose a new reduced-order modelling approach for parametric eigenvalue problems arising in electronic structure calculations. Namely, we develop nonlinear reduced basis techniques for the approximation of parametric eigenvalue problems inspired from quantum chemistry applications. More precisely, we consider here a one-dimensional model which is a toy model for the computation of the electronic ground state wavefunction of a system of electrons within a molecule, solution to the many-body electronic Schr\"odinger equation, where the varying parameters are the positions of the nuclei in the molecule. We estimate the decay rate of the Kolmogorov n-width of the set of solutions for this parametric problem in several settings, including the standard L2-norm as well as with distances based on optimal transport. The fact that the latter decays much faster than in the traditional L2-norm setting motivates us to propose a practical nonlinear reduced basis method, which is based on an offline greedy algorithm, and an efficient stochastic energy minimization in the online phase. We finally provide numerical results illustrating the capabilities of the method and good approximation properties, both in the offline and the online phase.
Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration
Authors: Jianyi Cheng, Cheng Zhang, Zhewen Yu, Alex Montgomerie-Corcoran, Can Xiao, Christos-Savvas Bouganis, Yiren Zhao
Abstract
Machine learning (ML) accelerators have been studied and used extensively to compute ML models with high performance and low power. However, designing such accelerators normally takes a long time and requires significant effort. Unfortunately, the pace of development of ML software models is much faster than the accelerator design cycle, leading to frequent and drastic modifications in the model architecture, thus rendering many accelerators obsolete. Existing design tools and frameworks can provide quick accelerator prototyping, but only for a limited range of models that can fit into a single hardware device, such as an FPGA. Furthermore, with the emergence of large language models, such as GPT-3, there is an increased need for hardware prototyping of these large models within a many-accelerator system to ensure the hardware can scale with the ever-growing model sizes. In this paper, we propose an efficient and scalable approach for exploring accelerator systems to compute large ML models. We developed a tool named MASE that can directly map large ML models onto an efficient streaming accelerator system. Over a set of ML models, we show that MASE can achieve better energy efficiency to GPUs when computing inference for recent transformer models. Our tool will open-sourced upon publication.
Keyword: mobile
From the Desks of ROS Maintainers: A Survey of Modern & Capable Mobile Robotics Algorithms in the Robot Operating System 2
Authors: Steve Macenski, Tom Moore, David Lu, Alexey Merzlyakov
Abstract
The Robot Operating System 2 (ROS 2) is rapidly impacting the intelligent machines sector -- on space missions, large agriculture equipment, multi-robot fleets, and more. Its success derives from its focused design and improved capabilities targeting product-grade and modern robotic systems. Following ROS 2's example, the mobile robotics ecosystem has been fully redesigned based on the transformed needs of modern robots and is experiencing active development not seen since its inception. This paper comes from the desks of the key ROS Navigation maintainers to review and analyze the state of the art of robotics navigation in ROS 2. This includes new systems without parallel in ROS 1 or other similar mobile robotics frameworks. We discuss current research products and historically robust methods that provide differing behaviors and support for most every robot type. This survey consists of overviews, comparisons, and expert insights organized by the fundamental problems in the field. Some of these implementations have yet to be described in literature and many have not been benchmarked relative to others. We end by providing a glimpse into the future of the ROS 2 mobile robotics ecosystem.
Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
Authors: Hai Wu, Qunsong Zeng, Kaibin Huang
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
Abstract
For the 6G mobile networks, in-situ model downloading has emerged as an important use case to enable real-time adaptive artificial intelligence on edge devices. However, the simultaneous downloading of diverse and high-dimensional models to multiple devices over wireless links presents a significant communication bottleneck. To overcome the bottleneck, we propose the framework of model broadcasting and assembling (MBA), which represents the first attempt on leveraging reusable knowledge, referring to shared parameters among tasks, to enable parameter broadcasting to reduce communication overhead. The MBA framework comprises two key components. The first, the MBA protocol, defines the system operations including parameter selection from a model library, power control for broadcasting, and model assembling at devices. The second component is the joint design of parameter-selection-and-power-control (PS-PC), which provides guarantees on devices' model performance and minimizes the downloading latency. The corresponding optimization problem is simplified by decomposition into the sequential PS and PC sub-problems without compromising its optimality. The PS sub-problem is solved efficiently by designing two efficient algorithms. On one hand, the low-complexity algorithm of greedy parameter selection features the construction of candidate model sets and a selection metric, both of which are designed under the criterion of maximum reusable knowledge among tasks. On the other hand, the optimal tree-search algorithm gains its efficiency via the proposed construction of a compact binary tree pruned using model architecture constraints and an intelligent branch-and-bound search. Given optimal PS, the optimal PC policy is derived in closed form. Extensive experiments demonstrate the substantial reduction in downloading latency achieved by the proposed MBA compared to traditional model downloading.
Non-invasive Diabetes Detection using Gabor Filter: A Comparative Analysis of Different Cameras
Authors: Christina A. Garcia, Patricia Angela R. Abu, Rosula SJ. Reyes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This paper compares and explores the performance of both mobile device camera and laptop camera as convenient tool for capturing images for non-invasive detection of Diabetes Mellitus (DM) using facial block texture features. Participants within age bracket 20 to 79 years old were chosen for the dataset. 12mp and 7mp mobile cameras, and a laptop camera were used to take the photo under normal lighting condition. Extracted facial blocks were classified using k-Nearest Neighbors (k-NN) and Support Vector Machine (SVM). 100 images were captured, preprocessed, filtered using Gabor, and iterated. Performance of the system was measured in terms of accuracy, specificity, and sensitivity. Best performance of 96.7% accuracy, 100% sensitivity, and 93% specificity were achieved from 12mp back camera using SVM with 100 images.
DISCO: Achieving Low Latency and High Reliability in Scheduling of Graph-Structured Tasks over Mobile Vehicular Cloud
Authors: Minghui Liwang, Bingshuo Guo, Zhanxi Ma, Yuhan Su, Seyyedali Hosseinalipour, Xianbin Wang, Huaiyu Dai
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
Abstract
To effectively process data across a fleet of dynamic and distributed vehicles, it is crucial to implement resource provisioning techniques that provide reliable, cost-effective, and real-time computing services. This article explores resource provisioning for computation-intensive tasks over mobile vehicular clouds (MVCs). We use undirected weighted graphs (UWGs) to model both the execution of tasks and communication patterns among vehicles in a MVC. We then study low-latency and reliable scheduling of UWG asks through a novel methodology named double-plan-promoted isomorphic subgraph search and optimization (DISCO). In DISCO, two complementary plans are envisioned to ensure effective task completion: Plan A and Plan B.Plan A analyzes the past data to create an optimal mapping ($\alpha$) between tasks and the MVC in advance to the practical task scheduling. Plan B serves as a dependable backup, designed to find a feasible mapping ($\beta$) in case $\alpha$ fails during task scheduling due to unpredictable nature of the network.We delve into into DISCO's procedure and key factors that contribute to its success. Additionally, we provide a case study that includes comprehensive comparisons to demonstrate DISCO's exceptional performance in regards to time efficiency and overhead. We further discuss a series of open directions for future research.
The Applicability of Federated Learning to Official Statistics
Authors: Joshua Stock, Oliver Hauke, Julius Weißmann, Hannes Federrath
Abstract
This work investigates the potential of Federated Learning (FL) for official statistics and shows how well the performance of FL models can keep up with centralized learning methods. At the same time, its utilization can safeguard the privacy of data holders, thus facilitating access to a broader range of data and ultimately enhancing official statistics. By simulating three different use cases, important insights on the applicability of the technology are gained. The use cases are based on a medical insurance data set, a fine dust pollution data set and a mobile radio coverage data set - all of which are from domains close to official statistics. We provide a detailed analysis of the results, including a comparison of centralized and FL algorithm performances for each simulation. In all three use cases, we were able to train models via FL which reach a performance very close to the centralized model benchmarks. Our key observations and their implications for transferring the simulations into practice are summarized. We arrive at the conclusion that FL has the potential to emerge as a pivotal technology in future use cases of official statistics.
Keyword: pruning
Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF
Authors: Haotian Bai, Yiqi Lin, Yize Chen, Lin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The explicit neural radiance field (NeRF) has gained considerable interest for its efficient training and fast inference capabilities, making it a promising direction such as virtual reality and gaming. In particular, PlenOctree (POT)[1], an explicit hierarchical multi-scale octree representation, has emerged as a structural and influential framework. However, POT's fixed structure for direct optimization is sub-optimal as the scene complexity evolves continuously with updates to cached color and density, necessitating refining the sampling distribution to capture signal complexity accordingly. To address this issue, we propose the dynamic PlenOctree DOT, which adaptively refines the sample distribution to adjust to changing scene complexity. Specifically, DOT proposes a concise yet novel hierarchical feature fusion strategy during the iterative rendering process. Firstly, it identifies the regions of interest through training signals to ensure adaptive and efficient refinement. Next, rather than directly filtering out valueless nodes, DOT introduces the sampling and pruning operations for octrees to aggregate features, enabling rapid parameter learning. Compared with POT, our DOT outperforms it by enhancing visual quality, reducing over $55.15$/$68.84\%$ parameters, and providing 1.7/1.9 times FPS for NeRF-synthetic and Tanks $\&$ Temples, respectively. Project homepage:https://vlislab22.github.io/DOT. [1] Yu, Alex, et al. "Plenoctrees for real-time rendering of neural radiance fields." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
Information-based Preprocessing of PLC Data for Automatic Behavior Modeling
Authors: Brandon K. Sai, Jonas Gram, Thomas Bauernhansl
Subjects: Systems and Control (eess.SY); Computational Engineering, Finance, and Science (cs.CE); Methodology (stat.ME)
Abstract
Cyber-physical systems (CPS) offer immense optimization potential for manufacturing processes through the availability of multivariate time series data of actors and sensors. Based on automated analysis software, the deployment of adaptive and responsive measures is possible for time series data. Due to the complex and dynamic nature of modern manufacturing, analysis and modeling often cannot be entirely automated. Even machine- or deep learning approaches often depend on a priori expert knowledge and labelling. In this paper, an information-based data preprocessing approach is proposed. By applying statistical methods including variance and correlation analysis, an approximation of the sampling rate in event-based systems and the utilization of spectral analysis, knowledge about the underlying manufacturing processes can be gained prior to modeling. The paper presents, how statistical analysis enables the pruning of a dataset's least important features and how the sampling rate approximation approach sets the base for further data analysis and modeling. The data's underlying periodicity, originating from the cyclic nature of an automated manufacturing process, will be detected by utilizing the fast Fourier transform. This information-based preprocessing method will then be validated for process time series data of cyber-physical systems' programmable logic controllers (PLC).
Keyword: diffusion
Recovering high-quality FODs from a reduced number of diffusion-weighted images using a model-driven deep learning architecture
Authors: J Bartlett, C E Davey, L A Johnston, J Duan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Fibre orientation distribution (FOD) reconstruction using deep learning has the potential to produce accurate FODs from a reduced number of diffusion-weighted images (DWIs), decreasing total imaging time. Diffusion acquisition invariant representations of the DWI signals are typically used as input to these methods to ensure that they can be applied flexibly to data with different b-vectors and b-values; however, this means the network cannot condition its output directly on the DWI signal. In this work, we propose a spherical deconvolution network, a model-driven deep learning FOD reconstruction architecture, that ensures intermediate and output FODs produced by the network are consistent with the input DWI signals. Furthermore, we implement a fixel classification penalty within our loss function, encouraging the network to produce FODs that can subsequently be segmented into the correct number of fixels and improve downstream fixel-based analysis. Our results show that the model-based deep learning architecture achieves competitive performance compared to a state-of-the-art FOD super-resolution network, FOD-Net. Moreover, we show that the fixel classification penalty can be tuned to offer improved performance with respect to metrics that rely on accurately segmented of FODs. Our code is publicly available at https://github.com/Jbartlett6/SDNet .
Mixbiotic society measures: Comparison of organizational structures based on communication simulation
Abstract
The philosophical world has proposed the concept of "mixbiotic society," in which individuals with freedom and diverse values mix and mingle to recognize their respective "fundamental incapability" each other and sublimate into solidarity, toward solving the issues of social isolation and fragmentation. Based on this concept, the mixbiotic society measures have been proposed to evaluate dynamic communication patterns with reference to classification in cellular automata and particle reaction-diffusion that simulate living phenomena. In this paper, we applied these measures to five typologies of organizational structure (Red: impulsive, Amber: adaptive, Orange: achievement, Green: pluralistic, and Teal: evolutionary) and evaluated their features. Specifically, we formed star, tree, tree+jumpers, tree+more jumpers, and small-world type networks corresponding to each of five typologies, conducted communication simulations on these networks, and calculated values for mixbiotic society measures. The results showed that Teal organization has the highest value of the mixism measure among mixbiotic society measures, i.e., it balances similarity (mixing) and dissimilarity (mingling) in communication, and is living and mixbiotic between order and chaos. Measures other than mixism showed that in Teal organization, information is not concentrated in a central leader and that communication takes place among various members. This evaluation of organizational structures shows that the mixbiotic society measures is also useful for assessing organizational change. In the future, these measures will be used not only in business organizations, but also in digital democratic organizations and platform cooperatives in conjunction with information technology.
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Authors: Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Abstract
Recently, there has been a growing interest in text-to-speech (TTS) methods that can be trained with minimal supervision by combining two types of discrete speech representations and using two sequence-to-sequence tasks to decouple TTS. To address the challenges associated with high dimensionality and waveform distortion in discrete representations, we propose Diff-LM-Speech, which models semantic embeddings into mel-spectrogram based on diffusion models and introduces a prompt encoder structure based on variational autoencoders and prosody bottlenecks to improve prompt representation capabilities. Autoregressive language models often suffer from missing and repeated words, while non-autoregressive frameworks face expression averaging problems due to duration prediction models. To address these issues, we propose Tetra-Diff-Speech, which designs a duration diffusion model to achieve diverse prosodic expressions. While we expect the information content of semantic coding to be between that of text and acoustic coding, existing models extract semantic coding with a lot of redundant information and dimensionality explosion. To verify that semantic coding is not necessary, we propose Tri-Diff-Speech. Experimental results show that our proposed methods outperform baseline methods. We provide a website with audio samples.
Keyword: adaptive
Writer adaptation for offline text recognition: An exploration of neural network-based methods
Authors: Tobias van der Werff, Maruf A. Dhali, Lambert Schomaker
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Abstract
Handwriting recognition has seen significant success with the use of deep learning. However, a persistent shortcoming of neural networks is that they are not well-equipped to deal with shifting data distributions. In the field of handwritten text recognition (HTR), this shows itself in poor recognition accuracy for writers that are not similar to those seen during training. An ideal HTR model should be adaptive to new writing styles in order to handle the vast amount of possible writing styles. In this paper, we explore how HTR models can be made writer adaptive by using only a handful of examples from a new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used as base models, using a ResNet backbone along with either an LSTM or Transformer sequence decoder. Using these base models, two methods are considered to make them writer adaptive: 1) model-agnostic meta-learning (MAML), an algorithm commonly used for tasks such as few-shot classification, and 2) writer codes, an idea originating from automatic speech recognition. Results show that an HTR-specific version of MAML known as MetaHTR improves performance compared to the baseline with a 1.4 to 2.0 improvement in word error rate (WER). The improvement due to writer adaptation is between 0.2 and 0.7 WER, where a deeper model seems to lend itself better to adaptation using MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models or sentence-level HTR may become prohibitive due to its high computational and memory requirements. Lastly, writer codes based on learned features or Hinge statistical features did not lead to improved recognition performance.
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
Abstract
We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set $\mathcal{X}\subset\mathbb{R}^d$, a fixed budget $T$, and an unpredictable sequence of parameters $\left\lbrace\thetat\right\rbrace{t=1}^{T}$, an algorithm will aim to correctly identify the best arm $x^ := \arg\max{x\in\mathcal{X}}x^\top\sum{t=1}^{T}\theta_t$ with probability as high as possible. Prior work has addressed the stationary setting where $\theta_t = \theta_1$ for all $t$ and demonstrated that the error probability decreases as $\exp(-T /\rho^)$ for a problem-dependent constant $\rho^$. But in many real-world $A/B/n$ multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2{(1)}/d)$, where $\Delta{(1)} = \min_{x \neq x^} (x^ - x)^\top \frac{1}{T}\sum_{t=1}^T \thetat$. As there exist environments where $\Delta{(1)}^2/ d \ll 1/ \rho^$, we are motivated to propose a novel algorithm $\mathsf{P1}$-$\mathsf{RAGE}$ that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of $\mathsf{P1}$-$\mathsf{RAGE}$ and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.
Fast Dust Sand Image Enhancement Based on Color Correction and New Membership Function
Authors: Ali Hakem Alsaeedi, Suha Mohammed Hadi, Yarub Alazzawi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Images captured in dusty environments suffering from poor visibility and quality. Enhancement of these images such as sand dust images plays a critical role in various atmospheric optics applications. In this work, proposed a new model based on Color Correction and new membership function to enhance san dust images. The proposed model consists of three phases: correction of color shift, removal of haze, and enhancement of contrast and brightness. The color shift is corrected using a new membership function to adjust the values of U and V in the YUV color space. The Adaptive Dark Channel Prior (A-DCP) is used for haze removal. The stretching contrast and improving image brightness are based on Contrast Limited Adaptive Histogram Equalization (CLAHE). The proposed model tests and evaluates through many real sand dust images. The experimental results show that the proposed solution is outperformed the current studies in terms of effectively removing the red and yellow cast and provides high quality and quantity dust images.
Is this model reliable for everyone? Testing for strong calibration
Authors: Jean Feng, Alexej Gossmann, Romain Pirracchio, Nicholas Petrick, Gene Pennello, Berkman Sahiner
Abstract
In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup. Such models are reliable across heterogeneous populations and satisfy strong notions of algorithmic fairness. However, the task of auditing a model for strong calibration is well-known to be difficult -- particularly for machine learning (ML) algorithms -- due to the sheer number of potential subgroups. As such, common practice is to only assess calibration with respect to a few predefined subgroups. Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal or where the poorly calibrated subgroup is small, as they either overly subdivide the data or fail to divide the data at all. We introduce a new testing procedure based on the following insight: if we can reorder observations by their expected residuals, there should be a change in the association between the predicted and observed residuals along this sequence if a poorly calibrated subgroup exists. This lets us reframe the problem of calibration testing into one of changepoint detection, for which powerful methods already exist. We begin with introducing a sample-splitting procedure where a portion of the data is used to train a suite of candidate models for predicting the residual, and the remaining data are used to perform a score-based cumulative sum (CUSUM) test. To further improve power, we then extend this adaptive CUSUM test to incorporate cross-validation, while maintaining Type I error control under minimal assumptions. Compared to existing methods, the proposed procedure consistently achieved higher power in simulation studies and more than doubled the power when auditing a mortality risk prediction model.
On averaging block Kaczmarz methods for solving nonlinear systems of equations
Abstract
A class of averaging block nonlinear Kaczmarz methods is developed for the solution of the nonlinear system of equations. The convergence theory of the proposed method is established under suitable assumptions and the upper bounds of the convergence rate for the proposed method with both constant stepsize and adaptive stepsize are derived. Numerical experiments are presented to verify the efficiency of the proposed method, which outperforms the existing nonlinear Kaczmarz methods in terms of the number of iteration steps and computational costs.
ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation
Abstract
This paper presents the development and evaluation of ChatHome, a domain-specific language model (DSLM) designed for the intricate field of home renovation. Considering the proven competencies of large language models (LLMs) like GPT-4 and the escalating fascination with home renovation, this study endeavors to reconcile these aspects by generating a dedicated model that can yield high-fidelity, precise outputs relevant to the home renovation arena. ChatHome's novelty rests on its methodology, fusing domain-adaptive pretraining and instruction-tuning over an extensive dataset. This dataset includes professional articles, standard documents, and web content pertinent to home renovation. This dual-pronged strategy is designed to ensure that our model can assimilate comprehensive domain knowledge and effectively address user inquiries. Via thorough experimentation on diverse datasets, both universal and domain-specific, including the freshly introduced "EvalHome" domain dataset, we substantiate that ChatHome not only amplifies domain-specific functionalities but also preserves its versatility.
Mixbiotic society measures: Comparison of organizational structures based on communication simulation
Abstract
The philosophical world has proposed the concept of "mixbiotic society," in which individuals with freedom and diverse values mix and mingle to recognize their respective "fundamental incapability" each other and sublimate into solidarity, toward solving the issues of social isolation and fragmentation. Based on this concept, the mixbiotic society measures have been proposed to evaluate dynamic communication patterns with reference to classification in cellular automata and particle reaction-diffusion that simulate living phenomena. In this paper, we applied these measures to five typologies of organizational structure (Red: impulsive, Amber: adaptive, Orange: achievement, Green: pluralistic, and Teal: evolutionary) and evaluated their features. Specifically, we formed star, tree, tree+jumpers, tree+more jumpers, and small-world type networks corresponding to each of five typologies, conducted communication simulations on these networks, and calculated values for mixbiotic society measures. The results showed that Teal organization has the highest value of the mixism measure among mixbiotic society measures, i.e., it balances similarity (mixing) and dissimilarity (mingling) in communication, and is living and mixbiotic between order and chaos. Measures other than mixism showed that in Teal organization, information is not concentrated in a central leader and that communication takes place among various members. This evaluation of organizational structures shows that the mixbiotic society measures is also useful for assessing organizational change. In the future, these measures will be used not only in business organizations, but also in digital democratic organizations and platform cooperatives in conjunction with information technology.
Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
Authors: Hai Wu, Qunsong Zeng, Kaibin Huang
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
Abstract
For the 6G mobile networks, in-situ model downloading has emerged as an important use case to enable real-time adaptive artificial intelligence on edge devices. However, the simultaneous downloading of diverse and high-dimensional models to multiple devices over wireless links presents a significant communication bottleneck. To overcome the bottleneck, we propose the framework of model broadcasting and assembling (MBA), which represents the first attempt on leveraging reusable knowledge, referring to shared parameters among tasks, to enable parameter broadcasting to reduce communication overhead. The MBA framework comprises two key components. The first, the MBA protocol, defines the system operations including parameter selection from a model library, power control for broadcasting, and model assembling at devices. The second component is the joint design of parameter-selection-and-power-control (PS-PC), which provides guarantees on devices' model performance and minimizes the downloading latency. The corresponding optimization problem is simplified by decomposition into the sequential PS and PC sub-problems without compromising its optimality. The PS sub-problem is solved efficiently by designing two efficient algorithms. On one hand, the low-complexity algorithm of greedy parameter selection features the construction of candidate model sets and a selection metric, both of which are designed under the criterion of maximum reusable knowledge among tasks. On the other hand, the optimal tree-search algorithm gains its efficiency via the proposed construction of a compact binary tree pruned using model architecture constraints and an intelligent branch-and-bound search. Given optimal PS, the optimal PC policy is derived in closed form. Extensive experiments demonstrate the substantial reduction in downloading latency achieved by the proposed MBA compared to traditional model downloading.
Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF
Authors: Haotian Bai, Yiqi Lin, Yize Chen, Lin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The explicit neural radiance field (NeRF) has gained considerable interest for its efficient training and fast inference capabilities, making it a promising direction such as virtual reality and gaming. In particular, PlenOctree (POT)[1], an explicit hierarchical multi-scale octree representation, has emerged as a structural and influential framework. However, POT's fixed structure for direct optimization is sub-optimal as the scene complexity evolves continuously with updates to cached color and density, necessitating refining the sampling distribution to capture signal complexity accordingly. To address this issue, we propose the dynamic PlenOctree DOT, which adaptively refines the sample distribution to adjust to changing scene complexity. Specifically, DOT proposes a concise yet novel hierarchical feature fusion strategy during the iterative rendering process. Firstly, it identifies the regions of interest through training signals to ensure adaptive and efficient refinement. Next, rather than directly filtering out valueless nodes, DOT introduces the sampling and pruning operations for octrees to aggregate features, enabling rapid parameter learning. Compared with POT, our DOT outperforms it by enhancing visual quality, reducing over $55.15$/$68.84\%$ parameters, and providing 1.7/1.9 times FPS for NeRF-synthetic and Tanks $\&$ Temples, respectively. Project homepage:https://vlislab22.github.io/DOT. [1] Yu, Alex, et al. "Plenoctrees for real-time rendering of neural radiance fields." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
Information-based Preprocessing of PLC Data for Automatic Behavior Modeling
Authors: Brandon K. Sai, Jonas Gram, Thomas Bauernhansl
Subjects: Systems and Control (eess.SY); Computational Engineering, Finance, and Science (cs.CE); Methodology (stat.ME)
Abstract
Cyber-physical systems (CPS) offer immense optimization potential for manufacturing processes through the availability of multivariate time series data of actors and sensors. Based on automated analysis software, the deployment of adaptive and responsive measures is possible for time series data. Due to the complex and dynamic nature of modern manufacturing, analysis and modeling often cannot be entirely automated. Even machine- or deep learning approaches often depend on a priori expert knowledge and labelling. In this paper, an information-based data preprocessing approach is proposed. By applying statistical methods including variance and correlation analysis, an approximation of the sampling rate in event-based systems and the utilization of spectral analysis, knowledge about the underlying manufacturing processes can be gained prior to modeling. The paper presents, how statistical analysis enables the pruning of a dataset's least important features and how the sampling rate approximation approach sets the base for further data analysis and modeling. The data's underlying periodicity, originating from the cyclic nature of an automated manufacturing process, will be detected by utilizing the fast Fourier transform. This information-based preprocessing method will then be validated for process time series data of cyber-physical systems' programmable logic controllers (PLC).
Panoptic Scene Graph Generation with Semantics-prototype Learning
Authors: Li Li, Wei Ji, Yiming Wu, Mengze Li, You Qin, Lina Wei, Roger Zimmermann
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes. However, different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations in the dataset, i.e. different predicates for same object pairs. Biased predicate annotations make PSG models struggle in constructing a clear decision plane among predicates, which greatly hinders the real application of PSG models. To address the intrinsic bias above, we propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones. To promise consistency and accuracy during the transfer process, we propose to measure the invariance of representations in each predicate class, and learn unbiased prototypes of predicates with different intensities. Meanwhile, we continuously measure the distribution changes between each presentation and its prototype, and constantly screen potential biased data. Finally, with the unbiased predicate-prototype representation embedding space, biased annotations are easily identified. Experiments show that ADTrans significantly improves the performance of benchmark models, achieving a new state-of-the-art performance, and shows great generalization and effectiveness on multiple datasets.
Universal Recurrent Event Memories for Streaming Data
Abstract
In this paper, we propose a new event memory architecture (MemNet) for recurrent neural networks, which is universal for different types of time series data such as scalar, multivariate or symbolic. Unlike other external neural memory architectures, it stores key-value pairs, which separate the information for addressing and for content to improve the representation, as in the digital archetype. Moreover, the key-value pairs also avoid the compromise between memory depth and resolution that applies to memories constructed by the model state. One of the MemNet key characteristics is that it requires only linear adaptive mapping functions while implementing a nonlinear operation on the input data. MemNet architecture can be applied without modifications to scalar time series, logic operators on strings, and also to natural language processing, providing state-of-the-art results in all application domains such as the chaotic time series, the symbolic operation tasks, and the question-answering tasks (bAbI). Finally, controlled by five linear layers, MemNet requires a much smaller number of training parameters than other external memory networks as well as the transformer network. The space complexity of MemNet equals a single self-attention layer. It greatly improves the efficiency of the attention mechanism and opens the door for IoT applications.
Keyword: efficient
Convergence of the numerical approximations and well-posedness: Nonlocal conservation laws with rough flux
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Small, but important: Traffic light proposals for detecting small traffic lights and beyond
BOURNE: Bootstrapped Self-supervised Learning Framework for Unified Graph Anomaly Detection
A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition
Higher-order multi-scale deep Ritz method for multi-scale problems of authentic composite materials
Learning with Constraint Learning: New Perspective, Solution Strategy and Various Applications
Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF
Learning Compliant Stiffness by Impedance Control-Aware Task Segmentation and Multi-objective Bayesian Optimization with Priors
Prompt Guided Transformer for Multi-Task Dense Prediction
Leveraging Optical Communication Fiber and AI for Distributed Water Pipe Leak Detection
Co-attention Graph Pooling for Efficient Pairwise Graph Interaction Learning
Improving Social Media Popularity Prediction with Multiple Post Dependencies
Nonlinear reduced basis using mixture Wasserstein barycenters: application to an eigenvalue problem inspired from quantum chemistry
Improvable Gap Balancing for Multi-Task Learning
On the Design of Region-Avoiding Metrics for Collision-Safe Motion Generation on Riemannian Manifolds
Framework to Automatically Determine the Quality of Open Data Catalogs
From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs
Improving Image Quality of Sparse-view Lung Cancer CT Images with a Convolutional Neural Network
Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration
Dynamic algorithms for k-center on graphs
Swiper and Dora: efficient solutions to weighted distributed problems
Learning to Open Doors with an Aerial Manipulator
Quasi-Monte Carlo Algorithms (not only) for Graphics Software
Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search
Two-level Continuous Topology Optimization in Structural Mechanics
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
SimDETR: Simplifying self-supervised pretraining for DETR
The Strong Maximum Circulation Algorithm: A New Method for Aggregating Preference Rankings
Keyword: faster
Nonlinear reduced basis using mixture Wasserstein barycenters: application to an eigenvalue problem inspired from quantum chemistry
Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration
Keyword: mobile
From the Desks of ROS Maintainers: A Survey of Modern & Capable Mobile Robotics Algorithms in the Robot Operating System 2
Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
Non-invasive Diabetes Detection using Gabor Filter: A Comparative Analysis of Different Cameras
DISCO: Achieving Low Latency and High Reliability in Scheduling of Graph-Structured Tasks over Mobile Vehicular Cloud
The Applicability of Federated Learning to Official Statistics
Keyword: pruning
Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF
Information-based Preprocessing of PLC Data for Automatic Behavior Modeling
Keyword: diffusion
Recovering high-quality FODs from a reduced number of diffusion-weighted images using a model-driven deep learning architecture
Mixbiotic society measures: Comparison of organizational structures based on communication simulation
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Keyword: adaptive
Writer adaptation for offline text recognition: An exploration of neural network-based methods
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
Fast Dust Sand Image Enhancement Based on Color Correction and New Membership Function
Is this model reliable for everyone? Testing for strong calibration
On averaging block Kaczmarz methods for solving nonlinear systems of equations
ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation
Mixbiotic society measures: Comparison of organizational structures based on communication simulation
Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF
Information-based Preprocessing of PLC Data for Automatic Behavior Modeling
Panoptic Scene Graph Generation with Semantics-prototype Learning
Universal Recurrent Event Memories for Streaming Data
Keyword: quantization
There is no result