Abstract
Entity Matching (EM), which aims to identify all entity pairs referring to the same real-world entity from relational tables, is one of the most important tasks in real-world data management systems. Due to the labeling process of EM being extremely labor-intensive, unsupervised EM is more applicable than supervised EM in practical scenarios. Traditional unsupervised EM assumes that all entities come from two tables; however, it is more common to match entities from multiple tables in practical applications, that is, multi-table entity matching (multi-table EM). Unfortunately, effective and efficient unsupervised multi-table EM remains under-explored. To fill this gap, this paper formally studies the problem of unsupervised multi-table entity matching and proposes an effective and efficient solution, termed as MultiEM. MultiEM is a parallelable pipeline of enhanced entity representation, table-wise hierarchical merging, and density-based pruning. Extensive experimental results on six real-world benchmark datasets demonstrate the superiority of MultiEM in terms of effectiveness and efficiency.
Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces
Authors: Dayananda Ubrangala, Juhi Sharma, Ravi Prasad Kondapalli, Kiran R, Amit Agarwala, Laurent Boué
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract
Typographical errors are a major source of frustration for visitors of online marketplaces. Because of the domain-specific nature of these marketplaces and the very short queries users tend to search for, traditional spell cheking solutions do not perform well in correcting typos. We present a data augmentation method to address the lack of annotated typo data and train a recurrent neural network to learn context-limited domain-specific embeddings. Those embeddings are deployed in a real-time inferencing API for the Microsoft AppSource marketplace to find the closest match between a misspelled user query and the available product names. Our data efficient solution shows that controlled high quality synthetic data may be a powerful tool especially considering the current climate of large language models which rely on prohibitively huge and often uncontrolled datasets.
Optimal Dynamic Reconfiguration of Distribution Networks
Authors: Rida Fatima, Hassan Zahid Butt, Xingpeng Li
Abstract
The aim of distribution networks is to meet their local area power demand with maximum reliability. As the electricity consumption tends to increase every year, limited line thermal capacity can lead to network congestion. Continuous development and upgradation of the distribution network is thus required to meet the energy demand, which poses a significant increase in cost. The objective of this research is to analyze distribution network topologies and introduce a topology reconfiguration scheme based on the cost and demand of electricity. Traditional electrical distribution networks are static and inefficient. To make the network active, an optimal dynamic network topology reconfiguration (DNTR) is proposed to control line switching and reconnect some loads to different substations such that the cost of electricity can be minimized. The proposed DNTR strategy was tested on a synthetic radial distribution network with three substations each connecting to an IEEE 13-bus system. Simulation results demonstrated significant cost saving in daily operations of this distribution system.
An equilibrated estimator for mixed finite element discretizations of the curl-curl problem
Abstract
We propose a new a posteriori error estimator for mixed finite element discretizations of the curl-curl problem. This estimator relies on a Prager--Synge inequality, and therefore leads to fully guaranteed constant-free upper bounds on the error. The estimator is also locally efficient and polynomial-degree-robust. The construction is based on patch-wise divergence-constrained minimization problems, leading to a cheap embarrassingly parallel algorithm. Crucially, the estimator operates without any assumption on the topology of the domain, and unconventional arguments are required to establish the reliability estimate. Numerical examples illustrate the key theoretical results, and suggest that the estimator is suited for mesh adaptivity purposes.
Evaluation of STT-MRAM as a Scratchpad for Training in ML Accelerators
Abstract
Progress in artificial intelligence and machine learning over the past decade has been driven by the ability to train larger deep neural networks (DNNs), leading to a compute demand that far exceeds the growth in hardware performance afforded by Moore's law. Training DNNs is an extremely memory-intensive process, requiring not just the model weights but also activations and gradients for an entire minibatch to be stored. The need to provide high-density and low-leakage on-chip memory motivates the exploration of emerging non-volatile memory for training accelerators. Spin-Transfer-Torque MRAM (STT-MRAM) offers several desirable properties for training accelerators, including 3-4x higher density than SRAM, significantly reduced leakage power, high endurance and reasonable access time. On the one hand, MRAM write operations require high write energy and latency due to the need to ensure reliable switching. In this study, we perform a comprehensive device-to-system evaluation and co-optimization of STT-MRAM for efficient ML training accelerator design. We devised a cross-layer simulation framework to evaluate the effectiveness of STT-MRAM as a scratchpad replacing SRAM in a systolic-array-based DNN accelerator. To address the inefficiency of writes in STT-MRAM, we propose to reduce write voltage and duration. To evaluate the ensuing accuracy-efficiency trade-off, we conduct a thorough analysis of the error tolerance of input activations, weights, and errors during the training. We propose heterogeneous memory configurations that enable training convergence with good accuracy. We show that MRAM provide up to 15-22x improvement in system level energy across a suite of DNN benchmarks under iso-capacity and iso-area scenarios. Further optimizing STT-MRAM write operations can provide over 2x improvement in write energy for minimal degradation in application-level training accuracy.
Artificial Intelligence in archival and historical scholarship workflow: HTS and ChatGPT
Authors: Salvatore Spina
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
This article examines the impact of Artificial Intelligence on the archival heritage digitization processes, specifically regarding the manuscripts' automatic transcription, their correction, and normalization. It highlights how digitality has compelled scholars to redefine Archive and History field and has facilitated the accessibility of analogue sources through digitization and integration into big data. The study focuses on two AI systems, namely Transkribus and ChatGPT, which enable efficient analysis and transcription of digitized sources. The article presents a test of ChatGPT, which was utilized to normalize the text of 366 letters stored in the Correspondence section of the Biscari Archive (Catania). Although the AI exhibited some limitations that resulted in inaccuracies, the corrected texts met expectations. Overall, the article concludes that digitization and AI can significantly enhance archival and historical research by allowing the analysis of vast amounts of data and the application of computational linguistic tools.
FuNToM: Functional Modeling of RF Circuits Using a Neural Network Assisted Two-Port Analysis Method
Abstract
Automatic synthesis of analog and Radio Frequency (RF) circuits is a trending approach that requires an efficient circuit modeling method. This is due to the expensive cost of running a large number of simulations at each synthesis cycle. Artificial intelligence methods are promising approaches for circuit modeling due to their speed and relative accuracy. However, existing approaches require a large amount of training data, which is still collected using simulation runs. In addition, such approaches collect a whole separate dataset for each circuit topology even if a single element is added or removed. These matters are only exacerbated by the need for post-layout modeling simulations, which take even longer. To alleviate these drawbacks, in this paper, we present FuNToM, a functional modeling method for RF circuits. FuNToM leverages the two-port analysis method for modeling multiple topologies using a single main dataset and multiple small datasets. It also leverages neural networks which have shown promising results in predicting the behavior of circuits. Our results show that for multiple RF circuits, in comparison to the state-of-the-art works, while maintaining the same accuracy, the required training data is reduced by 2.8x - 10.9x. In addition, FuNToM needs 176.8x - 188.6x less time for collecting the training set in post-layout modeling.
A Graphical Approach to Document Layout Analysis
Authors: Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, Maxim Sokolov, Vadym Barda, Delphine Vendryes, Chris Tanner
Abstract
Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh
Abstract
Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and most existing work uses standard dense schedules and hyperparameters for training sparse networks. In this work, we examine the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We begin by showing that using standard dense training recipes for sparse training is suboptimal, and results in under-training. We provide new approaches for mitigating this issue for both sparse pre-training of vision models (e.g. ResNet50/ImageNet) and sparse fine-tuning of language models (e.g. BERT/GLUE), achieving state-of-the-art results in both settings in the high-sparsity regime, and providing detailed analyses for the difficulty of sparse training in both scenarios. Our work sets a new threshold in terms of the accuracies that can be achieved under high sparsity, and should inspire further research into improving sparse model training, to reach higher accuracies under high sparsity, but also to do so efficiently.
BEAM: The Modeling Framework for Behavior, Energy, Autonomy & Mobility
Abstract
This report outlines the concepts, mechanisms and inner dynamics of the BEAM (Behavior, Energy, Autonomy, and Mobility) modeling framework. BEAM is an open-source large-scale high-resolution transportation model that harnesses the principles of the actor model of computation to build a powerful and efficient agent-based model of travel behavior. It allows a detailed microscopic view of how people make travel choices and interact with the transportation system, enabling more accurate simulations of human mobility and urban transport networks. It also allows the analysis of numerous spatially defined but interacting layers, and integrates them into a cohesive representation of a regional transportation system. This integrated picture provides invaluable insights to policy makers and other stakeholders about how changes to the transportation system result in changes to traffic congestion, mode share, energy use, and emissions throughout a modeled region. These capabilities are demonstrated with a case study of New York City that showcase BEAM's application in a very large and intricate urban transportation system, without relying on existing travel demand models. The unique ability of BEAM to simulate individual behaviors, integrate with other models, and adapt to different real-world scenarios underscores its importance in the rapidly evolving field of transportation and emphasizes its potential as a valuable proof-of-concept tool to contribute to more informed and effective policy and planning decisions.
Target specification bias, counterfactual prediction, and algorithmic fairness in healthcare
Authors: Eran Tal
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Methodology (stat.ME)
Abstract
Bias in applications of machine learning (ML) to healthcare is usually attributed to unrepresentative or incomplete data, or to underlying health disparities. This article identifies a more pervasive source of bias that affects the clinical utility of ML-enabled prediction tools: target specification bias. Target specification bias arises when the operationalization of the target variable does not match its definition by decision makers. The mismatch is often subtle, and stems from the fact that decision makers are typically interested in predicting the outcomes of counterfactual, rather than actual, healthcare scenarios. Target specification bias persists independently of data limitations and health disparities. When left uncorrected, it gives rise to an overestimation of predictive accuracy, to inefficient utilization of medical resources, and to suboptimal decisions that can harm patients. Recent work in metrology - the science of measurement - suggests ways of counteracting target specification bias and avoiding its harmful consequences.
Efficient Model Adaptation for Continual Learning at the Edge
Authors: Zachary A. Daniels, Jun Hu, Michael Lomnitz, Phil Miller, Aswin Raghavan, Joe Zhang, Michael Piacentino, David Zhang
Abstract
Most machine learning (ML) systems assume stationary and matching data distributions during training and deployment. This is often a false assumption. When ML models are deployed on real devices, data distributions often shift over time due to changes in environmental factors, sensor characteristics, and task-of-interest. While it is possible to have a human-in-the-loop to monitor for distribution shifts and engineer new architectures in response to these shifts, such a setup is not cost-effective. Instead, non-stationary automated ML (AutoML) models are needed. This paper presents the Encoder-Adaptor-Reconfigurator (EAR) framework for efficient continual learning under domain shifts. The EAR framework uses a fixed deep neural network (DNN) feature encoder and trains shallow networks on top of the encoder to handle novel data. The EAR framework is capable of 1) detecting when new data is out-of-distribution (OOD) by combining DNNs with hyperdimensional computing (HDC), 2) identifying low-parameter neural adaptors to adapt the model to the OOD data using zero-shot neural architecture search (ZS-NAS), and 3) minimizing catastrophic forgetting on previous tasks by progressively growing the neural architecture as needed and dynamically routing data through the appropriate adaptors and reconfigurators for handling domain-incremental and class-incremental continual learning. We systematically evaluate our approach on several benchmark datasets for domain adaptation and demonstrate strong performance compared to state-of-the-art algorithms for OOD detection and few-/zero-shot NAS.
Strong convergence of multiscale truncated Euler-Maruyama method for super-linear slow-fast stochastic differential equations
Abstract
This work focuses on solving super-linear stochastic differential equations (SDEs) involving different time scales numerically. Taking advantages of being explicit and easily implementable, a multiscale truncated Euler-Maruyama scheme is proposed for slow-fast SDEs with local Lipschitz coefficients. By virtue of the averaging principle, the strong convergence of its numerical solutions to the exact ones in pth moment is obtained. Furthermore, under mild conditions on the coefficients, the corresponding strong error estimate is also provided. Finally, two examples and some numerical simulations are given to verify the theoretical results.
VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs
Abstract
Graph Neural Networks (GNNs) conduct message passing which aggregates local neighbors to update node representations. Such message passing leads to scalability issues in practical latency-constrained applications. To address this issue, recent methods adopt knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (MLP) by mimicking the output of GNN. However, the existing GNN representation space may not be expressive enough for representing diverse local structures of the underlying graph, which limits the knowledge transfer from GNN to MLP. Here we present a novel framework VQGraph to learn a powerful graph representation space for bridging GNNs and MLPs. We adopt the encoder of a variant of a vector-quantized variational autoencoder (VQ-VAE) as a structure-aware graph tokenizer, which explicitly represents the nodes of diverse local structures as numerous discrete tokens and constitutes a meaningful codebook. Equipped with the learned codebook, we propose a new token-based distillation objective based on soft token assignments to sufficiently transfer the structural knowledge from GNN to MLP. Extensive experiments and analyses demonstrate the strong performance of VQGraph, where we achieve new state-of-the-art performance on GNN-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: https://github.com/YangLing0818/VQGraph.
Abstract
Understanding the life cycle of the machine learning (ML) model is an intriguing area of research (e.g., understanding where the model comes from, how it is trained, and how it is used). This paper focuses on a novel problem within this field, namely Model Provenance (MP), which concerns the relationship between a target model and its pre-training model and aims to determine whether a source model serves as the provenance for a target model. This is an important problem that has significant implications for ensuring the security and intellectual property of machine learning models but has not received much attention in the literature. To fill in this gap, we introduce a novel concept of Model DNA which represents the unique characteristics of a machine learning model. We utilize a data-driven and model-driven representation learning method to encode the model's training data and input-output information as a compact and comprehensive representation (i.e., DNA) of the model. Using this model DNA, we develop an efficient framework for model provenance identification, which enables us to identify whether a source model is a pre-training model of a target model. We conduct evaluations on both computer vision and natural language processing tasks using various models, datasets, and scenarios to demonstrate the effectiveness of our approach in accurately identifying model provenance.
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Authors: Lin Zhang, Shaohuai Shi, Bo Li
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
Second-order optimization algorithms exhibit excellent convergence properties for training deep learning models, but often incur significant computation and memory overheads. This can result in lower training efficiency than the first-order counterparts such as stochastic gradient descent (SGD). In this work, we present a memory- and time-efficient second-order algorithm named Eva with two novel techniques: 1) we construct the second-order information with the Kronecker factorization of small stochastic vectors over a mini-batch of training data to reduce memory consumption, and 2) we derive an efficient update formula without explicitly computing the inverse of matrices using the Sherman-Morrison formula. We further extend Eva to a general vectorized approximation framework to improve the compute and memory efficiency of two existing second-order algorithms (FOOF and Shampoo) without affecting their convergence performance. Extensive experimental results on different models and datasets show that Eva reduces the end-to-end training time up to 2.05x and 2.42x compared to first-order SGD and second-order algorithms (K-FAC and Shampoo), respectively.
Artificial intelligence based load balancing in SDN: A comprehensive survey
Authors: Ahmed Hazim Alhilali, Ahmadreza Montazerolghaem
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
In the future, it is anticipated that software-defined networking (SDN) will become the preferred platform for deploying diverse networks. Compared to traditional networks, SDN separates the control and data planes for efficient domain-wide traffic routing and management. The controllers in the control plane are responsible for programming data plane forwarding devices, while the top layer, the application plane, enforces policies and programs the network. The different levels of the SDN use interfaces for communication. However, SDN faces challenges with traffic distribution, such as load imbalance, which can negatively affect the network performance. Consequently, developers have developed various SDN load-balancing solutions to enhance SDN effectiveness. In addition, researchers are considering the potential of implementing some artificial intelligence (AI) approaches into SDN to improve network resource usage and overall performance due to the fast growth of the AI field. This survey focuses on the following: Firstly, analyzing the SDN architecture and investigating the problem of load balancing in SDN. Secondly, categorizing AI-based load balancing methods and thoroughly assessing these mechanisms from various perspectives, such as the algorithm/technique employed, the tackled problem, and their strengths and weaknesses. Thirdly, summarizing the metrics utilized to measure the effectiveness of these techniques. Finally, identifying the trends and challenges of AI-based load balancing for future research.
Abstract
Autonomous vehicles and robots need to operate over a wide variety of scenarios in order to complete tasks efficiently and safely. Multi-camera self-supervised monocular depth estimation from videos is a promising way to reason about the environment, as it generates metrically scaled geometric predictions from visual data without requiring additional sensors. However, most works assume well-calibrated extrinsics to fully leverage this multi-camera setup, even though accurate and efficient calibration is still a challenging problem. In this work, we introduce a novel method for extrinsic calibration that builds upon the principles of self-supervised monocular depth and ego-motion learning. Our proposed curriculum learning strategy uses monocular depth and pose estimators with velocity supervision to estimate extrinsics, and then jointly learns extrinsic calibration along with depth and pose for a set of overlapping cameras rigidly attached to a moving vehicle. Experiments on a benchmark multi-camera dataset (DDAD) demonstrate that our method enables self-calibration in various scenes robustly and efficiently compared to a traditional vision-based pose estimation pipeline. Furthermore, we demonstrate the benefits of extrinsics self-calibration as a way to improve depth prediction via joint optimization.
Improved Order Analysis and Design of Exponential Integrator for Diffusion Models Sampling
Abstract
Efficient differential equation solvers have significantly reduced the sampling time of diffusion models (DMs) while retaining high sampling quality. Among these solvers, exponential integrators (EI) have gained prominence by demonstrating state-of-the-art performance. However, existing high-order EI-based sampling algorithms rely on degenerate EI solvers, resulting in inferior error bounds and reduced accuracy in contrast to the theoretically anticipated results under optimal settings. This situation makes the sampling quality extremely vulnerable to seemingly innocuous design choices such as timestep schedules. For example, an inefficient timestep scheduler might necessitate twice the number of steps to achieve a quality comparable to that obtained through carefully optimized timesteps. To address this issue, we reevaluate the design of high-order differential solvers for DMs. Through a thorough order analysis, we reveal that the degeneration of existing high-order EI solvers can be attributed to the absence of essential order conditions. By reformulating the differential equations in DMs and capitalizing on the theory of exponential integrators, we propose refined EI solvers that fulfill all the order conditions, which we designate as Refined Exponential Solver (RES). Utilizing these improved solvers, RES exhibits more favorable error bounds theoretically and achieves superior sampling efficiency and stability in practical applications. For instance, a simple switch from the single-step DPM-Solver++ to our order-satisfied RES solver when Number of Function Evaluations (NFE) $=9$, results in a reduction of numerical defects by $25.2\%$ and FID improvement of $25.4\%$ (16.77 vs 12.51) on a pre-trained ImageNet diffusion model.
AutoML4ETC: Automated Neural Architecture Search for Real-World Encrypted Traffic Classification
Authors: Navid Malekghaini, Elham Akbari, Mohammad A. Salahuddin, Noura Limam, Raouf Boutaba, Bertrand Mathieu, Stephanie Moteau, Stephane Tuffin
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Deep learning (DL) has been successfully applied to encrypted network traffic classification in experimental settings. However, in production use, it has been shown that a DL classifier's performance inevitably decays over time. Re-training the model on newer datasets has been shown to only partially improve its performance. Manually re-tuning the model architecture to meet the performance expectations on newer datasets is time-consuming and requires domain expertise. We propose AutoML4ETC, a novel tool to automatically design efficient and high-performing neural architectures for encrypted traffic classification. We define a novel, powerful search space tailored specifically for the near real-time classification of encrypted traffic using packet header bytes. We show that with different search strategies over our search space, AutoML4ETC generates neural architectures that outperform the state-of-the-art encrypted traffic classifiers on several datasets, including public benchmark datasets and real-world TLS and QUIC traffic collected from the Orange mobile network. In addition to being more accurate, AutoML4ETC's architectures are significantly more efficient and lighter in terms of the number of parameters. Finally, we make AutoML4ETC publicly available for future research.
ES-MVSNet: Efficient Framework for End-to-end Self-supervised Multi-View Stereo
Abstract
Compared to the multi-stage self-supervised multi-view stereo (MVS) method, the end-to-end (E2E) approach has received more attention due to its concise and efficient training pipeline. Recent E2E self-supervised MVS approaches have integrated third-party models (such as optical flow models, semantic segmentation models, NeRF models, etc.) to provide additional consistency constraints, which grows GPU memory consumption and complicates the model's structure and training pipeline. In this work, we propose an efficient framework for end-to-end self-supervised MVS, dubbed ES-MVSNet. To alleviate the high memory consumption of current E2E self-supervised MVS frameworks, we present a memory-efficient architecture that reduces memory usage by 43% without compromising model performance. Furthermore, with the novel design of asymmetric view selection policy and region-aware depth consistency, we achieve state-of-the-art performance among E2E self-supervised MVS methods, without relying on third-party models for additional consistency signals. Extensive experiments on DTU and Tanks&Temples benchmarks demonstrate that the proposed ES-MVSNet approach achieves state-of-the-art performance among E2E self-supervised MVS methods and competitive performance to many supervised and multi-stage self-supervised methods.
Edge Dynamic Map architecture for C-ITS applications
Abstract
Cooperative Intelligent Transport Systems (C-ITS) create, share and process massive amounts of data which needs to be real-time managed to enable new cooperative and autonomous driving applications. Vehicle-to-Everything (V2X) communications facilitate information exchange among vehicles and infrastructures using various protocols. By providing computer power, data storage, and low latency capabilities, Multi-access Edge Computing (MEC) has become a key enabling technology in the transport industry. The Local Dynamic Map (LDM) concept has consequently been extended to its utilisation in MECs, into an efficient, collaborative, and centralised Edge Dynamic Map (EDM) for C-ITS applications. This research presents an EDM architecture for V2X communications and implements a real-time proof-of-concept using a Time-Series Database (TSDB) engine to store vehicular message information. The performance evaluation includes data insertion and querying, assessing the system's capacity and scale for low-latency Cooperative Awareness Message (CAM) applications. Traffic simulations using SUMO have been employed to generate virtual routes for thousands of vehicles, demonstrating the transmission of virtual CAM messages to the EDM.
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation
Authors: Chenglong Wang, Hang Zhou, Yimin Hu, Yifu Huo, Bei Li, Tongran Liu, Tong Xiao, Jingbo Zhu
Abstract
Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e.g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences. This is a computational challenge as presented by the practice of sequence generation problems, such as machine translation, where we often deal with a large action space (\textit{e.g.,} a vocabulary) and a long action sequence (\textit{e.g.,} a translation). In this work, we introduce two-stage sampling and dynamic sampling approaches to improve the sampling efficiency during training sequence generation models via RL. We experiment with our approaches on the traditional sequence generation tasks, including machine translation and abstractive summarization. Furthermore, we evaluate our approaches in RL from human feedback (RLHF) through training a large language model using the reward model. Experimental results show that the efficient sampling-based RL, referred to as ESRL, can outperform all baselines in terms of both training efficiency and memory consumption. Notably, ESRL yields consistent performance gains over the strong REINFORCE, minimum risk training, and proximal policy optimization methods.
Deep Semantic Model Fusion for Ancient Agricultural Terrace Detection
Authors: Yi Wang, Chenying Liu, Arti Tiwari, Micha Silver, Arnon Karnieli, Xiao Xiang Zhu, Conrad M Albrecht
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Discovering ancient agricultural terraces in desert regions is important for the monitoring of long-term climate changes on the Earth's surface. However, traditional ground surveys are both costly and limited in scale. With the increasing accessibility of aerial and satellite data, machine learning techniques bear large potential for the automatic detection and recognition of archaeological landscapes. In this paper, we propose a deep semantic model fusion method for ancient agricultural terrace detection. The input data includes aerial images and LiDAR generated terrain features in the Negev desert. Two deep semantic segmentation models, namely DeepLabv3+ and UNet, with EfficientNet backbone, are trained and fused to provide segmentation maps of ancient terraces and walls. The proposed method won the first prize in the International AI Archaeology Challenge. Codes are available at https://github.com/wangyi111/international-archaeology-ai-challenge.
Robust Discontinuity Indicators for High-Order Reconstruction of Piecewise Smooth Functions
Abstract
In many applications, piecewise continuous functions are commonly interpolated over meshes. However, accurate high-order manipulations of such functions can be challenging due to potential spurious oscillations known as the Gibbs phenomena. To address this challenge, we propose a novel approach, Robust Discontinuity Indicators (RDI), which can efficiently and reliably detect both C^{0} and C^{1} discontinuities for node-based and cell-averaged values. We present a detailed analysis focusing on its derivation and the dual-thresholding strategy. A key advantage of RDI is its ability to handle potential inaccuracies associated with detecting discontinuities on non-uniform meshes, thanks to its innovative discontinuity indicators. We also extend the applicability of RDI to handle general surfaces with boundaries, features, and ridge points, thereby enhancing its versatility and usefulness in various scenarios. To demonstrate the robustness of RDI, we conduct a series of experiments on non-uniform meshes and general surfaces, and compare its performance with some alternative methods. By addressing the challenges posed by the Gibbs phenomena and providing reliable detection of discontinuities, RDI opens up possibilities for improved approximation and analysis of piecewise continuous functions, such as in data remap.
Efficient Monaural Speech Enhancement using Spectrum Attention Fusion
Abstract
Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we present an improvement for speech enhancement models that maintains the expressiveness of self-attention while significantly reducing model complexity, which we have termed Spectrum Attention Fusion. We carefully construct a convolutional module to replace several self-attention layers in a speech Transformer, allowing the model to more efficiently fuse spectral features. Our proposed model is able to achieve comparable or better results against SOTA models but with significantly smaller parameters (0.58M) on the Voice Bank + DEMAND dataset.
Krylov Subspace Recycling With Randomized Sketching For Matrix Functions
Abstract
A Krylov subspace recycling method for the efficient evaluation of a sequence of matrix functions acting on a set of vectors is developed. The method improves over the recycling methods presented in [Burke et al., arXiv:2209.14163, 2022] in that it uses a closed-form expression for the augmented FOM approximants and hence circumvents the use of numerical quadrature. We further extend our method to use randomized sketching in order to avoid the arithmetic cost of orthogonalizing a full Krylov basis, offering an attractive solution to the fact that recycling algorithms built from shifted augmented FOM cannot easily be restarted. The efficacy of the proposed algorithms is demonstrated with numerical experiments.
Efficient Spectrum Sharing Between Coexisting OFDM Radar and Downlink Multiuser Communication Systems
Authors: Jia Zhu, Yifeng Xiong, Xiaojun Jing
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
Abstract
This paper investigates the problem of joint subcarrier and power allocation in the coexistence of radar and multi-user communication systems. Specifically, in our research scenario, the base station (BS) provides information transmission services for multiple users while ensuring that its interference to a separate radar system will not affect the radar's normal function. To this end, we propose a subcarrier and power allocation scheme based on orthogonal frequency division multiple access (OFDM). The original problem consisting involving multivariate fractional programming and binary variables is highly non-convex. Due to its complexity, we relax the binary constraint by introducing a penalty term, provided that the optimal solution is not affected. Then, by integrating multiple power variables into one matrix, the original problem is reformulated as a multi-ratio fractional programming (FP) problem, and finally a quadratic transform is employed to make the non-convex problem a sequence of convex problems. The numerical results indicate the performance trade-off between the multi-user communication system and the radar system, and notably that the performance of the communication system is not improved with power increase in the presence of radar interference beyond a certain threshold. This provides a useful insight for the energy-efficient design of the system.
Improving the Security of United States Elections with Robust Optimization
Authors: Braden L. Crimmins, J. Alex Halderman, Bradley Sturt
Subjects: Cryptography and Security (cs.CR); Optimization and Control (math.OC)
Abstract
For more than a century, election officials across the United States have inspected voting machines before elections using a procedure called Logic and Accuracy Testing (LAT). This procedure consists of election officials casting a test deck of ballots into each voting machine and confirming the machine produces the expected vote total for each candidate. We bring a scientific perspective to LAT by introducing the first formal approach to designing test decks with rigorous security guarantees. Specifically, our approach employs robust optimization to find test decks that are guaranteed to detect any voting machine misconfiguration that would cause votes to be swapped across candidates. Out of all the test decks with this security guarantee, our robust optimization problem yields the test deck with the minimum number of ballots, thereby minimizing implementation costs for election officials. To facilitate deployment at scale, we develop a practically efficient exact algorithm for solving our robust optimization problems based on the cutting plane method. In partnership with the Michigan Bureau of Elections, we retrospectively applied our approach to all 6928 ballot styles from Michigan's November 2022 general election; this retrospective study reveals that the test decks with rigorous security guarantees obtained by our approach require, on average, only 1.2% more ballots than current practice. Our approach has since been piloted in real-world elections by the Michigan Bureau of Elections as a low-cost way to improve election security and increase public trust in democratic institutions.
Communication-Efficient Decentralized Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control
Abstract
Connected and autonomous vehicles (CAVs) promise next-gen transportation systems with enhanced safety, energy efficiency, and sustainability. One typical control strategy for CAVs is the so-called cooperative adaptive cruise control (CACC) where vehicles drive in platoons and cooperate to achieve safe and efficient transportation. In this study, we formulate CACC as a multi-agent reinforcement learning (MARL) problem. Diverging from existing MARL methods that use centralized training and decentralized execution which require not only a centralized communication mechanism but also dense inter-agent communication, we propose a fully-decentralized MARL framework for enhanced efficiency and scalability. In addition, a quantization-based communication scheme is proposed to reduce the communication overhead without significantly degrading the control performance. This is achieved by employing randomized rounding numbers to quantize each piece of communicated information and only communicating non-zero components after quantization. Extensive experimentation in two distinct CACC settings reveals that the proposed MARL framework consistently achieves superior performance over several contemporary benchmarks in terms of both communication efficiency and control efficacy.
Scaling Survival Analysis in Healthcare with Federated Survival Forests: A Comparative Study on Heart Failure and Breast Cancer Genomics
Authors: Alberto Archetti, Francesca Ieva, Matteo Matteucci
Abstract
Survival analysis is a fundamental tool in medicine, modeling the time until an event of interest occurs in a population. However, in real-world applications, survival data are often incomplete, censored, distributed, and confidential, especially in healthcare settings where privacy is critical. The scarcity of data can severely limit the scalability of survival models to distributed applications that rely on large data pools. Federated learning is a promising technique that enables machine learning models to be trained on multiple datasets without compromising user privacy, making it particularly well-suited for addressing the challenges of survival data and large-scale survival applications. Despite significant developments in federated learning for classification and regression, many directions remain unexplored in the context of survival analysis. In this work, we propose an extension of the Federated Survival Forest algorithm, called FedSurF++. This federated ensemble method constructs random survival forests in heterogeneous federations. Specifically, we investigate several new tree sampling methods from client forests and compare the results with state-of-the-art survival models based on neural networks. The key advantage of FedSurF++ is its ability to achieve comparable performance to existing methods while requiring only a single communication round to complete. The extensive empirical investigation results in a significant improvement from the algorithmic and privacy preservation perspectives, making the original FedSurF algorithm more efficient, robust, and private. We also present results on two real-world datasets demonstrating the success of FedSurF++ in real-world healthcare studies. Our results underscore the potential of FedSurF++ to improve the scalability and effectiveness of survival analysis in distributed settings while preserving user privacy.
Learning Optimal Admission Control in Partially Observable Queueing Networks
Authors: Jonatha Anselmi, Bruno Gaujal, Louis-Sébastien Rebuffi
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network. Specifically, only the arrival and departure times from the network are observable, and optimality refers to the average holding/rejection cost in infinite horizon. While reinforcement learning in Partially Observable Markov Decision Processes (POMDP) is prohibitively expensive in general, we show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$. In particular, in contrast with existing regret analyses, our regret bound does not depend on the diameter of the underlying Markov Decision Process (MDP), which in most queueing systems is at least exponential in $S$. The novelty of our approach is to leverage Norton's equivalent theorem for closed product-form queueing networks and an efficient reinforcement learning algorithm for MDPs with the structure of birth-and-death processes.
Work-in-Progress: A Universal Instrumentation Platform for Non-Volatile Memories
Authors: Felix Staudigl, Mohammed Hossein, Tobias Ziegler, Hazem Al Indari, Rebecca Pelke, Sebastian Siegel, Dirk J. Wouters, Dominik Sisejkovic, Jan Moritz Joseph, Rainer Leupers
Subjects: Hardware Architecture (cs.AR); Computers and Society (cs.CY)
Abstract
Emerging non-volatile memories (NVMs) represent a disruptive technology that allows a paradigm shift from the conventional von Neumann architecture towards more efficient computing-in-memory (CIM) architectures. Several instrumentation platforms have been proposed to interface NVMs allowing the characterization of single cells and crossbar structures. However, these platforms suffer from low flexibility and are not capable of performing CIM operations on NVMs. Therefore, we recently designed and built the NeuroBreakoutBoard, a highly versatile instrumentation platform capable of executing CIM on NVMs. We present our preliminary results demonstrating a relative error < 5% in the range of 1 k$\Omega$ to 1 M$\Omega$ and showcase the switching behavior of a HfO$_2$/Ti-based memristive cell.
From Military to Healthcare: Adopting and Expanding Ethical Principles for Generative Artificial Intelligence
Authors: David Oniani, Jordan Hilsman, Yifan Peng, COL (Ret.)Ronald K. Poropatich, COL Jeremy C. Pamplin, LTC Gary L. Legault, Yanshan Wang
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
In 2020, the U.S. Department of Defense officially disclosed a set of ethical principles to guide the use of Artificial Intelligence (AI) technologies on future battlefields. Despite stark differences, there are core similarities between the military and medical service. Warriors on battlefields often face life-altering circumstances that require quick decision-making. Medical providers experience similar challenges in a rapidly changing healthcare environment, such as in the emergency department or during surgery treating a life-threatening condition. Generative AI, an emerging technology designed to efficiently generate valuable information, holds great promise. As computing power becomes more accessible and the abundance of health data, such as electronic health records, electrocardiograms, and medical images, increases, it is inevitable that healthcare will be revolutionized by this technology. Recently, generative AI has captivated the research community, leading to debates about its application in healthcare, mainly due to concerns about transparency and related issues. Meanwhile, concerns about the potential exacerbation of health disparities due to modeling biases have raised notable ethical concerns regarding the use of this technology in healthcare. However, the ethical principles for generative AI in healthcare have been understudied, and decision-makers often fail to consider the significance of generative AI. In this paper, we propose GREAT PLEA ethical principles, encompassing governance, reliability, equity, accountability, traceability, privacy, lawfulness, empathy, and autonomy, for generative AI in healthcare. We aim to proactively address the ethical dilemmas and challenges posed by the integration of generative AI in healthcare.
Abstract
Federated learning enables collaborative training of machine learning models by keeping the raw data of the involved workers private. One of its main objectives is to improve the models' privacy, security, and scalability. Vertical Federated Learning (VFL) offers an efficient cross-silo setting where a few parties collaboratively train a model without sharing the same features. In such a scenario, classification labels are commonly considered sensitive information held exclusively by one (active) party, while other (passive) parties use only their local information. Recent works have uncovered important flaws of VFL, leading to possible label inference attacks under the assumption that the attacker has some, even limited, background knowledge on the relation between labels and data. In this work, we are the first (to the best of our knowledge) to investigate label inference attacks on VFL using a zero-background knowledge strategy. To concretely formulate our proposal, we focus on Graph Neural Networks (GNNs) as a target model for the underlying VFL. In particular, we refer to node classification tasks, which are widely studied, and GNNs have shown promising results. Our proposed attack, BlindSage, provides impressive results in the experiments, achieving nearly 100% accuracy in most cases. Even when the attacker has no information about the used architecture or the number of classes, the accuracy remained above 85% in most instances. Finally, we observe that well-known defenses cannot mitigate our attack without affecting the model's performance on the main classification task.
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Abstract
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a shared embedding space, which bridges the gap between closed-vocabulary and open-vocabulary recognition. Hence, existing methods often adopt a two-stage framework to tackle the problem, where the inputs first go through a mask generator and then through the CLIP model along with the predicted masks. This process involves extracting features from images multiple times, which can be ineffective and inefficient. By contrast, we propose to build everything into a single-stage framework using a shared Frozen Convolutional CLIP backbone, which not only significantly simplifies the current two-stage pipeline, but also remarkably yields a better accuracy-cost trade-off. The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining. When training on COCO panoptic data only and testing in a zero-shot manner, FC-CLIP achieve 26.8 PQ, 16.8 AP, and 34.1 mIoU on ADE20K, 18.2 PQ, 27.9 mIoU on Mapillary Vistas, 44.0 PQ, 26.8 AP, 56.2 mIoU on Cityscapes, outperforming the prior art by +4.2 PQ, +2.4 AP, +4.2 mIoU on ADE20K, +4.0 PQ on Mapillary Vistas and +20.1 PQ on Cityscapes, respectively. Additionally, the training and testing time of FC-CLIP is 7.5x and 6.6x significantly faster than the same prior art, while using 5.9x fewer parameters. FC-CLIP also sets a new state-of-the-art performance across various open-vocabulary semantic segmentation datasets. Code at https://github.com/bytedance/fc-clip
Keyword: faster
VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs
Abstract
Graph Neural Networks (GNNs) conduct message passing which aggregates local neighbors to update node representations. Such message passing leads to scalability issues in practical latency-constrained applications. To address this issue, recent methods adopt knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (MLP) by mimicking the output of GNN. However, the existing GNN representation space may not be expressive enough for representing diverse local structures of the underlying graph, which limits the knowledge transfer from GNN to MLP. Here we present a novel framework VQGraph to learn a powerful graph representation space for bridging GNNs and MLPs. We adopt the encoder of a variant of a vector-quantized variational autoencoder (VQ-VAE) as a structure-aware graph tokenizer, which explicitly represents the nodes of diverse local structures as numerous discrete tokens and constitutes a meaningful codebook. Equipped with the learned codebook, we propose a new token-based distillation objective based on soft token assignments to sufficiently transfer the structural knowledge from GNN to MLP. Extensive experiments and analyses demonstrate the strong performance of VQGraph, where we achieve new state-of-the-art performance on GNN-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: https://github.com/YangLing0818/VQGraph.
Balanced Classification: A Unified Framework for Long-Tailed Object Detection
Abstract
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories. In this paper, we contend that the learning bias originates from two factors: 1) the unequal competition arising from the imbalanced distribution of foreground categories, and 2) the lack of sample diversity in tail categories. To tackle these issues, we introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution and dynamic intensification of sample diversities in a synchronized manner. Specifically, a novel foreground classification balance loss (FCBL) is developed to ameliorate the domination of head categories and shift attention to difficult-to-differentiate categories by introducing pairwise class-aware margins and auto-adjusted weight terms, respectively. This loss prevents the over-suppression of tail categories in the context of unequal competition. Moreover, we propose a dynamic feature hallucination module (FHM), which enhances the representation of tail categories in the feature space by synthesizing hallucinated samples to introduce additional data variances. In this divide-and-conquer approach, BACL sets a new state-of-the-art on the challenging LVIS benchmark with a decoupled training pipeline, surpassing vanilla Faster R-CNN with ResNet-50-FPN by 5.8% AP and 16.1% AP for overall and tail categories. Extensive experiments demonstrate that BACL consistently achieves performance improvements across various datasets with different backbones and architectures. Code and models are available at https://github.com/Tianhao-Qi/BACL.
MSECNet: Accurate and Robust Normal Estimation for 3D Point Clouds by Multi-Scale Edge Conditioning
Abstract
Estimating surface normals from 3D point clouds is critical for various applications, including surface reconstruction and rendering. While existing methods for normal estimation perform well in regions where normals change slowly, they tend to fail where normals vary rapidly. To address this issue, we propose a novel approach called MSECNet, which improves estimation in normal varying regions by treating normal variation modeling as an edge detection problem. MSECNet consists of a backbone network and a multi-scale edge conditioning (MSEC) stream. The MSEC stream achieves robust edge detection through multi-scale feature fusion and adaptive edge detection. The detected edges are then combined with the output of the backbone network using the edge conditioning module to produce edge-aware representations. Extensive experiments show that MSECNet outperforms existing methods on both synthetic (PCPNet) and real-world (SceneNN) datasets while running significantly faster. We also conduct various analyses to investigate the contribution of each component in the MSEC stream. Finally, we demonstrate the effectiveness of our approach in surface reconstruction.
Fast and Accurate Reduced-Order Modeling of a MOOSE-based Additive Manufacturing Model with Operator Learning
Authors: Mahmoud Yaseen, Dewen Yushu, Peter German, Xu Wu
Abstract
One predominant challenge in additive manufacturing (AM) is to achieve specific material properties by manipulating manufacturing process parameters during the runtime. Such manipulation tends to increase the computational load imposed on existing simulation tools employed in AM. The goal of the present work is to construct a fast and accurate reduced-order model (ROM) for an AM model developed within the Multiphysics Object-Oriented Simulation Environment (MOOSE) framework, ultimately reducing the time/cost of AM control and optimization processes. Our adoption of the operator learning (OL) approach enabled us to learn a family of differential equations produced by altering process variables in the laser's Gaussian point heat source. More specifically, we used the Fourier neural operator (FNO) and deep operator network (DeepONet) to develop ROMs for time-dependent responses. Furthermore, we benchmarked the performance of these OL methods against a conventional deep neural network (DNN)-based ROM. Ultimately, we found that OL methods offer comparable performance and, in terms of accuracy and generalizability, even outperform DNN at predicting scalar model responses. The DNN-based ROM afforded the fastest training time. Furthermore, all the ROMs were faster than the original MOOSE model yet still provided accurate predictions. FNO had a smaller mean prediction error than DeepONet, with a larger variance for time-dependent responses. Unlike DNN, both FNO and DeepONet were able to simulate time series data without the need for dimensionality reduction techniques. The present work can help facilitate the AM optimization process by enabling faster execution of simulation tools while still preserving evaluation accuracy.
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Abstract
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a shared embedding space, which bridges the gap between closed-vocabulary and open-vocabulary recognition. Hence, existing methods often adopt a two-stage framework to tackle the problem, where the inputs first go through a mask generator and then through the CLIP model along with the predicted masks. This process involves extracting features from images multiple times, which can be ineffective and inefficient. By contrast, we propose to build everything into a single-stage framework using a shared Frozen Convolutional CLIP backbone, which not only significantly simplifies the current two-stage pipeline, but also remarkably yields a better accuracy-cost trade-off. The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining. When training on COCO panoptic data only and testing in a zero-shot manner, FC-CLIP achieve 26.8 PQ, 16.8 AP, and 34.1 mIoU on ADE20K, 18.2 PQ, 27.9 mIoU on Mapillary Vistas, 44.0 PQ, 26.8 AP, 56.2 mIoU on Cityscapes, outperforming the prior art by +4.2 PQ, +2.4 AP, +4.2 mIoU on ADE20K, +4.0 PQ on Mapillary Vistas and +20.1 PQ on Cityscapes, respectively. Additionally, the training and testing time of FC-CLIP is 7.5x and 6.6x significantly faster than the same prior art, while using 5.9x fewer parameters. FC-CLIP also sets a new state-of-the-art performance across various open-vocabulary semantic segmentation datasets. Code at https://github.com/bytedance/fc-clip
Keyword: mobile
Disease Insight through Digital Biomarkers Developed by Remotely Collected Wearables and Smartphone Data
Abstract
Digital Biomarkers and remote patient monitoring can provide valuable and timely insights into how a patient is coping with their condition (disease progression, treatment response, etc.), complementing treatment in traditional healthcare settings.Smartphones with embedded and connected sensors have immense potential for improving healthcare through various apps and mHealth (mobile health) platforms. This capability could enable the development of reliable digital biomarkers from long-term longitudinal data collected remotely from patients. We built an open-source platform, RADAR-base, to support large-scale data collection in remote monitoring studies. RADAR-base is a modern remote data collection platform built around Confluent's Apache Kafka, to support scalability, extensibility, security, privacy and quality of data. It provides support for study design and set-up, active (eg PROMs) and passive (eg. phone sensors, wearable devices and IoT) remote data collection capabilities with feature generation (eg. behavioural, environmental and physiological markers). The backend enables secure data transmission, and scalable solutions for data storage, management and data access. The platform has successfully collected longitudinal data for various cohorts in a number of disease areas including Multiple Sclerosis, Depression, Epilepsy, ADHD, Alzheimer, Autism and Lung diseases. Digital biomarkers developed through collected data are providing useful insights into different diseases. RADAR-base provides a modern open-source, community-driven solution for remote monitoring, data collection, and digital phenotyping of physical and mental health diseases. Clinicians can use digital biomarkers to augment their decision making for the prevention, personalisation and early intervention of disease.
A Survey of Beam Management for mmWave and THz Communications Towards 6G
Abstract
Communication in millimeter wave (mmWave) and even terahertz (THz) frequency bands is ushering in a new era of wireless communications. Beam management, namely initial access and beam tracking, has been recognized as an essential technique to ensure robust mmWave/THz communications, especially for mobile scenarios. However, narrow beams at higher carrier frequency lead to huge beam measurement overhead, which has a negative impact on beam acquisition and tracking. In addition, the beam management process is further complicated by the fluctuation of mmWave/THz channels, the random movement patterns of users, and the dynamic changes in the environment. For mmWave and THz communications toward 6G, we have witnessed a substantial increase in research and industrial attention on artificial intelligence (AI), reconfigurable intelligent surface (RIS), and integrated sensing and communications (ISAC). The introduction of these enabling technologies presents both open opportunities and unique challenges for beam management. In this paper, we present a comprehensive survey on mmWave and THz beam management. Further, we give some insights on technical challenges and future research directions in this promising area.
AutoML4ETC: Automated Neural Architecture Search for Real-World Encrypted Traffic Classification
Authors: Navid Malekghaini, Elham Akbari, Mohammad A. Salahuddin, Noura Limam, Raouf Boutaba, Bertrand Mathieu, Stephanie Moteau, Stephane Tuffin
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Deep learning (DL) has been successfully applied to encrypted network traffic classification in experimental settings. However, in production use, it has been shown that a DL classifier's performance inevitably decays over time. Re-training the model on newer datasets has been shown to only partially improve its performance. Manually re-tuning the model architecture to meet the performance expectations on newer datasets is time-consuming and requires domain expertise. We propose AutoML4ETC, a novel tool to automatically design efficient and high-performing neural architectures for encrypted traffic classification. We define a novel, powerful search space tailored specifically for the near real-time classification of encrypted traffic using packet header bytes. We show that with different search strategies over our search space, AutoML4ETC generates neural architectures that outperform the state-of-the-art encrypted traffic classifiers on several datasets, including public benchmark datasets and real-world TLS and QUIC traffic collected from the Orange mobile network. In addition to being more accurate, AutoML4ETC's architectures are significantly more efficient and lighter in terms of the number of parameters. Finally, we make AutoML4ETC publicly available for future research.
Keyword: pruning
MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching
Abstract
Entity Matching (EM), which aims to identify all entity pairs referring to the same real-world entity from relational tables, is one of the most important tasks in real-world data management systems. Due to the labeling process of EM being extremely labor-intensive, unsupervised EM is more applicable than supervised EM in practical scenarios. Traditional unsupervised EM assumes that all entities come from two tables; however, it is more common to match entities from multiple tables in practical applications, that is, multi-table entity matching (multi-table EM). Unfortunately, effective and efficient unsupervised multi-table EM remains under-explored. To fill this gap, this paper formally studies the problem of unsupervised multi-table entity matching and proposes an effective and efficient solution, termed as MultiEM. MultiEM is a parallelable pipeline of enhanced entity representation, table-wise hierarchical merging, and density-based pruning. Extensive experimental results on six real-world benchmark datasets demonstrate the superiority of MultiEM in terms of effectiveness and efficiency.
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh
Abstract
Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and most existing work uses standard dense schedules and hyperparameters for training sparse networks. In this work, we examine the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We begin by showing that using standard dense training recipes for sparse training is suboptimal, and results in under-training. We provide new approaches for mitigating this issue for both sparse pre-training of vision models (e.g. ResNet50/ImageNet) and sparse fine-tuning of language models (e.g. BERT/GLUE), achieving state-of-the-art results in both settings in the high-sparsity regime, and providing detailed analyses for the difficulty of sparse training in both scenarios. Our work sets a new threshold in terms of the accuracies that can be achieved under high sparsity, and should inspire further research into improving sparse model training, to reach higher accuracies under high sparsity, but also to do so efficiently.
Keyword: diffusion
Training Data Protection with Compositional Diffusion Models
Abstract
We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data simultaneously. Furthermore, each model only contains information about the subset of the data it was exposed to during training, enabling several forms of training data protection. In particular, CDMs are the first method to enable both selective forgetting and continual learning for large-scale diffusion models, as well as allowing serving customized models based on the user's access rights. CDMs also allow determining the importance of a subset of the data in generating particular samples.
A Multidimensional Analysis of Social Biases in Vision Transformers
Authors: Jannik Brinkmann, Paul Swoboda, Christian Bartelt
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The embedding spaces of image models have been shown to encode a range of social biases such as racism and sexism. Here, we investigate specific factors that contribute to the emergence of these biases in Vision Transformers (ViT). Therefore, we measure the impact of training data, model architecture, and training objectives on social biases in the learned representations of ViTs. Our findings indicate that counterfactual augmentation training using diffusion-based image editing can mitigate biases, but does not eliminate them. Moreover, we find that larger models are less biased than smaller models, and that models trained using discriminative objectives are less biased than those trained using generative objectives. In addition, we observe inconsistencies in the learned social biases. To our surprise, ViTs can exhibit opposite biases when trained on the same data set using different self-supervised objectives. Our findings give insights into the factors that contribute to the emergence of social biases and suggests that we could achieve substantial fairness improvements based on model design choices.
On the Transition from Neural Representation to Symbolic Knowledge
Abstract
Bridging the huge disparity between neural and symbolic representation can potentially enable the incorporation of symbolic thinking into neural networks from essence. Motivated by how human gradually builds complex symbolic representation from the prototype symbols that are learned through perception and environmental interactions. We propose a Neural-Symbolic Transitional Dictionary Learning (TDL) framework that employs an EM algorithm to learn a transitional representation of data that compresses high-dimension information of visual parts of an input into a set of tensors as neural variables and discover the implicit predicate structure in a self-supervised way. We implement the framework with a diffusion model by regarding the decomposition of input as a cooperative game, then learn predicates by prototype clustering. We additionally use RL enabled by the Markovian of diffusion models to further tune the learned prototypes by incorporating subjective factors. Extensive experiments on 3 abstract compositional visual objects datasets that require the model to segment parts without any visual features like texture, color, or shadows apart from shape and 3 neural/symbolic downstream tasks demonstrate the learned representation enables interpretable decomposition of visual input and smooth adaption to downstream tasks which are not available by existing methods.
On the Biometric Capacity of Generative Face Models
Authors: Vishnu Naresh Boddeti, Gautam Sreekumar, Arun Ross
Abstract
There has been tremendous progress in generating realistic faces with high fidelity over the past few years. Despite this progress, a crucial question remains unanswered: "Given a generative face model, how many unique identities can it generate?" In other words, what is the biometric capacity of the generative face model? A scientific basis for answering this question will benefit evaluating and comparing different generative face models and establish an upper bound on their scalability. This paper proposes a statistical approach to estimate the biometric capacity of generated face images in a hyperspherical feature space. We employ our approach on multiple generative models, including unconditional generators like StyleGAN, Latent Diffusion Model, and "Generated Photos," as well as DCFace, a class-conditional generator. We also estimate capacity w.r.t. demographic attributes such as gender and age. Our capacity estimates indicate that (a) under ArcFace representation at a false acceptance rate (FAR) of 0.1%, StyleGAN3 and DCFace have a capacity upper bound of $1.43\times10^6$ and $1.190\times10^4$, respectively; (b) the capacity reduces drastically as we lower the desired FAR with an estimate of $1.796\times10^4$ and $562$ at FAR of 1% and 10%, respectively, for StyleGAN3; (c) there is no discernible disparity in the capacity w.r.t gender; and (d) for some generative models, there is an appreciable disparity in the capacity w.r.t age. Code is available at https://github.com/human-analysis/capacity-generative-face-models.
SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation
Abstract
Recent score-based diffusion models (SBDMs) show promising results in unpaired image-to-image translation (I2I). However, existing methods, either energy-based or statistically-based, provide no explicit form of the interfered intermediate generative distributions. This work presents a new score-decomposed diffusion model (SDDM) on manifolds to explicitly optimize the tangled distributions during image generation. SDDM derives manifolds to make the distributions of adjacent time steps separable and decompose the score function or energy guidance into an image denoising" part and a contentrefinement" part. To refine the image in the same noise level, we equalize the refinement parts of the score function and energy guidance, which permits multi-objective optimization on the manifold. We also leverage the block adaptive instance normalization module to construct manifolds with lower dimensions but still concentrated with the perturbed reference image. SDDM outperforms existing SBDM-based methods with much fewer diffusion steps on several I2I benchmarks.
Improved Order Analysis and Design of Exponential Integrator for Diffusion Models Sampling
Abstract
Efficient differential equation solvers have significantly reduced the sampling time of diffusion models (DMs) while retaining high sampling quality. Among these solvers, exponential integrators (EI) have gained prominence by demonstrating state-of-the-art performance. However, existing high-order EI-based sampling algorithms rely on degenerate EI solvers, resulting in inferior error bounds and reduced accuracy in contrast to the theoretically anticipated results under optimal settings. This situation makes the sampling quality extremely vulnerable to seemingly innocuous design choices such as timestep schedules. For example, an inefficient timestep scheduler might necessitate twice the number of steps to achieve a quality comparable to that obtained through carefully optimized timesteps. To address this issue, we reevaluate the design of high-order differential solvers for DMs. Through a thorough order analysis, we reveal that the degeneration of existing high-order EI solvers can be attributed to the absence of essential order conditions. By reformulating the differential equations in DMs and capitalizing on the theory of exponential integrators, we propose refined EI solvers that fulfill all the order conditions, which we designate as Refined Exponential Solver (RES). Utilizing these improved solvers, RES exhibits more favorable error bounds theoretically and achieves superior sampling efficiency and stability in practical applications. For instance, a simple switch from the single-step DPM-Solver++ to our order-satisfied RES solver when Number of Function Evaluations (NFE) $=9$, results in a reduction of numerical defects by $25.2\%$ and FID improvement of $25.4\%$ (16.77 vs 12.51) on a pre-trained ImageNet diffusion model.
Abstract
The crystal diffusion variational autoencoder (CDVAE) is a machine learning model that leverages score matching to generate realistic crystal structures that preserve crystal symmetry. In this study, we leverage novel diffusion probabilistic (DP) models to denoise atomic coordinates rather than adopting the standard score matching approach in CDVAE. Our proposed DP-CDVAE model can reconstruct and generate crystal structures whose qualities are statistically comparable to those of the original CDVAE. Furthermore, notably, when comparing the carbon structures generated by the DP-CDVAE model with relaxed structures obtained from density functional theory calculations, we find that the DP-CDVAE generated structures are remarkably closer to their respective ground states. The energy differences between these structures and the true ground states are, on average, 68.1 meV/atom lower than those generated by the original CDVAE. This significant improvement in the energy accuracy highlights the effectiveness of the DP-CDVAE model in generating crystal structures that better represent their ground-state configurations.
Towards Personalized Prompt-Model Retrieval for Generative Recommendation
Authors: Yuanhe Guo, Haoming Liu, Hongyi Wen
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Abstract
Recommender Systems are built to retrieve relevant items to satisfy users' information needs. The candidate corpus usually consists of a finite set of items that are ready to be served, such as videos, products, or articles. With recent advances in Generative AI such as GPT and Diffusion models, a new form of recommendation task is yet to be explored where items are to be created by generative models with personalized prompts. Taking image generation as an example, with a single prompt from the user and access to a generative model, it is possible to generate hundreds of new images in a few minutes. How shall we attain personalization in the presence of "infinite" items? In this preliminary study, we propose a two-stage framework, namely Prompt-Model Retrieval and Generated Item Ranking, to approach this new task formulation. We release GEMRec-18K, a prompt-model interaction dataset with 18K images generated by 200 publicly-available generative models paired with a diverse set of 90 textual prompts. Our findings demonstrate the promise of generative model recommendation as a novel personalization problem and the limitations of existing evaluation metrics. We highlight future directions for the RecSys community to advance towards generative recommender systems. Our code and dataset are available at https://github.com/MAPS-research/GEMRec.
Painterly Image Harmonization using Diffusion Model
Abstract
Painterly image harmonization aims to insert photographic objects into paintings and obtain artistically coherent composite images. Previous methods for this task mainly rely on inference optimization or generative adversarial network, but they are either very time-consuming or struggling at fine control of the foreground objects (e.g., texture and content details). To address these issues, we propose a novel Painterly Harmonization stable Diffusion model (PHDiffusion), which includes a lightweight adaptive encoder and a Dual Encoder Fusion (DEF) module. Specifically, the adaptive encoder and the DEF module first stylize foreground features within each encoder. Then, the stylized foreground features from both encoders are combined to guide the harmonization process. During training, besides the noise loss in diffusion model, we additionally employ content loss and two style losses, i.e., AdaIN style loss and contrastive style loss, aiming to balance the trade-off between style migration and content preservation. Compared with the state-of-the-art models from related fields, our PHDiffusion can stylize the foreground more sufficiently and simultaneously retain finer content. Our code and model are available at https://github.com/bcmi/PHDiffusion-Painterly-Image-Harmonization.
Maximum-norm a posteriori error bounds for an extrapolated upwind scheme applied to a singularly perturbed convection-diffusion problem
Abstract
Richardson extrapolation is applied to a simple first-order upwind difference scheme for the approximation of solutions of singularly perturbed convection-diffusion problems in one dimension. Robust a posteriori error bounds are derived for the proposed method on arbitrary meshes. It is shown that the resulting error estimator can be used to stear an adaptive mesh algorithm that generates meshes resolving layers and singularities. Numerical results are presented that illustrate the theoretical findings.
Diffusion-Augmented Depth Prediction with Sparse Annotations
Abstract
Depth estimation aims to predict dense depth maps. In autonomous driving scenes, sparsity of annotations makes the task challenging. Supervised models produce concave objects due to insufficient structural information. They overfit to valid pixels and fail to restore spatial structures. Self-supervised methods are proposed for the problem. Their robustness is limited by pose estimation, leading to erroneous results in natural scenes. In this paper, we propose a supervised framework termed Diffusion-Augmented Depth Prediction (DADP). We leverage the structural characteristics of diffusion model to enforce depth structures of depth models in a plug-and-play manner. An object-guided integrality loss is also proposed to further enhance regional structure integrality by fetching objective information. We evaluate DADP on three driving benchmarks and achieve significant improvements in depth structures and robustness. Our work provides a new perspective on depth estimation with sparse annotations in autonomous driving scenes.
Keyword: adaptive
PePNet: A Periodicity-Perceived Workload Prediction Network Supporting Rare Occurrence of Heavy Workload
Abstract
Cloud providers can greatly benefit from accurate workload prediction. However, the workload of cloud servers is highly variable, with occasional heavy workload bursts. This makes workload prediction challenging. There are mainly two categories of workload prediction methods: statistical methods and neural-network-based ones. The former ones rely on strong mathematical assumptions and have reported low accuracy when predicting highly variable workload. The latter ones offer higher overall accuracy, yet they are vulnerable to data imbalance between heavy workload and common one. This impairs the prediction accuracy of neural network-based models on heavy workload. Either the overall inaccuracy of statistic methods or the heavy-workload inaccuracy of neural-network-based models can cause service level agreement violations. Thus, we propose PePNet to improve overall especially heavy workload prediction accuracy. It has two distinctive characteristics: (i) A Periodicity-Perceived Mechanism to detect the existence of periodicity and the length of one period automatically, without any priori knowledge. Furthermore, it fuses periodic information adaptively, which is suitable for periodic, lax periodic and aperiodic time series. (ii) An Achilles' Heel Loss Function iteratively optimizing the most under-fitting part in predicting sequence for each step, which significantly improves the prediction accuracy of heavy load. Extensive experiments conducted on Alibaba2018, SMD dataset and Dinda's dataset demonstrate that PePNet improves MAPE for overall workload by 20.0% on average, compared with state-of-the-art methods. Especially, PePNet improves MAPE for heavy workload by 23.9% on average.
Dynamic Token-Pass Transformers for Semantic Segmentation
Authors: Yuang Liu, Qiang Zhou, Jing Wang, Fan Wang, Jun Wang, Wei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Vision transformers (ViT) usually extract features via forwarding all the tokens in the self-attention layers from top to toe. In this paper, we introduce dynamic token-pass vision transformers (DoViT) for semantic segmentation, which can adaptively reduce the inference cost for images with different complexity. DoViT gradually stops partial easy tokens from self-attention calculation and keeps the hard tokens forwarding until meeting the stopping criteria. We employ lightweight auxiliary heads to make the token-pass decision and divide the tokens into keeping/stopping parts. With a token separate calculation, the self-attention layers are speeded up with sparse tokens and still work friendly with hardware. A token reconstruction module is built to collect and reset the grouped tokens to their original position in the sequence, which is necessary to predict correct semantic masks. We conduct extensive experiments on two common semantic segmentation tasks, and demonstrate that our method greatly reduces about 40% $\sim$ 60% FLOPs and the drop of mIoU is within 0.8% for various segmentation transformers. The throughput and inference speed of ViT-L/B are increased to more than 2$\times$ on Cityscapes.
A Taxonomy of Adaptive Traffic Signal Control
Authors: Andalib Shams, A. M. Tahsin Emtenan, Christopher M. Day
Abstract
Research on adaptive traffic signal control (ATSC) extends back to at least the 1960s, and many ATSC methods have been proposed over the years. This paper provides a review of this research and proposes a taxonomy for organizing it, accompanied by a consistent vocabulary for discussing the control concepts. We begin from the well-established concept of control generations. Next, we classify the ATSC methods according to their topographic structure (local-only, system/hierarchical), time resolution of decision-making (continuous versus planning-horizon), type of decision (rule-based or optimization), objective function, cyclic/acyclic nature,and additional subcategories relevant to certain "families" of methods. These various elements of system control are organized into a taxonomy of ATSC to help future researchers understand the wide diversity of algorithmic approaches to the signal control problem that have been proposed to date, and which can be updated or expanded to incorporate future research.
Towards an Uncertainty-Aware Adaptive Decision Engine for Self-Protecting Software: an POMDP-based Approach
Authors: Ryan Liu, Ladan Tahvildari
Subjects: Software Engineering (cs.SE); Systems and Control (eess.SY)
Abstract
The threats posed by evolving cyberattacks have led to increased research related to software systems that can self-protect. One topic in this domain is Moving Target Defense (MTD), which changes software characteristics in the protected system to make it harder for attackers to exploit vulnerabilities. However, MTD implementation and deployment are often impacted by run-time uncertainties, and existing MTD decision-making solutions have neglected uncertainty in model parameters and lack self-adaptation. This paper aims to address this gap by proposing an approach for an uncertainty-aware and self-adaptive MTD decision engine based on Partially Observable Markov Decision Process and Bayesian Learning techniques. The proposed approach considers uncertainty in both state and model parameters; thus, it has the potential to better capture environmental variability and improve defense strategies. A preliminary study is presented to highlight the potential effectiveness and challenges of the proposed approach.
SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation
Abstract
Recent score-based diffusion models (SBDMs) show promising results in unpaired image-to-image translation (I2I). However, existing methods, either energy-based or statistically-based, provide no explicit form of the interfered intermediate generative distributions. This work presents a new score-decomposed diffusion model (SDDM) on manifolds to explicitly optimize the tangled distributions during image generation. SDDM derives manifolds to make the distributions of adjacent time steps separable and decompose the score function or energy guidance into an image denoising" part and a contentrefinement" part. To refine the image in the same noise level, we equalize the refinement parts of the score function and energy guidance, which permits multi-objective optimization on the manifold. We also leverage the block adaptive instance normalization module to construct manifolds with lower dimensions but still concentrated with the perturbed reference image. SDDM outperforms existing SBDM-based methods with much fewer diffusion steps on several I2I benchmarks.
Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
Abstract
Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring speech emotion from a well-labeled corpus to an unlabeled one, which is a rather challenging task due to the significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle to learn corpus-invariant features by global distribution alignment, but unfortunately, the resulting features are mixed with corpus-specific features or not class-discriminative. To tackle these challenges, we propose a novel Emotion Decoupling aNd Alignment learning framework (EMO-DNA) for cross-corpus SER, a novel UDA method to learn emotion-relevant corpus-invariant features. The novelties of EMO-DNA are two-fold: contrastive emotion decoupling and dual-level emotion alignment. On one hand, our contrastive emotion decoupling achieves decoupling learning via a contrastive decoupling loss to strengthen the separability of emotion-relevant features from corpus-specific ones. On the other hand, our dual-level emotion alignment introduces an adaptive threshold pseudo-labeling to select confident target samples for class-level alignment, and performs corpus-level alignment to jointly guide model for learning class-discriminative corpus-invariant features across corpora. Extensive experimental results demonstrate the superior performance of EMO-DNA over the state-of-the-art methods in several cross-corpus scenarios. Source code is available at https://github.com/Jiaxin-Ye/Emo-DNA.
Balanced Classification: A Unified Framework for Long-Tailed Object Detection
Abstract
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories. In this paper, we contend that the learning bias originates from two factors: 1) the unequal competition arising from the imbalanced distribution of foreground categories, and 2) the lack of sample diversity in tail categories. To tackle these issues, we introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution and dynamic intensification of sample diversities in a synchronized manner. Specifically, a novel foreground classification balance loss (FCBL) is developed to ameliorate the domination of head categories and shift attention to difficult-to-differentiate categories by introducing pairwise class-aware margins and auto-adjusted weight terms, respectively. This loss prevents the over-suppression of tail categories in the context of unequal competition. Moreover, we propose a dynamic feature hallucination module (FHM), which enhances the representation of tail categories in the feature space by synthesizing hallucinated samples to introduce additional data variances. In this divide-and-conquer approach, BACL sets a new state-of-the-art on the challenging LVIS benchmark with a decoupled training pipeline, surpassing vanilla Faster R-CNN with ResNet-50-FPN by 5.8% AP and 16.1% AP for overall and tail categories. Extensive experiments demonstrate that BACL consistently achieves performance improvements across various datasets with different backbones and architectures. Code and models are available at https://github.com/Tianhao-Qi/BACL.
Painterly Image Harmonization using Diffusion Model
Abstract
Painterly image harmonization aims to insert photographic objects into paintings and obtain artistically coherent composite images. Previous methods for this task mainly rely on inference optimization or generative adversarial network, but they are either very time-consuming or struggling at fine control of the foreground objects (e.g., texture and content details). To address these issues, we propose a novel Painterly Harmonization stable Diffusion model (PHDiffusion), which includes a lightweight adaptive encoder and a Dual Encoder Fusion (DEF) module. Specifically, the adaptive encoder and the DEF module first stylize foreground features within each encoder. Then, the stylized foreground features from both encoders are combined to guide the harmonization process. During training, besides the noise loss in diffusion model, we additionally employ content loss and two style losses, i.e., AdaIN style loss and contrastive style loss, aiming to balance the trade-off between style migration and content preservation. Compared with the state-of-the-art models from related fields, our PHDiffusion can stylize the foreground more sufficiently and simultaneously retain finer content. Our code and model are available at https://github.com/bcmi/PHDiffusion-Painterly-Image-Harmonization.
MSECNet: Accurate and Robust Normal Estimation for 3D Point Clouds by Multi-Scale Edge Conditioning
Abstract
Estimating surface normals from 3D point clouds is critical for various applications, including surface reconstruction and rendering. While existing methods for normal estimation perform well in regions where normals change slowly, they tend to fail where normals vary rapidly. To address this issue, we propose a novel approach called MSECNet, which improves estimation in normal varying regions by treating normal variation modeling as an edge detection problem. MSECNet consists of a backbone network and a multi-scale edge conditioning (MSEC) stream. The MSEC stream achieves robust edge detection through multi-scale feature fusion and adaptive edge detection. The detected edges are then combined with the output of the backbone network using the edge conditioning module to produce edge-aware representations. Extensive experiments show that MSECNet outperforms existing methods on both synthetic (PCPNet) and real-world (SceneNN) datasets while running significantly faster. We also conduct various analyses to investigate the contribution of each component in the MSEC stream. Finally, we demonstrate the effectiveness of our approach in surface reconstruction.
Maximum-norm a posteriori error bounds for an extrapolated upwind scheme applied to a singularly perturbed convection-diffusion problem
Abstract
Richardson extrapolation is applied to a simple first-order upwind difference scheme for the approximation of solutions of singularly perturbed convection-diffusion problems in one dimension. Robust a posteriori error bounds are derived for the proposed method on arbitrary meshes. It is shown that the resulting error estimator can be used to stear an adaptive mesh algorithm that generates meshes resolving layers and singularities. Numerical results are presented that illustrate the theoretical findings.
Communication-Efficient Decentralized Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control
Abstract
Connected and autonomous vehicles (CAVs) promise next-gen transportation systems with enhanced safety, energy efficiency, and sustainability. One typical control strategy for CAVs is the so-called cooperative adaptive cruise control (CACC) where vehicles drive in platoons and cooperate to achieve safe and efficient transportation. In this study, we formulate CACC as a multi-agent reinforcement learning (MARL) problem. Diverging from existing MARL methods that use centralized training and decentralized execution which require not only a centralized communication mechanism but also dense inter-agent communication, we propose a fully-decentralized MARL framework for enhanced efficiency and scalability. In addition, a quantization-based communication scheme is proposed to reduce the communication overhead without significantly degrading the control performance. This is achieved by employing randomized rounding numbers to quantize each piece of communicated information and only communicating non-zero components after quantization. Extensive experimentation in two distinct CACC settings reveals that the proposed MARL framework consistently achieves superior performance over several contemporary benchmarks in terms of both communication efficiency and control efficacy.
Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings
Abstract
The emergence of vertical federated learning (VFL) has stimulated concerns about the imperfection in privacy protection, as shared feature embeddings may reveal sensitive information under privacy attacks. This paper studies the delicate equilibrium between data privacy and task utility goals of VFL under differential privacy (DP). To address the generality issue of prior arts, this paper advocates a flexible and generic approach that decouples the two goals and addresses them successively. Specifically, we initially derive a rigorous privacy guarantee by applying norm clipping on shared feature embeddings, which is applicable across various datasets and models. Subsequently, we demonstrate that task utility can be optimized via adaptive adjustments on the scale and distribution of feature embeddings in an accuracy-appreciative way, without compromising established DP mechanisms. We concretize our observation into the proposed VFL-AFE framework, which exhibits effectiveness against privacy attacks and the capacity to retain favorable task utility, as substantiated by extensive experiments.
Adaptive Preferential Attached kNN Graph With Distribution-Awareness
Authors: Shaojie Min, Ji Liu
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Abstract
Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks, due to their simplicity and effectiveness. However, the conventional kNN graph's reliance on a fixed value of k can hinder its performance, especially in scenarios involving complex data distributions. Moreover, like other classification models, the presence of ambiguous samples along decision boundaries often presents a challenge, as they are more prone to incorrect classification. To address these issues, we propose the Preferential Attached k-Nearest Neighbors Graph (paNNG), which combines adaptive kNN with distribution-based graph construction. By incorporating distribution information, paNNG can significantly improve performance for ambiguous samples by "pulling" them towards their original classes and hence enable enhanced overall accuracy and generalization capability. Through rigorous evaluations on diverse benchmark datasets, paNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.
Discrete-Time Adaptive State Tracking Control Schemes Using Gradient Algorithms
Abstract
This paper conducts a comprehensive study of a classical adaptive control problem: adaptive control of a state-space plant model: $\dot{x}(t) = A x(t) + B u(t)$ in continuous time, or $x(t+1) = A x(t) + B u(t)$ in discrete time, for state tracking of a chosen stable reference model system: $\dot{x}_m(t) = A_m x_m(t) + B_m r(t)$ in continuous time, or $x_m(t+1) = A_m x_m(t) + B_m r(t)$ in discrete time. Adaptive state tracking control schemes for continuous-time systems have been reported in the literature, using a Lyapunov design and analysis method which has not been successfully applied to discrete-time systems, so that the discrete-time adaptive state tracking problem has remained to be open. In this paper, new adaptive state tracking control schemes are developed for discrete-time systems, using a gradient method for the design of adaptive laws for updating the controller parameters. Both direct and indirect adaptive designs are presented, which have the standard and desired adaptive law properties. Such a new gradient algorithm based framework is also developed for adaptive state tracking control of continuous-time systems, as compared with the Lyapunov method based framework.
Keyword: quantization
Communication-Efficient Decentralized Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control
Abstract
Connected and autonomous vehicles (CAVs) promise next-gen transportation systems with enhanced safety, energy efficiency, and sustainability. One typical control strategy for CAVs is the so-called cooperative adaptive cruise control (CACC) where vehicles drive in platoons and cooperate to achieve safe and efficient transportation. In this study, we formulate CACC as a multi-agent reinforcement learning (MARL) problem. Diverging from existing MARL methods that use centralized training and decentralized execution which require not only a centralized communication mechanism but also dense inter-agent communication, we propose a fully-decentralized MARL framework for enhanced efficiency and scalability. In addition, a quantization-based communication scheme is proposed to reduce the communication overhead without significantly degrading the control performance. This is achieved by employing randomized rounding numbers to quantize each piece of communicated information and only communicating non-zero components after quantization. Extensive experimentation in two distinct CACC settings reveals that the proposed MARL framework consistently achieves superior performance over several contemporary benchmarks in terms of both communication efficiency and control efficacy.
RobustMQ: Benchmarking Robustness of Quantized Models
Abstract
Quantization has emerged as an essential technique for deploying deep neural networks (DNNs) on devices with limited resources. However, quantized models exhibit vulnerabilities when exposed to various noises in real-world applications. Despite the importance of evaluating the impact of quantization on robustness, existing research on this topic is limited and often disregards established principles of robustness evaluation, resulting in incomplete and inconclusive findings. To address this gap, we thoroughly evaluated the robustness of quantized models against various noises (adversarial attacks, natural corruptions, and systematic noises) on ImageNet. The comprehensive evaluation results empirically provide valuable insights into the robustness of quantized models in various scenarios, for example: (1) quantized models exhibit higher adversarial robustness than their floating-point counterparts, but are more vulnerable to natural corruptions and systematic noises; (2) in general, increasing the quantization bit-width results in a decrease in adversarial robustness, an increase in natural robustness, and an increase in systematic robustness; (3) among corruption methods, \textit{impulse noise} and \textit{glass blur} are the most harmful to quantized models, while \textit{brightness} has the least impact; (4) among systematic noises, the \textit{nearest neighbor interpolation} has the highest impact, while bilinear interpolation, cubic interpolation, and area interpolation are the three least harmful. Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
Keyword: efficient
MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching
Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces
Optimal Dynamic Reconfiguration of Distribution Networks
An equilibrated estimator for mixed finite element discretizations of the curl-curl problem
Evaluation of STT-MRAM as a Scratchpad for Training in ML Accelerators
Artificial Intelligence in archival and historical scholarship workflow: HTS and ChatGPT
FuNToM: Functional Modeling of RF Circuits Using a Neural Network Assisted Two-Port Analysis Method
A Graphical Approach to Document Layout Analysis
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
BEAM: The Modeling Framework for Behavior, Energy, Autonomy & Mobility
Target specification bias, counterfactual prediction, and algorithmic fairness in healthcare
Efficient Model Adaptation for Continual Learning at the Edge
Strong convergence of multiscale truncated Euler-Maruyama method for super-linear slow-fast stochastic differential equations
VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs
Model Provenance via Model DNA
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Artificial intelligence based load balancing in SDN: A comprehensive survey
Robust Self-Supervised Extrinsic Self-Calibration
Improved Order Analysis and Design of Exponential Integrator for Diffusion Models Sampling
AutoML4ETC: Automated Neural Architecture Search for Real-World Encrypted Traffic Classification
ES-MVSNet: Efficient Framework for End-to-end Self-supervised Multi-View Stereo
Edge Dynamic Map architecture for C-ITS applications
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation
Deep Semantic Model Fusion for Ancient Agricultural Terrace Detection
Robust Discontinuity Indicators for High-Order Reconstruction of Piecewise Smooth Functions
Efficient Monaural Speech Enhancement using Spectrum Attention Fusion
Krylov Subspace Recycling With Randomized Sketching For Matrix Functions
Efficient Spectrum Sharing Between Coexisting OFDM Radar and Downlink Multiuser Communication Systems
Improving the Security of United States Elections with Robust Optimization
Communication-Efficient Decentralized Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control
Scaling Survival Analysis in Healthcare with Federated Survival Forests: A Comparative Study on Heart Failure and Breast Cancer Genomics
Learning Optimal Admission Control in Partially Observable Queueing Networks
Work-in-Progress: A Universal Instrumentation Platform for Non-Volatile Memories
From Military to Healthcare: Adopting and Expanding Ethical Principles for Generative Artificial Intelligence
BlindSage: Label Inference Attacks against Node-level Vertical Federated Graph Neural Networks
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Keyword: faster
VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs
Balanced Classification: A Unified Framework for Long-Tailed Object Detection
MSECNet: Accurate and Robust Normal Estimation for 3D Point Clouds by Multi-Scale Edge Conditioning
Fast and Accurate Reduced-Order Modeling of a MOOSE-based Additive Manufacturing Model with Operator Learning
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Keyword: mobile
Disease Insight through Digital Biomarkers Developed by Remotely Collected Wearables and Smartphone Data
A Survey of Beam Management for mmWave and THz Communications Towards 6G
AutoML4ETC: Automated Neural Architecture Search for Real-World Encrypted Traffic Classification
Keyword: pruning
MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Keyword: diffusion
Training Data Protection with Compositional Diffusion Models
A Multidimensional Analysis of Social Biases in Vision Transformers
On the Transition from Neural Representation to Symbolic Knowledge
On the Biometric Capacity of Generative Face Models
SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation
denoising" part and a content
refinement" part. To refine the image in the same noise level, we equalize the refinement parts of the score function and energy guidance, which permits multi-objective optimization on the manifold. We also leverage the block adaptive instance normalization module to construct manifolds with lower dimensions but still concentrated with the perturbed reference image. SDDM outperforms existing SBDM-based methods with much fewer diffusion steps on several I2I benchmarks.Improved Order Analysis and Design of Exponential Integrator for Diffusion Models Sampling
Diffusion probabilistic models enhance variational autoencoder for crystal structure generative modeling
Towards Personalized Prompt-Model Retrieval for Generative Recommendation
Painterly Image Harmonization using Diffusion Model
Maximum-norm a posteriori error bounds for an extrapolated upwind scheme applied to a singularly perturbed convection-diffusion problem
Diffusion-Augmented Depth Prediction with Sparse Annotations
Keyword: adaptive
PePNet: A Periodicity-Perceived Workload Prediction Network Supporting Rare Occurrence of Heavy Workload
Dynamic Token-Pass Transformers for Semantic Segmentation
A Taxonomy of Adaptive Traffic Signal Control
Towards an Uncertainty-Aware Adaptive Decision Engine for Self-Protecting Software: an POMDP-based Approach
SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation
denoising" part and a content
refinement" part. To refine the image in the same noise level, we equalize the refinement parts of the score function and energy guidance, which permits multi-objective optimization on the manifold. We also leverage the block adaptive instance normalization module to construct manifolds with lower dimensions but still concentrated with the perturbed reference image. SDDM outperforms existing SBDM-based methods with much fewer diffusion steps on several I2I benchmarks.Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
Balanced Classification: A Unified Framework for Long-Tailed Object Detection
Painterly Image Harmonization using Diffusion Model
MSECNet: Accurate and Robust Normal Estimation for 3D Point Clouds by Multi-Scale Edge Conditioning
Maximum-norm a posteriori error bounds for an extrapolated upwind scheme applied to a singularly perturbed convection-diffusion problem
Communication-Efficient Decentralized Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control
Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings
Adaptive Preferential Attached kNN Graph With Distribution-Awareness
Discrete-Time Adaptive State Tracking Control Schemes Using Gradient Algorithms
Keyword: quantization
Communication-Efficient Decentralized Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control
RobustMQ: Benchmarking Robustness of Quantized Models