Abstract
External sorting is at the core of many operations in large-scale database systems, such as ordering and aggregation queries for large result sets, building indexes, sort-merge joins, duplicate removal, sharding, and record clustering. Unlike in-memory sorting, these algorithms need to work together with the OS and the filesystem to efficiently utilize system resources and minimize disk I/O. In this paper we describe ELSAR: a parallel external sorting algorithm that uses an innovative paradigm based on a learned data distribution model. The algorithm leverages the model to arrange the input records into mutually exclusive, monotonic, and equi-depth partitions that, once sorted, can simply be concatenated to form the output. This method completely eliminates the need for multi-way file merging, which is typically used in external sorting. We present thorough benchmarks for uniform and skewed datasets in various storage media, where we measure the sorting rates, size scalability, and energy efficiency of ELSAR and other sorting algorithms. We observed that ELSAR has up to 1.65x higher sorting rates than the next-best external sort (Nsort) on SSD drives and 5.31x higher than the GNU coreutils' sort utility on Intel Optane non-volatile memory. In addition, ELSAR supersedes the current winner of the SortBenchmark for the most energy-efficient external string sorting algorithm by an impressive margin of 41%. These results reinforce the premise that novel learning-enhanced algorithms can provide remarkable performance benefits over traditional ones.
$2 * n$ is better than $n^2$: Decomposing Event Coreference Resolution into Two Tractable Problems
Authors: Shafiuddin Rehan Ahmed, Abhijnan Nath, James H. Martin, Nikhil Krishnaswamy
Abstract
Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distribution, making it difficult for the algorithm to learn coreference beyond surface matching. Additionally, these methods are intractable because of the quadratic operations needed. To address these challenges, we break the problem of ECR into two parts: a) a heuristic to efficiently filter out a large number of non-coreferent pairs, and b) a training approach on a balanced set of coreferent and non-coreferent mention pairs. By following this approach, we show that we get comparable results to the state of the art on two popular ECR datasets while significantly reducing compute requirements. We also analyze the mention pairs that are "hard" to accurately classify as coreferent or non-coreferent. Code at https://github.com/ahmeshaf/lemma_ce_coref
DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors
Authors: Chia-Hao Li, Niraj K. Jha
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
Abstract
Modern advances in machine learning (ML) and wearable medical sensors (WMSs) in edge devices have enabled ML-driven disease detection for smart healthcare. Conventional ML-driven disease detection methods rely on customizing individual models for each disease and its corresponding WMS data. However, such methods lack adaptability to distribution shifts and new task classification classes. Also, they need to be rearchitected and retrained from scratch for each new disease. Moreover, installing multiple ML models in an edge device consumes excessive memory, drains the battery faster, and complicates the detection process. To address these challenges, we propose DOCTOR, a multi-disease detection continual learning (CL) framework based on WMSs. It employs a multi-headed deep neural network (DNN) and an exemplar-replay-style CL algorithm. The CL algorithm enables the framework to continually learn new missions where different data distributions, classification classes, and disease detection tasks are introduced sequentially. It counteracts catastrophic forgetting with a data preservation method and a synthetic data generation module. The data preservation method efficiently preserves the most informative subset of training data from previous missions based on the average training loss of each data instance. The synthetic data generation module models the probability distribution of the real training data and then generates as much synthetic data as needed for replays while maintaining data privacy. The multi-headed DNN enables DOCTOR to detect multiple diseases simultaneously based on user WMS data. We demonstrate DOCTOR's efficacy in maintaining high multi-disease classification accuracy with a single DNN model in various CL experiments. DOCTOR achieves very competitive performance across all CL scenarios relative to the ideal joint-training framework while maintaining a small model size.
Graph-Based Reductions for Parametric and Weighted MDPs
Authors: Kasper Engelen, Guillermo A. Pérez, Shrisha Rao
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
Abstract
We study the complexity of reductions for weighted reachability in parametric Markov decision processes. That is, we say a state p is never worse than q if for all valuations of the polynomial indeterminates it is the case that the maximal expected weight that can be reached from p is greater than the same value from q. In terms of computational complexity, we establish that determining whether p is never worse than q is coETR-complete. On the positive side, we give a polynomial-time algorithm to compute the equivalence classes of the order we study for Markov chains. Additionally, we describe and implement two inference rules to under-approximate the never-worse relation and empirically show that it can be used as an efficient preprocessing step for the analysis of large Markov decision processes.
Abstract
Depth cameras are frequently used in robotic manipulation, e.g. for visual servoing. The quality of small and compact depth cameras is though often not sufficient for depth reconstruction, which is required for precise tracking in and perception of the robot's working space. Based on the work of Shabanov et al. (2021), in this work, we present a self-supervised multi-object depth denoising pipeline, that uses depth maps of higher-quality sensors as close-to-ground-truth supervisory signals to denoise depth maps coming from a lower-quality sensor. We display a computationally efficient way to align sets of two frame pairs in space and retrieve a frame-based multi-object mask, in order to receive a clean labeled dataset to train a denoising neural network on. The implementation of our presented work can be found at https://github.com/alr-internship/self-supervised-depth-denoising.
Stochastic Texture Filtering
Authors: Marcos Fajardo, Bartlomiej Wronski, Marco Salvi, Matt Pharr
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
2D texture maps and 3D voxel arrays are widely used to add rich detail to the surfaces and volumes of rendered scenes, and filtered texture lookups are integral to producing high-quality imagery. We show that filtering textures after evaluating lighting, rather than before BSDF evaluation as is current practice, gives a more accurate solution to the rendering equation. These benefits are not merely theoretical, but are apparent in common cases. We further show that stochastically sampling texture filters is crucial for enabling this approach, which has not been possible previously except in limited cases. Stochastic texture filtering offers additional benefits, including efficient implementation of high-quality texture filters and efficient filtering of textures stored in compressed and sparse data structures, including neural representations. We demonstrate applications in both real-time and offline rendering and show that the additional stochastic error is minimal. Furthermore, this error is handled well by either spatiotemporal denoising or moderate pixel sampling rates.
Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models
Authors: Rohan Dhesikan, Vignesh Rajmohan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
The proliferation of video content demands efficient and flexible neural network based approaches for generating new video content. In this paper, we propose a novel approach that combines zero-shot text-to-video generation with ControlNet to improve the output of these models. Our method takes multiple sketched frames as input and generates video output that matches the flow of these frames, building upon the Text-to-Video Zero architecture and incorporating ControlNet to enable additional input conditions. By first interpolating frames between the inputted sketches and then running Text-to-Video Zero using the new interpolated frames video as the control technique, we leverage the benefits of both zero-shot text-to-video generation and the robust control provided by ControlNet. Experiments demonstrate that our method excels at producing high-quality and remarkably consistent video content that more accurately aligns with the user's intended motion for the subject within the video. We provide a comprehensive resource package, including a demo video, project website, open-source GitHub repository, and a Colab playground to foster further research and application of our proposed method.
Singularity swapping method for nearly singular integrals based on trapezoidal rule
Authors: Gang Bao, Wenmao Hua, Jun Lai, Jinrui Zhang
Abstract
Accurate evaluation of nearly singular integrals plays an important role in many boundary integral equation based numerical methods. In this paper, we propose a variant of singularity swapping method to accurately evaluate the layer potentials for arbitrarily close targets. Our method is based on the global trapezoidal rule and trigonometric interpolation, resulting in an explicit quadrature formula. The method achieves spectral accuracy for nearly singular integrals on closed analytic curves. In order to extract the singularity from the complexified distance function, an efficient root finding method is proposed based on contour integration. Through the change of variables, we also extend the quadrature method to integrals on the piecewise analytic curves. Numerical examples for Laplace's and Helmholtz equations show that high order accuracy can be achieved for arbitrarily close field evaluation.
Hybrid hyperinterpolation over general regions
Authors: Congpei An, Jia-Shu Ran, Alvise Sommariva
Abstract
We present an $\ell^2_2+\ell_1$-regularized discrete least squares approximation over general regions under assumptions of hyperinterpolation, named hybrid hyperinterpolation. Hybrid hyperinterpolation, using a soft thresholding operator as well a filter function to shrink the Fourier coefficients approximated by a high-order quadrature rule of a given continuous function with respect to some orthonormal basis, is a combination of lasso and filtered hyperinterpolations. Hybrid hyperinterpolation inherits features of them to deal with noisy data once the regularization parameters and filter function are chosen well. We not only provide $L_2$ errors in theoretical analysis for hybrid hyperinterpolation to approximate continuous functions with noise and noise-free, but also decompose $L_2$ errors into three exact computed terms with the aid of a prior regularization parameter choices rule. This rule, making fully use of coefficients of hyperinterpolation to choose a regularization parameter, reveals that $L_2$ errors for hybrid hyperinterpolation sharply decline and then slowly increase when the sparsity of coefficients ranges from one to large values. Numerical examples show the enhanced performance of hybrid hyperinterpolation when regularization parameters and noise vary. Theoretical $L_2$ errors bounds are verified in numerical examples on the interval, the unit-disk, the unit-sphere and the unit-cube, the union of disks.
Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation
Authors: Jiyi Zhang, Han Fang, Hwee Kuan Lee, Ee-Chien Chang
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Given a poorly documented neural network model, we take the perspective of a forensic investigator who wants to find out the model's data domain (e.g. whether on face images or traffic signs). Although existing methods such as membership inference and model inversion can be used to uncover some information about an unknown model, they still require knowledge of the data domain to start with. In this paper, we propose solving this problem by leveraging on comprehensive corpus such as ImageNet to select a meaningful distribution that is close to the original training distribution and leads to high performance in follow-up investigations. The corpus comprises two components, a large dataset of samples and meta information such as hierarchical structure and textual information on the samples. Our goal is to select a set of samples from the corpus for the given model. The core of our method is an objective function that considers two criteria on the selected samples: the model functional properties (derived from the dataset), and semantics (derived from the metadata). We also give an algorithm to efficiently search the large space of all possible subsets w.r.t. the objective function. Experimentation results show that the proposed method is effective. For example, cloning a given model (originally trained with CIFAR-10) by using Caltech 101 can achieve 45.5% accuracy. By using datasets selected by our method, the accuracy is improved to 72.0%.
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs
Authors: Hongjing Huang, Yingtao Li, Jie Sun, Xueying Zhu, Jie Zhang, Liang Luo, Jialin Li, Zeke Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Generalized linear models (GLMs) are a widely utilized family of machine learning models in real-world applications. As data size increases, it is essential to perform efficient distributed training for these models. However, existing systems for distributed training have a high cost for communication and often use large batch sizes to balance computation and communication, which negatively affects convergence. Therefore, we argue for an efficient distributed GLM training system that strives to achieve linear scalability, while keeping batch size reasonably low. As a start, we propose P4SGD, a distributed heterogeneous training system that efficiently trains GLMs through model parallelism between distributed FPGAs and through forward-communication-backward pipeline parallelism within an FPGA. Moreover, we propose a light-weight, latency-centric in-switch aggregation protocol to minimize the latency of the AllReduce operation between distributed FPGAs, powered by a programmable switch. As such, to our knowledge, P4SGD is the first solution that achieves almost linear scalability between distributed accelerators through model parallelism. We implement P4SGD on eight Xilinx U280 FPGAs and a Tofino P4 switch. Our experiments show P4SGD converges up to 6.5X faster than the state-of-the-art GPU counterpar.
RNNS: Representation Nearest Neighbor Search Black-Box Attack on Code Models
Authors: Jie Zhang, Wei Ma, Qiang Hu, Xiaofei Xie, Yves Le Traon, Yang Liu
Abstract
Pre-trained code models are mainly evaluated using the in-distribution test data. The robustness of models, i.e., the ability to handle hard unseen data, still lacks evaluation. In this paper, we propose a novel search-based black-box adversarial attack guided by model behaviours for pre-trained programming language models, named Representation Nearest Neighbor Search(RNNS), to evaluate the robustness of Pre-trained PL models. Unlike other black-box adversarial attacks, RNNS uses the model-change signal to guide the search in the space of the variable names collected from real-world projects. Specifically, RNNS contains two main steps, 1) indicate which variable (attack position location) we should attack based on model uncertainty, and 2) search which adversarial tokens we should use for variable renaming according to the model behaviour observations. We evaluate RNNS on 6 code tasks (e.g., clone detection), 3 programming languages (Java, Python, and C), and 3 pre-trained code models: CodeBERT, GraphCodeBERT, and CodeT5. The results demonstrate that RNNS outperforms the state-of-the-art black-box attacking methods (MHM and ALERT) in terms of attack success rate (ASR) and query times (QT). The perturbation of generated adversarial examples from RNNS is smaller than the baselines with respect to the number of replaced variables and the variable length change. Our experiments also show that RNNS is efficient in attacking the defended models and is useful for adversarial training.
Mixture of personality improved Spiking actor network for efficient multi-agent cooperation
Abstract
Adaptive human-agent and agent-agent cooperation are becoming more and more critical in the research area of multi-agent reinforcement learning (MARL), where remarked progress has been made with the help of deep neural networks. However, many established algorithms can only perform well during the learning paradigm but exhibit poor generalization during cooperation with other unseen partners. The personality theory in cognitive psychology describes that humans can well handle the above cooperation challenge by predicting others' personalities first and then their complex actions. Inspired by this two-step psychology theory, we propose a biologically plausible mixture of personality (MoP) improved spiking actor network (SAN), whereby a determinantal point process is used to simulate the complex formation and integration of different types of personality in MoP, and dynamic and spiking neurons are incorporated into the SAN for the efficient reinforcement learning. The benchmark Overcooked task, containing a strong requirement for cooperative cooking, is selected to test the proposed MoP-SAN. The experimental results show that the MoP-SAN can achieve both high performances during not only the learning paradigm but also the generalization test (i.e., cooperation with other unseen agents) paradigm where most counterpart deep actor networks failed. Necessary ablation experiments and visualization analyses were conducted to explain why MoP and SAN are effective in multi-agent reinforcement learning scenarios while DNN performs poorly in the generalization test.
Text-guided High-definition Consistency Texture Model
Authors: Zhibin Tang, Tiantong He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
With the advent of depth-to-image diffusion models, text-guided generation, editing, and transfer of realistic textures are no longer difficult. However, due to the limitations of pre-trained diffusion models, they can only create low-resolution, inconsistent textures. To address this issue, we present the High-definition Consistency Texture Model (HCTM), a novel method that can generate high-definition and consistent textures for 3D meshes according to the text prompts. We achieve this by leveraging a pre-trained depth-to-image diffusion model to generate single viewpoint results based on the text prompt and a depth map. We fine-tune the diffusion model with Parameter-Efficient Fine-Tuning to quickly learn the style of the generated result, and apply the multi-diffusion strategy to produce high-resolution and consistent results from different viewpoints. Furthermore, we propose a strategy that prevents the appearance of noise on the textures caused by backpropagation. Our proposed approach has demonstrated promising results in generating high-definition and consistent textures for 3D meshes, as demonstrated through a series of experiments.
Abstract
Homomorphic encryption is a sophisticated encryption technique that allows computations on encrypted data to be done without the requirement for decryption. This trait makes homomorphic encryption appropriate for safe computation in sensitive data scenarios, such as cloud computing, medical data exchange, and financial transactions. The data is encrypted using a public key in homomorphic encryption, and the calculation is conducted on the encrypted data using an algorithm that retains the encryption. The computed result is then decrypted with a private key to acquire the final output. This abstract notion protects data while allowing complicated computations to be done on the encrypted data, resulting in a secure and efficient approach to analysing sensitive information. This article is intended to give a clear idea about the various fully Homomorphic Encryption Schemes present in the literature and analyse and compare the results of each of these schemes. Further, we also provide applications and open-source tools of homomorphic encryption schemes.
Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers
Authors: Lei Yuan, Zi-Qian Zhang, Ke Xue, Hao Yin, Feng Chen, Cong Guan, Li-He Li, Chao Qian, Yang Yu
Subjects: Multiagent Systems (cs.MA); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract
Cooperative multi-agent reinforcement learning (CMARL) has shown to be promising for many real-world applications. Previous works mainly focus on improving coordination ability via solving MARL-specific challenges (e.g., non-stationarity, credit assignment, scalability), but ignore the policy perturbation issue when testing in a different environment. This issue hasn't been considered in problem formulation or efficient algorithm design. To address this issue, we firstly model the problem as a limited policy adversary Dec-POMDP (LPA-Dec-POMDP), where some coordinators from a team might accidentally and unpredictably encounter a limited number of malicious action attacks, but the regular coordinators still strive for the intended goal. Then, we propose Robust Multi-Agent Coordination via Evolutionary Generation of Auxiliary Adversarial Attackers (ROMANCE), which enables the trained policy to encounter diversified and strong auxiliary adversarial attacks during training, thus achieving high robustness under various policy perturbations. Concretely, to avoid the ego-system overfitting to a specific attacker, we maintain a set of attackers, which is optimized to guarantee the attackers high attacking quality and behavior diversity. The goal of quality is to minimize the ego-system coordination effect, and a novel diversity regularizer based on sparse action is applied to diversify the behaviors among attackers. The ego-system is then paired with a population of attackers selected from the maintained attacker set, and alternately trained against the constantly evolving attackers. Extensive experiments on multiple scenarios from SMAC indicate our ROMANCE provides comparable or better robustness and generalization ability than other baselines.
Fast Distributed Inference Serving for Large Language Models
Authors: Bingyang Wu, Yinmin Zhong, Zili Zhang, Gang Huang, Xuanzhe Liu, Xin Jin
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Large language models (LLMs) power a new generation of interactive AI applications exemplified by ChatGPT. The interactive nature of these applications demand low job completion time (JCT) for model inference. Existing LLM serving systems use run-to-completion processing for inference jobs, which suffers from head-of-line blocking and long JCT. We present FastServe, a distributed inference serving system for LLMs. FastServe exploits the autoregressive pattern of LLM inference to enable preemption at the granularity of each output token. FastServe uses preemptive scheduling to minimize JCT with a novel skip-join Multi-Level Feedback Queue scheduler. Based on the new semi information-agnostic setting of LLM inference, the scheduler leverages the input length information to assign an appropriate initial queue for each arrival job to join. The higher priority queues than the joined queue are skipped to reduce demotions. We design an efficient GPU memory management mechanism that proactively offloads and uploads intermediate states between GPU memory and host memory for LLM inference. We build a system prototype of FastServe based on NVIDIA FasterTransformer. Experimental results show that compared to the state-of-the-art solution Orca, FastServe improves the average and tail JCT by up to 5.1$\times$ and 6.4$\times$, respectively.
Fast Event-based Double Integral for Real-time Robotics
Authors: Shijie Lin, Yingqiang Zhang, Dongyue Huang, Bin Zhou, Xiaowei Luo, Jia Pan
Abstract
Motion deblurring is a critical ill-posed problem that is important in many vision-based robotics applications. The recently proposed event-based double integral (EDI) provides a theoretical framework for solving the deblurring problem with the event camera and generating clear images at high frame-rate. However, the original EDI is mainly designed for offline computation and does not support real-time requirement in many robotics applications. In this paper, we propose the fast EDI, an efficient implementation of EDI that can achieve real-time online computation on single-core CPU devices, which is common for physical robotic platforms used in practice. In experiments, our method can handle event rates at as high as 13 million event per second in a wide variety of challenging lighting conditions. We demonstrate the benefit on multiple downstream real-time applications, including localization, visual tag detection, and feature matching.
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Abstract
For years the model performance in machine learning obeyed a power-law relationship with the model size. For the consideration of parameter efficiency, recent studies focus on increasing model depth rather than width to achieve better performance. In this paper, we study how model width affects the Transformer model through a parameter-efficient multi-path structure. To better fuse features extracted from different paths, we add three additional operations to each sublayer: a normalization at the end of each path, a cheap operation to produce more features, and a learnable weighted mechanism to fuse all features flexibly. Extensive experiments on 12 WMT machine translation tasks show that, with the same number of parameters, the shallower multi-path model can achieve similar or even better performance than the deeper model. It reveals that we should pay more attention to the multi-path structure, and there should be a balance between the model depth and width to train a better large-scale Transformer.
MDD-Enabled Two-Tier Terahertz Fronthaul in Indoor Industrial Cell-Free Massive MIMO
Authors: Bohan Li, Diego Dupleich, Guoqing Xia, Huiyu Zhou, Yue Zhang, Pei Xiao, Lie-Liang Yang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
To make indoor industrial cell-free massive multiple-input multiple-output (CF-mMIMO) networks free from wired fronthaul, this paper studies a multicarrier-division duplex (MDD)-enabled two-tier terahertz (THz) fronthaul scheme. More specifically, two layers of fronthaul links rely on the mutually orthogonal subcarreir sets in the same THz band, while access links are implemented over sub-6G band. The proposed scheme leads to a complicated mixed-integer nonconvex optimization problem incorporating access point (AP) clustering, device selection, the assignment of subcarrier sets between two fronthaul links and the resource allocation at both the central processing unit (CPU) and APs. In order to address the formulated problem, we first resort to the low-complexity but efficient heuristic methods thereby relaxing the binary variables. Then, the overall end-to-end rate is obtained by iteratively optimizing the assignment of subcarrier sets and the number of AP clusters. Furthermore, an advanced MDD frame structure consisting of three parallel data streams is tailored for the proposed scheme. Simulation results demonstrate the effectiveness of the proposed dynamic AP clustering approach in dealing with the varying sizes of networks. Moreover, benefiting from the well-designed frame structure, MDD is capable of outperforming TDD in the two-tier fronthaul networks. Additionally, the effect of the THz bandwidth on system performance is analyzed, and it is shown that with sufficient frequency resources, our proposed two-tier fully-wireless fronthaul scheme can achieve a comparable performance to the fiber-optic based systems. Finally, the superiority of the proposed MDD-enabled fronthaul scheme is verified in a practical scenario with realistic ray-tracing simulations.
Improving the performance of classical linear algebra iterative methods via hybrid parallelism
Authors: Pedro J. Martinez-Ferrer, Tufan Arslan, Vicenç Beltran
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)
Abstract
We propose fork-join and task-based hybrid implementations of four classical linear algebra iterative methods (Jacobi, Gauss-Seidel, conjugate gradient and biconjugate gradient stabilised) as well as variations of them. Algorithms are duly documented and the corresponding source code is made publicly available for reproducibility. Both weak and strong scalability benchmarks are conducted to statistically analyse their relative efficiencies. The weak scalability results assert the superiority of a task-based hybrid parallelisation over MPI-only and fork-join hybrid implementations. Indeed, the task-based model is able to achieve speedups of up to 25% larger than its MPI-only counterpart depending on the numerical method and the computational resources used. For strong scalability scenarios, hybrid methods based on tasks remain more efficient with moderate computational resources where data locality does not play an important role. Fork-join hybridisation often yields mixed results and hence does not present a competitive advantage over a much simpler MPI approach.
Safe motion planning with environment uncertainty
Authors: Antony Thomas, Fulvio Mastrogiovanni, Marco Baglietto
Abstract
We present an approach for safe motion planning under robot state and environment (obstacle and landmark location) uncertainties. To this end, we first develop an approach that accounts for the landmark uncertainties during robot localization. Existing planning approaches assume that the landmark locations are well known or are known with little uncertainty. However, this might not be true in practice. Noisy sensors and imperfect motions compound to the errors originating from the estimate of environment features. Moreover, possible occlusions and dynamic objects in the environment render imperfect landmark estimation. Consequently, not considering this uncertainty can wrongly localize the robot, leading to inefficient plans. Our approach thus incorporates the landmark uncertainty within the Bayes filter estimation framework. We also analyze the effect of considering this uncertainty and delineate the conditions under which it can be ignored. Second, we extend the state-of-the-art by computing an exact expression for the collision probability under Gaussian distributed robot motion, perception and obstacle location uncertainties. We formulate the collision probability process as a quadratic form in random variables. Under Gaussian distribution assumptions, an exact expression for collision probability is thus obtained which is computable in real-time. In contrast, existing approaches approximate the collision probability using upper-bounds that can lead to overly conservative estimate and thereby suboptimal plans. We demonstrate and evaluate our approach using a theoretical example and simulations. We also present a comparison of our approach to different state-of-the-art methods.
Stochastic Chemical Reaction Networks for MAP Detection in Cellular Receivers
Authors: Bastian Heinlein, Lukas Brand, Malcolm Egan, Maximilian Schäfer, Robert Schober, Sebastian Lotter
Abstract
In order to fully exploit the potential of molecular communication (MC) for intra-body communication, practically implementable cellular receivers are an important long-term goal. A variety of receiver architectures based on chemical reaction networks (CRNs) and gene-regulatory networks (GRNs) has been introduced in the literature, because cells use these concepts to perform computations in nature. However, practical feasibility is still limited by stochastic fluctuations of chemical reactions and long computation times in GRNs. Therefore, in this paper, we propose two receiver designs based on stochastic CRNs, i.e., CRNs that perform computations by exploiting the intrinsic fluctuations of chemical reactions with very low molecule counts. The first CRN builds on a recent result from chemistry that showed how Boltzmann machines (BMs), a commonly used machine learning model, can be implemented with CRNs. We show that BMs with optimal parameter values and their CRN implementations can act as maximum-a-posteriori (MAP) detectors. Furthermore, we show that BMs can be efficiently trained from simulation data to achieve close-to-MAP performance. While this approach yields a fixed CRN once deployed, our second approach based on a manually designed CRN can be trained with pilot symbols even within the cell and thus adapt to changing channel conditions. We extend the literature by showing that practical robust detectors can achieve close-to-MAP performance even without explicit channel knowledge.
The Robustness of Computer Vision Models against Common Corruptions: a Survey
Authors: Shunxin Wang, Raymond Veldhuis, Nicola Strisciuglio
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The performance of computer vision models is susceptible to unexpected changes in input images when deployed in real scenarios. These changes are referred to as common corruptions. While they can hinder the applicability of computer vision models in real-world scenarios, they are not always considered as a testbed for model generalization and robustness. In this survey, we present a comprehensive and systematic overview of methods that improve corruption robustness of computer vision models. Unlike existing surveys that focus on adversarial attacks and label noise, we cover extensively the study of robustness to common corruptions that can occur when deploying computer vision models to work in practical applications. We describe different types of image corruption and provide the definition of corruption robustness. We then introduce relevant evaluation metrics and benchmark datasets. We categorize methods into four groups. We also cover indirect methods that show improvements in generalization and may improve corruption robustness as a byproduct. We report benchmark results collected from the literature and find that they are not evaluated in a unified manner, making it difficult to compare and analyze. We thus built a unified benchmark framework to obtain directly comparable results on benchmark datasets. Furthermore, we evaluate relevant backbone networks pre-trained on ImageNet using our framework, providing an overview of the base corruption robustness of existing models to help choose appropriate backbones for computer vision tasks. We identify that developing methods to handle a wide range of corruptions and efficiently learn with limited data and computational resources is crucial for future development. Additionally, we highlight the need for further investigation into the relationship among corruption robustness, OOD generalization, and shortcut learning.
Brain Tumor Detection using Swin Transformers
Authors: Prateek A. Meshram, Suraj Joshi, Devarshi Mahajan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The first MRI scan was done in the year 1978 by researchers at EML Laboratories. As per an estimate, approximately 251,329 people died due to primary cancerous brain and CNS (Central Nervous System) Tumors in the year 2020. It has been recommended by various medical professionals that brain tumor detection at an early stage would help in saving many lives. Whenever radiologists deal with a brain MRI they try to diagnose it with the histological subtype which is quite subjective and here comes the major issue. Upon that, in developing countries like India, where there is 1 doctor for every 1151 people, the need for efficient diagnosis to help radiologists and doctors come into picture. In our approach, we aim to solve the problem using swin transformers and deep learning to detect, classify, locate and provide the size of the tumor in the particular MRI scan which would assist the doctors and radiologists in increasing their efficiency. At the end, the medics would be able to download the predictions and measures in a PDF (Portable Document Format). Keywords: brain tumor, transformers, classification, medical, deep learning, detection
Abstract
We propose a new signaling scheme for on-chip optical-electrical-optical artificial neural networks that utilizes orthogonal delay division multiplexing and pilot-tone based self-homodyne detection. This scheme offers a more efficient scaling of the optical power budget with increasing network complexity. Our simulations, based on a 220 nm SOI silicon photonics technology, suggest that the network can support 31 x 31 neurons, with 961 links and freely programmable weights, using a single 500 mW optical comb and an SNR of 21.3 dB per neuron. Moreover, it features a low sensitivity to temperature fluctuations, ensuring that it can be operated outside of a laboratory environment. We demonstrate the network's effectiveness in nonlinear equalization tasks by training it to equalize a time-interleaved ADC architecture, achieving an ENOB over 4 over the entire 75 GHz ADC bandwidth. We anticipate that this network architecture will enable broadband and low latency nonlinear signal processing in practical settings such as ultra-broadband data converters and real-time control systems.
Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction
Authors: Tu T. Do, Mai Anh Vu, Hoang Thien Ly, Thu Nguyen, Steven A. Hicks, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen
Abstract
Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.
Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods
Authors: Nhat-Hao Pham, Khanh-Linh Vo, Mai Anh Vu, Thu Nguyen, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen
Abstract
Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can pose a significant challenge in estimating correlation coefficients. In this paper, we compare the effects of various missing data methods on the correlation plot, focusing on two common missing patterns: random and monotone. We aim to provide practical strategies and recommendations for researchers and practitioners in creating and analyzing the correlation plot. Our experimental results suggest that while imputation is commonly used for missing data, using imputed data for plotting the correlation matrix may lead to a significantly misleading inference of the relation between the features. We recommend using DPER, a direct parameter estimation approach, for plotting the correlation matrix based on its performance in the experiments.
Toward Open Integrated Access and Backhaul with O-RAN
Abstract
Millimeter wave (mmWave) communications has been recently standardized for use in the fifth generation (5G) of cellular networks, fulfilling the promise of multi-gigabit mobile throughput of current and future mobile radio network generations. In this context, the network densification required to overcome the difficult mmWave propagation will result in increased deployment costs. Integrated Access and Backhaul (IAB) has been proposed as an effective mean of reducing densification costs by deploying a wireless mesh network of base stations, where backhaul and access transmissions share the same radio technology. However, IAB requires sophisticated control mechanisms to operate efficiently and address the increased complexity. The Open Radio Access Network (RAN) paradigm represents the ideal enabler of RAN intelligent control, but its current specifications are not compatible with IAB. In this work, we discuss the challenges of integrating IAB into the Open RAN ecosystem, detailing the required architectural extensions that will enable dynamic control of 5G IAB networks. We implement the proposed integrated architecture into the first publicly-available Open-RAN-enabled experimental framework, which allows prototyping and testing Open-RAN-based solutions over end-to-end 5G IAB networks. Finally, we validate the framework with both ideal and realistic deployment scenarios exploiting the large-scale testing capabilities of publicly available experimental platforms
Compressing neural network by tensor network with exponentially fewer variational parameters
Authors: Yong Qing, Peng-Fei Zhou, Ke Li, Shi-Ju Ran
Abstract
Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including over-fitting, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN by encoding them to multi-layer tensor networks (TN's) that contain exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, and VGG-16) and datasets (MNIST and CIFAR-10), surpassing the state-of-the-art method based on shallow tensor networks. For instance, about 10 million parameters in the three convolutional layers of VGG-16 are compressed in TN's with just $632$ parameters, while the testing accuracy on CIFAR-10 is surprisingly improved from $81.14\%$ by the original NN to $84.36\%$ after compression. Our work suggests TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which superiorly exploits the compressibility than the simple multi-way arrays.
Computation-Efficient Backscatter-Blessed MEC with User Reciprocity
Authors: Bowen Gu, Hao Xie, Dong Li
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This letter proposes a new user cooperative offloading protocol called user reciprocity in backscatter communication (BackCom)-aided mobile edge computing systems with efficient computation, whose quintessence is that each user can switch alternately between the active or the BackCom mode in different slots, and one user works in the active mode and the other user works in the BackCom mode in each time slot. In particular, the user in the BackCom mode can always use the signal transmitted by the user in the active mode for more data transmission in a spectrum-sharing manner. To evaluate the proposed protocol, a computation efficiency (CE) maximization-based optimization problem is formulated by jointly power control, time scheduling, reflection coefficient adjustment, and computing frequency allocation, while satisfying various physical constraints on the maximum energy budget, the computing frequency threshold, the minimum computed bits, and harvested energy threshold. To solve this non-convex problem, Dinkelbach's method and quadratic transform are first employed to transform the complex fractional forms into linear ones. Then, an iterative algorithm is designed by decomposing the resulting problem to obtain the suboptimal solution. The closed-form solutions for the transmit power, the RC, and the local computing frequency are provided for more insights. Besides, the analytical performance gain with the reciprocal mode is also derived. Simulation results demonstrate that the proposed scheme outperforms benchmark schemes regarding the CE.
Access-Redundancy Tradeoffs in Quantized Linear Computations
Authors: Vinayak Ramkumar, Netanel Raviv, Itzhak Tamo
Abstract
Linear real-valued computations over distributed datasets are common in many applications, most notably as part of machine learning inference. In particular, linear computations which are quantized, i.e., where the coefficients are restricted to a predetermined set of values (such as $\pm 1$), gained increasing interest lately due to their role in efficient, robust, or private machine learning models. Given a dataset to store in a distributed system, we wish to encode it so that all such computations could be conducted by accessing a small number of servers, called the access parameter of the system. Doing so relieves the remaining servers to execute other tasks, and reduces the overall communication in the system. Minimizing the access parameter gives rise to an access-redundancy tradeoff, where smaller access parameter requires more redundancy in the system, and vice versa. In this paper we study this tradeoff, and provide several explicit code constructions based on covering codes in a novel way. While the connection to covering codes has been observed in the past, our results strictly outperform the state-of-the-art, and extend the framework to new families of computations.
Joint Falsification and Fidelity Settings Optimization for Validation of Safety-Critical Systems: A Theoretical Analysis
Abstract
Safety validation is a crucial component in the development and deployment of autonomous systems, such as self-driving vehicles and robotic systems. Ensuring safe operation necessitates extensive testing and verification of control policies, typically conducted in simulation environments. High-fidelity simulators accurately model real-world dynamics but entail high computational costs, limiting their scalability for exhaustive testing. Conversely, low-fidelity simulators offer efficiency but may not capture the intricacies of high-fidelity simulators, potentially yielding false conclusions. We propose a joint falsification and fidelity optimization framework for safety validation of autonomous systems. Our mathematical formulation combines counterexample searches with simulator fidelity improvement, facilitating more efficient exploration of the critical environmental configurations challenging the control system. Our contributions encompass a set of theorems addressing counterexample sensitivity analysis, sample complexity, convergence, the interplay between the outer and inner optimization loops, and regret bound analysis. The proposed joint optimization approach enables a more targeted and efficient testing process, optimizes the use of available computational resources, and enhances confidence in autonomous system safety validation.
Feature Expansion for Graph Neural Networks
Authors: Jiaqi Sun, Lin Zhang, Guangyi Chen, Kun Zhang, Peng XU, Yujiu Yang
Abstract
Graph neural networks aim to learn representations for graph-structured data and show impressive performance, particularly in node classification. Recently, many methods have studied the representations of GNNs from the perspective of optimization goals and spectral graph theory. However, the feature space that dominates representation learning has not been systematically studied in graph neural networks. In this paper, we propose to fill this gap by analyzing the feature space of both spatial and spectral models. We decompose graph neural networks into determined feature spaces and trainable weights, providing the convenience of studying the feature space explicitly using matrix space analysis. In particular, we theoretically find that the feature space tends to be linearly correlated due to repeated aggregations. Motivated by these findings, we propose 1) feature subspaces flattening and 2) structural principal components to expand the feature space. Extensive experiments verify the effectiveness of our proposed more comprehensive feature space, with comparable inference time to the baseline, and demonstrate its efficient convergence capability.
Abstract
Locating a specific mobile application screen from existing repositories is restricted to basic keyword searches, such as Google Image Search, or necessitates a complete query screen image, as in the case of Swire. However, interactive partial sketch-based solutions like PSDoodle have limitations, including inaccuracy and an inability to consider text appearing on the screen. A potentially effective solution involves implementing a system that provides interactive partial sketching functionality for efficiently structuring user interface elements. Additionally, the system should incorporate text queries to enhance its capabilities further. Our approach, TpD, represents the pioneering effort to enable an iterative search of screens by combining interactive sketching and keyword search techniques. TpD is built on a combination of the Rico repository of approximately 58k Android app screens and the PSDoodle. Our evaluation with third-party software developers showed that PSDoodle provided higher top-10 screen retrieval accuracy than state-of-the-art Swire and required less time to complete a query than other interactive solutions.
Concentric Tube Robot Redundancy Resolution via Velocity/Compliance Manipulability Optimization
Abstract
Concentric Tube Robots (CTR) have the potential to enable effective minimally invasive surgeries. While extensive modeling and control schemes have been proposed in the past decade, limited efforts have been made to improve the trajectory tracking performance from the perspective of manipulability , which can be critical to generate safe motion and feasible actuator commands. In this paper, we propose a gradient-based redundancy resolution framework that optimizes velocity/compliance manipulability-based performance indices during trajectory tracking for a kinematically redundant CTR. We efficiently calculate the gradients of manipulabilities by propagating the first- and second-order derivatives of state variables of the Cosserat rod model along the CTR arc length, reducing the gradient computation time by 68\% compared to finite difference method. Task-specific performance indices are optimized by projecting the gradient into the null-space of trajectory tracking. The proposed method is validated in three exemplary scenarios that involve trajectory tracking, obstacle avoidance, and external load compensation, respectively. Simulation results show that the proposed method is able to accomplish the required tasks while commonly used redundancy resolution approaches underperform or even fail.
Privacy-Preserving Prompt Tuning for Large Language Model Services
Authors: Yansong Li, Zhixing Tan, Yang Liu
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract
Prompt tuning provides an efficient way for users to customize Large Language Models (LLMs) with their private data in the emerging LLM service scenario. However, the sensitive nature of private data brings the need for privacy preservation in LLM service customization. Based on prompt tuning, we propose Privacy-Preserving Prompt Tuning (RAPT), a framework that provides privacy guarantees for LLM services. \textsc{rapt} adopts a local privacy setting, allowing users to privatize their data locally with local differential privacy. As prompt tuning performs poorly when directly trained on privatized data, we introduce a novel privatized token reconstruction task that is trained jointly with the downstream task, allowing LLMs to learn better task-dependent representations. Despite the simplicity of our framework, experiments show that RAPT achieves competitive performance across tasks while providing privacy guarantees against adversaries.
A Joint Python/C++ Library for Efficient yet Accessible Black-Box and Gray-Box Optimization with GOMEA
Authors: Anton Bouter, Peter A.N. Bosman
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Exploiting knowledge about the structure of a problem can greatly benefit the efficiency and scalability of an Evolutionary Algorithm (EA). Model-Based EAs (MBEAs) are capable of doing this by explicitly modeling the problem structure. The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is among the state-of-the-art of MBEAs due to its use of a linkage model and the optimal mixing variation operator. Especially in a Gray-Box Optimization (GBO) setting that allows for partial evaluations, i.e., the relatively efficient evaluation of a partial modification of a solution, GOMEA is known to excel. Such GBO settings are known to exist in various real-world applications to which GOMEA has successfully been applied. In this work, we introduce the GOMEA library, making existing GOMEA code in C++ accessible through Python, which serves as a centralized way of maintaining and distributing code of GOMEA for various optimization domains. Moreover, it allows for the straightforward definition of BBO as well as GBO fitness functions within Python, which are called from the C++ optimization code for each required (partial) evaluation. We describe the structure of the GOMEA library and how it can be used, and we show its performance in both GBO and Black-Box Optimization (BBO).
Embedded Feature Correlation Optimization with Specific Parameter Initialization for 2D/3D Registration
Authors: Minheng Chen, Zhirun Zhang, Shuheng Gu, Youyong Kong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
We present a novel deep learning-based framework: Embedded Feature Correlation Optimization with Specific Parameter Initialization (COSPI) for 2D/3D registration which is a most challenging problem due to the difficulty such as dimensional mismatch, heavy computation load and lack of golden evaluating standard. The framework we designed includes a parameter specification module to efficiently choose initialization pose parameter and a fine-registration network to align images. The proposed framework takes extracting multi-scale features into consideration using a novel composite connection encoder with special training techniques. The method is compared with both learning-based methods and optimization-based methods to further evaluate the performance. Our experiments demonstrate that the method in this paper has improved the registration performance, and thereby outperforms the existing methods in terms of accuracy and running time. We also show the potential of the proposed method as an initial pose estimator.
Uncertainty Quantification of a Wind Tunnel-Informed Stochastic Wind Load Model for Wind Engineering Applications
Authors: Thays Guerra Araujo Duarte, Srinivasan Arunachalam, Arthriya Subgranon, Seymour M J Spence
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
The simulation of stochastic wind loads is necessary for many applications in wind engineering. The proper orthogonal decomposition (POD)-based spectral representation method is a popular approach used for this purpose due to its computational efficiency. For general wind directions and building configurations, the data-driven POD-based stochastic model is an alternative that uses wind tunnel smoothed auto- and cross-spectral density as input to calibrate the eigenvalues and eigenvectors of the target load process. Even though this method is straightforward and presents advantages compared to using empirical target auto- and cross-spectral density, the limitations and errors associated with this model have not been investigated. To this end, an extensive experimental study on a rectangular building model considering multiple wind directions and configurations was conducted to allow the quantification of uncertainty related to the use of wind tunnel data for calibration and validation of the data-driven POD-based stochastic model. Errors associated with the use of typical wind tunnel records for model calibration, the model itself, and the truncation of modes were quantified. Results demonstrate that the data-driven model can efficiently simulate stochastic wind loads with negligible model errors, while the errors associated with calibration to typical wind tunnel data can be important.
Pseudo-reversing and its application for multiscaling of manifold-valued data
Abstract
The well-known Wiener's lemma is a valuable statement in harmonic analysis; in the Banach space of functions with absolutely convergent Fourier series, the lemma proposes a sufficient condition for the existence of a pointwise multiplicative inverse. We call the functions that admit an inverse as \emph{reversible}. In this paper, we introduce a simple and efficient method for approximating the inverse of functions, which are not necessarily reversible, with elements from the space. We term this process \emph{pseudo-reversing}. In addition, we define a condition number to measure the reversibility of functions and study the reversibility under pseudo-reversing. Then, we exploit pseudo-reversing to construct a multiscale pyramid transform based on a refinement operator and its pseudo-reverse for analyzing real and manifold-valued data. Finally, we present the properties of the resulting multiscale methods and numerically illustrate different aspects of pseudo-reversing, including the applications of its resulting multiscale transform to data compression and contrast enhancement of manifold-valued sequence.
Optimal Eventual Byzantine Agreement Protocols with Omission Failures
Authors: Kaya Alpturer, Joseph Y. Halpern, Ron van der Meyden
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Work on \emph{optimal} protocols for \emph{Eventual Byzantine Agreement} (EBA) -- protocols that, in a precise sense, decide as soon as possible in every run and guarantee that all nonfaulty agents decide on the same value -- has focused on emph{full-information protocols} (FIPs), where agents repeatedly send messages that completely describe their past observations to every other agent. While it can be shown that, without loss of generality, we can take an optimal protocol to be an FIP, full information exchange is impractical to implement for many applications due to the required message size. We separate protocols into two parts, the \emph{information-exchange protocol} and the \emph{action protocol}, so as to be able to examine the effects of more limited information exchange. We then define a notion of optimality with respect to an information-exchange protocol. Roughly speaking, an action protocol $P$ is optimal with respect to an information-exchange protocol $\mathcal{E}$ if, with $P$, agents decide as soon as possible among action protocols that exchange information according to $\mathcal{E}$. We present a knowledge-based EBA program for omission failures all of whose implementations are guaranteed to be correct and are optimal if the information exchange satisfies a certain safety condition. We then construct concrete programs that implement this knowledge-based program in two settings of interest that are shown to satisfy the safety condition. Finally, we show that a small modification of our program results in an FIP that is both optimal and efficiently implementable, settling an open problem posed by Halpern, Moses, and Waarts (SIAM J. Comput., 2001).
FedPDD: A Privacy-preserving Double Distillation Framework for Cross-silo Federated Recommendation
Authors: Sheng Wan, Dashan Gao, Hanlin Gu, Daning Hu
Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract
Cross-platform recommendation aims to improve recommendation accuracy by gathering heterogeneous features from different platforms. However, such cross-silo collaborations between platforms are restricted by increasingly stringent privacy protection regulations, thus data cannot be aggregated for training. Federated learning (FL) is a practical solution to deal with the data silo problem in recommendation scenarios. Existing cross-silo FL methods transmit model information to collaboratively build a global model by leveraging the data of overlapped users. However, in reality, the number of overlapped users is often very small, thus largely limiting the performance of such approaches. Moreover, transmitting model information during training requires high communication costs and may cause serious privacy leakage. In this paper, we propose a novel privacy-preserving double distillation framework named FedPDD for cross-silo federated recommendation, which efficiently transfers knowledge when overlapped users are limited. Specifically, our double distillation strategy enables local models to learn not only explicit knowledge from the other party but also implicit knowledge from its past predictions. Moreover, to ensure privacy and high efficiency, we employ an offline training scheme to reduce communication needs and privacy leakage risk. In addition, we adopt differential privacy to further protect the transmitted information. The experiments on two real-world recommendation datasets, HetRec-MovieLens and Criteo, demonstrate the effectiveness of FedPDD compared to the state-of-the-art approaches.
Vertical Federated Learning over Cloud-RAN: Convergence Analysis and System Optimization
Abstract
Vertical federated learning (FL) is a collaborative machine learning framework that enables devices to learn a global model from the feature-partition datasets without sharing local raw data. However, as the number of the local intermediate outputs is proportional to the training samples, it is critical to develop communication-efficient techniques for wireless vertical FL to support high-dimensional model aggregation with full device participation. In this paper, we propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation by leveraging over-the-air computation (AirComp) and alleviating communication straggler issue with cooperative model aggregation among geographically distributed edge servers. However, the model aggregation error caused by AirComp and quantization errors caused by the limited fronthaul capacity degrade the learning performance for vertical FL. To address these issues, we characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions. To improve the learning performance, we establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed. We conduct extensive simulations to demonstrate the effectiveness of the proposed system architecture and optimization framework for vertical FL.
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition
Authors: Naga VS Raviteja Chappa, Pha Nguyen, Alexander H Nelson, Han-Seok Seo, Xin Li, Page Daniel Dobbs, Khoa Luu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data. To extract spatio-temporal information, we create local and global views with varying frame rates. Our self-supervised objective ensures that features extracted from contrasting views of the same video are consistent across spatio-temporal domains. Our proposed approach is efficient in using transformer-based encoders for alleviating the weakly supervised setting of group activity recognition. By leveraging the benefits of transformer models, our approach can model long-term relationships along spatio-temporal dimensions. Our proposed SoGAR method achieves state-of-the-art results on three group activity recognition benchmarks, namely JRDB-PAR, NBA, and Volleyball datasets, surpassing the current state-of-the-art in terms of F1-score, MCA, and MPCA metrics.
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Authors: Hassan Akbari, Dan Kondratyuk, Yin Cui, Rachel Hornung, Huisheng Wang, Hartwig Adam
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Abstract
We present Integrated Multimodal Perception (IMP), a simple and scalable multimodal multi-task training and modeling approach. IMP integrates multimodal inputs including image, video, text, and audio into a single Transformer encoder with minimal modality-specific components. IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient model \& task scaling. We conduct extensive empirical studies about IMP and reveal the following key insights: 1) performing gradient descent updates by alternating on diverse heterogeneous modalities, loss functions, and tasks, while also varying input resolutions, efficiently improves multimodal understanding. 2) model sparsification with MoE on a single modality-agnostic encoder substantially improves the performance, outperforming dense models that use modality-specific encoders or additional fusion layers and greatly mitigating the conflicts between modalities. IMP achieves competitive performance on a wide range of downstream tasks including image classification, video classification, image-text, and video-text retrieval. Most notably, we train a sparse IMP-MoE-L focusing on video tasks that achieves new state-of-the-art in zero-shot video classification. Our model achieves 77.0% on Kinetics-400, 76.8% on Kinetics-600, and 76.8% on Kinetics-700 zero-shot classification accuracy, improving the previous state-of-the-art by +5%, +6.7%, and +5.8%, respectively, while using only 15% of their total training computational cost.
Korean Named Entity Recognition Based on Language-Specific Features
Abstract
In the paper, we propose a novel way of improving named entity recognition in the Korean language using its language-specific features. While the field of named entity recognition has been studied extensively in recent years, the mechanism of efficiently recognizing named entities in Korean has hardly been explored. This is because the Korean language has distinct linguistic properties that prevent models from achieving their best performances. Therefore, an annotation scheme for {Korean corpora} by adopting the CoNLL-U format, which decomposes Korean words into morphemes and reduces the ambiguity of named entities in the original segmentation that may contain functional morphemes such as postpositions and particles, is proposed herein. We investigate how the named entity tags are best represented in this morpheme-based scheme and implement an algorithm to convert word-based {and syllable-based Korean corpora} with named entities into the proposed morpheme-based format. Analyses of the results of {statistical and neural} models reveal that the proposed morpheme-based format is feasible, and the {varied} performances of the models under the influence of various additional language-specific features are demonstrated. Extrinsic conditions were also considered to observe the variance of the performances of the proposed models, given different types of data, including the original segmentation and different types of tagging formats.
Prior Global Search Stability on Finite Graphs with Uncertainty. May Greedy Search Win?
Authors: Andrey Ananev, Aleksey Khlyupin
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Abstract
This research paper addresses the stability of search algorithms in complex networks when dealing with incomplete information or uncertainty. We propose a theoretical model to investigate whether a global search algorithm with incomplete prior information can be outperformed by a stochastic greedy search on average. The model incorporates random variables to perturb edge weights in the graph, thus capturing the uncertainty of available information. Our findings indicate that some graphs and uncertainty model parameters exist where the global search algorithm fails under uncertainty conditions, while the random greedy search performs better. We derive a critical curve that separates stable from unstable graphs for global search with incomplete information. Interestingly, the critical curve's behavior changes from monotonic to bell-shaped depending on the uncertainty parameters. We test our proposed model through numerical simulations on various synthetic and real-world graphs with different structures. Our results offer insights into the design and optimization of search algorithms for network-based applications, such as communication networks, social networks, and biological networks. We also discuss the study of memory and associative learning in miniature insects, highlighting the potential of efficient search and walking strategies for small robots or devices that operate in a limited area in space.
Generalized Stratified Sampling for Efficient Reliability Assessment of Structures Against Natural Hazards
Abstract
Performance-based engineering for natural hazards facilitates the design and appraisal of structures with rigorous evaluation of their uncertain structural behavior under potentially extreme stochastic loads expressed in terms of failure probabilities against stated criteria. As a result, efficient stochastic simulation schemes are central to computational frameworks that aim to estimate failure probabilities associated with multiple limit states using limited sample sets. In this work, a generalized stratified sampling scheme is proposed in which two phases of sampling are involved: the first is devoted to the generation of strata-wise samples and the estimation of strata probabilities whereas the second aims at the estimation of strata-wise failure probabilities. Phase-I sampling enables the selection of a generalized stratification variable (i.e., not necessarily belonging to the input set of random variables) for which the probability distribution is not known a priori. To improve the efficiency, Markov Chain Monte Carlo Phase-I sampling is proposed when Monte Carlo simulation is deemed infeasible and optimal Phase-II sampling is implemented based on user-specified target coefficients of variation for the limit states of interest. The expressions for these coefficients are derived with due regard to the sample correlations induced by the Markov chains and the uncertainty in the estimated strata probabilities. The proposed stochastic simulation scheme reaps the benefits of near-optimal stratified sampling for a broader choice of stratification variables in high-dimensional reliability problems with a mechanism to approximately control the accuracy of the failure probability estimators. The practicality of the scheme is demonstrated using two examples involving the estimation of failure probabilities associated with highly nonlinear responses induced by wind and seismic excitations.
Non-Euclidean Motion Planning with Graphs of Geodesically-Convex Sets
Authors: Thomas Cohn, Mark Petersen, Max Simchowitz, Russ Tedrake
Abstract
Computing optimal, collision-free trajectories for high-dimensional systems is a challenging problem. Sampling-based planners struggle with the dimensionality, whereas trajectory optimizers may get stuck in local minima due to inherent nonconvexities in the optimization landscape. The use of mixed-integer programming to encapsulate these nonconvexities and find globally optimal trajectories has recently shown great promise, thanks in part to tight convex relaxations and efficient approximation strategies that greatly reduce runtimes. These approaches were previously limited to Euclidean configuration spaces, precluding their use with mobile bases or continuous revolute joints. In this paper, we handle such scenarios by modeling configuration spaces as Riemannian manifolds, and we describe a reduction procedure for the zero-curvature case to a mixed-integer convex optimization problem. We demonstrate our results on various robot platforms, including producing efficient collision-free trajectories for a PR2 bimanual mobile manipulator.
RECKONING: Reasoning through Dynamic Knowledge Encoding
Authors: Zeming Chen, Gail Weiss, Eric Mitchell, Asli Celikyilmaz, Antoine Bosselut
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Recent studies on transformer-based language models show that they can answer questions by reasoning over knowledge provided as part of the context (i.e., in-context reasoning). However, since the available knowledge is often not filtered for a particular question, in-context reasoning can be sensitive to distractor facts, additional content that is irrelevant to a question but that may be relevant for a different question (i.e., not necessarily random noise). In these situations, the model fails to distinguish the knowledge that is necessary to answer the question, leading to spurious reasoning and degraded performance. This reasoning failure contrasts with the model's apparent ability to distinguish its contextual knowledge from all the knowledge it has memorized during pre-training. Following this observation, we propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model's parameters before presenting it with a question. Our method, RECKONING, is a bi-level learning algorithm that teaches language models to reason by updating their parametric knowledge through back-propagation, allowing them to then answer questions using the updated parameters. During training, the inner loop rapidly adapts a copy of the model weights to encode contextual knowledge into its parameters. In the outer loop, the model learns to uses the updated weights to reproduce and answer reasoning questions about the memorized knowledge. Our experiments on two multi-hop reasoning datasets show that RECKONING's performance improves over the in-context reasoning baseline (by up to 4.5%). We also find that compared to in-context reasoning, RECKONING generalizes better to longer reasoning chains unseen during training, is more robust to distractors in the context, and is more computationally efficient when multiple questions are asked about the same knowledge.
Keyword: faster
DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors
Authors: Chia-Hao Li, Niraj K. Jha
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
Abstract
Modern advances in machine learning (ML) and wearable medical sensors (WMSs) in edge devices have enabled ML-driven disease detection for smart healthcare. Conventional ML-driven disease detection methods rely on customizing individual models for each disease and its corresponding WMS data. However, such methods lack adaptability to distribution shifts and new task classification classes. Also, they need to be rearchitected and retrained from scratch for each new disease. Moreover, installing multiple ML models in an edge device consumes excessive memory, drains the battery faster, and complicates the detection process. To address these challenges, we propose DOCTOR, a multi-disease detection continual learning (CL) framework based on WMSs. It employs a multi-headed deep neural network (DNN) and an exemplar-replay-style CL algorithm. The CL algorithm enables the framework to continually learn new missions where different data distributions, classification classes, and disease detection tasks are introduced sequentially. It counteracts catastrophic forgetting with a data preservation method and a synthetic data generation module. The data preservation method efficiently preserves the most informative subset of training data from previous missions based on the average training loss of each data instance. The synthetic data generation module models the probability distribution of the real training data and then generates as much synthetic data as needed for replays while maintaining data privacy. The multi-headed DNN enables DOCTOR to detect multiple diseases simultaneously based on user WMS data. We demonstrate DOCTOR's efficacy in maintaining high multi-disease classification accuracy with a single DNN model in various CL experiments. DOCTOR achieves very competitive performance across all CL scenarios relative to the ideal joint-training framework while maintaining a small model size.
Universal Matrix Sparsifiers and Fast Deterministic Algorithms for Linear Algebra
Authors: Rajarshi Bhattacharjee, Gregory Dexter, Cameron Musco, Archan Ray, David P Woodruff
Subjects: Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)
Abstract
Given $\mathbf A \in \mathbb{R}^{n \times n}$ with entries bounded in magnitude by $1$, it is well-known that if $S \subset [n] \times [n]$ is a uniformly random subset of $\tilde{O} (n/\epsilon^2)$ entries, and if ${\mathbf A}_S$ equals $\mathbf A$ on the entries in $S$ and is zero elsewhere, then $|\mathbf A - \frac{n^2}{s} \cdot {\mathbf A}_S|_2 \le \epsilon n$ with high probability, where $|\cdot|_2$ is the spectral norm. We show that for positive semidefinite (PSD) matrices, no randomness is needed at all in this statement. Namely, there exists a fixed subset $S$ of $\tilde{O} (n/\epsilon^2)$ entries that acts as a universal sparsifier: the above error bound holds simultaneously for every bounded entry PSD matrix $\mathbf A \in \mathbb{R}^{n \times n}$. One can view this result as a significant extension of a Ramanujan expander graph, which sparsifies any bounded entry PSD matrix, not just the all ones matrix. We leverage the existence of such universal sparsifiers to give the first deterministic algorithms for several central problems related to singular value computation that run in faster than matrix multiplication time. We also prove universal sparsification bounds for non-PSD matrices, showing that $\tilde{O} (n/\epsilon^4)$ entries suffices to achieve error $\epsilon \cdot \max(n,|\mathbf A|_1)$, where $|\mathbf A|_1$ is the trace norm. We prove that this is optimal up to an $\tilde{O} (1/\epsilon^2)$ factor. Finally, we give an improved deterministic spectral approximation algorithm for PSD $\mathbf A$ with entries lying in ${-1,0,1}$, which we show is nearly information-theoretically optimal.
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs
Authors: Hongjing Huang, Yingtao Li, Jie Sun, Xueying Zhu, Jie Zhang, Liang Luo, Jialin Li, Zeke Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Generalized linear models (GLMs) are a widely utilized family of machine learning models in real-world applications. As data size increases, it is essential to perform efficient distributed training for these models. However, existing systems for distributed training have a high cost for communication and often use large batch sizes to balance computation and communication, which negatively affects convergence. Therefore, we argue for an efficient distributed GLM training system that strives to achieve linear scalability, while keeping batch size reasonably low. As a start, we propose P4SGD, a distributed heterogeneous training system that efficiently trains GLMs through model parallelism between distributed FPGAs and through forward-communication-backward pipeline parallelism within an FPGA. Moreover, we propose a light-weight, latency-centric in-switch aggregation protocol to minimize the latency of the AllReduce operation between distributed FPGAs, powered by a programmable switch. As such, to our knowledge, P4SGD is the first solution that achieves almost linear scalability between distributed accelerators through model parallelism. We implement P4SGD on eight Xilinx U280 FPGAs and a Tofino P4 switch. Our experiments show P4SGD converges up to 6.5X faster than the state-of-the-art GPU counterpar.
Acceleration of FM-index Queries Through Prefix-free Parsing
Authors: Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, Travis Gagie
Abstract
FM-indexes are a crucial data structure in DNA alignment, for example, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al.\ proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing -- which takes parameters that let us tune the average length of the phrases -- instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38. And was consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it is very clear that our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory. Our source code is available at https://github.com/marco-oliva/afm .
Fast Distributed Inference Serving for Large Language Models
Authors: Bingyang Wu, Yinmin Zhong, Zili Zhang, Gang Huang, Xuanzhe Liu, Xin Jin
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Large language models (LLMs) power a new generation of interactive AI applications exemplified by ChatGPT. The interactive nature of these applications demand low job completion time (JCT) for model inference. Existing LLM serving systems use run-to-completion processing for inference jobs, which suffers from head-of-line blocking and long JCT. We present FastServe, a distributed inference serving system for LLMs. FastServe exploits the autoregressive pattern of LLM inference to enable preemption at the granularity of each output token. FastServe uses preemptive scheduling to minimize JCT with a novel skip-join Multi-Level Feedback Queue scheduler. Based on the new semi information-agnostic setting of LLM inference, the scheduler leverages the input length information to assign an appropriate initial queue for each arrival job to join. The higher priority queues than the joined queue are skipped to reduce demotions. We design an efficient GPU memory management mechanism that proactively offloads and uploads intermediate states between GPU memory and host memory for LLM inference. We build a system prototype of FastServe based on NVIDIA FasterTransformer. Experimental results show that compared to the state-of-the-art solution Orca, FastServe improves the average and tail JCT by up to 5.1$\times$ and 6.4$\times$, respectively.
Scalable Demand-Driven Call Graph Generation for Python
Abstract
Call graph generation is the foundation of inter-procedural static analysis. PyCG is the state-of-the-art approach for generating call graphs for Python programs. Unfortunately, PyCG does not scale to large programs when adapted to whole-program analysis where dependent libraries are also analyzed. Further, PyCG does not support demand-driven analysis where only the reachable functions from given entry functions are analyzed. Moreover, PyCG is flow-insensitive and does not fully support Python's features, hindering its accuracy. To overcome these drawbacks, we propose a scalable demand-driven approach for generating call graphs for Python programs, and implement it as a prototype tool Jarvis. Jarvis maintains an assignment graph (i.e., points-to relations between program identifiers) for each function in a program to allow reuse and improve scalability. Given a set of entry functions as the demands, Jarvis generates the call graph on-the-fly, where flow-sensitive intra-procedural analysis and inter-procedural analysis are conducted in turn. Our evaluation on a micro-benchmark of 135 small Python programs and a macro-benchmark of 6 real-world Python applications has demonstrated that Jarvis can significantly improve PyCG by at least 67% faster in time, 84% higher in precision, and at least 10% higher in recall.
A Neural Emulator for Uncertainty Estimation of Fire Propagation
Authors: Andrew Bolt, Conrad Sanderson, Joel Janek Dabrowski, Carolyn Huston, Petra Kuhnert
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Wildfire propagation is a highly stochastic process where small changes in environmental conditions (such as wind speed and direction) can lead to large changes in observed behaviour. A traditional approach to quantify uncertainty in fire-front progression is to generate probability maps via ensembles of simulations. However, use of ensembles is typically computationally expensive, which can limit the scope of uncertainty analysis. To address this, we explore the use of a spatio-temporal neural-based modelling approach to directly estimate the likelihood of fire propagation given uncertainty in input parameters. The uncertainty is represented by deliberately perturbing the input weather forecast during model training. The computational load is concentrated in the model training process, which allows larger probability spaces to be explored during deployment. Empirical evaluations indicate that the proposed model achieves comparable fire boundaries to those produced by the traditional SPARK simulation platform, with an overall Jaccard index (similarity score) of 67.4% on a set of 35 simulated fires. When compared to a related neural model (emulator) which was employed to generate probability maps via ensembles of emulated fires, the proposed approach produces competitive Jaccard similarity scores while being approximately an order of magnitude faster.
Keyword: mobile
QF-Geo: Capacity Aware Geographic Routing using Bounded Regions of Wireless Meshes
Authors: Yung-Fu Chen, Kenneth W. Parker, Anish Arora
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Routing in wireless meshes must detour around holes. Extant routing protocols often underperform in minimally connected networks where holes are larger and more frequent. Minimal density networks are common in practice due to deployment cost constraints, mobility dynamics, and/or adversarial jamming. Protocols that use global search to determine optimal paths incur search overhead that limits scaling. Conversely, protocols that use local search tend to find approximately optimal paths at higher densities due to the existence of geometrically direct routes but underperform as the connectivity lowers and regional (or global) information is required to address holes. Designing a routing protocol to achieve high throughput-latency performance across network densities, mobility, and interference dynamics remains challenging. This paper shows that, in a probabilistic setting, bounded exploration can be leveraged to mitigate this challenge. We show, first, that the length of shortest paths in networks with uniform random node distribution can, with high probability (whp), be bounded. Thus, whp a shortest path may be found by limiting exploration to an elliptic region whose size is a function of the network density and the Euclidean distance between the two endpoints. Second, we propose a geographic routing protocol that achieves high reliability and throughput-latency performance by forwarding packets within an ellipse whose size is bounded similarly and by an estimate of the available capacity. Our protocol, QF-Geo, selects forwarding relays within the elliptic region, prioritizing those with sufficient capacity to avoid bottlenecks. Our simulation results show that QF-Geo achieves high goodput efficiency and reliability in both static and mobile networks across both low and high densities, at large scales, with a wide range of concurrent flows, and in the presence of adversarial jamming.
Robot Gaze During Autonomous Navigation and its Effect on Social Presence
Authors: Kerry He, Wesley P. Chan, Akansel Cosgun, Albin Joy, Elizabeth A. Croft
Abstract
As robots have become increasingly common in human-rich environments, it is critical that they are able to exhibit social cues to be perceived as a cooperative and socially-conformant team member. We investigate the effect of robot gaze cues on people's subjective perceptions of a mobile robot as a socially present entity in three common hallway navigation scenarios. The tested robot gaze behaviors were path-oriented (looking at its own future path), or person-oriented (looking at the nearest person), with fixed-gaze as the control. We conduct a real-world study with 36 participants who walked through the hallway, and an online study with 233 participants who were shown simulated videos of the same scenarios. Our results suggest that the preferred gaze behavior is scenario-dependent. Person-oriented gaze behaviors which acknowledge the presence of the human are generally preferred when the robot and human cross paths. However, this benefit is diminished in scenarios that involve less implicit interaction between the robot and the human.
Optical Aberration Correction in Postprocessing using Imaging Simulation
Abstract
As the popularity of mobile photography continues to grow, considerable effort is being invested in the reconstruction of degraded images. Due to the spatial variation in optical aberrations, which cannot be avoided during the lens design process, recent commercial cameras have shifted some of these correction tasks from optical design to postprocessing systems. However, without engaging with the optical parameters, these systems only achieve limited correction for aberrations.In this work, we propose a practical method for recovering the degradation caused by optical aberrations. Specifically, we establish an imaging simulation system based on our proposed optical point spread function model. Given the optical parameters of the camera, it generates the imaging results of these specific devices. To perform the restoration, we design a spatial-adaptive network model on synthetic data pairs generated by the imaging simulation system, eliminating the overhead of capturing training data by a large amount of shooting and registration. Moreover, we comprehensively evaluate the proposed method in simulations and experimentally with a customized digital-single-lens-reflex (DSLR) camera lens and HUAWEI HONOR 20, respectively. The experiments demonstrate that our solution successfully removes spatially variant blur and color dispersion. When compared with the state-of-the-art deblur methods, the proposed approach achieves better results with a lower computational overhead. Moreover, the reconstruction technique does not introduce artificial texture and is convenient to transfer to current commercial cameras. Project Page: \url{https://github.com/TanGeeGo/ImagingSimulation}.
Computational Optics for Mobile Terminals in Mass Production
Abstract
Correcting the optical aberrations and the manufacturing deviations of cameras is a challenging task. Due to the limitation on volume and the demand for mass production, existing mobile terminals cannot rectify optical degradation. In this work, we systematically construct the perturbed lens system model to illustrate the relationship between the deviated system parameters and the spatial frequency response measured from photographs. To further address this issue, an optimization framework is proposed based on this model to build proxy cameras from the machining samples' SFRs. Engaging with the proxy cameras, we synthetic data pairs, which encode the optical aberrations and the random manufacturing biases, for training the learning-based algorithms. In correcting aberration, although promising results have been shown recently with convolutional neural networks, they are hard to generalize to stochastic machining biases. Therefore, we propose a dilated Omni-dimensional dynamic convolution and implement it in post-processing to account for the manufacturing degradation. Extensive experiments which evaluate multiple samples of two representative devices demonstrate that the proposed optimization framework accurately constructs the proxy camera. And the dynamic processing model is well-adapted to manufacturing deviations of different cameras, realizing perfect computational photography. The evaluation shows that the proposed method bridges the gap between optical design, system machining, and post-processing pipeline, shedding light on the joint of image signal reception (lens and sensor) and image signal processing.
Spectrum Breathing: Protecting Over-the-Air Federated Learning Against Interference
Authors: Zhanwei Wang, Kaibin Huang, Yonina C. Eldar
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Theory (cs.IT)
Abstract
Federated Learning (FL) is a widely embraced paradigm for distilling artificial intelligence from distributed mobile data. However, the deployment of FL in mobile networks can be compromised by exposure to interference from neighboring cells or jammers. Existing interference mitigation techniques require multi-cell cooperation or at least interference channel state information, which is expensive in practice. On the other hand, power control that treats interference as noise may not be effective due to limited power budgets, and also that this mechanism can trigger countermeasures by interference sources. As a practical approach for protecting FL against interference, we propose Spectrum Breathing, which cascades stochastic-gradient pruning and spread spectrum to suppress interference without bandwidth expansion. The cost is higher learning latency by exploiting the graceful degradation of learning speed due to pruning. We synchronize the two operations such that their levels are controlled by the same parameter, Breathing Depth. To optimally control the parameter, we develop a martingale-based approach to convergence analysis of Over-the-Air FL with spectrum breathing, termed AirBreathing FL. We show a performance tradeoff between gradient-pruning and interference-induced error as regulated by the breathing depth. Given receive SIR and model size, the optimization of the tradeoff yields two schemes for controlling the breathing depth that can be either fixed or adaptive to channels and the learning process. As shown by experiments, in scenarios where traditional Over-the-Air FL fails to converge in the presence of strong interference, AirBreahing FL with either fixed or adaptive breathing depth can ensure convergence where the adaptive scheme achieves close-to-ideal performance.
A Comprehensive Picture of Factors Affecting User Willingness to Use Mobile Health Applications
Authors: Shaojing Fan, Ramesh C. Jain, Mohan S. Kankanhalli
Abstract
Mobile health (mHealth) applications have become increasingly valuable in preventive healthcare and in reducing the burden on healthcare organizations. The aim of this paper is to investigate the factors that influence user acceptance of mHealth apps and identify the underlying structure that shapes users' behavioral intention. An online study that employed factorial survey design with vignettes was conducted, and a total of 1,669 participants from eight countries across four continents were included in the study. Structural equation modeling was employed to quantitatively assess how various factors collectively contribute to users' willingness to use mHealth apps. The results indicate that users' digital literacy has the strongest impact on their willingness to use them, followed by their online habit of sharing personal information. Users' concerns about personal privacy only had a weak impact. Furthermore, users' demographic background, such as their country of residence, age, ethnicity, and education, has a significant moderating effect. Our findings have implications for app designers, healthcare practitioners, and policymakers. Efforts are needed to regulate data collection and sharing and promote digital literacy among the general population to facilitate the widespread adoption of mHealth apps.
Integrated Access and Backhaul in 5G with Aerial Distributed Unit using OpenAirInterface
Authors: Rakesh Mundlamuri, Omid Esrafilian, Rajeev Gangula, Rohan Kharade, Cedric Roux, Florian Kaltenberger, Raymond Knopp, David Gesbert
Subjects: Information Theory (cs.IT); Robotics (cs.RO)
Abstract
In this work, we demonstrate the Integrated Access and Backhaul (IAB) capabilities of an aerial robot offering 5G connectivity to ground users. The robot is integrated with a distributed unit (DU) and has 5G wireless backhaul access to a terrestrial central unit (CU). The CU-DU interface fully complies with the 3GPP defined F1 application protocol (F1AP). Such aerial robots can be instantiated and configured dynamically tailoring to the network demands. The complete radio and access network solution is based on open-source software from OpenAirInterface, and off-the-shelf commercial 5G mobile terminals. Experimental results illustrate throughput gains, coverage extension and dynamic adaptability nature of the aerial DU.
Autonomous Stabilization of Retinal Videos for Streamlining Assessment of Spontaneous Venous Pulsations
Abstract
Spontaneous retinal Venous Pulsations (SVP) are rhythmic changes in the caliber of the central retinal vein and are observed in the optic disc region (ODR) of the retina. Its absence is a critical indicator of various ocular or neurological abnormalities. Recent advances in imaging technology have enabled the development of portable smartphone-based devices for observing the retina and assessment of SVPs. However, the quality of smartphone-based retinal videos is often poor due to noise and image jitting, which in return, can severely obstruct the observation of SVPs. In this work, we developed a fully automated retinal video stabilization method that enables the examination of SVPs captured by various mobile devices. Specifically, we first propose an ODR Spatio-Temporal Localization (ODR-STL) module to localize visible ODR and remove noisy and jittering frames. Then, we introduce a Noise-Aware Template Matching (NATM) module to stabilize high-quality video segments at a fixed position in the field of view. After the processing, the SVPs can be easily observed in the stabilized videos, significantly facilitating user observations. Furthermore, our method is cost-effective and has been tested in both subjective and objective evaluations. Both of the evaluations support its effectiveness in facilitating the observation of SVPs. This can improve the timely diagnosis and treatment of associated diseases, making it a valuable tool for eye health professionals.
Toward Open Integrated Access and Backhaul with O-RAN
Abstract
Millimeter wave (mmWave) communications has been recently standardized for use in the fifth generation (5G) of cellular networks, fulfilling the promise of multi-gigabit mobile throughput of current and future mobile radio network generations. In this context, the network densification required to overcome the difficult mmWave propagation will result in increased deployment costs. Integrated Access and Backhaul (IAB) has been proposed as an effective mean of reducing densification costs by deploying a wireless mesh network of base stations, where backhaul and access transmissions share the same radio technology. However, IAB requires sophisticated control mechanisms to operate efficiently and address the increased complexity. The Open Radio Access Network (RAN) paradigm represents the ideal enabler of RAN intelligent control, but its current specifications are not compatible with IAB. In this work, we discuss the challenges of integrating IAB into the Open RAN ecosystem, detailing the required architectural extensions that will enable dynamic control of 5G IAB networks. We implement the proposed integrated architecture into the first publicly-available Open-RAN-enabled experimental framework, which allows prototyping and testing Open-RAN-based solutions over end-to-end 5G IAB networks. Finally, we validate the framework with both ideal and realistic deployment scenarios exploiting the large-scale testing capabilities of publicly available experimental platforms
Post-training Model Quantization Using GANs for Synthetic Data Generation
Authors: Athanasios Masouris, Mansi Sharma, Adrian Boguszewski, Alexander Kozlov, Zhuo Wu, Raymond Lo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Quantization is a widely adopted technique for deep neural networks to reduce the memory and computational resources required. However, when quantized, most models would need a suitable calibration process to keep their performance intact, which requires data from the target domain, such as a fraction of the dataset used in model training and model validation (i.e. calibration dataset). In this study, we investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method. We propose a data generation method based on Generative Adversarial Networks that are trained prior to the model quantization step. We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images. Overall, the results of our experiments demonstrate the potential of leveraging synthetic data for calibration during the quantization process. In our experiments, the percentage of accuracy degradation of the selected models was less than 0.6%, with our best performance achieved on MobileNetV2 (0.05%). The code is available at: https://github.com/ThanosM97/gsoc2022-openvino
Abstract
We study the problem of treasure hunt in a Euclidean plane by a mobile agent with the guidance of pebbles. The initial position of the agent and position of the treasure are modeled as special points in the Euclidean plane. The treasure is situated at a distance at most $D>0$ from the initial position of the agent. The agent has a perfect compass, but an adversary controls the speed of the agent. Hence, the agent can not measure how much distance it traveled for a given time. The agent can find the treasure only when it reaches the exact position of the treasure. The cost of the treasure hunt is defined as the total distance traveled by the agent before it finds the treasure. The agent has no prior knowledge of the position of the treasure or the value of $D$. An Oracle, which knows the treasure's position and the agent's initial location, places some pebbles to guide the agent towards the treasure. Once decided to move along some specified angular direction, the agent can decide to change its direction only when it encounters a pebble or a special point. We ask the following central question in this paper: ``For given $k \ge 0$, What is cheapest treasure hunt algorithm if at most $k$ pebbles are placed by the Oracle?" We show that for $k=1$, there does not exist any treasure hunt algorithm that finds the treasure with finite cost. We show the existence of an algorithm with cost $O(D)$ for $k=2$. For $k>8$ we have designed an algorithm that uses $k$ many pebbles to find the treasure with cost $O(k^{2}) + D(\sin\theta' + \cos\theta')$, where $\theta'=\frac{\pi}{2^{k-8}}$. The second result shows the existence of an algorithm with cost arbitrarily close to $D$ for sufficiently large values of $D$.
Computation-Efficient Backscatter-Blessed MEC with User Reciprocity
Authors: Bowen Gu, Hao Xie, Dong Li
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This letter proposes a new user cooperative offloading protocol called user reciprocity in backscatter communication (BackCom)-aided mobile edge computing systems with efficient computation, whose quintessence is that each user can switch alternately between the active or the BackCom mode in different slots, and one user works in the active mode and the other user works in the BackCom mode in each time slot. In particular, the user in the BackCom mode can always use the signal transmitted by the user in the active mode for more data transmission in a spectrum-sharing manner. To evaluate the proposed protocol, a computation efficiency (CE) maximization-based optimization problem is formulated by jointly power control, time scheduling, reflection coefficient adjustment, and computing frequency allocation, while satisfying various physical constraints on the maximum energy budget, the computing frequency threshold, the minimum computed bits, and harvested energy threshold. To solve this non-convex problem, Dinkelbach's method and quadratic transform are first employed to transform the complex fractional forms into linear ones. Then, an iterative algorithm is designed by decomposing the resulting problem to obtain the suboptimal solution. The closed-form solutions for the transmit power, the RC, and the local computing frequency are provided for more insights. Besides, the analytical performance gain with the reciprocal mode is also derived. Simulation results demonstrate that the proposed scheme outperforms benchmark schemes regarding the CE.
Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification
Abstract
This paper proposes a feedback mechanism to 'break bad habits' using the Pavlok device. Pavlok utilises beeps, vibration and shocks as a mode of aversion technique to help individuals with behaviour modification. While the device can be useful in certain periodic daily life situations, like alarms and exercise notifications, the device relies on manual operations that limit its usage. To this end, we design a user interface to generate an automatic feedback mechanism that integrates Pavlok and a deep learning based model to detect certain behaviours via an integrated user interface i.e. mobile or desktop application. Our proposed solution is implemented and verified in the context of snoring, which first detects audio from the environment following a prediction of whether the audio content is a snore or not. Based on the prediction of the deep learning model, we use Pavlok to alert users for preventive measures. We believe that this simple solution can help people to change their atomic habits, which may lead to long-term benefits.
Transformer-based model for monocular visual odometry: a video understanding approach
Authors: André O. Françani, Marcos R. O. A. Maximo
Abstract
Estimating the camera pose given images of a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and it often relies on geometric approaches that require engineering effort for a specific scenario. Deep learning methods have shown to be generalizable after proper training and a considerable amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6-DoF camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community.
Abstract
Locating a specific mobile application screen from existing repositories is restricted to basic keyword searches, such as Google Image Search, or necessitates a complete query screen image, as in the case of Swire. However, interactive partial sketch-based solutions like PSDoodle have limitations, including inaccuracy and an inability to consider text appearing on the screen. A potentially effective solution involves implementing a system that provides interactive partial sketching functionality for efficiently structuring user interface elements. Additionally, the system should incorporate text queries to enhance its capabilities further. Our approach, TpD, represents the pioneering effort to enable an iterative search of screens by combining interactive sketching and keyword search techniques. TpD is built on a combination of the Rico repository of approximately 58k Android app screens and the PSDoodle. Our evaluation with third-party software developers showed that PSDoodle provided higher top-10 screen retrieval accuracy than state-of-the-art Swire and required less time to complete a query than other interactive solutions.
Enabling Technologies for Programmable and Software-Defined Networks: Bolstering the Path Towards 6G
Authors: David Carrascal, Elisa Rojas, Diego Lopez-Pajares
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Although the complete scope of the sixth generation of mobile technologies (6G) is still unclear, the prominence of the Internet of Things (IoT) and Artificial Intelligence (AI) / Machine Learning (ML) in the networking field is undeniable. In this regard, key technology enablers for the previous generation, 5G, such as software-defined networking and network function virtualization, fall short to accomplish the stringent requirements envisioned for 6G verticals. This PhD thesis goes back to basics, by exploring missing functionality gaps in relation to these technologies, in order to provide the ''glue'' for holistic and fully-fledged networking solutions for 6G, aligned with standards and industry recommendations. Although ambitious, and in a very early stage, this PhD thesis illustrates an initial design for in-band control in Software-Defined Networking (SDN) that could facilitate the interoperability among constrained IoT devices. The current design demonstrates promising results in terms of resource-usage and robustness, which are pivotal features for constrained networks. Next steps include the integration of the approach with a real testbed comprised of constrained IoT devices and the implementation of a federated learning environment at the edge.
Waterberry Farms: A Novel Benchmark For Informative Path Planning
Authors: Samuel Matloob, Partha P. Datta, O. Patrick Kreidl, Ayan Dutta, Swapnoneel Roy, Ladislau Bölöni
Abstract
Recent developments in robotic and sensor hardware make data collection with mobile robots (ground or aerial) feasible and affordable to a wide population of users. The newly emergent applications, such as precision agriculture, weather damage assessment, or personal home security often do not satisfy the simplifying assumptions made by previous research: the explored areas have complex shapes and obstacles, multiple phenomena need to be sensed and estimated simultaneously and the measured quantities might change during observations. The future progress of path planning and estimation algorithms requires a new generation of benchmarks that provide representative environments and scoring methods that capture the demands of these applications. This paper describes the Waterberry Farms benchmark (WBF) that models a precision agriculture application at a Florida farm growing multiple crop types. The benchmark captures the dynamic nature of the spread of plant diseases and variations of soil humidity while the scoring system measures the performance of a given combination of a movement policy and an information model estimator. By benchmarking several examples of representative path planning and estimator algorithms, we demonstrate WBF's ability to provide insight into their properties and quantify future progress.
Non-Euclidean Motion Planning with Graphs of Geodesically-Convex Sets
Authors: Thomas Cohn, Mark Petersen, Max Simchowitz, Russ Tedrake
Abstract
Computing optimal, collision-free trajectories for high-dimensional systems is a challenging problem. Sampling-based planners struggle with the dimensionality, whereas trajectory optimizers may get stuck in local minima due to inherent nonconvexities in the optimization landscape. The use of mixed-integer programming to encapsulate these nonconvexities and find globally optimal trajectories has recently shown great promise, thanks in part to tight convex relaxations and efficient approximation strategies that greatly reduce runtimes. These approaches were previously limited to Euclidean configuration spaces, precluding their use with mobile bases or continuous revolute joints. In this paper, we handle such scenarios by modeling configuration spaces as Riemannian manifolds, and we describe a reduction procedure for the zero-curvature case to a mixed-integer convex optimization problem. We demonstrate our results on various robot platforms, including producing efficient collision-free trajectories for a PR2 bimanual mobile manipulator.
Keyword: pruning
Spectrum Breathing: Protecting Over-the-Air Federated Learning Against Interference
Authors: Zhanwei Wang, Kaibin Huang, Yonina C. Eldar
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Theory (cs.IT)
Abstract
Federated Learning (FL) is a widely embraced paradigm for distilling artificial intelligence from distributed mobile data. However, the deployment of FL in mobile networks can be compromised by exposure to interference from neighboring cells or jammers. Existing interference mitigation techniques require multi-cell cooperation or at least interference channel state information, which is expensive in practice. On the other hand, power control that treats interference as noise may not be effective due to limited power budgets, and also that this mechanism can trigger countermeasures by interference sources. As a practical approach for protecting FL against interference, we propose Spectrum Breathing, which cascades stochastic-gradient pruning and spread spectrum to suppress interference without bandwidth expansion. The cost is higher learning latency by exploiting the graceful degradation of learning speed due to pruning. We synchronize the two operations such that their levels are controlled by the same parameter, Breathing Depth. To optimally control the parameter, we develop a martingale-based approach to convergence analysis of Over-the-Air FL with spectrum breathing, termed AirBreathing FL. We show a performance tradeoff between gradient-pruning and interference-induced error as regulated by the breathing depth. Given receive SIR and model size, the optimization of the tradeoff yields two schemes for controlling the breathing depth that can be either fixed or adaptive to channels and the learning process. As shown by experiments, in scenarios where traditional Over-the-Air FL fails to converge in the presence of strong interference, AirBreahing FL with either fixed or adaptive breathing depth can ensure convergence where the adaptive scheme achieves close-to-ideal performance.
Keyword: voxel
Stochastic Texture Filtering
Authors: Marcos Fajardo, Bartlomiej Wronski, Marco Salvi, Matt Pharr
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
2D texture maps and 3D voxel arrays are widely used to add rich detail to the surfaces and volumes of rendered scenes, and filtered texture lookups are integral to producing high-quality imagery. We show that filtering textures after evaluating lighting, rather than before BSDF evaluation as is current practice, gives a more accurate solution to the rendering equation. These benefits are not merely theoretical, but are apparent in common cases. We further show that stochastically sampling texture filters is crucial for enabling this approach, which has not been possible previously except in limited cases. Stochastic texture filtering offers additional benefits, including efficient implementation of high-quality texture filters and efficient filtering of textures stored in compressed and sparse data structures, including neural representations. We demonstrate applications in both real-time and offline rendering and show that the additional stochastic error is minimal. Furthermore, this error is handled well by either spatiotemporal denoising or moderate pixel sampling rates.
Abstract
Recently, Transformer-based methods for point cloud learning have achieved good results on various point cloud learning benchmarks. However, since the attention mechanism needs to generate three feature vectors of query, key, and value to calculate attention features, most of the existing Transformer-based point cloud learning methods usually consume a large amount of computational time and memory resources when calculating global attention. To address this problem, we propose a Voxel-Transformer-Point (VTP) Block for extracting local and global features of point clouds. VTP combines the advantages of voxel-based, point-based and Transformer-based methods, which consists of Voxel-Based Branch (V branch), Point-Based Transformer Branch (PT branch) and Point-Based Branch (P branch). The V branch extracts the coarse-grained features of the point cloud through low voxel resolution; the PT branch obtains the fine-grained features of the point cloud by calculating the self-attention in the local neighborhood and the inter-neighborhood cross-attention; the P branch uses a simplified MLP network to generate the global location information of the point cloud. In addition, to enrich the local features of point clouds at different scales, we set the voxel scale in the V branch and the neighborhood sphere scale in the PT branch to one large and one small (large voxel scale \& small neighborhood sphere scale or small voxel scale \& large neighborhood sphere scale). Finally, we use VTP as the feature extraction network to construct a VTPNet for point cloud learning, and performs shape classification, part segmentation, and semantic segmentation tasks on the ModelNet40, ShapeNet Part, and S3DIS datasets. The experimental results indicate that VTPNet has good performance in 3D point cloud learning.
Keyword: lidar
DMNR: Unsupervised De-noising of Point Clouds Corrupted by Airborne Particles
Authors: Chu Chen, Yanqi Ma, Bingcheng Dong, Junjie Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
LiDAR sensors are critical for autonomous driving and robotics applications due to their ability to provide accurate range measurements and their robustness to lighting conditions. However, airborne particles, such as fog, rain, snow, and dust, will degrade its performance and it is inevitable to encounter these inclement environmental conditions outdoors. It would be a straightforward approach to remove them by supervised semantic segmentation. But annotating these particles point wisely is too laborious. To address this problem and enhance the perception under inclement conditions, we develop two dynamic filtering methods called Dynamic Multi-threshold Noise Removal (DMNR) and DMNR-H by accurate analysis of the position distribution and intensity characteristics of noisy points and clean points on publicly available WADS and DENSE datasets. Both DMNR and DMNR-H outperform state-of-the-art unsupervised methods by a significant margin on the two datasets and are slightly better than supervised deep learning-based methods. Furthermore, our methods are more robust to different LiDAR sensors and airborne particles, such as snow and fog.
A Multi-modal Approach to Single-modal Visual Place Classification
Authors: Tomoya Iwasaki, Kanji Tanaka, Kenta Tsukahara
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Visual place classification from a first-person-view monocular RGB image is a fundamental problem in long-term robot navigation. A difficulty arises from the fact that RGB image classifiers are often vulnerable to spatial and appearance changes and degrade due to domain shifts, such as seasonal, weather, and lighting differences. To address this issue, multi-sensor fusion approaches combining RGB and depth (D) (e.g., LIDAR, radar, stereo) have gained popularity in recent years. Inspired by these efforts in multimodal RGB-D fusion, we explore the use of pseudo-depth measurements from recently-developed techniques of ``domain invariant" monocular depth estimation as an additional pseudo depth modality, by reformulating the single-modal RGB image classification task as a pseudo multi-modal RGB-D classification problem. Specifically, a practical, fully self-supervised framework for training, appropriately processing, fusing, and classifying these two modalities, RGB and pseudo-D, is described. Experiments on challenging cross-domain scenarios using public NCLT datasets validate effectiveness of the proposed framework.
Keyword: diffusion
DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models
Authors: Žiga Babnik, Peter Peer, Vitomir Štruc
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Modern face recognition (FR) models excel in constrained scenarios, but often suffer from decreased performance when deployed in unconstrained (real-world) environments due to uncertainties surrounding the quality of the captured facial data. Face image quality assessment (FIQA) techniques aim to mitigate these performance degradations by providing FR models with sample-quality predictions that can be used to reject low-quality samples and reduce false match errors. However, despite steady improvements, ensuring reliable quality estimates across facial images with diverse characteristics remains challenging. In this paper, we present a powerful new FIQA approach, named DifFIQA, which relies on denoising diffusion probabilistic models (DDPM) and ensures highly competitive results. The main idea behind the approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because the diffusion-based perturbations are computationally expensive, we also distill the knowledge encoded in DifFIQA into a regression-based quality predictor, called DifFIQA(R), that balances performance and execution time. We evaluate both models in comprehensive experiments on 7 datasets, with 4 target FR models and against 10 state-of-the-art FIQA techniques with highly encouraging results. The source code will be made publicly available.
A positivity-preserving implicit-explicit scheme with high order polynomial basis for compressible Navier-Stokes equations
Abstract
In this paper, we are interested in constructing a scheme solving compressible Navier--Stokes equations, with desired properties including high order spatial accuracy, conservation, and positivity-preserving of density and internal energy under a standard hyperbolic type CFL constraint on the time step size, e.g., $\Delta t=\mathcal O(\Delta x)$. Strang splitting is used to approximate convection and diffusion operators separately. For the convection part, i.e., the compressible Euler equation, the high order accurate postivity-preserving Runge--Kutta discontinuous Galerkin method can be used. For the diffusion part, the equation of internal energy instead of the total energy is considered, and a first order semi-implicit time discretization is used for the ease of achieving positivity. A suitable interior penalty discontinuous Galerkin method for the stress tensor can ensure the conservation of momentum and total energy for any high order polynomial basis. In particular, positivity can be proven with $\Delta t=\mathcal{O}(\Delta x)$ if the Laplacian operator of internal energy is approximated by the $\mathbb{Q}^k$ spectral element method with $k=1,2,3$. So the full scheme with $\mathbb{Q}^k$ ($k=1,2,3$) basis is conservative and positivity-preserving with $\Delta t=\mathcal{O}(\Delta x)$, which is robust for demanding problems such as solutions with low density and low pressure induced by high-speed shock diffraction. Even though the full scheme is only first order accurate in time, numerical tests indicate that higher order polynomial basis produces much better numerical solutions, e.g., better resolution for capturing the roll-ups during shock reflection.
Comprehensive Dataset of Synthetic and Manipulated Overhead Imagery for Development and Evaluation of Forensic Tools
Authors: Brandon B. May, Kirill Trapeznikov, Shengbang Fang, Matthew C. Stamm
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Abstract
We present a first of its kind dataset of overhead imagery for development and evaluation of forensic tools. Our dataset consists of real, fully synthetic and partially manipulated overhead imagery generated from a custom diffusion model trained on two sets of different zoom levels and on two sources of pristine data. We developed our model to support controllable generation of multiple manipulation categories including fully synthetic imagery conditioned on real and generated base maps, and location. We also support partial in-painted imagery with same conditioning options and with several types of manipulated content. The data consist of raw images and ground truth annotations describing the manipulation parameters. We also report benchmark performance on several tasks supported by our dataset including detection of fully and partially manipulated imagery, manipulation localization and classification.
Text-guided High-definition Consistency Texture Model
Authors: Zhibin Tang, Tiantong He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
With the advent of depth-to-image diffusion models, text-guided generation, editing, and transfer of realistic textures are no longer difficult. However, due to the limitations of pre-trained diffusion models, they can only create low-resolution, inconsistent textures. To address this issue, we present the High-definition Consistency Texture Model (HCTM), a novel method that can generate high-definition and consistent textures for 3D meshes according to the text prompts. We achieve this by leveraging a pre-trained depth-to-image diffusion model to generate single viewpoint results based on the text prompt and a depth map. We fine-tune the diffusion model with Parameter-Efficient Fine-Tuning to quickly learn the style of the generated result, and apply the multi-diffusion strategy to produce high-resolution and consistent results from different viewpoints. Furthermore, we propose a strategy that prevents the appearance of noise on the textures caused by backpropagation. Our proposed approach has demonstrated promising results in generating high-definition and consistent textures for 3D meshes, as demonstrated through a series of experiments.
iEdit: Localised Text-guided Image Editing with Weak Supervision
Authors: Rumeysa Bodur, Erhan Gundogdu, Binod Bhattarai, Tae-Kyun Kim, Michael Donoser, Loris Bazzani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Diffusion models (DMs) can generate realistic images with text guidance using large-scale datasets. However, they demonstrate limited controllability in the output space of the generated images. We propose a novel learning method for text-guided image editing, namely \texttt{iEdit}, that generates images conditioned on a source image and a textual edit prompt. As a fully-annotated dataset with target images does not exist, previous approaches perform subject-specific fine-tuning at test time or adopt contrastive learning without a target image, leading to issues on preserving the fidelity of the source image. We propose to automatically construct a dataset derived from LAION-5B, containing pseudo-target images with their descriptive edit prompts given input image-caption pairs. This dataset gives us the flexibility of introducing a weakly-supervised loss function to generate the pseudo-target image from the latent noise of the source image conditioned on the edit prompt. To encourage localised editing and preserve or modify spatial structures in the image, we propose a loss function that uses segmentation masks to guide the editing during training and optionally at inference. Our model is trained on the constructed dataset with 200K samples and constrained GPU resources. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
Relightify: Relightable 3D Faces from a Single Image via Diffusion Models
Abstract
Following the remarkable success of diffusion models on image generation, recent works have also demonstrated their impressive ability to address a number of inverse problems in an unsupervised way, by properly constraining the sampling process based on a conditioning input. Motivated by this, in this paper, we present the first approach to use diffusion models as a prior for highly accurate 3D facial BRDF reconstruction from a single image. We start by leveraging a high-quality UV dataset of facial reflectance (diffuse and specular albedo and normals), which we render under varying illumination settings to simulate natural RGB textures and, then, train an unconditional diffusion model on concatenated pairs of rendered textures and reflectance components. At test time, we fit a 3D morphable model to the given image and unwrap the face in a partial UV texture. By sampling from the diffusion model, while retaining the observed texture part intact, the model inpaints not only the self-occluded areas but also the unknown reflectance components, in a single sequence of denoising steps. In contrast to existing methods, we directly acquire the observed texture from the input image, thus, resulting in more faithful and consistent reflectance estimation. Through a series of qualitative and quantitative comparisons, we demonstrate superior performance in both texture completion as well as reflectance reconstruction tasks.
Evaluating Twitter's Algorithmic Amplification of Low-Trust Content: An Observational Study
Abstract
Artificial intelligence powered recommender systems play a crucial role in determining the content that users are exposed to on social media platforms. However, the behavioural patterns of these systems are often opaque and non-replicable, complicating the evaluation of their impact on the dissemination and consumption of disinformation and misinformation.To begin addressing this evidence gap, this study presents a measurement approach that uses observed digital traces to infer the current status of algorithmic amplification of low-trust content on Twitter. Using an original dataset of 2.7 million posts on COVID-19 and climate change published on the platform in a 14-day period in January 2023, this study identifies tweets sharing information from domains rated as low-trust by IffyNews (n=74,467), and for each author, it compares the impressions of tweets containing low-trust information against an impressions benchmark generated from the author's timeline. To minimise the influence of factors traditionally understood to influence impressions, the data is grouped by tweet author, and the sample is stratified twice, controlling for the effect of tweet-level sentiment and engagement. Through this approach, it is possible to assess whether, for any given user, Twitter's recommender system amplifies the impressions - and as a consequence, the visibility - of potentially misleading information regardless of critical confounding factors. This analysis provides observational evidence on whether the Twitter algorithm favours the visibility of low-trust content, with results indicating that tweets containing low-trust URL domains perform significantly better than tweets that do not across all sample stratifications. This suggests that in its current form, Twitter's recommender system may be facilitating the diffusion of false content, even when originating from notoriously low-trust sources.
Crank-Nicolson schemes for sub-diffusion equations with nonsingular and singular source terms in time
Abstract
In this work, two Crank-Nicolson schemes without corrections are developed for sub-diffusion equations. First, we propose a Crank-Nicolson scheme without correction for problems with regularity assumptions only on the source term. Second, since the existing Crank-Nicolson schemes have a severe reduction of convergence order for solving sub-diffusion equations with singular source terms in time, we then extend our scheme and propose a new Crank-Nicolson scheme for problems with singular source terms in time. Second-order error estimates for both the two Crank-Nicolson schemes are rigorously established by a Laplace transform technique, which are numerically verified by some numerical examples.
Keyword: dynamic
QF-Geo: Capacity Aware Geographic Routing using Bounded Regions of Wireless Meshes
Authors: Yung-Fu Chen, Kenneth W. Parker, Anish Arora
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Routing in wireless meshes must detour around holes. Extant routing protocols often underperform in minimally connected networks where holes are larger and more frequent. Minimal density networks are common in practice due to deployment cost constraints, mobility dynamics, and/or adversarial jamming. Protocols that use global search to determine optimal paths incur search overhead that limits scaling. Conversely, protocols that use local search tend to find approximately optimal paths at higher densities due to the existence of geometrically direct routes but underperform as the connectivity lowers and regional (or global) information is required to address holes. Designing a routing protocol to achieve high throughput-latency performance across network densities, mobility, and interference dynamics remains challenging. This paper shows that, in a probabilistic setting, bounded exploration can be leveraged to mitigate this challenge. We show, first, that the length of shortest paths in networks with uniform random node distribution can, with high probability (whp), be bounded. Thus, whp a shortest path may be found by limiting exploration to an elliptic region whose size is a function of the network density and the Euclidean distance between the two endpoints. Second, we propose a geographic routing protocol that achieves high reliability and throughput-latency performance by forwarding packets within an ellipse whose size is bounded similarly and by an estimate of the available capacity. Our protocol, QF-Geo, selects forwarding relays within the elliptic region, prioritizing those with sufficient capacity to avoid bottlenecks. Our simulation results show that QF-Geo achieves high goodput efficiency and reliability in both static and mobile networks across both low and high densities, at large scales, with a wide range of concurrent flows, and in the presence of adversarial jamming.
Constrained reaction wheel desaturation and attitude control of spacecraft with four reaction wheels
Authors: Miguel Castroviejo-Fernandez, Ilya Kolmanovsky
Abstract
The paper addresses a problem of constrained spacecraft attitude stabilization with simultaneous reaction wheel (RW) desaturation. The spacecraft has a reaction wheel array (RWA) consisting of four RWs in a pyramidal configuration. The developments exploit a spacecraft dynamics model with gravity gradient torques. The linearized dynamics are shown to be controllable at almost all RWA configurations. Configurations that result in the highest Degree of Controllability are elucidated. A strategy that combines an incremental reference governor and time-distributed model predictive control is proposed to perform constrained RW desaturation at low computational cost. Simulation results of successful RW desaturation maneuvers subject to spacecraft pointing constraints, RW zero-speed crossing avoidance and limits on control moments are reported.
Constant Approximation for Network Revenue Management with Markovian-Correlated Customer Arrivals
Authors: Jiashuo Jiang
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Abstract
The Network Revenue Management (NRM) problem is a well-known challenge in dynamic decision-making under uncertainty. In this problem, fixed resources must be allocated to serve customers over a finite horizon, while customers arrive according to a stochastic process. The typical NRM model assumes that customer arrivals are independent over time. However, in this paper, we explore a more general setting where customer arrivals over different periods can be correlated. We propose a new model that assumes the existence of a system state, which determines customer arrivals for the current period. This system state evolves over time according to a time-inhomogeneous Markov chain. Our model can be used to represent correlation in various settings and synthesizes previous literature on correlation models. To solve the NRM problem under our correlated model, we derive a new linear programming (LP) approximation of the optimal policy. Our approximation provides a tighter upper bound on the total expected value collected by the optimal policy than existing upper bounds. We use our LP to develop a new bid price policy, which computes bid prices for each system state and time period in a backward induction manner. The decision is then made by comparing the reward of the customer against the associated bid prices. Our policy guarantees to collect at least $1/(1+L)$ fraction of the total reward collected by the optimal policy, where $L$ denotes the maximum number of resources required by a customer. In summary, our work presents a new model for correlated customer arrivals in the NRM problem and provides an LP approximation for solving the problem under this model. We derive a new bid price policy and provides a theoretical guarantee on the performance of the policy.
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Authors: Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanović, Borivoje Nikolić, Yakun Sophia Shao
Abstract
Driven by the wide adoption of deep neural networks (DNNs) across different application domains, multi-tenancy execution, where multiple DNNs are deployed simultaneously on the same hardware, has been proposed to satisfy the latency requirements of different applications while improving the overall system utilization. However, multi-tenancy execution could lead to undesired system-level resource contention, causing quality-of-service (QoS) degradation for latency-critical applications. To address this challenge, we propose MoCA, an adaptive multi-tenancy system for DNN accelerators. Unlike existing solutions that focus on compute resource partition, MoCA dynamically manages shared memory resources of co-located applications to meet their QoS targets. Specifically, MoCA leverages the regularities in both DNN operators and accelerators to dynamically modulate memory access rates based on their latency targets and user-defined priorities so that co-located applications get the resources they demand without significantly starving their co-runners. We demonstrate that MoCA improves the satisfaction rate of the service level agreement (SLA) up to 3.9x (1.8x average), system throughput by 2.3x (1.7x average), and fairness by 1.3x (1.2x average), compared to prior work.
Computational Optics for Mobile Terminals in Mass Production
Abstract
Correcting the optical aberrations and the manufacturing deviations of cameras is a challenging task. Due to the limitation on volume and the demand for mass production, existing mobile terminals cannot rectify optical degradation. In this work, we systematically construct the perturbed lens system model to illustrate the relationship between the deviated system parameters and the spatial frequency response measured from photographs. To further address this issue, an optimization framework is proposed based on this model to build proxy cameras from the machining samples' SFRs. Engaging with the proxy cameras, we synthetic data pairs, which encode the optical aberrations and the random manufacturing biases, for training the learning-based algorithms. In correcting aberration, although promising results have been shown recently with convolutional neural networks, they are hard to generalize to stochastic machining biases. Therefore, we propose a dilated Omni-dimensional dynamic convolution and implement it in post-processing to account for the manufacturing degradation. Extensive experiments which evaluate multiple samples of two representative devices demonstrate that the proposed optimization framework accurately constructs the proxy camera. And the dynamic processing model is well-adapted to manufacturing deviations of different cameras, realizing perfect computational photography. The evaluation shows that the proposed method bridges the gap between optical design, system machining, and post-processing pipeline, shedding light on the joint of image signal reception (lens and sensor) and image signal processing.
Mixture of personality improved Spiking actor network for efficient multi-agent cooperation
Abstract
Adaptive human-agent and agent-agent cooperation are becoming more and more critical in the research area of multi-agent reinforcement learning (MARL), where remarked progress has been made with the help of deep neural networks. However, many established algorithms can only perform well during the learning paradigm but exhibit poor generalization during cooperation with other unseen partners. The personality theory in cognitive psychology describes that humans can well handle the above cooperation challenge by predicting others' personalities first and then their complex actions. Inspired by this two-step psychology theory, we propose a biologically plausible mixture of personality (MoP) improved spiking actor network (SAN), whereby a determinantal point process is used to simulate the complex formation and integration of different types of personality in MoP, and dynamic and spiking neurons are incorporated into the SAN for the efficient reinforcement learning. The benchmark Overcooked task, containing a strong requirement for cooperative cooking, is selected to test the proposed MoP-SAN. The experimental results show that the MoP-SAN can achieve both high performances during not only the learning paradigm but also the generalization test (i.e., cooperation with other unseen agents) paradigm where most counterpart deep actor networks failed. Necessary ablation experiments and visualization analyses were conducted to explain why MoP and SAN are effective in multi-agent reinforcement learning scenarios while DNN performs poorly in the generalization test.
Collaborative Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud Network
Abstract
Kubernetes (k8s) has the potential to coordinate distributed edge resources and centralized cloud resources, but currently lacks a specialized scheduling framework for edge-cloud networks. Besides, the hierarchical distribution of heterogeneous resources makes the modeling and scheduling of k8s-oriented edge-cloud network particularly challenging. In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud network to improve the long-term throughput rate of request processing. First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch and dynamic dispatch spaces within the edge cluster. Second, for diverse system scales and structures, we use graph neural networks to embed system state information, and combine the embedding results with multiple policy networks to reduce the orchestration dimensionality by stepwise scheduling. Finally, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration, and present the implementation design of deploying the above algorithms compatible with native k8s components. Experiments using real workload traces show that KaiS can successfully learn appropriate scheduling policies, irrespective of request arrival patterns and system scales. Moreover, KaiS can enhance the average system throughput rate by 15.9% while reducing scheduling cost by 38.4% compared to baselines.
MDD-Enabled Two-Tier Terahertz Fronthaul in Indoor Industrial Cell-Free Massive MIMO
Authors: Bohan Li, Diego Dupleich, Guoqing Xia, Huiyu Zhou, Yue Zhang, Pei Xiao, Lie-Liang Yang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
To make indoor industrial cell-free massive multiple-input multiple-output (CF-mMIMO) networks free from wired fronthaul, this paper studies a multicarrier-division duplex (MDD)-enabled two-tier terahertz (THz) fronthaul scheme. More specifically, two layers of fronthaul links rely on the mutually orthogonal subcarreir sets in the same THz band, while access links are implemented over sub-6G band. The proposed scheme leads to a complicated mixed-integer nonconvex optimization problem incorporating access point (AP) clustering, device selection, the assignment of subcarrier sets between two fronthaul links and the resource allocation at both the central processing unit (CPU) and APs. In order to address the formulated problem, we first resort to the low-complexity but efficient heuristic methods thereby relaxing the binary variables. Then, the overall end-to-end rate is obtained by iteratively optimizing the assignment of subcarrier sets and the number of AP clusters. Furthermore, an advanced MDD frame structure consisting of three parallel data streams is tailored for the proposed scheme. Simulation results demonstrate the effectiveness of the proposed dynamic AP clustering approach in dealing with the varying sizes of networks. Moreover, benefiting from the well-designed frame structure, MDD is capable of outperforming TDD in the two-tier fronthaul networks. Additionally, the effect of the THz bandwidth on system performance is analyzed, and it is shown that with sufficient frequency resources, our proposed two-tier fully-wireless fronthaul scheme can achieve a comparable performance to the fiber-optic based systems. Finally, the superiority of the proposed MDD-enabled fronthaul scheme is verified in a practical scenario with realistic ray-tracing simulations.
Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models
Abstract
A medical provider's summary of a patient visit serves several critical purposes, including clinical decision-making, facilitating hand-offs between providers, and as a reference for the patient. An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue, despite the complexity of patient-generated language. Even minor inaccuracies in visit summaries (for example, summarizing "patient does not have a fever" when a fever is present) can be detrimental to the outcome of care for the patient. This paper tackles the problem of medical conversation summarization by discretizing the task into several smaller dialogue-understanding tasks that are sequentially built upon. First, we identify medical entities and their affirmations within the conversation to serve as building blocks. We study dynamically constructing few-shot prompts for tasks by conditioning on relevant patient information and use GPT-3 as the backbone for our experiments. We also develop GPT-derived summarization metrics to measure performance against reference summaries quantitatively. Both our human evaluation study and metrics for medical correctness show that summaries generated using this approach are clinically accurate and outperform the baseline approach of summarizing the dialog in a zero-shot, single-prompt setting.
Integrated Access and Backhaul in 5G with Aerial Distributed Unit using OpenAirInterface
Authors: Rakesh Mundlamuri, Omid Esrafilian, Rajeev Gangula, Rohan Kharade, Cedric Roux, Florian Kaltenberger, Raymond Knopp, David Gesbert
Subjects: Information Theory (cs.IT); Robotics (cs.RO)
Abstract
In this work, we demonstrate the Integrated Access and Backhaul (IAB) capabilities of an aerial robot offering 5G connectivity to ground users. The robot is integrated with a distributed unit (DU) and has 5G wireless backhaul access to a terrestrial central unit (CU). The CU-DU interface fully complies with the 3GPP defined F1 application protocol (F1AP). Such aerial robots can be instantiated and configured dynamically tailoring to the network demands. The complete radio and access network solution is based on open-source software from OpenAirInterface, and off-the-shelf commercial 5G mobile terminals. Experimental results illustrate throughput gains, coverage extension and dynamic adaptability nature of the aerial DU.
DMNR: Unsupervised De-noising of Point Clouds Corrupted by Airborne Particles
Authors: Chu Chen, Yanqi Ma, Bingcheng Dong, Junjie Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
LiDAR sensors are critical for autonomous driving and robotics applications due to their ability to provide accurate range measurements and their robustness to lighting conditions. However, airborne particles, such as fog, rain, snow, and dust, will degrade its performance and it is inevitable to encounter these inclement environmental conditions outdoors. It would be a straightforward approach to remove them by supervised semantic segmentation. But annotating these particles point wisely is too laborious. To address this problem and enhance the perception under inclement conditions, we develop two dynamic filtering methods called Dynamic Multi-threshold Noise Removal (DMNR) and DMNR-H by accurate analysis of the position distribution and intensity characteristics of noisy points and clean points on publicly available WADS and DENSE datasets. Both DMNR and DMNR-H outperform state-of-the-art unsupervised methods by a significant margin on the two datasets and are slightly better than supervised deep learning-based methods. Furthermore, our methods are more robust to different LiDAR sensors and airborne particles, such as snow and fog.
On Riccati contraction in time-varying linear-quadratic control
Authors: Jintao Sun, Michael Cantoni
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Contraction properties of the Riccati operator are studied within the context of non-stationary linear-quadratic optimal control. A lifting approach is used to obtain a bound on the rate of strict contraction, with respect to the Riemannian metric, across a sufficient number of iterations. This number of iterations is related to an assumed uniform controllability and observability property of the dynamics and stage-cost in the original formulation of the problem.
Safe motion planning with environment uncertainty
Authors: Antony Thomas, Fulvio Mastrogiovanni, Marco Baglietto
Abstract
We present an approach for safe motion planning under robot state and environment (obstacle and landmark location) uncertainties. To this end, we first develop an approach that accounts for the landmark uncertainties during robot localization. Existing planning approaches assume that the landmark locations are well known or are known with little uncertainty. However, this might not be true in practice. Noisy sensors and imperfect motions compound to the errors originating from the estimate of environment features. Moreover, possible occlusions and dynamic objects in the environment render imperfect landmark estimation. Consequently, not considering this uncertainty can wrongly localize the robot, leading to inefficient plans. Our approach thus incorporates the landmark uncertainty within the Bayes filter estimation framework. We also analyze the effect of considering this uncertainty and delineate the conditions under which it can be ignored. Second, we extend the state-of-the-art by computing an exact expression for the collision probability under Gaussian distributed robot motion, perception and obstacle location uncertainties. We formulate the collision probability process as a quadratic form in random variables. Under Gaussian distribution assumptions, an exact expression for collision probability is thus obtained which is computable in real-time. In contrast, existing approaches approximate the collision probability using upper-bounds that can lead to overly conservative estimate and thereby suboptimal plans. We demonstrate and evaluate our approach using a theoretical example and simulations. We also present a comparison of our approach to different state-of-the-art methods.
Toward Open Integrated Access and Backhaul with O-RAN
Abstract
Millimeter wave (mmWave) communications has been recently standardized for use in the fifth generation (5G) of cellular networks, fulfilling the promise of multi-gigabit mobile throughput of current and future mobile radio network generations. In this context, the network densification required to overcome the difficult mmWave propagation will result in increased deployment costs. Integrated Access and Backhaul (IAB) has been proposed as an effective mean of reducing densification costs by deploying a wireless mesh network of base stations, where backhaul and access transmissions share the same radio technology. However, IAB requires sophisticated control mechanisms to operate efficiently and address the increased complexity. The Open Radio Access Network (RAN) paradigm represents the ideal enabler of RAN intelligent control, but its current specifications are not compatible with IAB. In this work, we discuss the challenges of integrating IAB into the Open RAN ecosystem, detailing the required architectural extensions that will enable dynamic control of 5G IAB networks. We implement the proposed integrated architecture into the first publicly-available Open-RAN-enabled experimental framework, which allows prototyping and testing Open-RAN-based solutions over end-to-end 5G IAB networks. Finally, we validate the framework with both ideal and realistic deployment scenarios exploiting the large-scale testing capabilities of publicly available experimental platforms
A Classification of Feedback Loops and Their Relation to Biases in Automated Decision-Making Systems
Authors: Nicolò Pagan, Joachim Baumann, Ezzat Elokda, Giulia De Pasquale, Saverio Bolognani, Anikó Hannák
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract
Prediction-based decision-making systems are becoming increasingly prevalent in various domains. Previous studies have demonstrated that such systems are vulnerable to runaway feedback loops, e.g., when police are repeatedly sent back to the same neighborhoods regardless of the actual rate of criminal activity, which exacerbate existing biases. In practice, the automated decisions have dynamic feedback effects on the system itself that can perpetuate over time, making it difficult for short-sighted design choices to control the system's evolution. While researchers started proposing longer-term solutions to prevent adverse outcomes (such as bias towards certain groups), these interventions largely depend on ad hoc modeling assumptions and a rigorous theoretical understanding of the feedback dynamics in ML-based decision-making systems is currently missing. In this paper, we use the language of dynamical systems theory, a branch of applied mathematics that deals with the analysis of the interconnection of systems with dynamic behaviors, to rigorously classify the different types of feedback loops in the ML-based decision-making pipeline. By reviewing existing scholarly work, we show that this classification covers many examples discussed in the algorithmic fairness community, thereby providing a unifying and principled framework to study feedback loops. By qualitative analysis, and through a simulation example of recommender systems, we show which specific types of ML biases are affected by each type of feedback loop. We find that the existence of feedback loops in the ML-based decision-making pipeline can perpetuate, reinforce, or even reduce ML biases.
XMI-ICU: Explainable Machine Learning Model for Pseudo-Dynamic Prediction of Mortality in the ICU for Heart Attack Patients
Authors: Munib Mesinovic, Peter Watkinson, Tingting Zhu
Abstract
Heart attack remain one of the greatest contributors to mortality in the United States and globally. Patients admitted to the intensive care unit (ICU) with diagnosed heart attack (myocardial infarction or MI) are at higher risk of death. In this study, we use two retrospective cohorts extracted from the eICU and MIMIC-IV databases, to develop a novel pseudo-dynamic machine learning framework for mortality prediction in the ICU with interpretability and clinical risk analysis. The method provides accurate prediction for ICU patients up to 24 hours before the event and provide time-resolved interpretability results. The performance of the framework relying on extreme gradient boosting was evaluated on a held-out test set from eICU, and externally validated on the MIMIC-IV cohort using the most important features identified by time-resolved Shapley values achieving AUCs of 91.0 (balanced accuracy of 82.3) for 6-hour prediction of mortality respectively. We show that our framework successfully leverages time-series physiological measurements by translating them into stacked static prediction problems to be robustly predictive through time in the ICU stay and can offer clinical insight from time-resolved interpretability
Joint Falsification and Fidelity Settings Optimization for Validation of Safety-Critical Systems: A Theoretical Analysis
Abstract
Safety validation is a crucial component in the development and deployment of autonomous systems, such as self-driving vehicles and robotic systems. Ensuring safe operation necessitates extensive testing and verification of control policies, typically conducted in simulation environments. High-fidelity simulators accurately model real-world dynamics but entail high computational costs, limiting their scalability for exhaustive testing. Conversely, low-fidelity simulators offer efficiency but may not capture the intricacies of high-fidelity simulators, potentially yielding false conclusions. We propose a joint falsification and fidelity optimization framework for safety validation of autonomous systems. Our mathematical formulation combines counterexample searches with simulator fidelity improvement, facilitating more efficient exploration of the critical environmental configurations challenging the control system. Our contributions encompass a set of theorems addressing counterexample sensitivity analysis, sample complexity, convergence, the interplay between the outer and inner optimization loops, and regret bound analysis. The proposed joint optimization approach enables a more targeted and efficient testing process, optimizes the use of available computational resources, and enhances confidence in autonomous system safety validation.
FedDWA: Personalized Federated Learning with Online Weight Adjustment
Abstract
Different from conventional federated learning, personalized federated learning (PFL) is able to train a customized model for each individual client according to its unique requirement. The mainstream approach is to adopt a kind of weighted aggregation method to generate personalized models, in which weights are determined by the loss value or model parameters among different clients. However, such kinds of methods require clients to download others' models. It not only sheer increases communication traffic but also potentially infringes data privacy. In this paper, we propose a new PFL algorithm called \emph{FedDWA (Federated Learning with Dynamic Weight Adjustment)} to address the above problem, which leverages the parameter server (PS) to compute personalized aggregation weights based on collected models from clients. In this way, FedDWA can capture similarities between clients with much less communication overhead. More specifically, we formulate the PFL problem as an optimization problem by minimizing the distance between personalized models and guidance models, so as to customize aggregation weights for each client. Guidance models are obtained by the local one-step ahead adaptation on individual clients. Finally, we conduct extensive experiments using five real datasets and the results demonstrate that FedDWA can significantly reduce the communication traffic and achieve much higher model accuracy than the state-of-the-art approaches.
Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery
Authors: Bingchen Zhao, Xin Wen, Kai Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we address the problem of generalized category discovery (GCD), \ie, given a set of images where part of them are labelled and the rest are not, the task is to automatically cluster the images in the unlabelled data, leveraging the information from the labelled data, while the unlabelled data contain images from the labelled classes and also new ones. GCD is similar to semi-supervised learning (SSL) but is more realistic and challenging, as SSL assumes all the unlabelled images are from the same classes as the labelled ones. We also do not assume the class number in the unlabelled data is known a-priori, making the GCD problem even harder. To tackle the problem of GCD without knowing the class number, we propose an EM-like framework that alternates between representation learning and class number estimation. We propose a semi-supervised variant of the Gaussian Mixture Model (GMM) with a stochastic splitting and merging mechanism to dynamically determine the prototypes by examining the cluster compactness and separability. With these prototypes, we leverage prototypical contrastive learning for representation learning on the partially labelled data subject to the constraints imposed by the labelled data. Our framework alternates between these two steps until convergence. The cluster assignment for an unlabelled instance can then be retrieved by identifying its nearest prototype. We comprehensively evaluate our framework on both generic image classification datasets and challenging fine-grained object recognition datasets, achieving state-of-the-art performance.
Conversational Semantic Parsing using Dynamic Context Graphs
Authors: Parag Jain, Mirella Lapata
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
In this paper we consider the task of conversational semantic parsing over general purpose knowledge graphs (KGs) with millions of entities, and thousands of relation-types. We are interested in developing models capable of interactively mapping user utterances into executable logical forms (e.g., SPARQL) in the context of the conversational history. Our key idea is to represent information about an utterance and its context via a subgraph which is created dynamically, i.e., the number of nodes varies per utterance. Moreover, rather than treating the subgraph as a sequence we exploit its underlying structure, and thus encode it using a graph neural network which further allows us to represent a large number of (unseen) nodes. Experimental results show that modeling context dynamically is superior to static approaches, delivering performance improvements across the board (i.e., for simple and complex questions). Our results further confirm that modeling the structure of context is better at processing discourse information, (i.e., at handling ellipsis and resolving coreference) and longer interactions.
Abstract
The Multi-Object Navigation (MultiON) task requires a robot to localize an instance (each) of multiple object classes. It is a fundamental task for an assistive robot in a home or a factory. Existing methods for MultiON have viewed this as a direct extension of Object Navigation (ON), the task of localising an instance of one object class, and are pre-sequenced, i.e., the sequence in which the object classes are to be explored is provided in advance. This is a strong limitation in practical applications characterized by dynamic changes. This paper describes a deep reinforcement learning framework for sequence-agnostic MultiON based on an actor-critic architecture and a suitable reward specification. Our framework leverages past experiences and seeks to reward progress toward individual as well as multiple target object classes. We use photo-realistic scenes from the Gibson benchmark dataset in the AI Habitat 3D simulation environment to experimentally show that our method performs better than a pre-sequenced approach and a state of the art ON method extended to MultiON.
Parametric Dynamic Mode Decomposition for nonlinear parametric dynamical systems
Authors: Shuwen Sun, Lihong Feng, Hoon Seng Chan, Tamara Miličić, Tanja Vidaković-Koch, Peter Benner
Abstract
A non-intrusive model order reduction (MOR) method that combines features of the dynamic mode decomposition (DMD) and the radial basis function (RBF) network is proposed to predict the dynamics of parametric nonlinear systems. In many applications, we have limited access to the information of the whole system, which motivates non-intrusive model reduction. One bottleneck is capturing the dynamics of the solution without knowing the physics inside the ``black-box'' system. DMD is a powerful tool to mimic the dynamics of the system and give a reliable approximation of the solution in the time domain using only the dominant DMD modes. However, DMD cannot reproduce the parametric behavior of the dynamics. Our contribution focuses on extending DMD to parametric DMD by RBF interpolation. Specifically, an RBF network is first trained using snapshot matrices at limited parameter samples. The snapshot matrix at any new parameter sample can be quickly learned from the RBF network. DMD will use the newly generated snapshot matrix at the online stage to predict the time patterns of the dynamics corresponding to the new parameter sample. The proposed framework and algorithm are tested and validated by numerical examples including models with parametrized and time-varying inputs.
Learning in a Single Domain for Non-Stationary Multi-Texture Synthesis
Abstract
This paper aims for a new generation task: non-stationary multi-texture synthesis, which unifies synthesizing multiple non-stationary textures in a single model. Most non-stationary textures have large scale variance and can hardly be synthesized through one model. To combat this, we propose a multi-scale generator to capture structural patterns of various scales and effectively synthesize textures with a minor cost. However, it is still hard to handle textures of different categories with different texture patterns. Therefore, we present a category-specific training strategy to focus on learning texture pattern of a specific domain. Interestingly, once trained, our model is able to produce multi-pattern generations with dynamic variations without the need to finetune the model for different styles. Moreover, an objective evaluation metric is designed for evaluating the quality of texture expansion and global structure consistency. To our knowledge, ours is the first scheme for this challenging task, including model, training, and evaluation. Experimental results demonstrate the proposed method achieves superior performance and time efficiency. The code will be available after the publication.
Waterberry Farms: A Novel Benchmark For Informative Path Planning
Authors: Samuel Matloob, Partha P. Datta, O. Patrick Kreidl, Ayan Dutta, Swapnoneel Roy, Ladislau Bölöni
Abstract
Recent developments in robotic and sensor hardware make data collection with mobile robots (ground or aerial) feasible and affordable to a wide population of users. The newly emergent applications, such as precision agriculture, weather damage assessment, or personal home security often do not satisfy the simplifying assumptions made by previous research: the explored areas have complex shapes and obstacles, multiple phenomena need to be sensed and estimated simultaneously and the measured quantities might change during observations. The future progress of path planning and estimation algorithms requires a new generation of benchmarks that provide representative environments and scoring methods that capture the demands of these applications. This paper describes the Waterberry Farms benchmark (WBF) that models a precision agriculture application at a Florida farm growing multiple crop types. The benchmark captures the dynamic nature of the spread of plant diseases and variations of soil humidity while the scoring system measures the performance of a given combination of a movement policy and an information model estimator. By benchmarking several examples of representative path planning and estimator algorithms, we demonstrate WBF's ability to provide insight into their properties and quantify future progress.
Deep Reinforcement Learning Based Resource Allocation for Cloud Native Wireless Network
Authors: Lin Wang, Jiasheng Wu, Yue Gao, Jingjing Zhang
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Abstract
Cloud native technology has revolutionized 5G beyond and 6G communication networks, offering unprecedented levels of operational automation, flexibility, and adaptability. However, the vast array of cloud native services and applications presents a new challenge in resource allocation for dynamic cloud computing environments. To tackle this challenge, we investigate a cloud native wireless architecture that employs container-based virtualization to enable flexible service deployment. We then study two representative use cases: network slicing and Multi-Access Edge Computing. To optimize resource allocation in these scenarios, we leverage deep reinforcement learning techniques and introduce two model-free algorithms capable of monitoring the network state and dynamically training allocation policies. We validate the effectiveness of our algorithms in a testbed developed using Free5gc. Our findings demonstrate significant improvements in network efficiency, underscoring the potential of our proposed techniques in unlocking the full potential of cloud native wireless networks.
HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion
Authors: Mustafa Işık, Martin Rünz, Markos Georgopoulos, Taras Khakhulin, Jonathan Starck, Lourdes Agapito, Matthias Nießner
Abstract
Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints. Our novel representation acts as a dynamic video encoding that captures fine details at high compression rates by factorizing space-time into a temporal matrix-vector decomposition. This allows us to obtain temporally coherent reconstructions of human actors for long sequences, while representing high-resolution details even in the context of challenging motion. While most research focuses on synthesizing at resolutions of 4MP or lower, we address the challenge of operating at 12MP. To this end, we introduce ActorsHQ, a novel multi-view dataset that provides 12MP footage from 160 cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions. We demonstrate challenges that emerge from using such high-resolution data and show that our newly introduced HumanRF effectively leverages this data, making a significant step towards production-level quality novel view synthesis.
Keyword: efficient
Parallel External Sorting of ASCII Records Using Learned Models
$2 * n$ is better than $n^2$: Decomposing Event Coreference Resolution into Two Tractable Problems
DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors
Graph-Based Reductions for Parametric and Weighted MDPs
Multi-Object Self-Supervised Depth Denoising
Stochastic Texture Filtering
Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models
Singularity swapping method for nearly singular integrals based on trapezoidal rule
Hybrid hyperinterpolation over general regions
Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs
RNNS: Representation Nearest Neighbor Search Black-Box Attack on Code Models
Mixture of personality improved Spiking actor network for efficient multi-agent cooperation
Text-guided High-definition Consistency Texture Model
Revisiting Fully Homomorphic Encryption Schemes
Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers
Fast Distributed Inference Serving for Large Language Models
Fast Event-based Double Integral for Real-time Robotics
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
MDD-Enabled Two-Tier Terahertz Fronthaul in Indoor Industrial Cell-Free Massive MIMO
Improving the performance of classical linear algebra iterative methods via hybrid parallelism
Safe motion planning with environment uncertainty
Stochastic Chemical Reaction Networks for MAP Detection in Cellular Receivers
The Robustness of Computer Vision Models against Common Corruptions: a Survey
Brain Tumor Detection using Swin Transformers
Scalable orthogonal delay-division multiplexed OEO artificial neural network
Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction
Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods
Toward Open Integrated Access and Backhaul with O-RAN
Compressing neural network by tensor network with exponentially fewer variational parameters
Computation-Efficient Backscatter-Blessed MEC with User Reciprocity
Access-Redundancy Tradeoffs in Quantized Linear Computations
Joint Falsification and Fidelity Settings Optimization for Validation of Safety-Critical Systems: A Theoretical Analysis
Feature Expansion for Graph Neural Networks
Searching Mobile App Screens via Text + Doodle
Concentric Tube Robot Redundancy Resolution via Velocity/Compliance Manipulability Optimization
Privacy-Preserving Prompt Tuning for Large Language Model Services
A Joint Python/C++ Library for Efficient yet Accessible Black-Box and Gray-Box Optimization with GOMEA
Embedded Feature Correlation Optimization with Specific Parameter Initialization for 2D/3D Registration
Uncertainty Quantification of a Wind Tunnel-Informed Stochastic Wind Load Model for Wind Engineering Applications
Pseudo-reversing and its application for multiscaling of manifold-valued data
Optimal Eventual Byzantine Agreement Protocols with Omission Failures
FedPDD: A Privacy-preserving Double Distillation Framework for Cross-silo Federated Recommendation
Vertical Federated Learning over Cloud-RAN: Convergence Analysis and System Optimization
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Korean Named Entity Recognition Based on Language-Specific Features
Prior Global Search Stability on Finite Graphs with Uncertainty. May Greedy Search Win?
Generalized Stratified Sampling for Efficient Reliability Assessment of Structures Against Natural Hazards
Non-Euclidean Motion Planning with Graphs of Geodesically-Convex Sets
RECKONING: Reasoning through Dynamic Knowledge Encoding
Keyword: faster
DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors
Universal Matrix Sparsifiers and Fast Deterministic Algorithms for Linear Algebra
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs
Acceleration of FM-index Queries Through Prefix-free Parsing
Fast Distributed Inference Serving for Large Language Models
Scalable Demand-Driven Call Graph Generation for Python
A Neural Emulator for Uncertainty Estimation of Fire Propagation
Keyword: mobile
QF-Geo: Capacity Aware Geographic Routing using Bounded Regions of Wireless Meshes
Robot Gaze During Autonomous Navigation and its Effect on Social Presence
Optical Aberration Correction in Postprocessing using Imaging Simulation
Computational Optics for Mobile Terminals in Mass Production
Spectrum Breathing: Protecting Over-the-Air Federated Learning Against Interference
A Comprehensive Picture of Factors Affecting User Willingness to Use Mobile Health Applications
Integrated Access and Backhaul in 5G with Aerial Distributed Unit using OpenAirInterface
Autonomous Stabilization of Retinal Videos for Streamlining Assessment of Spontaneous Venous Pulsations
Toward Open Integrated Access and Backhaul with O-RAN
Post-training Model Quantization Using GANs for Synthetic Data Generation
Pebble guided Treasure Hunt in Plane
Computation-Efficient Backscatter-Blessed MEC with User Reciprocity
Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification
Transformer-based model for monocular visual odometry: a video understanding approach
Searching Mobile App Screens via Text + Doodle
Enabling Technologies for Programmable and Software-Defined Networks: Bolstering the Path Towards 6G
Waterberry Farms: A Novel Benchmark For Informative Path Planning
Non-Euclidean Motion Planning with Graphs of Geodesically-Convex Sets
Keyword: pruning
Spectrum Breathing: Protecting Over-the-Air Federated Learning Against Interference
Keyword: voxel
Stochastic Texture Filtering
VTPNet for 3D deep learning on point cloud
Keyword: lidar
DMNR: Unsupervised De-noising of Point Clouds Corrupted by Airborne Particles
A Multi-modal Approach to Single-modal Visual Place Classification
Keyword: diffusion
DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models
A positivity-preserving implicit-explicit scheme with high order polynomial basis for compressible Navier-Stokes equations
Comprehensive Dataset of Synthetic and Manipulated Overhead Imagery for Development and Evaluation of Forensic Tools
Text-guided High-definition Consistency Texture Model
iEdit: Localised Text-guided Image Editing with Weak Supervision
Relightify: Relightable 3D Faces from a Single Image via Diffusion Models
Evaluating Twitter's Algorithmic Amplification of Low-Trust Content: An Observational Study
Crank-Nicolson schemes for sub-diffusion equations with nonsingular and singular source terms in time
Keyword: dynamic
QF-Geo: Capacity Aware Geographic Routing using Bounded Regions of Wireless Meshes
Constrained reaction wheel desaturation and attitude control of spacecraft with four reaction wheels
Constant Approximation for Network Revenue Management with Markovian-Correlated Customer Arrivals
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Computational Optics for Mobile Terminals in Mass Production
Mixture of personality improved Spiking actor network for efficient multi-agent cooperation
Collaborative Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud Network
MDD-Enabled Two-Tier Terahertz Fronthaul in Indoor Industrial Cell-Free Massive MIMO
Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models
Integrated Access and Backhaul in 5G with Aerial Distributed Unit using OpenAirInterface
DMNR: Unsupervised De-noising of Point Clouds Corrupted by Airborne Particles
On Riccati contraction in time-varying linear-quadratic control
Safe motion planning with environment uncertainty
Toward Open Integrated Access and Backhaul with O-RAN
A Classification of Feedback Loops and Their Relation to Biases in Automated Decision-Making Systems
XMI-ICU: Explainable Machine Learning Model for Pseudo-Dynamic Prediction of Mortality in the ICU for Heart Attack Patients
Joint Falsification and Fidelity Settings Optimization for Validation of Safety-Critical Systems: A Theoretical Analysis
FedDWA: Personalized Federated Learning with Online Weight Adjustment
Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery
Conversational Semantic Parsing using Dynamic Context Graphs
Sequence-Agnostic Multi-Object Navigation
Parametric Dynamic Mode Decomposition for nonlinear parametric dynamical systems
Learning in a Single Domain for Non-Stationary Multi-Texture Synthesis
Waterberry Farms: A Novel Benchmark For Informative Path Planning
Deep Reinforcement Learning Based Resource Allocation for Cloud Native Wireless Network
HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion