Abstract
Attackers are now using sophisticated techniques, like polymorphism, to change the attack pattern for each new attack. Thus, the detection of novel attacks has become the biggest challenge for cyber experts and researchers. Recently, anomaly and hybrid approaches are used for the detection of network attacks. Detecting novel attacks, on the other hand, is a key enabler for a wide range of IoT applications. Novel attacks can easily evade existing signature-based detection methods and are extremely difficult to detect, even going undetected for years. Existing machine learning models have also failed to detect the attack and have a high rate of false positives. In this paper, a rule-based deep neural network technique has been proposed as a framework for addressing the problem of detecting novel attacks. The designed framework significantly improves respective benchmark results, including the CICIDS 2017 dataset. The experimental results show that the proposed model keeps a good balance between attack detection, untruthful positive rates, and untruthful negative rates. For novel attacks, the model has an accuracy of more than 99%. During the automatic interaction between network-devices (IoT), security and privacy are the primary obstacles. Our proposed method can handle these obstacles efficiently and finally identify, and classify the different levels of threats.
Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model
Authors: S. Rijal, R. Neupane, S. P. Mainali, S. K. Regmi, S. Maharjan
Abstract
Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers. There have been several researches going on in this field but the size and complexity of the model is being traded off with the accuracy and robustness of speech separation. "Monaural multi-speaker speech separation" presents a speech-separation model based on the Transformer architecture and its efficient forms. The model has been trained with the LibriMix dataset containing diverse speakers' utterances. The model separates 2 distinct speaker sources from a mixed audio input. The developed model approaches the reduction in computational complexity of the speech separation model, with minimum tradeoff with the performance of prevalent speech separation model and it has shown significant movement towards that goal. This project foresees, a rise in contribution towards the ongoing research in the field of speech separation with computational efficiency at its core.
An Efficient Recommendation System in E-commerce using Passer learning optimization based on Bi-LSTM
Abstract
Recommendation system services have become crucial for users to access personalized goods or services as the global e-commerce market expands. They can increase business sales growth and lower the cost of user information exploration. Recent years have seen a signifi-cant increase in researchers actively using user reviews to solve standard recommender system research issues. Reviews may, however, contain information that does not help consumers de-cide what to buy, such as advertising or fictitious or fake reviews. Using such reviews to offer suggestion services may reduce the effectiveness of those recommendations. In this research, the recommendation in e-commerce is developed using passer learning optimization based on Bi-LSTM to solve that issue (PL optimized Bi-LSTM). Data is first obtained from the product recommendation dataset and pre-processed to remove any values that are missing or incon-sistent. Then, feature extraction is performed using TF-IDF features and features that support graph embedding. Before submitting numerous features with the same dimensions to the Bi-LSTM classifier for analysis, they are integrated using the feature concatenation approach. The Collaborative Bi-LSTM method employs these features to determine if the model is a recommended product. The PL optimization approach, which efficiently adjusts the classifier's parameters and produces an extract output that measures the f1-score, MSE, precision, and recall, is the basis of this research's contributions. As compared to earlier methods, the pro-posed PL-optimized Bi-LSTM achieved values of 88.58%, 1.24%, 92.69%, and 92.69% for dataset 1, 88.46%, 0.48%, 92.43%, and 93.47% for dataset 2, and 92.51%, 1.58%, 91.90%, and 90.76% for dataset 3.
Semi-Supervised Laplacian Learning on Stiefel Manifolds
Authors: Chester Holtz, Pengwen Chen, Alexander Cloninger, Chung-Kuan Cheng, Gal Mishne
Abstract
Motivated by the need to address the degeneracy of canonical Laplace learning algorithms in low label rates, we propose to reformulate graph-based semi-supervised learning as a nonconvex generalization of a \emph{Trust-Region Subproblem} (TRS). This reformulation is motivated by the well-posedness of Laplacian eigenvectors in the limit of infinite unlabeled data. To solve this problem, we first show that a first-order condition implies the solution of a manifold alignment problem and that solutions to the classical \emph{Orthogonal Procrustes} problem can be used to efficiently find good classifiers that are amenable to further refinement. Next, we address the criticality of selecting supervised samples at low-label rates. We characterize informative samples with a novel measure of centrality derived from the principal eigenvectors of a certain submatrix of the graph Laplacian. We demonstrate that our framework achieves lower classification error compared to recent state-of-the-art and classical semi-supervised learning methods at extremely low, medium, and high label rates. Our code is available on github\footnote{anonymized for submission}.
Formally Explaining Neural Networks within Reactive Systems
Authors: Shahaf Bassan, Guy Amir, Davide Corsi, Idan Refaeli, Guy Katz
Abstract
Deep neural networks (DNNs) are increasingly being used as controllers in reactive systems. However, DNNs are highly opaque, which renders it difficult to explain and justify their actions. To mitigate this issue, there has been a surge of interest in explainable AI (XAI) techniques, capable of pinpointing the input features that caused the DNN to act as it did. Existing XAI techniques typically face two limitations: (i) they are heuristic, and do not provide formal guarantees that the explanations are correct; and (ii) they often apply to ``one-shot'' systems (where the DNN is invoked independently of past invocations), as opposed to reactive systems. Here, we begin bridging this gap, and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. We suggest methods for efficiently calculating succinct explanations, by exploiting the system's transition constraints in order to curtail the search space explored by the underlying verifier. We evaluate our approach on two popular benchmarks from the domain of automated navigation; and observe that our methods allow the efficient computation of minimal and minimum explanations, while significantly outperforming the state of the art. We also demonstrate that our method produces formal explanations that are more reliable than competing, non-verification-based XAI techniques.
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification
Authors: Sandra Gilhuber, Julian Busch, Daniel Rotthues, Christian M. M. Frey, Thomas Seidl
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract
Node classification is one of the core tasks on attributed graphs, but successful graph learning solutions require sufficiently labeled data. To keep annotation costs low, active graph learning focuses on selecting the most qualitative subset of nodes that maximizes label efficiency. However, deciding which heuristic is best suited for an unlabeled graph to increase label efficiency is a persistent challenge. Existing solutions either neglect aligning the learned model and the sampling method or focus only on limited selection aspects. They are thus sometimes worse or only equally good as random sampling. In this work, we introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings. Toward better transferability between different graph structures, we combine three independent scoring functions to identify the most informative node samples for labeling in a parameter-free way: i) Model Uncertainty, ii) Diversity Component, and iii) Node Importance computed via graph diffusion heuristics. Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria and similarly fast as simpler heuristics. Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
Attribution-Scores in Data Management and Explainable Machine Learning
Abstract
We describe recent research on the use of actual causality in the definition of responsibility scores as explanations for query answers in databases, and for outcomes from classification models in machine learning. In the case of databases, useful connections with database repairs are illustrated and exploited. Repairs are also used to give a quantitative measure of the consistency of a database. For classification models, the responsibility score is properly extended and illustrated. The efficient computation of Shap-score is also analyzed and discussed. The emphasis is placed on work done by the author and collaborators.
Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF)
Authors: Chaochao Zhou, Syed Hasib Akhter Faruqui, Abhinav Patel, Ramez N. Abdalla, Michael C. Hurley, Ali Shaibani, Matthew B. Potts, Babak S. Jahromi, Leon Cho, Sameer A. Ansari, Donald R. Cantrell
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Many tasks performed in image-guided, mini-invasive, medical procedures can be cast as pose estimation problems, where an X-ray projection is utilized to reach a target in 3D space. Recent advances in the differentiable rendering of optically reflective materials have enabled state-of-the-art performance in RGB camera view synthesis and pose estimation. Expanding on these prior works, we introduce new methods for pose estimation of radiolucent objects using X-ray projections, and we demonstrate the critical role of optimal view synthesis in performing this task. We first develop an algorithm (DiffDRR) that efficiently computes Digitally Reconstructed Radiographs (DRRs) and leverages automatic differentiation within TensorFlow. In conjunction with classic CBCT reconstruction algorithms, we perform pose estimation by gradient descent using a loss function that quantifies the similarity of the DRR synthesized from a randomly initialized pose and the true fluoroscopic image at the target pose. We propose two novel methods for high-fidelity view synthesis, Neural Tuned Tomography (NeTT) and masked Neural Radiance Fields (mNeRF). Both methods rely on classic CBCT; NeTT directly optimizes the CBCT densities, while the non-zero values of mNeRF are constrained by a 3D mask of the anatomic region segmented from CBCT. We demonstrate that both NeTT and mNeRF distinctly improve pose estimation within our framework. By defining a successful pose estimate to be a 3D angle error of less than 3 deg, we find that NeTT and mNeRF can achieve similar results, both with overall success rates more than 93%. Furthermore, we show that a NeTT trained for a single subject can generalize to synthesize high-fidelity DRRs and ensure robust pose estimations for all other subjects. Therefore, we suggest that NeTT is an attractive option for robust pose estimation using fluoroscopic projections.
Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design
Abstract
This paper introduces a new architectural design framework that utilizes generative AI tools including ChatGPT and Veras with parametric modeling and Building Information Modeling (BIM) to enhance the design process. The study experiments with the potential of ChatGPT and generative AI in 3D architectural design, extending beyond its use in text and 2D image generation. The proposed framework promotes collaboration between architects and AI, facilitating a quick exploration of design ideas and producing context-sensitive, creative design generation. By integrating ChatGPT for scripting and Veras for generating design ideas with widely used parametric modeling and BIM tools, the framework provides architects with an intuitive and powerful method to convey design intent, leading to more efficient, creative, and collaborative design processes.
Demonstrating Autonomous 3D Path Planning on a Novel Scalable UGV-UAV Morphing Robot
Authors: Eric Sihite, Filip Slezak, Ioannis Mandralis, Adarsh Salagame, Milad Ramezani, Arash Kalantari, Alireza Ramezani, Morteza Gharib
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Some animals exhibit multi-modal locomotion capability to traverse a wide range of terrains and environments, such as amphibians that can swim and walk or birds that can fly and walk. This capability is extremely beneficial for expanding the animal's habitat range and they can choose the most energy efficient mode of locomotion in a given environment. The robotic biomimicry of this multi-modal locomotion capability can be very challenging but offer the same advantages. However, the expanded range of locomotion also increases the complexity of performing localization and path planning. In this work, we present our morphing multi-modal robot, which is capable of ground and aerial locomotion, and the implementation of readily available SLAM and path planning solutions to navigate a complex indoor environment.
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer
Authors: Guanyu Xu, Jiawei Hao, Li Shen, Han Hu, Yong Luo, Hui Lin, Jialie Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Recently, the efficient deployment and acceleration of powerful vision transformers (ViTs) on resource-limited edge devices for providing multimedia services have become attractive tasks. Although early exiting is a feasible solution for accelerating inference, most works focus on convolutional neural networks (CNNs) and transformer models in natural language processing (NLP).Moreover, the direct application of early exiting methods to ViTs may result in substantial performance degradation. To tackle this challenge, we systematically investigate the efficacy of early exiting in ViTs and point out that the insufficient feature representations in shallow internal classifiers and the limited ability to capture target semantic information in deep internal classifiers restrict the performance of these methods. We then propose an early exiting framework for general ViTs termed LGViT, which incorporates heterogeneous exiting heads, namely, local perception head and global aggregation head, to achieve an efficiency-accuracy trade-off. In particular, we develop a novel two-stage training scheme, including end-to-end training and self-distillation with the backbone frozen to generate early exiting ViTs, which facilitates the fusion of global and local information extracted by the two types of heads. We conduct extensive experiments using three popular ViT backbones on three vision datasets. Results demonstrate that our LGViT can achieve competitive performance with approximately 1.8 $\times$ speed-up.
Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functions
Abstract
Learning distance functions between complex objects, such as the Wasserstein distance to compare point sets, is a common goal in machine learning applications. However, functions on such complex objects (e.g., point sets and graphs) are often required to be invariant to a wide variety of group actions e.g. permutation or rigid transformation. Therefore, continuous and symmetric product functions (such as distance functions) on such complex objects must also be invariant to the product of such group actions. We call these functions symmetric and factor-wise group invariant (or SFGI functions in short). In this paper, we first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general neural network with a sketching idea to develop a specific and efficient neural network which can approximate the $p$-th Wasserstein distance between point sets. Very importantly, the required model complexity is independent of the sizes of input point sets. On the theoretical front, to the best of our knowledge, this is the first result showing that there exists a neural network with the capacity to approximate Wasserstein distance with bounded model complexity. Our work provides an interesting integration of sketching ideas for geometric problems with universal approximation of symmetric functions. On the empirical front, we present a range of results showing that our newly proposed neural network architecture performs comparatively or better than other models (including a SOTA Siamese Autoencoder based approach). In particular, our neural network generalizes significantly better and trains much faster than the SOTA Siamese AE. Finally, this line of investigation could be useful in exploring effective neural network design for solving a broad range of geometric optimization problems (e.g., $k$-means in a metric space).
Exploiting Sparsity for Localization of Large-Scale Wireless Sensor Networks
Abstract
Wireless Sensor Network (WSN) localization refers to the problem of determining the position of each of the agents in a WSN using noisy measurement information. In many cases, such as in distance and bearing-based localization, the measurement model is a nonlinear function of the agents' positions, leading to pairwise interconnections between the agents. As the optimal solution for the WSN localization problem is known to be computationally expensive in these cases, an efficient approximation is desired. In this paper, we show that the inherent sparsity in this problem can be exploited to greatly reduce the computational effort of using an Extended Kalman Filter (EKF) for large-scale WSN localization. In the proposed method, which we call the Low-Bandwidth Extended Kalman Filter (LB-EKF), the measurement information matrix is converted into a banded matrix by relabeling (permuting the order of) the vertices of the graph. Using a combination of theoretical analysis and numerical simulations, it is shown that typical WSN configurations (which can be modeled as random geometric graphs) can be localized in a scalable manner using the proposed LB-EKF approach.
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering
Authors: Hyeon Jeon, Ghulam Jilani Quadri, Hyunwook Lee, Paul Rosen, Danielle Albers Szafir, Jinwook Seo
Abstract
Visual clustering is a common perceptual task in scatterplots that supports diverse analytics tasks (e.g., cluster identification). However, even with the same scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can differ due to the differences among individuals and ambiguous cluster boundaries. Although such perceptual variability casts doubt on the reliability of data analysis based on visual clustering, we lack a systematic way to efficiently assess this variability. In this research, we study perceptual variability in conducting visual clustering, which we call Cluster Ambiguity. To this end, we introduce CLAMS, a data-driven visual quality measure for automatically predicting cluster ambiguity in monochrome scatterplots. We first conduct a qualitative study to identify key factors that affect the visual separation of clusters (e.g., proximity or size difference between clusters). Based on study findings, we deploy a regression module that estimates the human-judged separability of two clusters. Then, CLAMS predicts cluster ambiguity by analyzing the aggregated results of all pairwise separability between clusters that are generated by the module. CLAMS outperforms widely-used clustering techniques in predicting ground truth cluster ambiguity. Meanwhile, CLAMS exhibits performance on par with human annotators. We conclude our work by presenting two applications for optimizing and benchmarking data mining techniques using CLAMS. The interactive demo of CLAMS is available at clusterambiguity.dev.
Predictive Modeling through Hyper-Bayesian Optimization
Authors: Manisha Senadeera, Santu Rana, Sunil Gupta, Svetha Venkatesh
Abstract
Model selection is an integral problem of model based optimization techniques such as Bayesian optimization (BO). Current approaches often treat model selection as an estimation problem, to be periodically updated with observations coming from the optimization iterations. In this paper, we propose an alternative way to achieve both efficiently. Specifically, we propose a novel way of integrating model selection and BO for the single goal of reaching the function optima faster. The algorithm moves back and forth between BO in the model space and BO in the function space, where the goodness of the recommended model is captured by a score function and fed back, capturing how well the model helped convergence in the function space. The score function is derived in such a way that it neutralizes the effect of the moving nature of the BO in the function space, thus keeping the model selection problem stationary. This back and forth leads to quick convergence for both model selection and BO in the function space. In addition to improved sample efficiency, the framework outputs information about the black-box function. Convergence is proved, and experimental results show significant improvement compared to standard BO.
Patch Space Exploration using Static Analysis Feedback
Abstract
Automated Program Repair (APR) techniques typically rely on a given test-suite to guide the repair process. Apart from the need to provide test oracles, this makes the produced patches prone to test data over-fitting. In this work, instead of relying on test cases, we show how to automatically repair memory safety issues, by leveraging static analysis (specifically Incorrectness Separation Logic) to guide repair. Our proposed approach learns what a desirable patch is by inspecting how close a patch is to fixing the bug based on the feedback from incorrectness separation logic based static analysis (specifically the Pulse analyser), and turning this information into a distribution of probabilities over context free grammars. Furthermore, instead of focusing on heuristics for reducing the search space of patches, we make repair scalable by creating classes of equivalent patches according to the effect they have on the symbolic heap, and then invoking the validation oracle only once per class of patch equivalence. This allows us to efficiently discover repairs even in the presence of a large pool of patch candidates offered by our generic patch synthesis mechanism. Experimental evaluation of our approach was conducted by repairing real world memory errors in OpenSSL, swoole and other subjects. The evaluation results show the scalability and efficacy of our approach in automatically producing high quality patches.
Domain Adaptation based on Human Feedback for Enhancing Generative Model Denoising Abilities
Authors: Hyun-Cheol Park, Sung Ho Kang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
How can we apply human feedback into generative model? As answer of this question, in this paper, we show the method applied on denoising problem and domain adaptation using human feedback. Deep generative models have demonstrated impressive results in image denoising. However, current image denoising models often produce inappropriate results when applied to domains different from the ones they were trained on. If there are Good' andBad' result for unseen data, how to raise up quality of `Bad' result. Most methods use an approach based on generalization of model. However, these methods require target image for training or adapting unseen domain. In this paper, to adapting domain, we deal with non-target image for unseen domain, and improve specific failed image. To address this, we propose a method for fine-tuning inappropriate results generated in a different domain by utilizing human feedback. First, we train a generator to denoise images using only the noisy MNIST digit '0' images. The denoising generator trained on the source domain leads to unintended results when applied to target domain images. To achieve domain adaptation, we construct a noise-image denoising generated image data set and train a reward model predict human feedback. Finally, we fine-tune the generator on the different domain using the reward model with auxiliary loss function, aiming to transfer denoising capabilities to target domain. Our approach demonstrates the potential to efficiently fine-tune a generator trained on one domain using human feedback from another domain, thereby enhancing denoising abilities in different domains.
Pixel to policy: DQN Encoders for within & cross-game reinforcement learning
Abstract
Reinforcement Learning can be applied to various tasks, and environments. Many of these environments have a similar shared structure, which can be exploited to improve RL performance on other tasks. Transfer learning can be used to take advantage of this shared structure, by learning policies that are transferable across different tasks and environments and can lead to more efficient learning as well as improved performance on a wide range of tasks. This work explores as well as compares the performance between RL models being trained from the scratch and on different approaches of transfer learning. Additionally, the study explores the performance of a model trained on multiple game environments, with the goal of developing a universal game-playing agent as well as transfer learning a pre-trained encoder using DQN, and training it on the same game or a different game. Our DQN model achieves a mean episode reward of 46.16 which even beats the human-level performance with merely 20k episodes which is significantly lower than deepmind's 1M episodes. The achieved mean rewards of 533.42 and 402.17 on the Assault and Space Invader environments respectively, represent noteworthy performance on these challenging environments.
Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation
Authors: Yunze Hu, Jiaao Chen, Kangjie Zhou, Han Gao, Yutong Li, Chang Liu
Abstract
Parking occupancy estimation holds significant potential in facilitating parking resource management and mitigating traffic congestion. Existing approaches employ robotic systems to detect the occupancy status of individual parking spaces and primarily focus on enhancing detection accuracy through perception pipelines. However, these methods often overlook the crucial aspect of robot path planning, which can hinder the accurate estimation of the entire parking area. In light of these limitations, we introduce the problem of informative path planning for parking occupancy estimation using autonomous vehicles and formulate it as a Partially Observable Markov Decision Process (POMDP) task. Then, we develop an occupancy state transition model and introduce a Bayes filter to estimate occupancy based on noisy sensor measurements. Subsequently, we propose the Monte Carlo Bayes Filter Tree, a computationally efficient algorithm that leverages progressive widening to generate informative paths. We demonstrate that the proposed approach outperforms the benchmark methods in diverse simulation environments, effectively striking a balance between optimality and computational efficiency.
Reconstruction Harmonic Balance Method and its Application in Solving Complex Nonlinear Dynamical Systems
Authors: Dai Honghua, Wang Qisi, Yan Zipu, Yue Xiaokui
Abstract
The harmonic balance method is the most commonly used method for solving periodic solutions of nonlinear dynamic systems, but the high-order approximation of nonlinear terms requires sophisticated formula derivation, which limits its ultra-high accuracy. The authors' team proposed the reconstruction harmonic balance (RHB) method through the equivalent reconstruction of the frequency domain nonlinear quantity in the time domain, which settled the problem of ultra-high-order calculation of the classical harmonic balance method. However, both methods require the dynamical system to be polynomial nonlinear, and cannot be directly used to solve the quasi-periodic solution of the nonlinear system. In view of the above problems, this paper proposes a computational method that combines the RHB method and the recast technique for complex nonlinear systems. First, the general nonlinear problem is non-destructively recast into a polynomial nonlinear system, and then the RHB method is used for high-precision solutions. Aiming at computing the quasi-periodic response, the RHB method based on the idea of "supplemental frequency" is derived. By optimizing and selecting base frequencies, the fast and accurate capture of quasi-periodic response is achieved. The typical systems such as nonlinear pendulum, relativistic harmonic oscillator, and nonlinear coupling asymmetric pendulum are selected for simulation. The simulation results show that the accuracy of the RHB-recast method for solving nonpolynomial nonlinear systems is on the order of 10^(-12), reaching the computer accuracy, far exceeding state-of-the-art methods. The supplemental frequency RHB method achieves the efficient solution of quasi-periodic problems.
VideoPro: A Visual Analytics Approach for Interactive Video Programming
Authors: Jianben He, Xingbo Wang, Kam Kwai Wong, Xijie Huang, Changjian Chen, Zixin Chen, Fengjie Wang, Min Zhu, Huamin Qu
Abstract
Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise in generating labeled data at scale with user-defined labeling functions, the high dimensional and complex temporal information in videos poses additional challenges for effectively composing and evaluating labeling functions. In this paper, we propose VideoPro, a visual analytics approach to support flexible and scalable video data programming for model steering with reduced human effort. We first extract human-understandable events from videos using computer vision techniques and treat them as atomic components of labeling functions. We further propose a two-stage template mining algorithm that characterizes the sequential patterns of these events to serve as labeling function templates for efficient data labeling. The visual interface of VideoPro facilitates multifaceted exploration, examination, and application of the labeling templates, allowing for effective programming of video data at scale. Moreover, users can monitor the impact of programming on model performance and make informed adjustments during the iterative programming process. We demonstrate the efficiency and effectiveness of our approach with two case studies and expert interviews.
Coded Modulation Schemes for Voronoi Constellations
Abstract
Multidimensional Voronoi constellations (VCs) are shown to be more power-efficient than quadrature amplitude modulation (QAM) formats given the same uncoded bit error rate, and also have higher achievable information rates. However, a coded modulation scheme to sustain these gains after forward error correction (FEC) coding is still lacking. This paper designs coded modulation schemes with soft-decision FEC codes for VCs, including bit-interleaved coded modulation (BICM) and multilevel coded modulation (MLCM), together with three bit-to-integer mapping algorithms and log-likelihood ratio calculation algorithms. Simulation results show that VCs can achieve up to 1.84 dB signal-to-noise ratio (SNR) gains over QAM with BICM, and up to 0.99 dB SNR gains over QAM with MLCM for the additive white Gaussian noise channel, with a surprisingly low complexity.
Patch-wise Auto-Encoder for Visual Anomaly Detection
Authors: Yajie Cui, Zhaoxiang Liu, Shiguo Lian
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal images, the model will not be able to reconstruct abnormal images correctly. On the contrary, we propose a novel patch-wise auto-encoder (Patch AE) framework, which aims at enhancing the reconstruction ability of AE to anomalies instead of weakening it. Each patch of image is reconstructed by corresponding spatially distributed feature vector of the learned feature representation, i.e., patch-wise reconstruction, which ensures anomaly-sensitivity of AE. Our method is simple and efficient. It advances the state-of-the-art performances on Mvtec AD benchmark, which proves the effectiveness of our model. It shows great potential in practical industrial application scenarios.
FLatten Transformer: Vision Transformer using Focused Linear Attention
Abstract
The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear complexity by approximating the Softmax operation through carefully designed mapping functions. However, current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github.com/LeapLabTHU/FLatten-Transformer.
Complexity evaluation of network configurations and abstractions
Authors: Jose Moreno
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Computer networks have been traditionally configured by humans using command-line interfaces. Some network abstractions have emerged in the last 10 years, but there is no easy way of comparing them to each other objectively. Therefore, there is no consensus in the industry of what direction modern network abstractions should take, and the adoption of these abstractions lags as a consequence. In this paper I propose a comparison framework using metrics derived from graph structures to evaluate the simplicity, efficiency, and effectiveness of different network abstraction models. The result of this comparison is that while some of the existing network abstractions are quite efficient to store network policy (such as the Kubernetes or the Cisco Application Centric Infrastructure models), others (notably public cloud) are still very infrastructure-centric and suffer from excessive complexity.
Context-Aware Talking-Head Video Editing
Authors: Songlin Yang, Wei Wang, Jun Ling, Bo Peng, Xu Tan, Jing Dong
Abstract
Talking-head video editing aims to efficiently insert, delete, and substitute the word of a pre-recorded video through a text transcript editor. The key challenge for this task is obtaining an editing model that generates new talking-head video clips which simultaneously have accurate lip synchronization and motion smoothness. Previous approaches, including 3DMM-based (3D Morphable Model) methods and NeRF-based (Neural Radiance Field) methods, are sub-optimal in that they either require minutes of source videos and days of training time or lack the disentangled control of verbal (e.g., lip motion) and non-verbal (e.g., head pose and expression) representations for video clip insertion. In this work, we fully utilize the video context to design a novel framework for talking-head video editing, which achieves efficiency, disentangled motion control, and sequential smoothness. Specifically, we decompose this framework to motion prediction and motion-conditioned rendering: (1) We first design an animation prediction module that efficiently obtains smooth and lip-sync motion sequences conditioned on the driven speech. This module adopts a non-autoregressive network to obtain context prior and improve the prediction efficiency, and it learns a speech-animation mapping prior with better generalization to novel speech from a multi-identity video dataset. (2) We then introduce a neural rendering module to synthesize the photo-realistic and full-head video frames given the predicted motion sequence. This module adopts a pre-trained head topology and uses only few frames for efficient fine-tuning to obtain a person-specific rendering model. Extensive experiments demonstrate that our method efficiently achieves smoother editing results with higher image quality and lip accuracy using less data than previous methods.
Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries
Authors: Yifei He, Artur Podobas, Stefano Markidis
Abstract
FFTc is a Domain-Specific Language (DSL) for designing and generating Fast Fourier Transforms (FFT) libraries. The FFTc uniqueness is that it leverages and extend Multi-Level Intermediate Representation (MLIR) dialects to optimize FFT code generation. In this work, we present FFTc extensions and improvements such as the possibility of using different data layout for complex-value arrays, and sparsification to enable efficient vectorization, and a seamless porting of FFT libraries to GPU systems. We show that, on CPUs, thanks to vectorization, the performance of the FFTc-generated FFT is comparable to performance of FFTW, a state-of-the-art FFT libraries. We also present the initial performance results for FFTc on Nvidia GPUs.
Massively Parallel Algorithms for High-Dimensional Euclidean Minimum Spanning Tree
Abstract
We study the classic Euclidean Minimum Spanning Tree (MST) problem in the Massively Parallel Computation (MPC) model. Given a set $X \subset \mathbb{R}^d$ of $n$ points, the goal is to produce a spanning tree for $X$ with weight within a small factor of optimal. Euclidean MST is one of the most fundamental hierarchical geometric clustering algorithms, and with the proliferation of enormous high-dimensional data sets, such as massive transformer-based embeddings, there is now a critical demand for efficient distributed algorithms to cluster such data sets. In low-dimensional space, where $d = O(1)$, Andoni, Nikolov, Onak, and Yaroslavtsev [STOC '14] gave a constant round MPC algorithm that obtains a high accuracy $(1+\epsilon)$-approximate solution. However, the situation is much more challenging for high-dimensional spaces: the best-known algorithm to obtain a constant approximation requires $O(\log n)$ rounds. Recently Chen, Jayaram, Levi, and Waingarten [STOC '22] gave a $\tilde{O}(\log n)$ approximation algorithm in a constant number of rounds based on embeddings into tree metrics. However, to date, no known algorithm achieves both a constant number of rounds and approximation. In this paper, we make strong progress on this front by giving a constant factor approximation in $\tilde{O}(\log \log n)$ rounds of the MPC model. In contrast to tree-embedding-based approaches, which necessarily must pay $\Omega(\log n)$-distortion, our algorithm is based on a new combination of graph-based distributed MST algorithms and geometric space partitions. Additionally, although the approximate MST we return can have a large depth, we show that it can be modified to obtain a $\tilde{O}(\log \log n)$-round constant factor approximation to the Euclidean Traveling Salesman Problem (TSP) in the MPC model. Previously, only a $O(\log n)$ round was known for the problem.
Enhancing Sample Efficiency and Uncertainty Compensation in Learning-based Model Predictive Control for Aerial Robots
Authors: Kong Yao Chee, Thales C. Silva, M. Ani Hsieh, George J. Pappas
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
The recent increase in data availability and reliability has led to a surge in the development of learning-based model predictive control (MPC) frameworks for robot systems. Despite attaining substantial performance improvements over their non-learning counterparts, many of these frameworks rely on an offline learning procedure to synthesize a dynamics model. This implies that uncertainties encountered by the robot during deployment are not accounted for in the learning process. On the other hand, learning-based MPC methods that learn dynamics models online are computationally expensive and often require a significant amount of data. To alleviate these shortcomings, we propose a novel learning-enhanced MPC framework that incorporates components from $\mathcal{L}_1$ adaptive control into learning-based MPC. This integration enables the accurate compensation of both matched and unmatched uncertainties in a sample-efficient way, enhancing the control performance during deployment. In our proposed framework, we present two variants and apply them to the control of a quadrotor system. Through simulations and physical experiments, we demonstrate that the proposed framework not only allows the synthesis of an accurate dynamics model on-the-fly, but also significantly improves the closed-loop control performance under a wide range of spatio-temporal uncertainties.
Sliding Touch-based Exploration for Modeling Unknown Object Shape with Multi-fingered Hands
Authors: Yiting Chen, Ahmet Ercan Tekden, Marc Peter Deisenroth, Yasemin Bekiroglu
Abstract
Efficient and accurate 3D object shape reconstruction contributes significantly to the success of a robot's physical interaction with its environment. Acquiring accurate shape information about unknown objects is challenging, especially in unstructured environments, e.g. the vision sensors may only be able to provide a partial view. To address this issue, tactile sensors could be employed to extract local surface information for more robust unknown object shape estimation. In this paper, we propose a novel approach for efficient unknown 3D object shape exploration and reconstruction using a multi-fingered hand equipped with tactile sensors and a depth camera only providing a partial view. We present a multi-finger sliding touch strategy for efficient shape exploration using a Bayesian Optimization approach and a single-leader-multi-follower strategy for multi-finger smooth local surface perception. We evaluate our proposed method by estimating the 3D shape of objects from the YCB and OCRTOC datasets based on simulation and real robot experiments. The proposed approach yields successful reconstruction results relying on only a few continuous sliding touches. Experimental results demonstrate that our method is able to model unknown objects in an efficient and accurate way.
Explainable Cost-Sensitive Deep Neural Networks for Brain Tumor Detection from Brain MRI Images considering Data Imbalance
Authors: Md Tanvir Rouf Shawon, G. M. Shahariar Shibli, Farzad Ahmed, Sajib Kumar Saha Joy
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents a research study on the use of Convolutional Neural Network (CNN), ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile models to efficiently detect brain tumors in order to reduce the time required for manual review of the report and create an automated system for classifying brain tumors. An automated pipeline is proposed, which encompasses five models: CNN, ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile. The performance of the proposed architecture is evaluated on a balanced dataset and found to yield an accuracy of 99.33% for fine-tuned InceptionV3 model. Furthermore, Explainable AI approaches are incorporated to visualize the model's latent behavior in order to understand its black box behavior. To further optimize the training process, a cost-sensitive neural network approach has been proposed in order to work with imbalanced datasets which has achieved almost 4% more accuracy than the conventional models used in our experiments. The cost-sensitive InceptionV3 (CS-InceptionV3) and CNN (CS-CNN) show a promising accuracy of 92.31% and a recall value of 1.00 respectively on an imbalanced dataset. The proposed models have shown great potential in improving tumor detection accuracy and must be further developed for application in practical solutions. We have provided the datasets and made our implementations publicly available at - https://github.com/shahariar-shibli/Explainable-Cost-Sensitive-Deep-Neural-Networks-for-Brain-Tumor-Detection-from-Brain-MRI-Images
Hessian-Aware Bayesian Optimization for Decision Making Systems
Abstract
Many approaches for optimizing decision making systems rely on gradient based methods requiring informative feedback from the environment. However, in the case where such feedback is sparse or uninformative, such approaches may result in poor performance. Derivative-free approaches such as Bayesian Optimization mitigate the dependency on the quality of gradient feedback, but are known to scale poorly in the high-dimension setting of complex decision making systems. This problem is exacerbated if the system requires interactions between several actors cooperating to accomplish a shared goal. To address the dimensionality challenge, we propose a compact multi-layered architecture modeling the dynamics of actor interactions through the concept of role. Additionally, we introduce Hessian-aware Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters. Experimental results demonstrate that our method (HA-GP-UCB) works effectively on several benchmarks under resource constraints and malformed feedback settings.
Arithmetic Deduction Model for High Performance Computing: A Comparative Exploration of Computational Models Paradigms
Authors: Patrick Mukala
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
A myriad of applications ranging from engineering and scientific simulations, image and signal processing as well as high-sensitive data retrieval demand high processing power reaching up to teraflops for their efficient execution. While a standard serial computer would require clock-cycles of less than one per second in this instance, parallel computing is a viable alternative. In adopting parallelism, multiple architectural models such as the PRAM, BSP and DataFlow Models have been proposed and implemented with some limitations due to a number of factors. Perhaps one of the predominant causes is the presence of sequential execution at some extent in these models. This status has trigged the need for improved alternatives. Hence, the Arithmetic Deduction Model has been introduced and its peculiarity can be seen through its use of natural arithmetic concepts to perform computation, and the remarkable substitution or elimination of dependency on variables and states in distributed data processing. Although some initial results about its performance have been published, it is critical to contextualize its genesis. Hence, in this paper we explore the importance of high performance computing and conduct a comparative study of some models of computation in terms of their strengh and limitations and accordingly highlight the need for a new model of computation.
An Empirical Evaluation of AriDeM using Matrix Multiplication
Abstract
For a long time, the Von Neumann has been a successful model of computation for sequential computing .Many models including the dataflow model have been unsuccessfully developed to emulate the same results in parallel computing. It is widely accepted that high performance computation is better-achieved using parallel architectures and is seen as the basis for future computational architectures with the ever-increasing need for high performance computation. We describe a new model of parallel computation known as the Arithmetic Deduction Model (AriDem) which has some similarities with the Von Neumann. A theoretical evaluation conducted on this model in comparison with the predominant von Neumann model indicated AriDeM to be more efficient in resources utilization. In this paper, we conduct an empirical evaluation of the model and the results reflect the output of the theoretical evaluation.
Orthonormal eigenfunction expansions for sixth-order boundary value problems
Abstract
Sixth-order boundary value problems (BVPs) arise in thin-film flows with a surface that has elastic bending resistance. To solve such problems, we first derive a complete set of odd and even orthonormal eigenfunctions -- resembling trigonometric sines and cosines, as well as the so-called ``beam'' functions. These functions intrinsically satisfy boundary conditions (BCs) of relevance to thin-film flows, since they are the solutions of a self-adjoint sixth-order Sturm--Liouville BVP with the same BCs. Next, we propose a Galerkin spectral approach for sixth-order problems; namely the sought function as well as all its derivatives and terms appearing in the differential equation are expanded into an infinite series with respect to the derived complete orthonormal (CON) set of eigenfunctions. The unknown coefficients in the series expansion are determined by solving the algebraic system derived by taking successive inner products with each member of the CON set of eigenfunctions. The proposed method and its convergence are demonstrated by solving two model sixth-order BVPs.
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
Authors: Nadezhda Chirkova, Sergey Troshin
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Software Engineering (cs.SE)
Abstract
Recent works have widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code. This work investigates another important aspect of such models, namely the effect of different subtokenization options, and aims at identifying most effective and length-efficient subtokenizations, taking into account code specifics. We propose subtokenziation that reduces average length by 17% without downstream performance drop, and show that a carefully chosen subtokenization may improve quality by 0.5-2%, possibly with some length increase.
Learning from Hypervectors: A Survey on Hypervector Encoding
Authors: Sercan Aygun, Mehran Shoushtari Moghadam, M. Hassan Najafi, Mohsen Imani
Abstract
Hyperdimensional computing (HDC) is an emerging computing paradigm that imitates the brain's structure to offer a powerful and efficient processing and learning model. In HDC, the data are encoded with long vectors, called hypervectors, typically with a length of 1K to 10K. The literature provides several encoding techniques to generate orthogonal or correlated hypervectors, depending on the intended application. The existing surveys in the literature often focus on the overall aspects of HDC systems, including system inputs, primary computations, and final outputs. However, this study takes a more specific approach. It zeroes in on the HDC system input and the generation of hypervectors, directly influencing the hypervector encoding process. This survey brings together various methods for hypervector generation from different studies and explores the limitations, challenges, and potential benefits they entail. Through a comprehensive exploration of this survey, readers will acquire a profound understanding of various encoding types in HDC and gain insights into the intricate process of hypervector generation for diverse applications.
Mining Reviews in Open Source Code for Developers Trail: A Process Mining Approach
Authors: Patrick Mukala
Subjects: Software Engineering (cs.SE); Information Retrieval (cs.IR)
Abstract
Audit trails are evidential indications of activities performers in any logs. Modern reactive systems such as transaction processing systems, management information systems, decision support systems and even executive management systems log activities of users as they perform their daily tasks for a number of reasons and perhaps one of the most important is security. In order to efficiently monitor and manage privacy and access to information, the trails as captured and recorded in these logs play a pivotal role in this regard. In Open Source realm, however, this is not the case. Although the objective with free software is to allow for access, free distribution and the rights to modify coding, having such audit trails can help to trace and understand how active members of these communities are and the type of activities they perform. In this paper, we propose using process mining to construct logs using as much data as can be found in open source repositories in order to produce a process model, also called a workflow net that graphical depicts the sequential occurrence of developers activities. Our method is exhibited through a simple algorithm called Act-Trace.
Keyword: faster
Multilevel well modeling in aggregation-based nonlinear multigrid for multiphase flow in porous media
Authors: Chak Shing Lee, François P. Hamon, Nicola Castelletto, Panayot S. Vassilevski, Joshua A. White
Abstract
A full approximation scheme (FAS) nonlinear multigrid solver for two-phase flow and transport problems driven by wells with multiple perforations is developed. It is an extension to our previous work on FAS solvers for diffusion and transport problems. The solver is applicable to discrete problems defined on unstructured grids as the coarsening algorithm is aggregation-based and algebraic. To construct coarse basis that can better capture the radial flow near wells, coarse grids in which perforated well cells are not near the coarse-element interface are desired. This is achieved by an aggregation algorithm proposed in this paper that makes use of the location of well cells in the cell-connectivity graph. Numerical examples in which the FAS solver is compared against Newton's method on benchmark problems are given. In particular, for a refined version of the SAIGUP model, the FAS solver is at least 35% faster than Newton's method for time steps with a CFL number greater than 10.
Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functions
Abstract
Learning distance functions between complex objects, such as the Wasserstein distance to compare point sets, is a common goal in machine learning applications. However, functions on such complex objects (e.g., point sets and graphs) are often required to be invariant to a wide variety of group actions e.g. permutation or rigid transformation. Therefore, continuous and symmetric product functions (such as distance functions) on such complex objects must also be invariant to the product of such group actions. We call these functions symmetric and factor-wise group invariant (or SFGI functions in short). In this paper, we first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general neural network with a sketching idea to develop a specific and efficient neural network which can approximate the $p$-th Wasserstein distance between point sets. Very importantly, the required model complexity is independent of the sizes of input point sets. On the theoretical front, to the best of our knowledge, this is the first result showing that there exists a neural network with the capacity to approximate Wasserstein distance with bounded model complexity. Our work provides an interesting integration of sketching ideas for geometric problems with universal approximation of symmetric functions. On the empirical front, we present a range of results showing that our newly proposed neural network architecture performs comparatively or better than other models (including a SOTA Siamese Autoencoder based approach). In particular, our neural network generalizes significantly better and trains much faster than the SOTA Siamese AE. Finally, this line of investigation could be useful in exploring effective neural network design for solving a broad range of geometric optimization problems (e.g., $k$-means in a metric space).
Predictive Modeling through Hyper-Bayesian Optimization
Authors: Manisha Senadeera, Santu Rana, Sunil Gupta, Svetha Venkatesh
Abstract
Model selection is an integral problem of model based optimization techniques such as Bayesian optimization (BO). Current approaches often treat model selection as an estimation problem, to be periodically updated with observations coming from the optimization iterations. In this paper, we propose an alternative way to achieve both efficiently. Specifically, we propose a novel way of integrating model selection and BO for the single goal of reaching the function optima faster. The algorithm moves back and forth between BO in the model space and BO in the function space, where the goodness of the recommended model is captured by a score function and fed back, capturing how well the model helped convergence in the function space. The score function is derived in such a way that it neutralizes the effect of the moving nature of the BO in the function space, thus keeping the model selection problem stationary. This back and forth leads to quick convergence for both model selection and BO in the function space. In addition to improved sample efficiency, the framework outputs information about the black-box function. Convergence is proved, and experimental results show significant improvement compared to standard BO.
Center Contrastive Loss for Metric Learning
Authors: Bolun Cai, Pengfei Xiong, Shangxuan Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Contrastive learning is a major studied topic in metric learning. However, sampling effective contrastive pairs remains a challenge due to factors such as limited batch size, imbalanced data distribution, and the risk of overfitting. In this paper, we propose a novel metric learning function called Center Contrastive Loss, which maintains a class-wise center bank and compares the category centers with the query data points using a contrastive loss. The center bank is updated in real-time to boost model convergence without the need for well-designed sample mining. The category centers are well-optimized classification proxies to re-balance the supervisory signal of each class. Furthermore, the proposed loss combines the advantages of both contrastive and classification methods by reducing intra-class variations and enhancing inter-class differences to improve the discriminative power of embeddings. Our experimental results, as shown in Figure 1, demonstrate that a standard network (ResNet50) trained with our loss achieves state-of-the-art performance and faster convergence.
Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems
Authors: Yubin Xiao, Di Wang, Huanhuan Chen, Boyang Li, Wei Pang, Xuan Wu, Hao Li, Dong Xu, Yanchun Liang, You Zhou
Abstract
The Traveling Salesman Problem (TSP) is a well-known problem in combinatorial optimization with applications in various domains. However, existing TSP solvers face challenges in producing high-quality solutions with low latency. To address this issue, we propose NAR4TSP, which produces TSP solutions in a Non-Autoregressive (NAR) manner using a specially designed Graph Neural Network (GNN), achieving faster inference speed. Moreover, NAR4TSP is trained using an enhanced Reinforcement Learning (RL) strategy, eliminating the dependency on costly labels used to train conventional supervised learning-based NAR models. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR decoding. The experimental results on both synthetic and real-world TSP instances demonstrate that NAR4TSP outperforms four state-of-the-art models in terms of solution quality, inference latency, and generalization ability. Lastly, we present visualizations of NAR4TSP's decoding process and its overall path planning to showcase the feasibility of implementing NAR4TSP in an end-to-end manner and its effectiveness, respectively.
MonoNext: A 3D Monocular Object Detection with ConvNext
Authors: Marcelo Eduardo Pederiva, José Mario De Martino, Alessandro Zimmer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Autonomous driving perception tasks rely heavily on cameras as the primary sensor for Object Detection, Semantic Segmentation, Instance Segmentation, and Object Tracking. However, RGB images captured by cameras lack depth information, which poses a significant challenge in 3D detection tasks. To supplement this missing data, mapping sensors such as LIDAR and RADAR are used for accurate 3D Object Detection. Despite their significant accuracy, the multi-sensor models are expensive and require a high computational demand. In contrast, Monocular 3D Object Detection models are becoming increasingly popular, offering a faster, cheaper, and easier-to-implement solution for 3D detections. This paper introduces a different Multi-Tasking Learning approach called MonoNext that utilizes a spatial grid to map objects in the scene. MonoNext employs a straightforward approach based on the ConvNext network and requires only 3D bounding box annotated data. In our experiments with the KITTI dataset, MonoNext achieved high precision and competitive performance comparable with state-of-the-art approaches. Furthermore, by adding more training data, MonoNext surpassed itself and achieved higher accuracies.
Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems
Authors: Måns I. Andersson, Felix Liu, Stefano Markidis
Abstract
This paper presents the design and development of an Anderson Accelerated Preconditioned Modified Hermitian and Skew-Hermitian Splitting (AA-PMHSS) method for solving complex-symmetric linear systems with application to electromagnetics problems, such as wave scattering and eddy currents. While it has been shown that the Anderson Acceleration of real linear systems is essentially equivalent to GMRES, we show here that the formulation using Anderson acceleration leads to a more performant method. We show relatively good robustness compared to existing preconditioned GMRES methods and significantly better performance due to the faster evaluation of the preconditioner. In particular, AA-PMHSS can be applied to solve problems and equations arising from electromagnetics, such as time-harmonic eddy current simulations discretized with the Finite Element Method. We also evaluate three test systems present in previous literature. We show that the method is competitive with two types of preconditioned GMRES. One of the significant advantages of these methods is that the convergence rate is independent of the discretization size.
Keyword: mobile
Crowd Safety Manager: Towards Data-Driven Active Decision Support for Planning and Control of Crowd Events
Abstract
This paper presents novel technology and methodology aimed at enhancing crowd management in both the planning and operational phases. The approach encompasses innovative data collection techniques, data integration, and visualization using a 3D Digital Twin, along with the incorporation of artificial intelligence (AI) tools for risk identification. The paper introduces the Bowtie model, a comprehensive framework designed to assess and predict risk levels. The model combines objective estimations and predictions, such as traffic flow operations and crowdedness levels, with various aggravating factors like weather conditions, sentiments, and the purpose of visitors, to evaluate the expected risk of incidents. The proposed framework is applied to the Crowd Safety Manager project in Scheveningen, where the DigiTwin is developed based on a wealth of real-time data sources. One noteworthy data source is Resono, offering insights into the number of visitors and their movements, leveraging a mobile phone panel of over 2 million users in the Netherlands. Particular attention is given to the left-hand side of the Bowtie, which includes state estimation, prediction, and forecasting. Notably, the focus is on generating multi-day ahead forecasts for event-planning purposes using Resono data. Advanced machine learning techniques, including the XGBoost framework, are compared, with XGBoost demonstrating the most accurate forecasts. The results indicate that the predictions are adequately accurate. However, certain locations may benefit from additional input data to further enhance prediction quality. Despite these limitations, this work contributes to a more effective crowd management system and opens avenues for further advancements in this critical field.
Mobile Apps for Children's Health and Wellbeing: Design Features and Future Opportunities
Abstract
Mobile health apps hold great potential for promoting children's health and wellbeing. However, there is limited understanding of how these technologies are currently designed to support children with their health concerns or wellness goals. To gain insight into the current landscape of mobile apps designed for children's health, we retrieved and reviewed 43 apps from IOS and Google Play store that are specifically marketed for children. Our qualitative analysis identified the dominant health focuses and goals of children's mobile health apps. We analyzed the primary users and their expectations as well as the methods of engagement and involvement adopted. Based on our findings, we discussed the opportunities to support children with chronic illnesses through mobile apps, design for dual use, and design for age appropriateness and digital health safety. This study provides insights and recommendations for app designers, health researchers, and policymakers on strategies for engaging children and parents while also promoting children's health and wellbeing through mobile technology.
DPBERT: Efficient Inference for BERT based on Dynamic Planning
Authors: Weixin Wu, Hankz Hankui Zhuo
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing power is limited. In this paper we aim to address the weakness of existing input-adaptive inference methods which fail to take full advantage of the structure of BERT. We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT through selecting a subsequence of transformer layers list of backbone as a computational path for an input sample. To do this, our approach adds a planning module to the original BERT model to determine whether a layer is included or bypassed during inference. Experimental results on the GLUE benchmark exhibit that our method reduces latency to 75\% while maintaining 98\% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.
A Cyber-Physical Routing Protocol Exploiting Trajectory Dynamics for Mission-Oriented Flying Ad Hoc Networks
Authors: Die Hu, Shaoshi Yang, Min Gong, Zhiyong Feng, Xuejun Zhu
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
As a special type of mobile ad hoc network (MANET), the flying ad hoc network (FANET) has the potential to enable a variety of emerging applications in both civilian wireless communications (e.g., 5G and 6G) and the defense industry. The routing protocol plays a pivotal role in FANET. However, when designing the routing protocol for FANET, it is conventionally assumed that the aerial nodes move randomly. This is clearly inappropriate for a mission-oriented FANET (MO-FANET), in which the aerial nodes typically move toward a given destination from given departure point(s), possibly along a roughly deterministic flight path while maintaining a well-established formation, in order to carry out certain missions. In this paper, a novel cyber-physical routing protocol exploiting the particular mobility pattern of an MO-FANET is proposed based on cross-disciplinary integration, which makes full use of the mission-determined trajectory dynamics to construct the time sequence of rejoining and separating, as well as the adjacency matrix for each node, as prior information. Compared with the existing representative routing protocols used in FANETs, our protocol achieves a higher packet-delivery ratio (PDR) at the cost of even lower overhead and lower average end-to-end latency, while maintaining a reasonably moderate and stable network jitter, as demonstrated by extensive ns-3-based simulations assuming realistic configurations in an MO-FANET.
A First Look at Digital Rights Management Systems for Secure Mobile Content Delivery
Authors: Amir Rafi, Carlton Shepherd, Konstantinos Markantonakis
Abstract
Digital rights management (DRM) solutions aim to prevent the copying or distribution of copyrighted material. On mobile devices, a variety of DRM technologies have become widely deployed. However, a detailed security study comparing their internal workings, and their strengths and weaknesses, remains missing in the existing literature. In this paper, we present the first detailed security analysis of mobile DRM systems, addressing the modern paradigm of cloud-based content delivery followed by major platforms, such as Netflix, Disney+, and Amazon Prime. We extensively analyse the security of three widely used DRM solutions -- Google Widevine, Apple FairPlay, and Microsoft PlayReady -- deployed on billions of devices worldwide. We then consolidate their features and capabilities, deriving common features and security properties for their evaluation. Furthermore, we identify some design-level shortcomings that render them vulnerable to emerging attacks within the state of the art, including micro-architectural side-channel vulnerabilities and an absence of post-quantum security. Lastly, we propose mitigations and suggest future directions of research.
Explainable Cost-Sensitive Deep Neural Networks for Brain Tumor Detection from Brain MRI Images considering Data Imbalance
Authors: Md Tanvir Rouf Shawon, G. M. Shahariar Shibli, Farzad Ahmed, Sajib Kumar Saha Joy
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents a research study on the use of Convolutional Neural Network (CNN), ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile models to efficiently detect brain tumors in order to reduce the time required for manual review of the report and create an automated system for classifying brain tumors. An automated pipeline is proposed, which encompasses five models: CNN, ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile. The performance of the proposed architecture is evaluated on a balanced dataset and found to yield an accuracy of 99.33% for fine-tuned InceptionV3 model. Furthermore, Explainable AI approaches are incorporated to visualize the model's latent behavior in order to understand its black box behavior. To further optimize the training process, a cost-sensitive neural network approach has been proposed in order to work with imbalanced datasets which has achieved almost 4% more accuracy than the conventional models used in our experiments. The cost-sensitive InceptionV3 (CS-InceptionV3) and CNN (CS-CNN) show a promising accuracy of 92.31% and a recall value of 1.00 respectively on an imbalanced dataset. The proposed models have shown great potential in improving tumor detection accuracy and must be further developed for application in practical solutions. We have provided the datasets and made our implementations publicly available at - https://github.com/shahariar-shibli/Explainable-Cost-Sensitive-Deep-Neural-Networks-for-Brain-Tumor-Detection-from-Brain-MRI-Images
Keyword: pruning
There is no result
Keyword: diffusion
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
Abstract
We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse categories. In contrast, DAVIS leverages a generative diffusion model and a Separation U-Net to synthesize separated magnitudes starting from Gaussian noises, conditioned on both the audio mixture and the visual footage. With its generative objective, DAVIS is better suited to achieving the goal of high-quality sound separation across diverse categories. We compare DAVIS to existing state-of-the-art discriminative audio-visual separation methods on the domain-specific MUSIC dataset and the open-domain AVE dataset, and results show that DAVIS outperforms other methods in separation quality, demonstrating the advantages of our framework for tackling the audio-visual source separation task.
Multilevel well modeling in aggregation-based nonlinear multigrid for multiphase flow in porous media
Authors: Chak Shing Lee, François P. Hamon, Nicola Castelletto, Panayot S. Vassilevski, Joshua A. White
Abstract
A full approximation scheme (FAS) nonlinear multigrid solver for two-phase flow and transport problems driven by wells with multiple perforations is developed. It is an extension to our previous work on FAS solvers for diffusion and transport problems. The solver is applicable to discrete problems defined on unstructured grids as the coarsening algorithm is aggregation-based and algebraic. To construct coarse basis that can better capture the radial flow near wells, coarse grids in which perforated well cells are not near the coarse-element interface are desired. This is achieved by an aggregation algorithm proposed in this paper that makes use of the location of well cells in the cell-connectivity graph. Numerical examples in which the FAS solver is compared against Newton's method on benchmark problems are given. In particular, for a refined version of the SAIGUP model, the FAS solver is at least 35% faster than Newton's method for time steps with a CFL number greater than 10.
InFusion: Inject and Attention Fusion for Multi Concept Zero Shot Text based Video Editing
Authors: Anant Khandelwal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Large text-to-image diffusion models have achieved remarkable success in generating diverse high-quality images that are closely aligned with text prompt. But, when these models applied to video the main challenge is to ensure temporal consistency and coherent editing. In this paper, we proposed InFusion, a framework for zero-shot text-based video editing leveraging large pre-trained image diffusion models. Our framework specifically supports editing of multiple concepts with the pixel level control over diverse concepts mentioned in the editing prompt. Specifically, we inject the difference of features from U-Net residual blocks in decoder layers for source and edit prompt, this when combined with injected attention features make it feasible to query the source contents and scale the edited concepts along with injection of unedited parts. The editing is further controlled in fine-grained manner with mask extraction and attention fusion strategy which cuts the edited part from source and paste it from the denoising pipeline for target prompt. Our framework is a low cost alternative for the one-shot tuned models for editing since it does not require training. We demonstrated the complex concept editing with generalised image model (Stable Diffusion v1.5) using LoRA. Adaptation is compatible with all the existing image diffusion techniques. Extensive experimental results demonstrate the effectiveness over existing methods in rendering high-quality and temporally consistent videos.
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification
Authors: Sandra Gilhuber, Julian Busch, Daniel Rotthues, Christian M. M. Frey, Thomas Seidl
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract
Node classification is one of the core tasks on attributed graphs, but successful graph learning solutions require sufficiently labeled data. To keep annotation costs low, active graph learning focuses on selecting the most qualitative subset of nodes that maximizes label efficiency. However, deciding which heuristic is best suited for an unlabeled graph to increase label efficiency is a persistent challenge. Existing solutions either neglect aligning the learned model and the sampling method or focus only on limited selection aspects. They are thus sometimes worse or only equally good as random sampling. In this work, we introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings. Toward better transferability between different graph structures, we combine three independent scoring functions to identify the most informative node samples for labeling in a parameter-free way: i) Model Uncertainty, ii) Diversity Component, and iii) Node Importance computed via graph diffusion heuristics. Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria and similarly fast as simpler heuristics. Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
Diffusion Model for Camouflaged Object Detection
Authors: Zhennan Chen, Rongrong Gao, Tian-Zhu Xiang, Fan Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Camouflaged object detection is a challenging task that aims to identify objects that are highly similar to their background. Due to the powerful noise-to-image denoising capability of denoising diffusion models, in this paper, we propose a diffusion-based framework for camouflaged object detection, termed diffCOD, a new framework that considers the camouflaged object segmentation task as a denoising diffusion process from noisy masks to object masks. Specifically, the object mask diffuses from the ground-truth masks to a random distribution, and the designed model learns to reverse this noising process. To strengthen the denoising learning, the input image prior is encoded and integrated into the denoising diffusion model to guide the diffusion process. Furthermore, we design an injection attention module (IAM) to interact conditional semantic features extracted from the image with the diffusion noise embedding via the cross-attention mechanism to enhance denoising learning. Extensive experiments on four widely used COD benchmark datasets demonstrate that the proposed method achieves favorable performance compared to the existing 11 state-of-the-art methods, especially in the detailed texture segmentation of camouflaged objects. Our code will be made publicly available at: https://github.com/ZNan-Chen/diffCOD.
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Authors: Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones to provide. As tasks grow more complex, the selection search grows combinatorially and invariably becomes intractable. Our work provides an alternative to demonstrations: tool documentation. We advocate the use of tool documentation, descriptions for the individual tool usage, over demonstrations. We substantiate our claim through three main empirical findings on 6 tasks across both vision and language modalities. First, on existing benchmarks, zero-shot prompts with only tool documentation are sufficient for eliciting proper tool usage, achieving performance on par with few-shot prompts. Second, on a newly collected realistic tool-use dataset with hundreds of available tool APIs, we show that tool documentation is significantly more valuable than demonstrations, with zero-shot documentation significantly outperforming few-shot without documentation. Third, we highlight the benefits of tool documentations by tackling image generation and video tracking using just-released unseen state-of-the-art models as tools. Finally, we highlight the possibility of using tool documentation to automatically enable new applications: by using nothing more than the documentation of GroundingDino, Stable Diffusion, XMem, and SAM, LLMs can re-invent the functionalities of the just-released Grounded-SAM and Track Anything models.
Keyword: adaptive
Unsupervised machine learning shock capturing for High-Order CFD solvers
Abstract
We present a novel unsupervised machine learning shock capturing algorithm based on Gaussian Mixture Models (GMMs). The proposed GMM sensor demonstrates remarkable accuracy in detecting shocks and is robust across diverse test cases without the need for parameter tuning. We compare the GMM-based sensor with state-of-the-art alternatives. All methods are integrated into a high-order compressible discontinuous Galerkin solver where artificial viscosity can be modulated to capture shocks. Supersonic test cases, including high Reynolds numbers, showcase the sensor's performance, demonstrating the same effectiveness as fine-tuned state-of-the-art sensors. %The nodal DG aproach allows for potential applications in sub-cell flux-differencing formulations, supersonic feature detection, and mesh refinement. The adaptive nature and ability to function without extensive training datasets make this GMM-based sensor suitable for complex geometries and varied flow configurations. Our study reveals the potential of unsupervised machine learning methods, exemplified by the GMM sensor, to improve the robustness and efficiency of advanced CFD codes.
Task-Oriented Channel Attention for Fine-Grained Few-Shot Classification
Abstract
The difficulty of the fine-grained image classification mainly comes from a shared overall appearance across classes. Thus, recognizing discriminative details, such as eyes and beaks for birds, is a key in the task. However, this is particularly challenging when training data is limited. To address this, we propose Task Discrepancy Maximization (TDM), a task-oriented channel attention method tailored for fine-grained few-shot classification with two novel modules Support Attention Module (SAM) and Query Attention Module (QAM). SAM highlights channels encoding class-wise discriminative features, while QAM assigns higher weights to object-relevant channels of the query. Based on these submodules, TDM produces task-adaptive features by focusing on channels encoding class-discriminative details and possessed by the query at the same time, for accurate class-sensitive similarity measure between support and query instances. While TDM influences high-level feature maps by task-adaptive calibration of channel-wise importance, we further introduce Instance Attention Module (IAM) operating in intermediate layers of feature extractors to instance-wisely highlight object-relevant channels, by extending QAM. The merits of TDM and IAM and their complementary benefits are experimentally validated in fine-grained few-shot classification tasks. Moreover, IAM is also shown to be effective in coarse-grained and cross-domain few-shot classifications.
DPBERT: Efficient Inference for BERT based on Dynamic Planning
Authors: Weixin Wu, Hankz Hankui Zhuo
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing power is limited. In this paper we aim to address the weakness of existing input-adaptive inference methods which fail to take full advantage of the structure of BERT. We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT through selecting a subsequence of transformer layers list of backbone as a computational path for an input sample. To do this, our approach adds a planning module to the original BERT model to determine whether a layer is included or bypassed during inference. Experimental results on the GLUE benchmark exhibit that our method reduces latency to 75\% while maintaining 98\% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.
AQUILA: Communication Efficient Federated Learning with Adaptive Quantization of Lazily-Aggregated Gradients
Abstract
The widespread adoption of Federated Learning (FL), a privacy-preserving distributed learning methodology, has been impeded by the challenge of high communication overheads, typically arising from the transmission of large-scale models. Existing adaptive quantization methods, designed to mitigate these overheads, operate under the impractical assumption of uniform device participation in every training round. Additionally, these methods are limited in their adaptability due to the necessity of manual quantization level selection and often overlook biases inherent in local devices' data, thereby affecting the robustness of the global model. In response, this paper introduces AQUILA (adaptive quantization of lazily-aggregated gradients), a novel adaptive framework devised to effectively handle these issues, enhancing the efficiency and robustness of FL. AQUILA integrates a sophisticated device selection method that prioritizes the quality and usefulness of device updates. Utilizing the exact global model stored by devices, it enables a more precise device selection criterion, reduces model deviation, and limits the need for hyperparameter adjustments. Furthermore, AQUILA presents an innovative quantization criterion, optimized to improve communication efficiency while assuring model convergence. Our experiments demonstrate that AQUILA significantly decreases communication costs compared to existing methods, while maintaining comparable model performance across diverse non-homogeneous FL settings, such as Non-IID data and heterogeneous model architectures.
A Dual-space Multilevel Kernel-splitting Framework for Discrete and Continuous Convolution
Abstract
We introduce a new class of multilevel, adaptive, dual-space methods for computing fast convolutional transforms. These methods can be applied to a broad class of kernels, from the Green's functions for classical partial differential equations (PDEs) to power functions and radial basis functions such as those used in statistics and machine learning. The DMK (dual-space multilevel kernel-splitting) framework uses a hierarchy of grids, computing a smoothed interaction at the coarsest level, followed by a sequence of corrections at finer and finer scales until the problem is entirely local, at which point direct summation is applied. The main novelty of DMK is that the interaction at each scale is diagonalized by a short Fourier transform, permitting the use of separation of variables, but without requiring the FFT for its asymptotic performance. The DMK framework substantially simplifies the algorithmic structure of the fast multipole method (FMM) and unifies the FMM, Ewald summation, and multilevel summation, achieving speeds comparable to the FFT in work per gridpoint, even in a fully adaptive context. For continuous source distributions, the evaluation of local interactions is further accelerated by approximating the kernel at the finest level as a sum of Gaussians with a highly localized remainder. The Gaussian convolutions are calculated using tensor product transforms, and the remainder term is calculated using asymptotic methods. We illustrate the performance of DMK for both continuous and discrete sources with extensive numerical examples in two and three dimensions.
Online Prototype Learning for Online Continual Learning
Abstract
Online continual learning (CL) studies the problem of learning continuously from a single-pass data stream while adapting to new data and mitigating catastrophic forgetting. Recently, by storing a small subset of old data, replay-based methods have shown promising performance. Unlike previous methods that focus on sample storage or knowledge distillation against catastrophic forgetting, this paper aims to understand why the online learning models fail to generalize well from a new perspective of shortcut learning. We identify shortcut learning as the key limiting factor for online CL, where the learned features may be biased, not generalizable to new tasks, and may have an adverse impact on knowledge distillation. To tackle this issue, we present the online prototype learning (OnPro) framework for online CL. First, we propose online prototype equilibrium to learn representative features against shortcut learning and discriminative features to avoid class confusion, ultimately achieving an equilibrium status that separates all seen classes well while learning new classes. Second, with the feedback of online prototypes, we devise a novel adaptive prototypical feedback mechanism to sense the classes that are easily misclassified and then enhance their boundaries. Extensive experimental results on widely-used benchmark datasets demonstrate the superior performance of OnPro over the state-of-the-art baseline methods. Source code is available at https://github.com/weilllllls/OnPro.
Deep Image Harmonization with Learnable Augmentation
Authors: Li Niu, Junyan Cao, Wenyan Cong, Liqing Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The goal of image harmonization is adjusting the foreground appearance in a composite image to make the whole image harmonious. To construct paired training images, existing datasets adopt different ways to adjust the illumination statistics of foregrounds of real images to produce synthetic composite images. However, different datasets have considerable domain gap and the performances on small-scale datasets are limited by insufficient training data. In this work, we explore learnable augmentation to enrich the illumination diversity of small-scale datasets for better harmonization performance. In particular, our designed SYthetic COmposite Network (SycoNet) takes in a real image with foreground mask and a random vector to learn suitable color transformation, which is applied to the foreground of this real image to produce a synthetic composite image. Comprehensive experiments demonstrate the effectiveness of our proposed learnable augmentation for image harmonization. The code of SycoNet is released at https://github.com/bcmi/SycoNet-Adaptive-Image-Harmonization.
Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency
Abstract
Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to achieve good comprehensive performance. Conventionally, to design a hybrid modulation, harmonic approach or piecewise approach is used, but they suffer from time-consuming model building process and inaccuracy. Therefore, an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation is proposed. Generally, the HEPS modulation is developed in an automated fashion, which alleviates cumbersome model building process while keeping high model accuracy. In HEPS modulation, two EPS strategies are considered to realize optimal efficiency with full ZVS operation over entire operating ranges. Specifically, to build data-driven models of ZVS and efficiency performance, extreme gradient boosting (XGBoost), which is a state-of-the-art ensemble learning algorithm, is adopted. Afterwards, particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) is utilized to select the best EPS strategy and optimize modulation parameters. With 1 kW hardware experiments, the feasibility of HEPS has been verified, achieving optimal efficiency with maximum of 97.1% and full-range ZVS operation.
Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup
Authors: Yan Sun, Li Shen, Hao Sun, Liang Ding, Dacheng Tao
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Abstract
Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer. In this work, we propose a novel momentum-based algorithm via utilizing the global gradient descent and locally adaptive amended optimizer to tackle these difficulties. Specifically, we incorporate a locally amended technique to the adaptive optimizer, named Federated Local ADaptive Amended optimizer (\textit{FedLADA}), which estimates the global average offset in the previous communication round and corrects the local offset through a momentum-like term to further improve the empirical training speed and mitigate the heterogeneous over-fitting. Theoretically, we establish the convergence rate of \textit{FedLADA} with a linear speedup property on the non-convex case under the partial participation settings. Moreover, we conduct extensive experiments on the real-world dataset to demonstrate the efficacy of our proposed \textit{FedLADA}, which could greatly reduce the communication rounds and achieves higher accuracy than several baselines.
Point Annotation Probability Map: Towards Dense Object Counting by Tolerating Annotation Noise
Authors: Yuehai Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Counting objects in crowded scenes remains a challenge to computer vision. The current deep learning based approach often formulate it as a Gaussian density regression problem. Such a brute-force regression, though effective, may not consider the annotation noise properly which arises from the human annotation process and may lead to different distributions. We conjecture that it would be beneficial to consider the annotation noise in the dense object counting task. To obtain strong robustness against annotation noise, generalized Gaussian distribution (GGD) function with a tunable bandwidth and shape parameter is exploited to form the learning target point annotation probability map, PAPM. Specifically, we first present a hand-designed PAPM method (HD-PAPM), in which we design a function based on GGD to tolerate the annotation noise. For end-to-end training, the hand-designed PAPM may not be optimal for the particular network and dataset. An adaptively learned PAPM method (AL-PAPM) is proposed. To improve the robustness to annotation noise, we design an effective transport cost function based on GGD. With such transport cost constraints, a better PAPM presentation could be adaptively learned with an optimal transport framework from point annotation in an end-to-end manner. Extensive experiments show the superiority of our proposed methods.
Adaptive Bitrate Video Semantic Communication over Wireless Networks
Abstract
This paper investigates the adaptive bitrate (ABR) video semantic communication over wireless networks. In the considered model, video sensing devices must transmit video semantic information to an edge server, to facilitate ubiquitous video sensing services such as road environment monitoring at the edge server in autonomous driving scenario. However, due to the varying wireless network conditions, it is challenging to guarantee both low transmission delay and high semantic accuracy at the same time if devices continuously transmit a fixed bitrate video semantic information. To address this challenge, we develop an adaptive bitrate video semantic communication (ABRVSC) system, in which devices adaptively adjust the bitrate of video semantic information according to network conditions. Specifically, we first define the quality of experience (QoE) for video semantic communication. Subsequently, a swin transformer-based semantic codec is proposed to extract semantic information with considering the influence of QoE. Then, we propose an Actor-Critic based ABR algorithm for the semantic codec to enhance the robustness of the proposed ABRVSC scheme against network variations. Simulation results demonstrate that at low bitrates, the mean intersection over union (MIoU) of the proposed ABRVSC scheme is nearly twice that of the traditional scheme. Moreover, the proposed ABRVSC scheme, which increases the QoE in video semantic communication by 36.57%, exhibits more robustness against network variations compared to both the fixed bitrate schemes and traditional ABR schemes.
Enhancing Sample Efficiency and Uncertainty Compensation in Learning-based Model Predictive Control for Aerial Robots
Authors: Kong Yao Chee, Thales C. Silva, M. Ani Hsieh, George J. Pappas
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
The recent increase in data availability and reliability has led to a surge in the development of learning-based model predictive control (MPC) frameworks for robot systems. Despite attaining substantial performance improvements over their non-learning counterparts, many of these frameworks rely on an offline learning procedure to synthesize a dynamics model. This implies that uncertainties encountered by the robot during deployment are not accounted for in the learning process. On the other hand, learning-based MPC methods that learn dynamics models online are computationally expensive and often require a significant amount of data. To alleviate these shortcomings, we propose a novel learning-enhanced MPC framework that incorporates components from $\mathcal{L}_1$ adaptive control into learning-based MPC. This integration enables the accurate compensation of both matched and unmatched uncertainties in a sample-efficient way, enhancing the control performance during deployment. In our proposed framework, we present two variants and apply them to the control of a quadrotor system. Through simulations and physical experiments, we demonstrate that the proposed framework not only allows the synthesis of an accurate dynamics model on-the-fly, but also significantly improves the closed-loop control performance under a wide range of spatio-temporal uncertainties.
Adaptive Sliding Mode Controller and Observer for Altitude and Attitude Control of a Quadrotor
Authors: Telli Khaled, M. Boumehraz
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Abstract
In this paper an adaptive sliding mode control approach for a quadrotor stabilization and trajectory tracking is presented. The closed loop control consists of three parts; the first part is quadrotor altitude and attitude stabilization, and trajectory tracking. Second part is used for parameters estimation where we focus in mass estimation, while the third part is the full states observation. Disturbances, sensors noise and parameter uncertainties, are taken into consideration. Sliding mode control law and observer adaptation are developed based on Lyapunov stability principle. Numerical simulations show the effectiveness of the proposed control technique
Keyword: quantization
AQUILA: Communication Efficient Federated Learning with Adaptive Quantization of Lazily-Aggregated Gradients
Abstract
The widespread adoption of Federated Learning (FL), a privacy-preserving distributed learning methodology, has been impeded by the challenge of high communication overheads, typically arising from the transmission of large-scale models. Existing adaptive quantization methods, designed to mitigate these overheads, operate under the impractical assumption of uniform device participation in every training round. Additionally, these methods are limited in their adaptability due to the necessity of manual quantization level selection and often overlook biases inherent in local devices' data, thereby affecting the robustness of the global model. In response, this paper introduces AQUILA (adaptive quantization of lazily-aggregated gradients), a novel adaptive framework devised to effectively handle these issues, enhancing the efficiency and robustness of FL. AQUILA integrates a sophisticated device selection method that prioritizes the quality and usefulness of device updates. Utilizing the exact global model stored by devices, it enables a more precise device selection criterion, reduces model deviation, and limits the need for hyperparameter adjustments. Furthermore, AQUILA presents an innovative quantization criterion, optimized to improve communication efficiency while assuring model convergence. Our experiments demonstrate that AQUILA significantly decreases communication costs compared to existing methods, while maintaining comparable model performance across diverse non-homogeneous FL settings, such as Non-IID data and heterogeneous model architectures.
Asynchronous Federated Learning with Bidirectional Quantized Communications and Buffered Aggregation
Authors: Tomas Ortega, Hamid Jafarkhani
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC)
Abstract
Asynchronous Federated Learning with Buffered Aggregation (FedBuff) is a state-of-the-art algorithm known for its efficiency and high scalability. However, it has a high communication cost, which has not been examined with quantized communications. To tackle this problem, we present a new algorithm (QAFeL), with a quantization scheme that establishes a shared "hidden" state between the server and clients to avoid the error propagation caused by direct quantization. This approach allows for high precision while significantly reducing the data transmitted during client-server interactions. We provide theoretical convergence guarantees for QAFeL and corroborate our analysis with experiments on a standard benchmark.
Harnessing the Power of Sample Abundance: Theoretical Guarantees and Algorithms for Accelerated One-Bit Sensing
Authors: Arian Eamaz, Farhang Yeganegi, Deanna Needell, Mojtaba Soltanalian
Abstract
One-bit quantization with time-varying sampling thresholds (also known as random dithering) has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as compared to their conventional multi-bit counterparts. This characteristic endows one-bit signal processing frameworks with what one may refer to as sample abundance. We show that sample abundance plays a pivotal role in many signal recovery and optimization problems that are formulated as (possibly non-convex) quadratic programs with linear feasibility constraints. Of particular interest to our work are low-rank matrix recovery and compressed sensing applications that take advantage of one-bit quantization. We demonstrate that the sample abundance paradigm allows for the transformation of such problems to merely linear feasibility problems by forming large-scale overdetermined linear systems -- thus removing the need for handling costly optimization constraints and objectives. To make the proposed computational cost savings achievable, we offer enhanced randomized Kaczmarz algorithms to solve these highly overdetermined feasibility problems and provide theoretical guarantees in terms of their convergence, sample size requirements, and overall performance. Several numerical results are presented to illustrate the effectiveness of the proposed methodologies.
Keyword: efficient
Detection and Classification of Novel Attacks and Anomaly in IoT Network using Rule based Deep Learning Model
Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model
An Efficient Recommendation System in E-commerce using Passer learning optimization based on Bi-LSTM
Semi-Supervised Laplacian Learning on Stiefel Manifolds
Formally Explaining Neural Networks within Reactive Systems
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification
Attribution-Scores in Data Management and Explainable Machine Learning
Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF)
Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design
Demonstrating Autonomous 3D Path Planning on a Novel Scalable UGV-UAV Morphing Robot
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer
Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functions
Exploiting Sparsity for Localization of Large-Scale Wireless Sensor Networks
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering
Predictive Modeling through Hyper-Bayesian Optimization
Patch Space Exploration using Static Analysis Feedback
Domain Adaptation based on Human Feedback for Enhancing Generative Model Denoising Abilities
Good' and
Bad' result for unseen data, how to raise up quality of `Bad' result. Most methods use an approach based on generalization of model. However, these methods require target image for training or adapting unseen domain. In this paper, to adapting domain, we deal with non-target image for unseen domain, and improve specific failed image. To address this, we propose a method for fine-tuning inappropriate results generated in a different domain by utilizing human feedback. First, we train a generator to denoise images using only the noisy MNIST digit '0' images. The denoising generator trained on the source domain leads to unintended results when applied to target domain images. To achieve domain adaptation, we construct a noise-image denoising generated image data set and train a reward model predict human feedback. Finally, we fine-tune the generator on the different domain using the reward model with auxiliary loss function, aiming to transfer denoising capabilities to target domain. Our approach demonstrates the potential to efficiently fine-tune a generator trained on one domain using human feedback from another domain, thereby enhancing denoising abilities in different domains.Pixel to policy: DQN Encoders for within & cross-game reinforcement learning
Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation
Reconstruction Harmonic Balance Method and its Application in Solving Complex Nonlinear Dynamical Systems
VideoPro: A Visual Analytics Approach for Interactive Video Programming
Coded Modulation Schemes for Voronoi Constellations
Patch-wise Auto-Encoder for Visual Anomaly Detection
FLatten Transformer: Vision Transformer using Focused Linear Attention
Complexity evaluation of network configurations and abstractions
Context-Aware Talking-Head Video Editing
Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries
Massively Parallel Algorithms for High-Dimensional Euclidean Minimum Spanning Tree
Enhancing Sample Efficiency and Uncertainty Compensation in Learning-based Model Predictive Control for Aerial Robots
Sliding Touch-based Exploration for Modeling Unknown Object Shape with Multi-fingered Hands
Explainable Cost-Sensitive Deep Neural Networks for Brain Tumor Detection from Brain MRI Images considering Data Imbalance
Hessian-Aware Bayesian Optimization for Decision Making Systems
Arithmetic Deduction Model for High Performance Computing: A Comparative Exploration of Computational Models Paradigms
An Empirical Evaluation of AriDeM using Matrix Multiplication
Orthonormal eigenfunction expansions for sixth-order boundary value problems
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
Learning from Hypervectors: A Survey on Hypervector Encoding
Mining Reviews in Open Source Code for Developers Trail: A Process Mining Approach
Keyword: faster
Multilevel well modeling in aggregation-based nonlinear multigrid for multiphase flow in porous media
Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functions
Predictive Modeling through Hyper-Bayesian Optimization
Center Contrastive Loss for Metric Learning
Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems
MonoNext: A 3D Monocular Object Detection with ConvNext
Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems
Keyword: mobile
Crowd Safety Manager: Towards Data-Driven Active Decision Support for Planning and Control of Crowd Events
Mobile Apps for Children's Health and Wellbeing: Design Features and Future Opportunities
DPBERT: Efficient Inference for BERT based on Dynamic Planning
A Cyber-Physical Routing Protocol Exploiting Trajectory Dynamics for Mission-Oriented Flying Ad Hoc Networks
A First Look at Digital Rights Management Systems for Secure Mobile Content Delivery
Explainable Cost-Sensitive Deep Neural Networks for Brain Tumor Detection from Brain MRI Images considering Data Imbalance
Keyword: pruning
There is no result
Keyword: diffusion
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
Multilevel well modeling in aggregation-based nonlinear multigrid for multiphase flow in porous media
InFusion: Inject and Attention Fusion for Multi Concept Zero Shot Text based Video Editing
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification
Diffusion Model for Camouflaged Object Detection
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Keyword: adaptive
Unsupervised machine learning shock capturing for High-Order CFD solvers
Task-Oriented Channel Attention for Fine-Grained Few-Shot Classification
DPBERT: Efficient Inference for BERT based on Dynamic Planning
AQUILA: Communication Efficient Federated Learning with Adaptive Quantization of Lazily-Aggregated Gradients
A Dual-space Multilevel Kernel-splitting Framework for Discrete and Continuous Convolution
Online Prototype Learning for Online Continual Learning
Deep Image Harmonization with Learnable Augmentation
Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency
Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup
Point Annotation Probability Map: Towards Dense Object Counting by Tolerating Annotation Noise
Adaptive Bitrate Video Semantic Communication over Wireless Networks
Enhancing Sample Efficiency and Uncertainty Compensation in Learning-based Model Predictive Control for Aerial Robots
Adaptive Sliding Mode Controller and Observer for Altitude and Attitude Control of a Quadrotor
Keyword: quantization
AQUILA: Communication Efficient Federated Learning with Adaptive Quantization of Lazily-Aggregated Gradients
Asynchronous Federated Learning with Bidirectional Quantized Communications and Buffered Aggregation
Harnessing the Power of Sample Abundance: Theoretical Guarantees and Algorithms for Accelerated One-Bit Sensing