Abstract
Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
An ensemble of online estimation methods for one degree-of-freedom models of unmanned surface vehicles: applied theory and preliminary field results with eight vehicles
Abstract
In this paper we report an experimental evaluation of three popular methods for online system identification of unmanned surface vehicles (USVs) which were implemented as an ensemble: certifiably stable shallow recurrent neural network (RNN), adaptive identification (AID), and recursive least squares (RLS). The algorithms were deployed on eight USVs for a total of 30 hours of online estimation. During online training the loss function for the RNN was augmented to include a cost for violating a sufficient condition for the RNN to be stable in the sense of contraction stability. Additionally we described an efficient method to calculate the equilibrium points of the RNN and classify the associated stability properties about these points. We found the AID method had lowest mean absolute error in the online prediction setting, but a weighted ensemble had lower error in offline processing.
Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking
Authors: Mingzhan Yang, Guangxin Han, Bin Yan, Wenhua Zhang, Jinqing Qi, Huchuan Lu, Dong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames. Most methods accomplish the task by explicitly or implicitly leveraging strong cues (i.e., spatial and appearance information), which exhibit powerful instance-level discrimination. However, when object occlusion and clustering occur, both spatial and appearance information will become ambiguous simultaneously due to the high overlap between objects. In this paper, we demonstrate that this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues to compensate for strong cues. Along with velocity direction, we introduce the confidence state and height state as potential weak cues. With superior performance, our method still maintains Simple, Online and Real-Time (SORT) characteristics. Furthermore, our method shows strong generalization for diverse trackers and scenarios in a plug-and-play and training-free manner. Significant and consistent improvements are observed when applying our method to 5 different representative trackers. Further, by leveraging both strong and weak cues, our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack where interaction and occlusion are frequent and severe. The code and models are available at https://github.com/ymzis69/HybirdSORT.
Nearly Optimal Dynamic Set Cover: Breaking the Quadratic-in-$f$ Time Barrier
Abstract
The dynamic set cover problem has been subject to extensive research since the pioneering works of [Bhattacharya et al, 2015] and [Gupta et al, 2017]. The input is a set system $(U, S)$ on a fixed collection $S$ of sets and a dynamic universe of elements, where each element appears in a most $f$ sets and the cost of each set lies in the range $[1/C, 1]$, and the goal is to efficiently maintain an approximately-minimum set cover under insertions and deletions of elements. Most previous work considers the low-frequency regime, namely $f = O(\log n)$, and this line of work has culminated with a deterministic $(1+\epsilon)f$-approximation algorithm with amortized update time $O(\frac{f^2}{\epsilon^3} + \frac{f}{\epsilon^2}\log C)$ [Bhattacharya et al, 2021]. In the high-frequency regime of $f = \Omega(\log n)$, an $O(\log n)$-approximation algorithm with amortized update time $O(f\log n)$ was given by [Gupta et al, 2017]. Interestingly, at the intersection of the two regimes, i.e., $f = \Theta(\log n)$, the state-of-the-art results coincide: approximation $\Theta(f) = \Theta(\log n)$ with amortized update time $O(f^2) = O(f \log n) = O(\log^2 n)$. Up to this date, no previous work achieved update time of $o(f^2)$. In this paper we break the $\Omega(f^2)$ update time barrier via the following results: (1) $(1+\epsilon)f$-approximation can be maintained in $O\left(\frac{f}{\epsilon^3}\log^f + \frac{f}{\epsilon^3}\log C\right) = O_{\epsilon,C}(f \log^ f)$ expected amortized update time; our algorithm works against an adaptive adversary. (2) $(1+\epsilon)f$-approximation can be maintained deterministically in $O\left(\frac{1}{\epsilon}f\log f + \frac{f}{\epsilon^3} + \frac{f\log C}{\epsilon^2}\right) = O_{\epsilon,C}(f \log f)$ amortized update time.
Addressing Uncertainty in Imbalanced Histopathology Image Classification of HER2 Breast Cancer: An interpretable Ensemble Approach with Threshold Filtered Single Instance Evaluation (SIE)
Authors: Md Sakib Hossain Shovon, M. F. Mridha, Khan Md Hasib, Sultan Alfarhood, Mejdl Safran, Dunren Che
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Breast Cancer (BC) is among women's most lethal health concerns. Early diagnosis can alleviate the mortality rate by helping patients make efficient treatment decisions. Human Epidermal Growth Factor Receptor (HER2) has become one the most lethal subtype of BC. According to the College of American Pathologists/American Society of Clinical Oncology (CAP/ASCO), the severity level of HER2 expression can be classified between 0 and 3+ range. HER2 can be detected effectively from immunohistochemical (IHC) and, hematoxylin \& eosin (HE) images of different classes such as 0, 1+, 2+, and 3+. An ensemble approach integrated with threshold filtered single instance evaluation (SIE) technique has been proposed in this study to diagnose BC from the multi-categorical expression of HER2 subtypes. Initially, DenseNet201 and Xception have been ensembled into a single classifier as feature extractors with an effective combination of global average pooling, dropout layer, dense layer with a swish activation function, and l2 regularizer, batch normalization, etc. After that, extracted features has been processed through single instance evaluation (SIE) to determine different confidence levels and adjust decision boundary among the imbalanced classes. This study has been conducted on the BC immunohistochemical (BCI) dataset, which is classified by pathologists into four stages of HER2 BC. This proposed approach known as DenseNet201-Xception-SIE with a threshold value of 0.7 surpassed all other existing state-of-art models with an accuracy of 97.12\%, precision of 97.15\%, and recall of 97.68\% on H\&E data and, accuracy of 97.56\%, precision of 97.57\%, and recall of 98.00\% on IHC data respectively, maintaining momentous improvement. Finally, Grad-CAM and Guided Grad-CAM have been employed in this study to interpret, how TL-based model works on the histopathology dataset and make decisions from the data.
Deep Learning Approaches in Pavement Distress Identification: A Review
Authors: Sizhe Guan, Haolan Liu, Hamid R. Pourreza, Hamidreza Mahyar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents a comprehensive review of recent advancements in image processing and deep learning techniques for pavement distress detection and classification, a critical aspect in modern pavement management systems. The conventional manual inspection process conducted by human experts is gradually being superseded by automated solutions, leveraging machine learning and deep learning algorithms to enhance efficiency and accuracy. The ability of these algorithms to discern patterns and make predictions based on extensive datasets has revolutionized the domain of pavement distress identification. The paper investigates the integration of unmanned aerial vehicles (UAVs) for data collection, offering unique advantages such as aerial perspectives and efficient coverage of large areas. By capturing high-resolution images, UAVs provide valuable data that can be processed using deep learning algorithms to detect and classify various pavement distresses effectively. While the primary focus is on 2D image processing, the paper also acknowledges the challenges associated with 3D images, such as sensor limitations and computational requirements. Understanding these challenges is crucial for further advancements in the field. The findings of this review significantly contribute to the evolution of pavement distress detection, fostering the development of efficient pavement management systems. As automated approaches continue to mature, the implementation of deep learning techniques holds great promise in ensuring safer and more durable road infrastructure for the benefit of society.
Microfluidic Molecular Communication Transmitter Based on Hydrodynamic Gating
Authors: Iman Mokari Bolhassan, Ali Abdali, Murat Kuscu
Abstract
Molecular Communications (MC) is a bio-inspired paradigm for transmitting information using chemical signals, which can enable novel applications at the junction of biotechnology, nanotechnology, and information and communication technologies. However, designing efficient and reliable MC systems poses significant challenges due to the complex nature of the physical channel and the limitations of the micro/nanoscale transmitter and receiver devices. In this paper, we propose a practical microfluidic transmitter architecture for MC based on hydrodynamic gating, a widely utilized technique for generating chemical waveforms in microfluidic channels with high spatiotemporal resolution. We develop an approximate analytical model that can capture the fundamental characteristics of the generated molecular pulses, such as pulse width, pulse amplitude, and pulse delay, as functions of main system parameters, such as flow velocity and gating duration. We validate the accuracy of our model by comparing it with finite element simulations using COMSOL Multiphysics under various system settings. Our analytical model can enable the optimization of microfluidic transmitters for MC applications in terms of minimizing intersymbol interference and maximizing data transmission rate.
Factor Graph Neural Networks
Authors: Zhen Zhang, Mohammed Haroon Dupty, Fan Wu, Javen Qinfeng Shi, Wee Sun Lee
Abstract
In recent years, we have witnessed a surge of Graph Neural Networks (GNNs), most of which can learn powerful representations in an end-to-end fashion with great success in many real-world applications. They have resemblance to Probabilistic Graphical Models (PGMs), but break free from some limitations of PGMs. By aiming to provide expressive methods for representation learning instead of computing marginals or most likely configurations, GNNs provide flexibility in the choice of information flowing rules while maintaining good performance. Despite their success and inspirations, they lack efficient ways to represent and learn higher-order relations among variables/nodes. More expressive higher-order GNNs which operate on k-tuples of nodes need increased computational resources in order to process higher-order tensors. We propose Factor Graph Neural Networks (FGNNs) to effectively capture higher-order relations for inference and learning. To do so, we first derive an efficient approximate Sum-Product loopy belief propagation inference algorithm for discrete higher-order PGMs. We then neuralize the novel message passing scheme into a Factor Graph Neural Network (FGNN) module by allowing richer representations of the message update rules; this facilitates both efficient inference and powerful end-to-end learning. We further show that with a suitable choice of message aggregation operators, our FGNN is also able to represent Max-Product belief propagation, providing a single family of architecture that can represent both Max and Sum-Product loopy belief propagation. Our extensive experimental evaluation on synthetic as well as real datasets demonstrates the potential of the proposed model.
Tango: rethinking quantization for graph neural network training on GPUs
Authors: Shiyang Chen, Da Zheng, Caiwen Ding, Chengying Huan, Yuede Ji, Hang Liu
Abstract
Graph Neural Networks (GNNs) are becoming increasingly popular due to their superior performance in critical graph-related tasks. While quantization is widely used to accelerate GNN computation, quantized training faces unprecedented challenges. Current quantized GNN training systems often have longer training times than their full-precision counterparts for two reasons: (i) addressing the accuracy challenge leads to excessive overhead, and (ii) the optimization potential exposed by quantization is not adequately leveraged. This paper introduces Tango which re-thinks quantization challenges and opportunities for graph neural network training on GPUs with three contributions: Firstly, we introduce efficient rules to maintain accuracy during quantized GNN training. Secondly, we design and implement quantization-aware primitives and inter-primitive optimizations that can speed up GNN training. Finally, we integrate Tango with the popular Deep Graph Library (DGL) system and demonstrate its superior performance over state-of-the-art approaches on various GNN models and datasets.
A Mini Immersed Finite Element Method for Two-Phase Stokes Problems on Cartesian Meshes
Abstract
This paper presents a mini immersed finite element (IFE) method for solving two- and three-dimensional two-phase Stokes problems on Cartesian meshes. The IFE space is constructed from the conventional mini element with shape functions modified on interface elements according to interface jump conditions, while keeping the degrees of freedom unchanged. Both discontinuous viscosity coefficients and surface forces are considered in the construction. The interface is approximated via discrete level set functions and explicit formulas of IFE basis functions and correction functions are derived, which make the IFE method easy to implement. The optimal approximation capabilities of the IFE space and the inf-sup stability and the optimal a priori error estimate of the IFE method are derived rigorously with constants independent of the mesh size and how the interface cuts the mesh. It is also proved that the condition number has the usual bound independent of the interface. Numerical experiments are provided to confirm the theoretical results.
WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond
Abstract
Underwater images suffer from light refraction and absorption, which impairs visibility and interferes the subsequent applications. Existing underwater image enhancement methods mainly focus on image quality improvement, ignoring the effect on practice. To balance the visual quality and application, we propose a heuristic normalizing flow for detection-driven underwater image enhancement, dubbed WaterFlow. Specifically, we first develop an invertible mapping to achieve the translation between the degraded image and its clear counterpart. Considering the differentiability and interpretability, we incorporate the heuristic prior into the data-driven mapping procedure, where the ambient light and medium transmission coefficient benefit credible generation. Furthermore, we introduce a detection perception module to transmit the implicit semantic guidance into the enhancement procedure, where the enhanced images hold more detection-favorable features and are able to promote the detection performance. Extensive experiments prove the superiority of our WaterFlow, against state-of-the-art methods quantitatively and qualitatively.
IIDS: Design of Intelligent Intrusion Detection System for Internet-of-Things Applications
Authors: KG Raghavendra Narayan, Srijanee Mookherji, Vanga Odelu, Rajendra Prasath, Anish Chand Turlapaty, Ashok Kumar Das
Abstract
With rapid technological growth, security attacks are drastically increasing. In many crucial Internet-of-Things (IoT) applications such as healthcare and defense, the early detection of security attacks plays a significant role in protecting huge resources. An intrusion detection system is used to address this problem. The signature-based approaches fail to detect zero-day attacks. So anomaly-based detection particularly AI tools, are becoming popular. In addition, the imbalanced dataset leads to biased results. In Machine Learning (ML) models, F1 score is an important metric to measure the accuracy of class-level correct predictions. The model may fail to detect the target samples if the F1 is considerably low. It will lead to unrecoverable consequences in sensitive applications such as healthcare and defense. So, any improvement in the F1 score has significant impact on the resource protection. In this paper, we present a framework for ML-based intrusion detection system for an imbalanced dataset. In this study, the most recent dataset, namely CICIoT2023 is considered. The random forest (RF) algorithm is used in the proposed framework. The proposed approach improves 3.72%, 3.75% and 4.69% in precision, recall and F1 score, respectively, with the existing method. Additionally, for unsaturated classes (i.e., classes with F1 score < 0.99), F1 score improved significantly by 7.9%. As a result, the proposed approach is more suitable for IoT security applications for efficient detection of intrusion and is useful in further studies.
Reward Shaping for Building Trustworthy Robots in Sequential Human-Robot Interaction
Abstract
Trust-aware human-robot interaction (HRI) has received increasing research attention, as trust has been shown to be a crucial factor for effective HRI. Research in trust-aware HRI discovered a dilemma -- maximizing task rewards often leads to decreased human trust, while maximizing human trust would compromise task performance. In this work, we address this dilemma by formulating the HRI process as a two-player Markov game and utilizing the reward-shaping technique to improve human trust while limiting performance loss. Specifically, we show that when the shaping reward is potential-based, the performance loss can be bounded by the potential functions evaluated at the final states of the Markov game. We apply the proposed framework to the experience-based trust model, resulting in a linear program that can be efficiently solved and deployed in real-world applications. We evaluate the proposed framework in a simulation scenario where a human-robot team performs a search-and-rescue mission. The results demonstrate that the proposed framework successfully modifies the robot's optimal policy, enabling it to increase human trust at a minimal task performance cost.
Training-Free Instance Segmentation from Semantic Image Segmentation Masks
Abstract
In recent years, the development of instance segmentation has garnered significant attention in a wide range of applications. However, the training of a fully-supervised instance segmentation model requires costly both instance-level and pixel-level annotations. In contrast, weakly-supervised instance segmentation methods (i.e., with image-level class labels or point labels) struggle to satisfy the accuracy and recall requirements of practical scenarios. In this paper, we propose a novel paradigm for instance segmentation called training-free instance segmentation (TFISeg), which achieves instance segmentation results from image masks predicted using off-the-shelf semantic segmentation models. TFISeg does not require training a semantic or/and instance segmentation model and avoids the need for instance-level image annotations. Therefore, it is highly efficient. Specifically, we first obtain a semantic segmentation mask of the input image via a trained semantic segmentation model. Then, we calculate a displacement field vector for each pixel based on the segmentation mask, which can indicate representations belonging to the same class but different instances, i.e., obtaining the instance-level object information. Finally, instance segmentation results are obtained after being refined by a learnable category-agnostic object boundary branch. Extensive experimental results on two challenging datasets and representative semantic segmentation baselines (including CNNs and Transformers) demonstrate that TFISeg can achieve competitive results compared to the state-of-the-art fully-supervised instance segmentation methods without the need for additional human resources or increased computational costs. The code is available at: TFISeg
The evolution of Complexity co-occurring keywords: bibliometric analysis and network approach
Authors: Tanya Araújo, Alexandre Abreu, Francisco Louçã
Subjects: Digital Libraries (cs.DL); Social and Information Networks (cs.SI)
Abstract
Bibliometric studies based on the Web of Science (WOS) database have become an increasingly popular method for analysing the structure of scientific research. So do network approaches, which, based on empirical data, make it possible to characterize the emergence of topological structures over time and across multiple research areas. Our paper is a contribution to interweaving these two lines of research that have progressed in separate ways but whose common applications have been increasingly more frequent. Among other attributes, Author Keywords and Keywords Plus are used as units of analysis that enable us to identify changes in the topics of interest and related bibliography. By considering the co-occurrence of those keywords with the Author Keyword \texttt{Complexity}, we provide an overview of the evolution of studies on Complexity Sciences, and compare this evolution in seven scientific fields. The results show a considerable increase in the number of papers dealing with complexity, as well as a general tendency across different disciplines for this literature to move from a more foundational, general and conceptual to a more applied and specific set of co-occurring keywords. Moreover, we provide evidence of changing topologies of networks of co-occurring keywords, which are described through the computation of some topological coefficients. In so doing, we emphasize the distinguishing structures that characterize the networks of the seven research areas.
Push to know! -- Visuo-Tactile based Active Object Parameter Inference with Dual Differentiable Filtering
Abstract
For robotic systems to interact with objects in dynamic environments, it is essential to perceive the physical properties of the objects such as shape, friction coefficient, mass, center of mass, and inertia. This not only eases selecting manipulation action but also ensures the task is performed as desired. However, estimating the physical properties of especially novel objects is a challenging problem, using either vision or tactile sensing. In this work, we propose a novel framework to estimate key object parameters using non-prehensile manipulation using vision and tactile sensing. Our proposed active dual differentiable filtering (ADDF) approach as part of our framework learns the object-robot interaction during non-prehensile object push to infer the object's parameters. Our proposed method enables the robotic system to employ vision and tactile information to interactively explore a novel object via non-prehensile object push. The novel proposed N-step active formulation within the differentiable filtering facilitates efficient learning of the object-robot interaction model and during inference by selecting the next best exploratory push actions (where to push? and how to push?). We extensively evaluated our framework in simulation and real-robotic scenarios, yielding superior performance to the state-of-the-art baseline.
Dual-Matrix Domain-Wall: A Novel Technique for Generating Permutations by QUBO and Ising Models with Quadratic Sizes
Abstract
The Ising model is defined by an objective function using a quadratic formula of qubit variables. The problem of an Ising model aims to determine the qubit values of the variables that minimize the objective function, and many optimization problems can be reduced to this problem. In this paper, we focus on optimization problems related to permutations, where the goal is to find the optimal permutation out of the $n!$ possible permutations of $n$ elements. To represent these problems as Ising models, a commonly employed approach is to use a kernel that utilizes one-hot encoding to find any one of the $n!$ permutations as the optimal solution. However, this kernel contains a large number of quadratic terms and high absolute coefficient values. The main contribution of this paper is the introduction of a novel permutation encoding technique called dual-matrix domain-wall, which significantly reduces the number of quadratic terms and the maximum absolute coefficient values in the kernel. Surprisingly, our dual-matrix domain-wall encoding reduces the quadratic term count and maximum absolute coefficient values from $n^3-n^2$ and $2n-4$ to $6n^2-12n+4$ and $2$, respectively. We also demonstrate the applicability of our encoding technique to partial permutations and Quadratic Unconstrained Binary Optimization (QUBO) models. Furthermore, we discuss a family of permutation problems that can be efficiently implemented using Ising/QUBO models with our dual-matrix domain-wall encoding.
WCCNet: Wavelet-integrated CNN with Crossmodal Rearranging Fusion for Fast Multispectral Pedestrian Detection
Authors: Xingjian Wang, Li Chai, Jiming Chen, Zhiguo Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multispectral pedestrian detection achieves better visibility in challenging conditions and thus has a broad application in various tasks, for which both the accuracy and computational cost are of paramount importance. Most existing approaches treat RGB and infrared modalities equally, typically adopting two symmetrical CNN backbones for multimodal feature extraction, which ignores the substantial differences between modalities and brings great difficulty for the reduction of the computational cost as well as effective crossmodal fusion. In this work, we propose a novel and efficient framework named WCCNet that is able to differentially extract rich features of different spectra with lower computational complexity and semantically rearranges these features for effective crossmodal fusion. Specifically, the discrete wavelet transform (DWT) allowing fast inference and training speed is embedded to construct a dual-stream backbone for efficient feature extraction. The DWT layers of WCCNet extract frequency components for infrared modality, while the CNN layers extract spatial-domain features for RGB modality. This methodology not only significantly reduces the computational complexity, but also improves the extraction of infrared features to facilitate the subsequent crossmodal fusion. Based on the well extracted features, we elaborately design the crossmodal rearranging fusion module (CMRF), which can mitigate spatial misalignment and merge semantically complementary features of spatially-related local regions to amplify the crossmodal complementary information. We conduct comprehensive evaluations on KAIST and FLIR benchmarks, in which WCCNet outperforms state-of-the-art methods with considerable computational efficiency and competitive accuracy. We also perform the ablation study and analyze thoroughly the impact of different components on the performance of WCCNet.
Game-theoretical approach to decentralized multi-drone conflict resolution and emergent traffic flow operations
Authors: Serge Hoogendoorn, Victor Knoop, Hani Mahmassani, Sascha Hoogendoorn-Lanser
Subjects: Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)
Abstract
This paper introduces decentralized control concepts for drones using differential game theory. The approach optimizes the behavior of an ego drone, assuming the anticipated behavior of the opponent drones using a receding horizon approach. For each control instant, the scheme computes the Nash equilibrium control signal which is applied for the control period. This results in a multi-drone conflict resolution scheme that is applied to all drones considered. The paper discusses the approach and presents the numerical algorithm, showing several examples that illustrate the performance of the model. We examine at the behavior of the ego drone, and the resulting collective drone flow operations. The latter shows that while the approach aims to optimize the operation cost of the ego drone, the experiments provide evidence that resulting flow operations are very efficient due to the self-organization of various flow patterns. The presented work contributes to the state of the art in providing a generic approach to multi-drone conflict resolution with good macroscopic flow performance characteristics. The approach enables relatively straightforward inclusion of error due to sensing and communication. The approach also allows for including different risk levels (e.g., for malfunctioning of sensor and communication technology), priority rules, regulations, and higher-level control signals (e.g., routing, dynamic speed limits).
Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search
Abstract
Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.
UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
Abstract
Change detection (CD) by comparing two bi-temporal images is a crucial task in remote sensing. With the advantages of requiring no cumbersome labeled change information, unsupervised CD has attracted extensive attention in the community. However, existing unsupervised CD approaches rarely consider the seasonal and style differences incurred by the illumination and atmospheric conditions in multi-temporal images. To this end, we propose a change detection with domain shift setting for remote sensing images. Furthermore, we present a novel unsupervised CD method using a light-weight transformer, called UCDFormer. Specifically, a transformer-driven image translation composed of a light-weight transformer and a domain-specific affinity weight is first proposed to mitigate domain shift between two images with real-time efficiency. After image translation, we can generate the difference map between the translated before-event image and the original after-event image. Then, a novel reliable pixel extraction module is proposed to select significantly changed/unchanged pixel positions by fusing the pseudo change maps of fuzzy c-means clustering and adaptive threshold. Finally, a binary change map is obtained based on these selected pixel pairs and a binary classifier. Experimental results on different unsupervised CD tasks with seasonal and style changes demonstrate the effectiveness of the proposed UCDFormer. For example, compared with several other related methods, UCDFormer improves performance on the Kappa coefficient by more than 12\%. In addition, UCDFormer achieves excellent performance for earthquake-induced landslide detection when considering large-scale applications. The code is available at \url{https://github.com/zhu-xlab/UCDFormer}
Virtual Reality Based Robot Teleoperation via Human-Scene Interaction
Abstract
Robot teleoperation gains great success in various situations, including chemical pollution rescue, disaster relief, and long-distance manipulation. In this article, we propose a virtual reality (VR) based robot teleoperation system to achieve more efficient and natural interaction with humans in different scenes. A user-friendly VR interface is designed to help users interact with a desktop scene using their hands efficiently and intuitively. To improve user experience and reduce workload, we simulate the process in the physics engine to help build a preview of the scene after manipulation in the virtual scene before execution. We conduct experiments with different users and compare our system with a direct control method across several teleoperation tasks. The user study demonstrates that the proposed system enables users to perform operations more instinctively with a lighter mental workload. Users can perform pick-and-place and object-stacking tasks in a considerably short time, even for beginners. Our code is available at https://github.com/lingxiaomeng/VR_Teleoperation_Gen3.
Abstract
Off-policy learning enables a reinforcement learning (RL) agent to reason counterfactually about policies that are not executed and is one of the most important ideas in RL. It, however, can lead to instability when combined with function approximation and bootstrapping, two arguably indispensable ingredients for large-scale reinforcement learning. This is the notorious deadly triad. Gradient Temporal Difference (GTD) is one powerful tool to solve the deadly triad. Its success results from solving a doubling sampling issue indirectly with weight duplication or Fenchel duality. In this paper, we instead propose a direct method to solve the double sampling issue by simply using two samples in a Markovian data stream with an increasing gap. The resulting algorithm is as computationally efficient as GTD but gets rid of GTD's extra weights. The only price we pay is a logarithmically increasing memory as time progresses. We provide both asymptotic and finite sample analysis, where the convergence rate is on-par with the canonical on-policy temporal difference learning. Key to our analysis is a novel refined discretization of limiting ODEs.
Generative Noisy-Label Learning by Implicit Dicriminative Approximation with Partial Label Prior
Abstract
The learning with noisy labels has been addressed with both discriminative and generative models. Although discriminative models have dominated the field due to their simpler modeling and more efficient computational training processes, generative models offer a more effective means of disentangling clean and noisy labels and improving the estimation of the label transition matrix. However, generative approaches maximize the joint likelihood of noisy labels and data using a complex formulation that only indirectly optimizes the model of interest associating data and clean labels. Additionally, these approaches rely on generative models that are challenging to train and tend to use uninformative clean label priors. In this paper, we propose a new generative noisy-label learning approach that addresses these three issues. First, we propose a new model optimisation that directly associates data and clean labels. Second, the generative model is implicitly estimated using a discriminative model, eliminating the inefficient training of a generative model. Third, we propose a new informative label prior inspired by partial label learning as supervision signal for noisy label learning. Extensive experiments on several noisy-label benchmarks demonstrate that our generative model provides state-of-the-art results while maintaining a similar computational complexity as discriminative models.
A Real-Time Robust Ecological-Adaptive Cruise Control Strategy for Battery Electric Vehicles
Authors: Sheng Yua, Xiao Pana, Anastasis Georgioub, Boli Chenc, Imad M. Jaimoukhaa, Simos A. Evangeloua
Abstract
This work addresses the ecological-adaptive cruise control problem for connected electric vehicles by a computationally efficient and robust control strategy. The problem is formulated in the space-domain with a realistic description of the nonlinear electric powertrain model and motion dynamics to yield a convex optimal control problem (OCP). The OCP is approached by a robust model predictive control (RMPC) method, which handles various uncertainties due to the modelling mismatch and inaccurate information of the leading vehicle. The RMPC problem is solved by semi-definite programming relaxation and single linear matrix inequality (sLMI) techniques for further enhanced computational efficiency. The performance of the proposed real-time robust ecological-adaptive cruise control (REACC) method is evaluated by utilising an urban driving cycle experimentally collected on a real-world route in London UK with practical disturbances including modelling mismatches on air-drag coefficients, tyre-rolling resistance coefficients, and road slope angles. Its robustness is verified through the comparison with a nominal MPC which is shown to result in speed limit constraint violations. The energy economy of the proposed method outperforms a state-of-the-art time-domain RMPC scheme, as a more precisely fitted convex powertrain model can be integrated into the space-domain scheme. The additional comparison with a traditional constant distance following strategy (CDFS) further verifies the effectiveness of the proposed REACC. Finally, it is verified that the REACC can be potentially implemented in real-time owing to the sLMI and resulting convex algorithm.
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
Authors: Kaibin Tian, Ruixiang Zhao, Hu Hu, Runquan Xie, Fengzong Lian, Zhanhui Kang, Xirong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute video-text similarity by fine-grained cross-modal feature interaction and matching, putting their scalability for large-scale T2VR into doubt. For efficient T2VR, we propose TeachCLIP with multi-grained teaching to let a CLIP4Clip based student network learn from more advanced yet computationally heavy models such as X-CLIP, TS2-Net and X-Pool . To improve the student's learning capability, we add an Attentional frame-Feature Aggregation (AFA) block, which by design adds no extra storage/computation overhead at the retrieval stage. While attentive weights produced by AFA are commonly used for combining frame-level features, we propose a novel use of the weights to let them imitate frame-text relevance estimated by the teacher network. As such, AFA provides a fine-grained learning (teaching) channel for the student (teacher). Extensive experiments on multiple public datasets justify the viability of the proposed method.
Current Studies and Applications of Krill Herd and Gravitational Search Algorithms in Healthcare
Authors: Rebwar Khalid Hamad, Tarik A. Rashid
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Nature-Inspired Computing or NIC for short is a relatively young field that tries to discover fresh methods of computing by researching how natural phenomena function to find solutions to complicated issues in many contexts. As a consequence of this, ground-breaking research has been conducted in a variety of domains, including synthetic immune functions, neural networks, the intelligence of swarm, as well as computing of evolutionary. In the domains of biology, physics, engineering, economics, and management, NIC techniques are used. In real-world classification, optimization, forecasting, and clustering, as well as engineering and science issues, meta-heuristics algorithms are successful, efficient, and resilient. There are two active NIC patterns: the gravitational search algorithm and the Krill herd algorithm. The study on using the Krill Herd Algorithm (KH) and the Gravitational Search Algorithm (GSA) in medicine and healthcare is given a worldwide and historical review in this publication. Comprehensive surveys have been conducted on some other nature-inspired algorithms, including KH and GSA. The various versions of the KH and GSA algorithms and their applications in healthcare are thoroughly reviewed in the present article. Nonetheless, no survey research on KH and GSA in the healthcare field has been undertaken. As a result, this work conducts a thorough review of KH and GSA to assist researchers in using them in diverse domains or hybridizing them with other popular algorithms. It also provides an in-depth examination of the KH and GSA in terms of application, modification, and hybridization. It is important to note that the goal of the study is to offer a viewpoint on GSA with KH, particularly for academics interested in investigating the capabilities and performance of the algorithm in the healthcare and medical domains.
Delegated Time-Lock Puzzle
Authors: Aydin Abadi, Dan Ristea, Steven J. Murdoch
Abstract
Time-Lock Puzzles (TLPs) are cryptographic protocols that enable a client to lock a message in such a way that a server can only unlock it after a specific time period. However, existing TLPs have certain limitations: (i) they assume that both the client and server always possess sufficient computational resources and (ii) they solely focus on the lower time bound for finding a solution, disregarding the upper bound that guarantees a regular server can find a solution within a certain time frame. Additionally, existing TLPs designed to handle multiple puzzles either (a) entail high verification costs or (b) lack generality, requiring identical time intervals between consecutive solutions. To address these limitations, this paper introduces, for the first time, the concept of a "Delegated Time-Lock Puzzle" and presents a protocol called "Efficient Delegated Time-Lock Puzzle" (ED-TLP) that realises this concept. ED-TLP allows the client and server to delegate their resource-demanding tasks to third-party helpers. It facilitates real-time verification of solution correctness and efficiently handles multiple puzzles with varying time intervals. ED-TLP ensures the delivery of solutions within predefined time limits by incorporating both an upper bound and a fair payment algorithm. We have implemented ED-TLP and conducted a comprehensive analysis of its overheads, demonstrating the efficiency of the construction.
Straggler Mitigation and Latency Optimization in Blockchain-based Hierarchical Federated Learning
Abstract
Cloud-edge-device hierarchical federated learning (HFL) has been recently proposed to achieve communication-efficient and privacy-preserving distributed learning. However, there exist several critical challenges, such as the single point of failure and potential stragglers in both edge servers and local devices. To resolve these issues, we propose a decentralized and straggler-tolerant blockchain-based HFL (BHFL) framework. Specifically, a Raft-based consortium blockchain is deployed on edge servers to provide a distributed and trusted computing environment for global model aggregation in BHFL. To mitigate the influence of stragglers on learning, we propose a novel aggregation method, HieAvg, which utilizes the historical weights of stragglers to estimate the missing submissions. Furthermore, we optimize the overall latency of BHFL by jointly considering the constraints of global model convergence and blockchain consensus delay. Theoretical analysis and experimental evaluation show that our proposed BHFL based on HieAvg can converge in the presence of stragglers, which performs better than the traditional methods even when the loss function is non-convex and the data on local devices are non-independent and identically distributed (non-IID).
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Authors: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Attila Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden, Rory Pilgrim, Krish Eswaran, Andrew Sellergren
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
Keyword: faster
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks
Authors: Jun Guo, Aishan Liu, Xingyu Zheng, Siyuan Liang, Yisong Xiao, Yichao Wu, Xianglong Liu
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
Despite the broad application of Machine Learning models as a Service (MLaaS), they are vulnerable to model stealing attacks. These attacks can replicate the model functionality by using the black-box query process without any prior knowledge of the target victim model. Existing stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. However, these defenses are now suffering problems of high inference computational overheads and unfavorable trade-offs between benign accuracy and stealing robustness, which challenges the feasibility of deployed models in practice. To address the problems, this paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. Instead of deploying auxiliary defense modules that introduce redundant inference time, InI directly trains a defensive model by isolating the adversary's training gradient from the expected gradient, which can effectively reduce the inference computational cost. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries, which can induce the adversary to extract little useful knowledge from victim models with minimal impact on the benign performance. Extensive experiments on several visual classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x faster) of our InI over other state-of-the-art methods. Our codes can be found in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.
Boundary integrated neural networks (BINNs) for 2D elastostatic and piezoelectric problems: Theory and MATLAB code
Abstract
In this paper, we make the first attempt to apply the boundary integrated neural networks (BINNs) for the numerical solution of two-dimensional (2D) elastostatic and piezoelectric problems. BINNs combine artificial neural networks with the well-established boundary integral equations (BIEs) to effectively solve partial differential equations (PDEs). The BIEs are utilized to map all the unknowns onto the boundary, after which these unknowns are approximated using artificial neural networks and resolved via a training process. In contrast to traditional neural network-based methods, the current BINNs offer several distinct advantages. First, by embedding BIEs into the learning procedure, BINNs only need to discretize the boundary of the solution domain, which can lead to a faster and more stable learning process (only the boundary conditions need to be fitted during the training). Second, the differential operator with respect to the PDEs is substituted by an integral operator, which effectively eliminates the need for additional differentiation of the neural networks (high-order derivatives of neural networks may lead to instability in learning). Third, the loss function of the BINNs only contains the residuals of the BIEs, as all the boundary conditions have been inherently incorporated within the formulation. Therefore, there is no necessity for employing any weighing functions, which are commonly used in traditional methods to balance the gradients among different objective functions. Moreover, BINNs possess the ability to tackle PDEs in unbounded domains since the integral representation remains valid for both bounded and unbounded domains. Extensive numerical experiments show that BINNs are much easier to train and usually give more accurate learning solutions as compared to traditional neural network-based methods.
VAPI: Vectorization of Algorithm for Performance Improvement
Authors: Mahmood Yashar, Tarik A. Rashid
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
This study presents the vectorization of metaheuristic algorithms as the first stage of vectorized optimization implementation. Vectorization is a technique for converting an algorithm, which operates on a single value at a time to one that operates on a collection of values at a time to execute rapidly. The vectorization technique also operates by replacing multiple iterations into a single operation, which improves the algorithm's performance in speed and makes the algorithm simpler and easier to be implemented. It is important to optimize the algorithm by implementing the vectorization technique, which improves the program's performance, which requires less time and can run long-running test functions faster, also execute test functions that cannot be implemented in non-vectorized algorithms and reduces iterations and time complexity. Converting to vectorization to operate several values at once and enhance algorithms' speed and efficiency is a solution for long running times and complicated algorithms. The objective of this study is to use the vectorization technique on one of the metaheuristic algorithms and compare the results of the vectorized algorithm with the algorithm which is non-vectorized.
BCDDO: Binary Child Drawing Development Optimization
Authors: Abubakr S. Issa, Yossra H. Ali, Tarik A. Rashid
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
This study presents the vectorization of metaheuristic algorithms as the first stage of vectorized optimization implementation. Vectorization is a technique for converting an algorithm, which operates on a single value at a time to one that operates on a collection of values at a time to execute rapidly. The vectorization technique also operates by replacing multiple iterations into a single operation, which improves the algorithm's performance in speed and makes the algorithm simpler and easier to be implemented. It is important to optimize the algorithm by implementing the vectorization technique, which improves the program's performance, which requires less time and can run long-running test functions faster, also execute test functions that cannot be implemented in non-vectorized algorithms and reduces iterations and time complexity. Converting to vectorization to operate several values at once and enhance algorithms' speed and efficiency is a solution for long running times and complicated algorithms. The objective of this study is to use the vectorization technique on one of the metaheuristic algorithms and compare the results of the vectorized algorithm with the algorithm which is non-vectorized.
BRNES: Enabling Security and Privacy-aware Experience Sharing in Multiagent Robotic and Autonomous Systems
Abstract
Although experience sharing (ES) accelerates multiagent reinforcement learning (MARL) in an advisor-advisee framework, attempts to apply ES to decentralized multiagent systems have so far relied on trusted environments and overlooked the possibility of adversarial manipulation and inference. Nevertheless, in a real-world setting, some Byzantine attackers, disguised as advisors, may provide false advice to the advisee and catastrophically degrade the overall learning performance. Also, an inference attacker, disguised as an advisee, may conduct several queries to infer the advisors' private information and make the entire ES process questionable in terms of privacy leakage. To address and tackle these issues, we propose a novel MARL framework (BRNES) that heuristically selects a dynamic neighbor zone for each advisee at each learning step and adopts a weighted experience aggregation technique to reduce Byzantine attack impact. Furthermore, to keep the agent's private information safe from adversarial inference attacks, we leverage the local differential privacy (LDP)-induced noise during the ES process. Our experiments show that our framework outperforms the state-of-the-art in terms of the steps to goal, obtained reward, and time to goal metrics. Particularly, our evaluation shows that the proposed framework is 8.32x faster than the current non-private frameworks and 1.41x faster than the private frameworks in an adversarial setting.
Keyword: mobile
A Model Predictive Path Integral Method for Fast, Proactive, and Uncertainty-Aware UAV Planning in Cluttered Environments
Authors: Jacob Higgins, Nicholas Mohammad, Nicola Bezzo
Abstract
Current motion planning approaches for autonomous mobile robots often assume that the low level controller of the system is able to track the planned motion with very high accuracy. In practice, however, tracking error can be affected by many factors, and could lead to potential collisions when the robot must traverse a cluttered environment. To address this problem, this paper proposes a novel receding-horizon motion planning approach based on Model Predictive Path Integral (MPPI) control theory -- a flexible sampling-based control technique that requires minimal assumptions on vehicle dynamics and cost functions. This flexibility is leveraged to propose a motion planning framework that also considers a data-informed risk function. Using the MPPI algorithm as a motion planner also reduces the number of samples required by the algorithm, relaxing the hardware requirements for implementation. The proposed approach is validated through trajectory generation for a quadrotor unmanned aerial vehicle (UAV), where fast motion increases trajectory tracking error and can lead to collisions with nearby obstacles. Simulations and hardware experiments demonstrate that the MPPI motion planner proactively adapts to the obstacles that the UAV must negotiate, slowing down when near obstacles and moving quickly when away from obstacles, resulting in a complete reduction of collisions while still producing lively motion.
A Decision Tree-based Monitoring and Recovery Framework for Autonomous Robots with Decision Uncertainties
Abstract
Autonomous mobile robots (AMR) operating in the real world often need to make critical decisions that directly impact their own safety and the safety of their surroundings. Learning-based approaches for decision making have gained popularity in recent years, since decisions can be made very quickly and with reasonable levels of accuracy for many applications. These approaches, however, typically return only one decision, and if the learner is poorly trained or observations are noisy, the decision may be incorrect. This problem is further exacerbated when the robot is making decisions about its own failures, such as faulty actuators or sensors and external disturbances, when a wrong decision can immediately cause damage to the robot. In this paper, we consider this very case study: a robot dealing with such failures must quickly assess uncertainties and make safe decisions. We propose an uncertainty aware learning-based failure detection and recovery approach, in which we leverage Decision Tree theory along with Model Predictive Control to detect and explain which failure is compromising the system, assess uncertainties associated with the failure, and lastly, find and validate corrective controls to recover the system. Our approach is validated with simulations and real experiments on a faulty unmanned ground vehicle (UGV) navigation case study, demonstrating recovery to safety under uncertainties.
Keyword: pruning
Understanding Activation Patterns in Artificial Neural Networks by Exploring Stochastic Processes
Authors: Stephan Johann Lehmler, Muhammad Saif-ur-Rehman, Tobias Glasmachers, Ioannis Iossifidis
Abstract
To gain a deeper understanding of the behavior and learning dynamics of (deep) artificial neural networks, it is valuable to employ mathematical abstractions and models. These tools provide a simplified perspective on network performance and facilitate systematic investigations through simulations. In this paper, we propose utilizing the framework of stochastic processes, which has been underutilized thus far. Our approach models activation patterns of thresholded nodes in (deep) artificial neural networks as stochastic processes. We focus solely on activation frequency, leveraging neuroscience techniques used for real neuron spike trains. During a classification task, we extract spiking activity and use an arrival process following the Poisson distribution. We examine observed data from various artificial neural networks in image recognition tasks, fitting the proposed model's assumptions. Through this, we derive parameters describing activation patterns in each network. Our analysis covers randomly initialized, generalizing, and memorizing networks, revealing consistent differences across architectures and training sets. Calculating Mean Firing Rate, Mean Fano Factor, and Variances, we find stable indicators of memorization during learning, providing valuable insights into network behavior. The proposed model shows promise in describing activation patterns and could serve as a general framework for future investigations. It has potential applications in theoretical simulations, pruning, and transfer learning.
Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation
Authors: Quan Tang, Bowen Zhang, Jiajun Liu, Fagiu Liu, Yifan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Vision transformers have achieved leading performance on various visual tasks yet still suffer from high computational complexity. The situation deteriorates in dense prediction tasks like semantic segmentation, as high-resolution inputs and outputs usually imply more tokens involved in computations. Directly removing the less attentive tokens has been discussed for the image classification task but can not be extended to semantic segmentation since a dense prediction is required for every patch. To this end, this work introduces a Dynamic Token Pruning (DToP) method based on the early exit of tokens for semantic segmentation. Motivated by the coarse-to-fine segmentation process by humans, we naturally split the widely adopted auxiliary-loss-based network architecture into several stages, where each auxiliary block grades every token's difficulty level. We can finalize the prediction of easy tokens in advance without completing the entire forward pass. Moreover, we keep $k$ highest confidence tokens for each semantic category to uphold the representative context information. Thus, computational complexity will change with the difficulty of the input, akin to the way humans do segmentation. Experiments suggest that the proposed DToP architecture reduces on average $20\% - 35\%$ of computational cost for current semantic segmentation methods based on plain vision transformers without accuracy degradation.
Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
Authors: Yongkang He, Mingjin Chen, Zhijing Yang, Yongyi Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.
Keyword: diffusion
The Bias Amplification Paradox in Text-to-Image Generation
Abstract
Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We find that the model appears to amplify gender-occupation biases found in the training data (LAION). However, we discover that amplification can largely be attributed to discrepancies between training captions and model prompts. For example, an inherent difference is that captions from the training data often contain explicit gender information while the prompts we use do not, which leads to a distribution shift and consequently impacts bias measures. Once we account for various distributional differences between texts used for training and generation, we observe that amplification decreases considerably. Our findings illustrate the challenges of comparing biases in models and the data they are trained on, and highlight confounding factors that contribute to bias amplification.
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Abstract
While language-guided image manipulation has made remarkable progress, the challenge of how to instruct the manipulation process faithfully reflecting human intentions persists. An accurate and comprehensive description of a manipulation task using natural language is laborious and sometimes even impossible, primarily due to the inherent uncertainty and ambiguity present in linguistic expressions. Is it feasible to accomplish image manipulation without resorting to external cross-modal language information? If this possibility exists, the inherent modality gap would be effortlessly eliminated. In this paper, we propose a novel manipulation methodology, dubbed ImageBrush, that learns visual instructions for more accurate image editing. Our key idea is to employ a pair of transformation images as visual instructions, which not only precisely captures human intention but also facilitates accessibility in real-world scenarios. Capturing visual instructions is particularly challenging because it involves extracting the underlying intentions solely from visual demonstrations and then applying this operation to a new image. To address this challenge, we formulate visual instruction learning as a diffusion-based inpainting problem, where the contextual information is fully exploited through an iterative process of generation. A visual prompting encoder is carefully devised to enhance the model's capacity in uncovering human intent behind the visual instructions. Extensive experiments show that our method generates engaging manipulation results conforming to the transformations entailed in demonstrations. Moreover, our model exhibits robust generalization capabilities on various downstream tasks such as pose transfer, image translation and video inpainting.
Exploiting Synthetic Data for Data Imbalance Problems: Baselines from a Data Perspective
Authors: Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
We live in a vast ocean of data, and deep neural networks are no exception to this. However, this data exhibits an inherent phenomenon of imbalance. This imbalance poses a risk of deep neural networks producing biased predictions, leading to potentially severe ethical and social consequences. To address these challenges, we believe that the use of generative models is a promising approach for comprehending tasks, given the remarkable advancements demonstrated by recent diffusion models in generating high-quality images. In this work, we propose a simple yet effective baseline, SYNAuG, that utilizes synthetic data as a preliminary step before employing task-specific algorithms to address data imbalance problems. This straightforward approach yields impressive performance on datasets such as CIFAR100-LT, ImageNet100-LT, UTKFace, and Waterbird, surpassing the performance of existing task-specific methods. While we do not claim that our approach serves as a complete solution to the problem of data imbalance, we argue that supplementing the existing data with synthetic data proves to be an effective and crucial preliminary step in addressing data imbalance concerns.
DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation
Abstract
The Class Incremental Semantic Segmentation (CISS) extends the traditional segmentation task by incrementally learning newly added classes. Previous work has introduced generative replay, which involves replaying old class samples generated from a pre-trained GAN, to address the issues of catastrophic forgetting and privacy concerns. However, the generated images lack semantic precision and exhibit out-of-distribution characteristics, resulting in inaccurate masks that further degrade the segmentation performance. To tackle these challenges, we propose DiffusePast, a novel framework featuring a diffusion-based generative replay module that generates semantically accurate images with more reliable masks guided by different instructions (e.g., text prompts or edge maps). Specifically, DiffusePast introduces a dual-generator paradigm, which focuses on generating old class images that align with the distribution of downstream datasets while preserving the structure and layout of the original images, enabling more precise masks. To adapt to the novel visual concepts of newly added classes continuously, we incorporate class-wise token embedding when updating the dual-generator. Moreover, we assign adequate pseudo-labels of old classes to the background pixels in the new step images, further mitigating the forgetting of previously learned knowledge. Through comprehensive experiments, our method demonstrates competitive performance across mainstream benchmarks, striking a better balance between the performance of old and novel classes.
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation
Authors: Guojin Zhong, Jin Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Abstract
The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image. This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM), which introduces contrastive positive/negative samples into the diffusion model to boost performance for markup-to-image generation. Technically, we design a fine-grained cross-modal alignment module to well explore the sequence similarity between the two modalities for learning robust feature representations. To improve the generalization ability, we propose a contrast-augmented diffusion model to explicitly explore positive and negative samples by maximizing a novel contrastive variational objective, which is mathematically inferred to provide a tighter bound for the model's optimization. Moreover, the context-aware cross attention module is developed to capture the contextual information within markup language during the denoising process, yielding better noise prediction results. Extensive experiments are conducted on four benchmark datasets from different domains, and the experimental results demonstrate the effectiveness of the proposed components in FSA-CDM, significantly exceeding state-of-the-art performance by about 2%-12% DTW improvements. The code will be released at https://github.com/zgj77/FSACDM.
Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Authors: Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We propose an effective denoising diffusion model for generating high-resolution images (e.g., 1024$\times$512), trained on small-size image patches (e.g., 64$\times$64). We name our algorithm Patch-DM, in which a new feature collage strategy is designed to avoid the boundary artifact when synthesizing large-size images. Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space. Patch-DM produces high-quality image synthesis results on our newly collected dataset of nature images (1024$\times$512), as well as on standard benchmarks of smaller sizes (256$\times$256), including LSUN-Bedroom, LSUN-Church, and FFHQ. We compare our method with previous patch-based generation methods and achieve state-of-the-art FID scores on all four datasets. Further, Patch-DM also reduces memory complexity compared to the classic diffusion models.
Keyword: adaptive
Adaptive Semantic Consistency for Cross-domain Few-shot Classification
Abstract
Cross-domain few-shot classification (CD-FSC) aims to identify novel target classes with a few samples, assuming that there exists a domain shift between source and target domains. Existing state-of-the-art practices typically pre-train on source domain and then finetune on the few-shot target data to yield task-adaptive representations. Despite promising progress, these methods are prone to overfitting the limited target distribution since data-scarcity and ignore the transferable knowledge learned in the source domain. To alleviate this problem, we propose a simple plug-and-play Adaptive Semantic Consistency (ASC) framework, which improves cross-domain robustness by preserving source transfer capability during the finetuning stage. Concretely, we reuse the source images in the pretraining phase and design an adaptive weight assignment strategy to highlight the samples similar to target domain, aiming to aggregate informative target-related knowledge from source domain. Subsequently, a semantic consistency regularization is applied to constrain the consistency between the semantic features of the source images output by the source model and target model. In this way, the proposed ASC enables explicit transfer of source domain knowledge to prevent the model from overfitting the target domain. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed ASC, and ASC provides consistent improvements over the baselines. The source code will be released.
Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment
Authors: Hongbo Liu, Mingda Wu, Kun Yuan, Ming Sun, Yansong Tang, Chuanchuan Zheng, Xing Wen, Xiu Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Video quality assessment (VQA) has attracted growing attention in recent years. While the great expense of annotating large-scale VQA datasets has become the main obstacle for current deep-learning methods. To surmount the constraint of insufficient training data, in this paper, we first consider the complete range of video distribution diversity (\ie content, distortion, motion) and employ diverse pretrained models (\eg architecture, pretext task, pre-training dataset) to benefit quality representation. An Adaptive Diverse Quality-aware feature Acquisition (Ada-DQA) framework is proposed to capture desired quality-related features generated by these frozen pretrained models. By leveraging the Quality-aware Acquisition Module (QAM), the framework is able to extract more essential and relevant features to represent quality. Finally, the learned quality representation is utilized as supplementary supervisory information, along with the supervision of the labeled quality score, to guide the training of a relatively lightweight VQA model in a knowledge distillation manner, which largely reduces the computational cost during inference. Experimental results on three mainstream no-reference VQA benchmarks clearly show the superior performance of Ada-DQA in comparison with current state-of-the-art approaches without using extra training data of VQA.
An ensemble of online estimation methods for one degree-of-freedom models of unmanned surface vehicles: applied theory and preliminary field results with eight vehicles
Abstract
In this paper we report an experimental evaluation of three popular methods for online system identification of unmanned surface vehicles (USVs) which were implemented as an ensemble: certifiably stable shallow recurrent neural network (RNN), adaptive identification (AID), and recursive least squares (RLS). The algorithms were deployed on eight USVs for a total of 30 hours of online estimation. During online training the loss function for the RNN was augmented to include a cost for violating a sufficient condition for the RNN to be stable in the sense of contraction stability. Additionally we described an efficient method to calculate the equilibrium points of the RNN and classify the associated stability properties about these points. We found the AID method had lowest mean absolute error in the online prediction setting, but a weighted ensemble had lower error in offline processing.
Nearly Optimal Dynamic Set Cover: Breaking the Quadratic-in-$f$ Time Barrier
Abstract
The dynamic set cover problem has been subject to extensive research since the pioneering works of [Bhattacharya et al, 2015] and [Gupta et al, 2017]. The input is a set system $(U, S)$ on a fixed collection $S$ of sets and a dynamic universe of elements, where each element appears in a most $f$ sets and the cost of each set lies in the range $[1/C, 1]$, and the goal is to efficiently maintain an approximately-minimum set cover under insertions and deletions of elements. Most previous work considers the low-frequency regime, namely $f = O(\log n)$, and this line of work has culminated with a deterministic $(1+\epsilon)f$-approximation algorithm with amortized update time $O(\frac{f^2}{\epsilon^3} + \frac{f}{\epsilon^2}\log C)$ [Bhattacharya et al, 2021]. In the high-frequency regime of $f = \Omega(\log n)$, an $O(\log n)$-approximation algorithm with amortized update time $O(f\log n)$ was given by [Gupta et al, 2017]. Interestingly, at the intersection of the two regimes, i.e., $f = \Theta(\log n)$, the state-of-the-art results coincide: approximation $\Theta(f) = \Theta(\log n)$ with amortized update time $O(f^2) = O(f \log n) = O(\log^2 n)$. Up to this date, no previous work achieved update time of $o(f^2)$. In this paper we break the $\Omega(f^2)$ update time barrier via the following results: (1) $(1+\epsilon)f$-approximation can be maintained in $O\left(\frac{f}{\epsilon^3}\log^f + \frac{f}{\epsilon^3}\log C\right) = O_{\epsilon,C}(f \log^ f)$ expected amortized update time; our algorithm works against an adaptive adversary. (2) $(1+\epsilon)f$-approximation can be maintained deterministically in $O\left(\frac{1}{\epsilon}f\log f + \frac{f}{\epsilon^3} + \frac{f\log C}{\epsilon^2}\right) = O_{\epsilon,C}(f \log f)$ amortized update time.
Detection and Segmentation of Cosmic Objects Based on Adaptive Thresholding and Back Propagation Neural Network
Authors: Samia Sultana, Shyla Afroge
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Astronomical images provide information about the great variety of cosmic objects in the Universe. Due to the large volumes of data, the presence of innumerable bright point sources as well as noise within the frame and the spatial gap between objects and satellite cameras, it is a challenging task to classify and detect the celestial objects. We propose an Adaptive Thresholding Method (ATM) based segmentation and Back Propagation Neural Network (BPNN) based cosmic object detection including a well-structured series of pre-processing steps designed to enhance segmentation and detection.
Particle swarm optimization with state-based adaptive velocity limit strategy
Abstract
Velocity limit (VL) has been widely adopted in many variants of particle swarm optimization (PSO) to prevent particles from searching outside the solution space. Several adaptive VL strategies have been introduced with which the performance of PSO can be improved. However, the existing adaptive VL strategies simply adjust their VL based on iterations, leading to unsatisfactory optimization results because of the incompatibility between VL and the current searching state of particles. To deal with this problem, a novel PSO variant with state-based adaptive velocity limit strategy (PSO-SAVL) is proposed. In the proposed PSO-SAVL, VL is adaptively adjusted based on the evolutionary state estimation (ESE) in which a high value of VL is set for global searching state and a low value of VL is set for local searching state. Besides that, limit handling strategies have been modified and adopted to improve the capability of avoiding local optima. The good performance of PSO-SAVL has been experimentally validated on a wide range of benchmark functions with 50 dimensions. The satisfactory scalability of PSO-SAVL in high-dimension and large-scale problems is also verified. Besides, the merits of the strategies in PSO-SAVL are verified in experiments. Sensitivity analysis for the relevant hyper-parameters in state-based adaptive VL strategy is conducted, and insights in how to select these hyper-parameters are also discussed.
Scaling Data Science Solutions with Semantics and Machine Learning: Bosch Case
Authors: Baifan Zhou, Nikolay Nikolov, Zhuoxun Zheng, Xianghui Luo, Ognjen Savkovic, Dumitru Roman, Ahmet Soylu, Evgeny Kharlamov
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Industry 4.0 and Internet of Things (IoT) technologies unlock unprecedented amount of data from factory production, posing big data challenges in volume and variety. In that context, distributed computing solutions such as cloud systems are leveraged to parallelise the data processing and reduce computation time. As the cloud systems become increasingly popular, there is increased demand that more users that were originally not cloud experts (such as data scientists, domain experts) deploy their solutions on the cloud systems. However, it is non-trivial to address both the high demand for cloud system users and the excessive time required to train them. To this end, we propose SemCloud, a semantics-enhanced cloud system, that couples cloud system with semantic technologies and machine learning. SemCloud relies on domain ontologies and mappings for data integration, and parallelises the semantic data integration and data analysis on distributed computing nodes. Furthermore, SemCloud adopts adaptive Datalog rules and machine learning for automated resource configuration, allowing non-cloud experts to use the cloud system. The system has been evaluated in industrial use case with millions of data, thousands of repeated runs, and domain users, showing promising results.
UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
Abstract
Change detection (CD) by comparing two bi-temporal images is a crucial task in remote sensing. With the advantages of requiring no cumbersome labeled change information, unsupervised CD has attracted extensive attention in the community. However, existing unsupervised CD approaches rarely consider the seasonal and style differences incurred by the illumination and atmospheric conditions in multi-temporal images. To this end, we propose a change detection with domain shift setting for remote sensing images. Furthermore, we present a novel unsupervised CD method using a light-weight transformer, called UCDFormer. Specifically, a transformer-driven image translation composed of a light-weight transformer and a domain-specific affinity weight is first proposed to mitigate domain shift between two images with real-time efficiency. After image translation, we can generate the difference map between the translated before-event image and the original after-event image. Then, a novel reliable pixel extraction module is proposed to select significantly changed/unchanged pixel positions by fusing the pseudo change maps of fuzzy c-means clustering and adaptive threshold. Finally, a binary change map is obtained based on these selected pixel pairs and a binary classifier. Experimental results on different unsupervised CD tasks with seasonal and style changes demonstrate the effectiveness of the proposed UCDFormer. For example, compared with several other related methods, UCDFormer improves performance on the Kappa coefficient by more than 12\%. In addition, UCDFormer achieves excellent performance for earthquake-induced landslide detection when considering large-scale applications. The code is available at \url{https://github.com/zhu-xlab/UCDFormer}
Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation
Abstract
Learning a policy with great generalization to unseen environments remains challenging but critical in visual reinforcement learning. Despite the success of augmentation combination in the supervised learning generalization, naively applying it to visual RL algorithms may damage the training efficiency, suffering from serve performance degradation. In this paper, we first conduct qualitative analysis and illuminate the main causes: (i) high-variance gradient magnitudes and (ii) gradient conflicts existed in various augmentation methods. To alleviate these issues, we propose a general policy gradient optimization framework, named Conflict-aware Gradient Agreement Augmentation (CG2A), and better integrate augmentation combination into visual RL algorithms to address the generalization bias. In particular, CG2A develops a Gradient Agreement Solver to adaptively balance the varying gradient magnitudes, and introduces a Soft Gradient Surgery strategy to alleviate the gradient conflicts. Extensive experiments demonstrate that CG2A significantly improves the generalization performance and sample efficiency of visual RL algorithms.
A Real-Time Robust Ecological-Adaptive Cruise Control Strategy for Battery Electric Vehicles
Authors: Sheng Yua, Xiao Pana, Anastasis Georgioub, Boli Chenc, Imad M. Jaimoukhaa, Simos A. Evangeloua
Abstract
This work addresses the ecological-adaptive cruise control problem for connected electric vehicles by a computationally efficient and robust control strategy. The problem is formulated in the space-domain with a realistic description of the nonlinear electric powertrain model and motion dynamics to yield a convex optimal control problem (OCP). The OCP is approached by a robust model predictive control (RMPC) method, which handles various uncertainties due to the modelling mismatch and inaccurate information of the leading vehicle. The RMPC problem is solved by semi-definite programming relaxation and single linear matrix inequality (sLMI) techniques for further enhanced computational efficiency. The performance of the proposed real-time robust ecological-adaptive cruise control (REACC) method is evaluated by utilising an urban driving cycle experimentally collected on a real-world route in London UK with practical disturbances including modelling mismatches on air-drag coefficients, tyre-rolling resistance coefficients, and road slope angles. Its robustness is verified through the comparison with a nominal MPC which is shown to result in speed limit constraint violations. The energy economy of the proposed method outperforms a state-of-the-art time-domain RMPC scheme, as a more precisely fitted convex powertrain model can be integrated into the space-domain scheme. The additional comparison with a traditional constant distance following strategy (CDFS) further verifies the effectiveness of the proposed REACC. Finally, it is verified that the REACC can be potentially implemented in real-time owing to the sLMI and resulting convex algorithm.
Adaptive Collaborative Filtering with Personalized Time Decay Functions for Financial Product Recommendation
Abstract
Classical recommender systems often assume that historical data are stationary and fail to account for the dynamic nature of user preferences, limiting their ability to provide reliable recommendations in time-sensitive settings. This assumption is particularly problematic in finance, where financial products exhibit continuous changes in valuations, leading to frequent shifts in client interests. These evolving interests, summarized in the past client-product interactions, see their utility fade over time with a degree that might differ from one client to another. To address this challenge, we propose a time-dependent collaborative filtering algorithm that can adaptively discount distant client-product interactions using personalized decay functions. Our approach is designed to handle the non-stationarity of financial data and produce reliable recommendations by modeling the dynamic collaborative signals between clients and products. We evaluate our method using a proprietary dataset from BNP Paribas and demonstrate significant improvements over state-of-the-art benchmarks from relevant literature. Our findings emphasize the importance of incorporating time explicitly in the model to enhance the accuracy of financial product recommendation.
Keyword: quantization
Tango: rethinking quantization for graph neural network training on GPUs
Authors: Shiyang Chen, Da Zheng, Caiwen Ding, Chengying Huan, Yuede Ji, Hang Liu
Abstract
Graph Neural Networks (GNNs) are becoming increasingly popular due to their superior performance in critical graph-related tasks. While quantization is widely used to accelerate GNN computation, quantized training faces unprecedented challenges. Current quantized GNN training systems often have longer training times than their full-precision counterparts for two reasons: (i) addressing the accuracy challenge leads to excessive overhead, and (ii) the optimization potential exposed by quantization is not adequately leveraged. This paper introduces Tango which re-thinks quantization challenges and opportunities for graph neural network training on GPUs with three contributions: Firstly, we introduce efficient rules to maintain accuracy during quantized GNN training. Secondly, we design and implement quantization-aware primitives and inter-primitive optimizations that can speed up GNN training. Finally, we integrate Tango with the popular Deep Graph Library (DGL) system and demonstrate its superior performance over state-of-the-art approaches on various GNN models and datasets.
Error Analysis of CORDIC Processor with FPGA Implementation
Abstract
The coordinate rotation digital computer (CORDIC) is a shift-add based fast computing algorithm which has been found in many digital signal processing (DSP) applications. In this paper, a detailed error analysis based on mean square error criteria and its implementation on FPGA is presented. Two considered error sources are an angle approximation error and a quantization error due to finite word length in fixed-point number system. The error bound and variance are discussed in theory. The CORDIC algorithm is implemented on FPGA using the Xilinx Zynq-7000 development board called ZedBoard. Those results of theoretical error analysis are practically investigated by implementing it on actual FPGA board. In addition, Matlab is used to provide theoretical value as a baseline model by being set up in double-precision floating-point to compare it with the practical value of errors on FPGA implementation.
Keyword: efficient
Approximate Model-Based Shielding for Safe Reinforcement Learning
An ensemble of online estimation methods for one degree-of-freedom models of unmanned surface vehicles: applied theory and preliminary field results with eight vehicles
Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking
Nearly Optimal Dynamic Set Cover: Breaking the Quadratic-in-$f$ Time Barrier
Addressing Uncertainty in Imbalanced Histopathology Image Classification of HER2 Breast Cancer: An interpretable Ensemble Approach with Threshold Filtered Single Instance Evaluation (SIE)
Deep Learning Approaches in Pavement Distress Identification: A Review
Microfluidic Molecular Communication Transmitter Based on Hydrodynamic Gating
Factor Graph Neural Networks
Tango: rethinking quantization for graph neural network training on GPUs
A Mini Immersed Finite Element Method for Two-Phase Stokes Problems on Cartesian Meshes
WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond
IIDS: Design of Intelligent Intrusion Detection System for Internet-of-Things Applications
Reward Shaping for Building Trustworthy Robots in Sequential Human-Robot Interaction
Training-Free Instance Segmentation from Semantic Image Segmentation Masks
The evolution of Complexity co-occurring keywords: bibliometric analysis and network approach
Push to know! -- Visuo-Tactile based Active Object Parameter Inference with Dual Differentiable Filtering
Dual-Matrix Domain-Wall: A Novel Technique for Generating Permutations by QUBO and Ising Models with Quadratic Sizes
WCCNet: Wavelet-integrated CNN with Crossmodal Rearranging Fusion for Fast Multispectral Pedestrian Detection
Game-theoretical approach to decentralized multi-drone conflict resolution and emergent traffic flow operations
Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search
UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
Virtual Reality Based Robot Teleoperation via Human-Scene Interaction
Direct Gradient Temporal Difference Learning
Generative Noisy-Label Learning by Implicit Dicriminative Approximation with Partial Label Prior
A Real-Time Robust Ecological-Adaptive Cruise Control Strategy for Battery Electric Vehicles
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
Current Studies and Applications of Krill Herd and Gravitational Search Algorithms in Healthcare
Delegated Time-Lock Puzzle
Straggler Mitigation and Latency Optimization in Blockchain-based Hierarchical Federated Learning
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Keyword: faster
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks
Boundary integrated neural networks (BINNs) for 2D elastostatic and piezoelectric problems: Theory and MATLAB code
VAPI: Vectorization of Algorithm for Performance Improvement
BCDDO: Binary Child Drawing Development Optimization
BRNES: Enabling Security and Privacy-aware Experience Sharing in Multiagent Robotic and Autonomous Systems
Keyword: mobile
A Model Predictive Path Integral Method for Fast, Proactive, and Uncertainty-Aware UAV Planning in Cluttered Environments
A Decision Tree-based Monitoring and Recovery Framework for Autonomous Robots with Decision Uncertainties
Keyword: pruning
Understanding Activation Patterns in Artificial Neural Networks by Exploring Stochastic Processes
Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation
Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
Keyword: diffusion
The Bias Amplification Paradox in Text-to-Image Generation
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Exploiting Synthetic Data for Data Imbalance Problems: Baselines from a Data Perspective
DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation
Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Keyword: adaptive
Adaptive Semantic Consistency for Cross-domain Few-shot Classification
Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment
An ensemble of online estimation methods for one degree-of-freedom models of unmanned surface vehicles: applied theory and preliminary field results with eight vehicles
Nearly Optimal Dynamic Set Cover: Breaking the Quadratic-in-$f$ Time Barrier
Detection and Segmentation of Cosmic Objects Based on Adaptive Thresholding and Back Propagation Neural Network
Particle swarm optimization with state-based adaptive velocity limit strategy
Scaling Data Science Solutions with Semantics and Machine Learning: Bosch Case
UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation
A Real-Time Robust Ecological-Adaptive Cruise Control Strategy for Battery Electric Vehicles
Adaptive Collaborative Filtering with Personalized Time Decay Functions for Financial Product Recommendation
Keyword: quantization
Tango: rethinking quantization for graph neural network training on GPUs
Error Analysis of CORDIC Processor with FPGA Implementation