Abstract
Change detection, a prominent research area in remote sensing, is pivotal in observing and analyzing surface transformations. Despite significant advancements achieved through deep learning-based methods, executing high-precision change detection in spatio-temporally complex remote sensing scenarios still presents a substantial challenge. The recent emergence of foundation models, with their powerful universality and generalization capabilities, offers potential solutions. However, bridging the gap of data and tasks remains a significant obstacle. In this paper, we introduce Time Travelling Pixels (TTP), a novel approach that integrates the latent knowledge of the SAM foundation model into change detection. This method effectively addresses the domain shift in general knowledge transfer and the challenge of expressing homogeneous and heterogeneous characteristics of multi-temporal images. The state-of-the-art results obtained on the LEVIR-CD underscore the efficacy of the TTP. The Code is available at \url{https://kychen.me/TTP}.
Observable Propagation: A Data-Efficient Approach to Uncover Feature Vectors in Transformers
Authors: Authors: Jacob Dunefsky, Arman Cohan
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
A key goal of current mechanistic interpretability research in NLP is to find linear features (also called "feature vectors") for transformers: directions in activation space corresponding to concepts that are used by a given model in its computation. Present state-of-the-art methods for finding linear features require large amounts of labelled data -- both laborious to acquire and computationally expensive to utilize. In this work, we introduce a novel method, called "observable propagation" (in short: ObsProp), for finding linear features used by transformer language models in computing a given task -- using almost no data. Our paradigm centers on the concept of observables, linear functionals corresponding to given tasks. We then introduce a mathematical theory for the analysis of feature vectors: we provide theoretical motivation for why LayerNorm nonlinearities do not affect the direction of feature vectors; we also introduce a similarity metric between feature vectors called the coupling coefficient which estimates the degree to which one feature's output correlates with another's. We use ObsProp to perform extensive qualitative investigations into several tasks, including gendered occupational bias, political party prediction, and programming language detection. Our results suggest that ObsProp surpasses traditional approaches for finding feature vectors in the low-data regime, and that ObsProp can be used to better understand the mechanisms responsible for bias in large language models. Code for experiments can be found at github.com/jacobdunefsky/ObservablePropagation.
Nearly Tight Bounds For Differentially Private Min $s$-$t$ and Multiway Cut
Authors: Authors: Mina Dalirrooyfard, Slobodan Mitrović, Yuriy Nevmyvaka
Abstract
Finding min $s$-$t$ cuts in graphs is a basic algorithmic tool with applications in image segmentation, community detection, reinforcement learning, and data clustering. In this problem, we are given two nodes as terminals, and the goal is to remove the smallest number of edges from the graph so that these two terminals are disconnected. We study the complexity of differential privacy for the min $s$-$t$ cut problem and show nearly tight lower and upper bounds where we achieve privacy at no cost for running time efficiency. We also develop a differentially private algorithm for the multiway $k$-cut problem, in which we are given $k$ nodes as terminals that we would like to disconnect. As a function of $k$, we obtain privacy guarantees that are exponentially more efficient than applying the advanced composition theorem to known algorithms for multiway $k$-cut. Finally, we empirically evaluate the approximation of our differentially private min $s$-$t$ cut algorithm and show that it almost matches the quality of the output of non-private ones.
LLM Polygraph: Uncovering LLMs' Factual Discernment through Intermediate Data Analysis
Abstract
Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection. Our investigation reveals distinguishable patterns in LLMs' inner states when generating factual versus non-factual content. We demonstrate the LLM factoscope's effectiveness across various architectures, achieving over 96% accuracy in factual detection. Our work opens a new avenue for utilizing LLMs' inner states for factual detection and encourages further exploration into LLMs' inner workings for enhanced reliability and transparency.
Segment Change Model (SCM) for Unsupervised Change detection in VHR Remote Sensing Images: a Case Study of Buildings
Abstract
The field of Remote Sensing (RS) widely employs Change Detection (CD) on very-high-resolution (VHR) images. A majority of extant deep-learning-based methods hinge on annotated samples to complete the CD process. Recently, the emergence of Vision Foundation Model (VFM) enables zero-shot predictions in particular vision tasks. In this work, we propose an unsupervised CD method named Segment Change Model (SCM), built upon the Segment Anything Model (SAM) and Contrastive Language-Image Pre-training (CLIP). Our method recalibrates features extracted at different scales and integrates them in a top-down manner to enhance discriminative change edges. We further design an innovative Piecewise Semantic Attention (PSA) scheme, which can offer semantic representation without training, thereby minimize pseudo change phenomenon. Through conducting experiments on two public datasets, the proposed SCM increases the mIoU from 46.09% to 53.67% on the LEVIR-CD dataset, and from 47.56% to 52.14% on the WHU-CD dataset. Our codes are available at https://github.com/StephenApX/UCD-SCM.
Soft Contrastive Learning for Time Series
Authors: Authors: Seunghan Lee, Taeyoung Park, Kibok Lee
Abstract
Contrastive learning has shown to be effective to learn representations from time series in a self-supervised way. However, contrasting similar time series instances or values from adjacent timestamps within a time series leads to ignore their inherent correlations, which results in deteriorating the quality of learned representations. To address this issue, we propose SoftCLT, a simple yet effective soft contrastive learning strategy for time series. This is achieved by introducing instance-wise and temporal contrastive loss with soft assignments ranging from zero to one. Specifically, we define soft assignments for 1) instance-wise contrastive loss by the distance between time series on the data space, and 2) temporal contrastive loss by the difference of timestamps. SoftCLT is a plug-and-play method for time series contrastive learning that improves the quality of learned representations without bells and whistles. In experiments, we demonstrate that SoftCLT consistently improves the performance in various downstream tasks including classification, semi-supervised learning, transfer learning, and anomaly detection, showing state-of-the-art performance. Code is available at this repository: https://github.com/seunghan96/softclt.
ReSynthDetect: A Fundus Anomaly Detection Network with Reconstruction and Synthetic Features
Abstract
Detecting anomalies in fundus images through unsupervised methods is a challenging task due to the similarity between normal and abnormal tissues, as well as their indistinct boundaries. The current methods have limitations in accurately detecting subtle anomalies while avoiding false positives. To address these challenges, we propose the ReSynthDetect network which utilizes a reconstruction network for modeling normal images, and an anomaly generator that produces synthetic anomalies consistent with the appearance of fundus images. By combining the features of consistent anomaly generation and image reconstruction, our method is suited for detecting fundus abnormalities. The proposed approach has been extensively tested on benchmark datasets such as EyeQ and IDRiD, demonstrating state-of-the-art performance in both image-level and pixel-level anomaly detection. Our experiments indicate a substantial 9% improvement in AUROC on EyeQ and a significant 17.1% improvement in AUPR on IDRiD.
Source Code is a Graph, Not a Sequence: A Cross-Lingual Perspective on Code Clone Detection
Authors: Authors: Mohammed Ataaur Rahaman, Julia Ive
Abstract
Source code clone detection is the task of finding code fragments that have the same or similar functionality, but may differ in syntax or structure. This task is important for software maintenance, reuse, and quality assurance (Roy et al. 2009). However, code clone detection is challenging, as source code can be written in different languages, domains, and styles. In this paper, we argue that source code is inherently a graph, not a sequence, and that graph-based methods are more suitable for code clone detection than sequence-based methods. We compare the performance of two state-of-the-art models: CodeBERT (Feng et al. 2020), a sequence-based model, and CodeGraph (Yu et al. 2023), a graph-based model, on two benchmark data-sets: BCB (Svajlenko et al. 2014) and PoolC (PoolC no date). We show that CodeGraph outperforms CodeBERT on both data-sets, especially on cross-lingual code clones. To the best of our knowledge, this is the first work to demonstrate the superiority of graph-based methods over sequence-based methods on cross-lingual code clone detection.
Camera calibration for the surround-view system: a benchmark and dataset
Authors: Authors: L Qin, C Lin, S Huang, S Yang, Y Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Surround-view system (SVS) is widely used in the Advanced Driver Assistance System (ADAS). SVS uses four fisheye lenses to monitor real-time scenes around the vehicle. However, accurate intrinsic and extrinsic parameter estimation is required for the proper functioning of the system. At present, the intrinsic calibration can be pipeline by utilizing checkerboard algorithm, while extrinsic calibration is still immature. Therefore, we proposed a specific calibration pipeline to estimate extrinsic parameters robustly. This scheme takes a driving sequence of four cameras as input. It firstly utilizes lane line to roughly estimate each camera pose. Considering the environmental condition differences in each camera, we separately select strategies from two methods to accurately estimate the extrinsic parameters. To achieve accurate estimates for both front and rear camera, we proposed a method that mutually iterating line detection and pose estimation. As for bilateral camera, we iteratively adjust the camera pose and position by minimizing texture and edge error between ground projections of adjacent cameras. After estimating the extrinsic parameters, the surround-view image can be synthesized by homography-based transformation. The proposed pipeline can robustly estimate the four SVS camera extrinsic parameters in real driving environments. In addition, to evaluate the proposed scheme, we build a surround-view fisheye dataset, which contains 40 videos with 32,000 frames, acquired from different real traffic scenarios. All the frames in each video are manually labeled with lane annotation, with its GT extrinsic parameters. Moreover, this surround-view dataset could be used by other researchers to evaluate their performance. The dataset will be available soon.
ConstScene: Dataset and Model for Advancing Robust Semantic Segmentation in Construction Environments
Authors: Authors: Maghsood Salimi, Mohammad Loni, Sara Afshar, Marjan Sirjani, Antonio Cicchetti
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The increasing demand for autonomous machines in construction environments necessitates the development of robust object detection algorithms that can perform effectively across various weather and environmental conditions. This paper introduces a new semantic segmentation dataset specifically tailored for construction sites, taking into account the diverse challenges posed by adverse weather and environmental conditions. The dataset is designed to enhance the training and evaluation of object detection models, fostering their adaptability and reliability in real-world construction applications. Our dataset comprises annotated images captured under a wide range of different weather conditions, including but not limited to sunny days, rainy periods, foggy atmospheres, and low-light situations. Additionally, environmental factors such as the existence of dirt/mud on the camera lens are integrated into the dataset through actual captures and synthetic generation to simulate the complex conditions prevalent in construction sites. We also generate synthetic images of the annotations including precise semantic segmentation masks for various objects commonly found in construction environments, such as wheel loader machines, personnel, cars, and structural elements. To demonstrate the dataset's utility, we evaluate state-of-the-art object detection algorithms on our proposed benchmark. The results highlight the dataset's success in adversarial training models across diverse conditions, showcasing its efficacy compared to existing datasets that lack such environmental variability.
Abstract
Background: Imagine a paper with n nodes on it where each pair undergoes a coin toss experiment; if heads we connect the pair with an undirected link, while tails maintain the disconnection. This procedure yields a random graph. Now consider duplicating this network onto another paper with a slight bias-a fraction of its links (approximately 1/10) undergo rearrangement. If we shuffle the two papers, how can we distinguish the pure random graph from the biased one? Results: In response to this challenge, we propose a novel metric called Randomness Index (RI). The closer the metric to zero is, the higher degree of randomness in the graph. The RI can distinguish between dense small-world networks and dense random graphs; a distinction which is impossible by conventional small-world properties like clustering coefficient and average path length. To validate its effectiveness, we apply the RI to temporal correlation networks of stock indices. Our findings reveal a reduction in randomness during global economic recession periods. Conclusion: The RI emerges as a powerful metric capable of characterizing small-world topology, especially in scenarios where other network measures fail. Beyond its utility in network analysis, the RI is promising for change-point (anomaly) detection in dynamical systems studied by means of multivariate time series.
GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection
Abstract
Few-shot object detection (FSOD) aims to achieve object detection only using a few novel class training data. Most of the existing methods usually adopt a transfer-learning strategy to construct the novel class distribution by transferring the base class knowledge. However, this direct way easily results in confusion between the novel class and other similar categories in the decision space. To address the problem, we propose generating local reverse samples (LRSamples) in Prototype Reference Frames to adaptively adjust the center position and boundary range of the novel class distribution to learn more discriminative novel class samples for FSOD. Firstly, we propose a Center Calibration Variance Augmentation (CCVA) module, which contains the selection rule of LRSamples, the generator of LRSamples, and augmentation on the calibrated distribution centers. Specifically, we design an intra-class feature converter (IFC) as the generator of CCVA to learn the selecting rule. By transferring the knowledge of IFC from the base training to fine-tuning, the IFC generates plentiful novel samples to calibrate the novel class distribution. Moreover, we propose a Feature Density Boundary Optimization (FDBO) module to adaptively adjust the importance of samples depending on their distance from the decision boundary. It can emphasize the importance of the high-density area of the similar class (closer decision boundary area) and reduce the weight of the low-density area of the similar class (farther decision boundary area), thus optimizing a clearer decision boundary for each category. We conduct extensive experiments to demonstrate the effectiveness of our proposed method. Our method achieves consistent improvement on the Pascal VOC and MS COCO datasets based on DeFRCN and MFDC baselines.
Enhancing Traffic Flow Prediction using Outlier-Weighted AutoEncoders: Handling Real-Time Changes
Abstract
In today's urban landscape, traffic congestion poses a critical challenge, especially during outlier scenarios. These outliers can indicate abrupt traffic peaks, drops, or irregular trends, often arising from factors such as accidents, events, or roadwork. Moreover, Given the dynamic nature of traffic, the need for real-time traffic modeling also becomes crucial to ensure accurate and up-to-date traffic predictions. To address these challenges, we introduce the Outlier Weighted Autoencoder Modeling (OWAM) framework. OWAM employs autoencoders for local outlier detection and generates correlation scores to assess neighboring traffic's influence. These scores serve as a weighted factor for neighboring sensors, before fusing them into the model. This information enhances the traffic model's performance and supports effective real-time updates, a crucial aspect for capturing dynamic traffic patterns. OWAM demonstrates a favorable trade-off between accuracy and efficiency, rendering it highly suitable for real-world applications. The research findings contribute significantly to the development of more efficient and adaptive traffic prediction models, advancing the field of transportation management for the future. The code and datasets of our framework is publicly available under https://github.com/himanshudce/OWAM.
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
Authors: Authors: Holger Severin Bovbjerg (1), Jesper Jensen (1, 2), Jan Østergaard (1), Zheng-Hua Tan (1, 3) ((1) Aalborg University, (2) Oticon, (3) Pioneer Centre for AI, Denmark)
Abstract
In this paper, we propose the use of self-supervised pretraining on a large unlabelled data set to improve the performance of a personalized voice activity detection (VAD) model in adverse conditions. We pretrain a long short-term memory (LSTM)-encoder using the autoregressive predictive coding (APC) framework and fine-tune it for personalized VAD. We also propose a denoising variant of APC, with the goal of improving the robustness of personalized VAD. The trained models are systematically evaluated on both clean speech and speech contaminated by various types of noise at different SNR-levels and compared to a purely supervised model. Our experiments show that self-supervised pretraining not only improves performance in clean conditions, but also yields models which are more robust to adverse conditions compared to purely supervised learning.
Dual-Functional Artificial Noise (DFAN) Aided Robust Covert Communications in Integrated Sensing and Communications
Abstract
This paper investigates covert communications in an integrated sensing and communications system, where a dual-functional base station (called Alice) covertly transmits signals to a covert user (called Bob) while sensing multiple targets, with one of them acting as a potential watcher (called Willie) and maliciously eavesdropping on legitimate communications. To shelter the covert communications, Alice transmits additional dual-functional artificial noise (DFAN) with a varying power not only to create uncertainty at Willie's signal reception to confuse Willie but also to sense the targets simultaneously. Based on this framework, the weighted sum of the sensing beampattern means square error (MSE) and cross correlation is minimized by jointly optimizing the covert communication and DFAN signals subject to the minimum covert rate requirement. The robust design considers both cases of imperfect Willie's CSI (WCSI) and statistical WCSI. Under the worst-case assumption that Willie can adaptively adjust the detection threshold to achieve the best detection performance, the minimum detection error probability (DEP) at Willie is analytically derived in the closed-form expression. The formulated covertness constrained optimization problems are tackled by a feasibility-checking based difference-of-convex relaxation (DC) algorithm utilizing the S-procedure, Bernstein-type inequality, and the DC method. Simulation results validate the feasibility of the proposed scheme and demonstrate the covertness performance gains achieved by our proposed design over various benchmarks.
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Authors: Authors: Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Yao Zhao, Jingdong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods, e.g., GANs and diffusion models. Cutting-edge solutions start to explore the benefits of pre-trained models, and mainly follow the fixed paradigm of solely training an attached classifier, e.g., combining frozen CLIP-ViT with a learnable linear layer in UniFD. However, our analysis shows that such a fixed paradigm is prone to yield detectors with insufficient learning regarding forgery representations. We attribute the key challenge to the lack of forgery adaptation, and present a novel forgery-aware adaptive transformer approach, namely FatFormer. Based on the pre-trained vision-language spaces of CLIP, FatFormer introduces two core designs for the adaption to build generalized forgery representations. First, motivated by the fact that both image and frequency analysis are essential for synthetic image detection, we develop a forgery-aware adapter to adapt image features to discern and integrate local forgery traces within image and frequency domains. Second, we find that considering the contrastive objectives between adapted image features and text prompt embeddings, a previously overlooked aspect, results in a nontrivial generalization improvement. Accordingly, we introduce language-guided alignment to supervise the forgery adaptation with image and text prompts in FatFormer. Experiments show that, by coupling these two designs, our approach tuned on 4-class ProGAN data attains a remarkable detection performance, achieving an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
Landslide Detection and Segmentation Using Remote Sensing Images and Deep Neural Network
Authors: Authors: Cam Le, Lam Pham, Jasmin Lampert, Matthias Schlögl, Alexander Schindler
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Knowledge about historic landslide event occurrence is important for supporting disaster risk reduction strategies. Building upon findings from 2022 Landslide4Sense Competition, we propose a deep neural network based system for landslide detection and segmentation from multisource remote sensing image input. We use a U-Net trained with Cross Entropy loss as baseline model. We then improve the U-Net baseline model by leveraging a wide range of deep learning techniques. In particular, we conduct feature engineering by generating new band data from the original bands, which helps to enhance the quality of remote sensing image input. Regarding the network architecture, we replace traditional convolutional layers in the U-Net baseline by a residual-convolutional layer. We also propose an attention layer which leverages the multi-head attention scheme. Additionally, we generate multiple output masks with three different resolutions, which creates an ensemble of three outputs in the inference process to enhance the performance. Finally, we propose a combined loss function which leverages Focal loss and IoU loss to train the network. Our experiments on the development set of the Landslide4Sense challenge achieve an F1 score and an mIoU score of 84.07 and 76.07, respectively. Our best model setup outperforms the challenge baseline and the proposed U-Net baseline, improving the F1 score/mIoU score by 6.8/7.4 and 10.5/8.8, respectively.
A pipeline for multiple orange detection and tracking with 3-D fruit relocalization and neural-net based yield regression in commercial citrus orchards
Authors: Authors: Thiago T. Santos, Kleber X. S. de Souza, João Camargo Neto, Luciano V. Koenigkan, Alécio S. Moreira, Sônia Ternes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Traditionally, sweet orange crop forecasting has involved manually counting fruits from numerous trees, which is a labor-intensive process. Automatic systems for fruit counting, based on proximal imaging, computer vision, and machine learning, have been considered a promising alternative or complement to manual counting. These systems require data association components that prevent multiple counting of the same fruit observed in different images. However, there is a lack of work evaluating the accuracy of multiple fruit counting, especially considering (i) occluded and re-entering green fruits on leafy trees, and (ii) counting ground-truth data measured in the crop field. We propose a non-invasive alternative that utilizes fruit counting from videos, implemented as a pipeline. Firstly, we employ CNNs for the detection of visible fruits. Inter-frame association techniques are then applied to track the fruits across frames. To handle occluded and re-appeared fruit, we introduce a relocalization component that employs 3-D estimation of fruit locations. Finally, a neural network regressor is utilized to estimate the total number of fruit, integrating image-based fruit counting with other tree data such as crop variety and tree size. The results demonstrate that the performance of our approach is closely tied to the quality of the field-collected videos. By ensuring that at least 30% of the fruit is accurately detected, tracked, and counted, our yield regressor achieves an impressive coefficient of determination of 0.85. To the best of our knowledge, this study represents one of the few endeavors in fruit estimation that incorporates manual fruit counting as a reference point for evaluation. We also introduce annotated datasets for multiple orange tracking (MOrangeT) and detection (OranDet), publicly available to foster the development of novel methods for image-based fruit counting.
Bayesian Sensor Placement for Multi-source Localization of Pathogens in Wastewater Networks
Authors: Authors: Kalvik Jakkala, Srinivas Akella
Subjects: Social and Information Networks (cs.SI); Computational Engineering, Finance, and Science (cs.CE); Physics and Society (physics.soc-ph)
Abstract
Wastewater monitoring is an effective approach for the early detection of viral and bacterial disease outbreaks. It has recently been used to identify the presence of individuals infected with COVID-19. To monitor large communities and accurately localize buildings with infected individuals with a limited number of sensors, one must carefully choose the sampling locations in wastewater networks. We also have to account for concentration requirements on the collected wastewater samples to ensure reliable virus presence test results. We model this as a sensor placement problem. Although sensor placement for source localization arises in numerous problems, most approaches use application-specific heuristics and fail to consider multiple source scenarios. To address these limitations, we develop a novel approach that combines Bayesian networks and discrete optimization to efficiently identify informative sensor placements and accurately localize virus sources. Our approach also takes into account concentration requirements on wastewater samples during optimization. Our simulation experiments demonstrate the quality of our sensor placements and the accuracy of our source localization approach. Furthermore, we show the robustness of our approach to discrepancies between the virus outbreak model and the actual outbreak rates.
Graph Neural Networks for Antisocial Behavior Detection on Twitter
Authors: Authors: Martina Toshevska, Slobodan Kalajdziski, Sonja Gievska
Abstract
Social media resurgence of antisocial behavior has exerted a downward spiral on stereotypical beliefs, and hateful comments towards individuals and social groups, as well as false or distorted news. The advances in graph neural networks employed on massive quantities of graph-structured data raise high hopes for the future of mediating communication on social media platforms. An approach based on graph convolutional data was employed to better capture the dependencies between the heterogeneous types of data. Utilizing past and present experiences on the topic, we proposed and evaluated a graph-based approach for antisocial behavior detection, with general applicability that is both language- and context-independent. In this research, we carried out an experimental validation of our graph-based approach on several PAN datasets provided as part of their shared tasks, that enable the discussion of the results obtained by the proposed solution.
Study of Adaptive LLR-based AP selection for Grant-Free Random Access in Cell-Free Networks
Authors: Authors: R. Di Renna, R. C. de Lamare
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This paper presents an iterative detection and decoding scheme along with an adaptive strategy to improve the selection of access points (APs) in a grant-free uplink cell-free scenario. With the requirement for the APs to have low-computational power in mind, we introduce a low-complexity scheme for local activity and data detection. At the central processing unit (CPU) level, we propose an adaptive technique based on local log-likelihood ratios (LLRs) to select the list of APs that should be considered for each device. Simulation results show that the proposed LLRs-based APs selection scheme outperforms the existing techniques in the literature in terms of bit error rate (BER) while requiring comparable fronthaul load.
Temporal Knowledge Distillation for Time-Sensitive Financial Services Applications
Abstract
Detecting anomalies has become an increasingly critical function in the financial service industry. Anomaly detection is frequently used in key compliance and risk functions such as financial crime detection fraud and cybersecurity. The dynamic nature of the underlying data patterns especially in adversarial environments like fraud detection poses serious challenges to the machine learning models. Keeping up with the rapid changes by retraining the models with the latest data patterns introduces pressures in balancing the historical and current patterns while managing the training data size. Furthermore the model retraining times raise problems in time-sensitive and high-volume deployment systems where the retraining period directly impacts the models ability to respond to ongoing attacks in a timely manner. In this study we propose a temporal knowledge distillation-based label augmentation approach (TKD) which utilizes the learning from older models to rapidly boost the latest model and effectively reduces the model retraining times to achieve improved agility. Experimental results show that the proposed approach provides advantages in retraining times while improving the model performance.
Review of Machine Learning Approaches for Diagnostics and Prognostics of Industrial Systems Using Industrial Open Source Data
Abstract
In the field of Prognostics and Health Management (PHM), recent years have witnessed a significant surge in the application of machine learning (ML). Despite this growth, the field grapples with a lack of unified guidelines and systematic approaches for effectively implementing these ML techniques and comprehensive analysis regarding industrial open-source data across varied scenarios. To address these gaps, this paper provides a comprehensive review of machine learning approaches for diagnostics and prognostics of industrial systems using open-source datasets from PHM Data Challenge Competitions held between 2018 and 2023 by PHM Society and IEEE Reliability Society and summarizes a unified ML framework. This review systematically categorizes and scrutinizes the problems, challenges, methodologies, and advancements demonstrated in these competitions, highlighting the evolving role of both conventional machine learning and deep learning in tackling complex industrial tasks related to detection, diagnosis, assessment, and prognosis. Moreover, this paper delves into the common challenges in PHM data challenge competitions by emphasizing both data-related and model-related issues and summarizes the solutions that have been employed to address these challenges. Finally, we identify key themes and potential directions for future research, providing opportunities and prospects for ML further development in PHM.
METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection
Abstract
Real-time analytics and decision-making require online anomaly detection (OAD) to handle drifts in data streams efficiently and effectively. Unfortunately, existing approaches are often constrained by their limited detection capacity and slow adaptation to evolving data streams, inhibiting their efficacy and efficiency in handling concept drift, which is a major challenge in evolving data streams. In this paper, we introduce METER, a novel dynamic concept adaptation framework that introduces a new paradigm for OAD. METER addresses concept drift by first training a base detection model on historical data to capture recurring central concepts, and then learning to dynamically adapt to new concepts in data streams upon detecting concept drift. Particularly, METER employs a novel dynamic concept adaptation technique that leverages a hypernetwork to dynamically generate the parameter shift of the base detection model, providing a more effective and efficient solution than conventional retraining or fine-tuning approaches. Further, METER incorporates a lightweight drift detection controller, underpinned by evidential deep learning, to support robust and interpretable concept drift detection. We conduct an extensive experimental evaluation, and the results show that METER significantly outperforms existing OAD approaches in various application scenarios.
Sensor Data Simulation for Anomaly Detection of the Elderly Living Alone
Authors: Authors: Kai Tanaka, Mineichi Kudo, Keigo Kimura
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
Abstract
With the increase of the number of elderly people living alone around the world, there is a growing demand for sensor-based detection of anomalous behaviors. Although smart homes with ambient sensors could be useful for detecting such anomalies, there is a problem of lack of sufficient real data for developing detection algorithms. For coping with this problem, several sensor data simulators have been proposed, but they have not been able to model appropriately the long-term transitions and correlations between anomalies that exist in reality. In this paper, therefore, we propose a novel sensor data simulator that can model these factors in generation of sensor data. Anomalies considered in this study were classified into three types of \textit{state anomalies}, \textit{activity anomalies}, and \textit{moving anomalies}. The simulator produces 10 years data in 100 min. including six anomalies, two for each type. Numerical evaluations show that this simulator is superior to the past simulators in the sense that it simulates well day-to-day variations of real data.
Chaurah: A Smart Raspberry Pi based Parking System
Abstract
The widespread usage of cars and other large, heavy vehicles necessitates the development of an effective parking infrastructure. Additionally, algorithms for detection and recognition of number plates are often used to identify automobiles all around the world where standardized plate sizes and fonts are enforced, making recognition an effortless task. As a result, both kinds of data can be combined to develop an intelligent parking system focuses on the technology of Automatic Number Plate Recognition (ANPR). Retrieving characters from an inputted number plate image is the sole purpose of ANPR which is a costly procedure. In this article, we propose Chaurah, a minimal cost ANPR system that relies on a Raspberry Pi 3 that was specifically created for parking facilities. The system employs a dual-stage methodology, with the first stage being an ANPR system which makes use of two convolutional neural networks (CNNs). The primary locates and recognises license plates from a vehicle image, while the secondary performs Optical Character Recognition (OCR) to identify individualized numbers from the number plate. An application built with Flutter and Firebase for database administration and license plate record comparison makes up the second component of the overall solution. The application also acts as an user-interface for the billing mechanism based on parking time duration resulting in an all-encompassing software deployment of the study.
DOEPatch: Dynamically Optimized Ensemble Model for Adversarial Patches Generation
Authors: Authors: Wenyi Tan, Yang Li, Chenxing Zhao, Zhunga Liu, Quan Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Object detection is a fundamental task in various applications ranging from autonomous driving to intelligent security systems. However, recognition of a person can be hindered when their clothing is decorated with carefully designed graffiti patterns, leading to the failure of object detection. To achieve greater attack potential against unknown black-box models, adversarial patches capable of affecting the outputs of multiple-object detection models are required. While ensemble models have proven effective, current research in the field of object detection typically focuses on the simple fusion of the outputs of all models, with limited attention being given to developing general adversarial patches that can function effectively in the physical world. In this paper, we introduce the concept of energy and treat the adversarial patches generation process as an optimization of the adversarial patches to minimize the total energy of the ``person'' category. Additionally, by adopting adversarial training, we construct a dynamically optimized ensemble model. During training, the weight parameters of the attacked target models are adjusted to find the balance point at which the generated adversarial patches can effectively attack all target models. We carried out six sets of comparative experiments and tested our algorithm on five mainstream object detection models. The adversarial patches generated by our algorithm can reduce the recognition accuracy of YOLOv2 and YOLOv3 to 13.19\% and 29.20\%, respectively. In addition, we conducted experiments to test the effectiveness of T-shirts covered with our adversarial patches in the physical world and could achieve that people are not recognized by the object detection model. Finally, leveraging the Grad-CAM tool, we explored the attack mechanism of adversarial patches from an energetic perspective.
DeLR: Active Learning for Detection with Decoupled Localization and Recognition Query
Authors: Authors: Yuhang Zhang, Yuang Deng, Xiaopeng Zhang, Jie Li, Robert C. Qiu, Qi Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Active learning has been demonstrated effective to reduce labeling cost, while most progress has been designed for image recognition, there still lacks instance-level active learning for object detection. In this paper, we rethink two key components, i.e., localization and recognition, for object detection, and find that the correctness of them are highly related, therefore, it is not necessary to annotate both boxes and classes if we are given pseudo annotations provided with the trained model. Motivated by this, we propose an efficient query strategy, termed as DeLR, that Decoupling the Localization and Recognition for active query. In this way, we are probably free of class annotations when the localization is correct, and able to assign the labeling budget for more informative samples. There are two main differences in DeLR: 1) Unlike previous methods mostly focus on image-level annotations, where the queried samples are selected and exhausted annotated. In DeLR, the query is based on region-level, and we only annotate the object region that is queried; 2) Instead of directly providing both localization and recognition annotations, we separately query the two components, and thus reduce the recognition budget with the pseudo class labels provided by the model. Experiments on several benchmarks demonstrate its superiority. We hope our proposed query strategy would shed light on researches in active learning in object detection.
EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion
Authors: Authors: Jianping Jiang, Xinyu Zhou, Peiqi Duan, Boxin Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Event cameras and RGB cameras exhibit complementary characteristics in imaging: the former possesses high dynamic range (HDR) and high temporal resolution, while the latter provides rich texture and color information. This makes the integration of event cameras into middle- and high-level RGB-based vision tasks highly promising. However, challenges arise in multi-modal fusion, data annotation, and model architecture design. In this paper, we propose EvPlug, which learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model. The learned fusion module integrates event streams with image features in the form of a plug-in, endowing the RGB-based model to be robust to HDR and fast motion scenes while enabling high temporal resolution inference. Our method only requires unlabeled event-image pairs (no pixel-wise alignment required) and does not alter the structure or weights of the RGB-based model. We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation
SAR-Net: Multi-scale Direction-aware SAR Network via Global Information Fusion
Authors: Authors: Mingxiang Cao, Jie Lei, Weiying Xie, Jiaqing Zhang, Daixun Li, Yunsong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep learning has driven significant progress in object detection using Synthetic Aperture Radar (SAR) imagery. Existing methods, while achieving promising results, often struggle to effectively integrate local and global information, particularly direction-aware features. This paper proposes SAR-Net, a novel framework specifically designed for global fusion of direction-aware information in SAR object detection. SAR-Net leverages two key innovations: the Unity Compensation Mechanism (UCM) and the Direction-aware Attention Module (DAM). UCM facilitates the establishment of complementary relationships among features across different scales, enabling efficient global information fusion. Among them, Multi-scale Alignment Module (MAM) and distinct Multi-level Fusion Module (MFM) enhance feature integration by capturing both texture detail and semantic information. Then, Multi-feature Embedding Module (MEM) feeds back global features into the primary branches, further improving information transmission. Additionally, DAM, through bidirectional attention polymerization, captures direction-aware information, effectively eliminating background interference. Extensive experiments demonstrate the effectiveness of SAR-Net, achieving state-of-the-art results on aircraft (SAR-AIRcraft-1.0) and ship datasets (SSDD, HRSID), confirming its generalization capability and robustness.
Reinforcement-based Display-size Selection for Frugal Satellite Image Change Detection
Authors: Authors: Hichem Sahbi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We introduce a novel interactive satellite image change detection algorithm based on active learning. The proposed method is iterative and consists in frugally probing the user (oracle) about the labels of the most critical images, and according to the oracle's annotations, it updates change detection results. First, we consider a probabilistic framework which assigns to each unlabeled sample a relevance measure modeling how critical is that sample when training change detection functions. We obtain these relevance measures by minimizing an objective function mixing diversity, representativity and uncertainty. These criteria when combined allow exploring different data modes and also refining change detections. Then, we further explore the potential of this objective function, by considering a reinforcement learning approach that finds the best combination of diversity, representativity and uncertainty as well as display-sizes through active learning iterations, leading to better generalization as shown through experiments in interactive satellite image change detection.
AI Powered Road Network Prediction with Multi-Modal Data
Abstract
This study presents an innovative approach for automatic road detection with deep learning, by employing fusion strategies for utilizing both lower-resolution satellite imagery and GPS trajectory data, a concept never explored before. We rigorously investigate both early and late fusion strategies, and assess deep learning based road detection performance using different fusion settings. Our extensive ablation studies assess the efficacy of our framework under diverse model architectures, loss functions, and geographic domains (Istanbul and Montreal). For an unbiased and complete evaluation of road detection results, we use both region-based and boundary-based evaluation metrics for road segmentation. The outcomes reveal that the ResUnet model outperforms U-Net and D-Linknet in road extraction tasks, achieving superior results over the benchmark study using low-resolution Sentinel-2 data. This research not only contributes to the field of automatic road detection but also offers novel insights into the utilization of data fusion methods in diverse applications.
Multi-Attention Fusion Drowsy Driving Detection Model
Abstract
Drowsy driving represents a major contributor to traffic accidents, and the implementation of driver drowsy driving detection systems has been proven to significantly reduce the occurrence of such accidents. Despite the development of numerous drowsy driving detection algorithms, many of them impose specific prerequisites such as the availability of complete facial images, optimal lighting conditions, and the use of RGB images. In our study, we introduce a novel approach called the Multi-Attention Fusion Drowsy Driving Detection Model (MAF). MAF is aimed at significantly enhancing classification performance, especially in scenarios involving partial facial occlusion and low lighting conditions. It accomplishes this by capitalizing on the local feature extraction capabilities provided by multi-attention fusion, thereby enhancing the algorithm's overall robustness. To enhance our dataset, we collected real-world data that includes both occluded and unoccluded faces captured under nighttime and daytime lighting conditions. We conducted a comprehensive series of experiments using both publicly available datasets and our self-built data. The results of these experiments demonstrate that our proposed model achieves an impressive driver drowsiness detection accuracy of 96.8%.
TSPP: A Unified Benchmarking Tool for Time-series Forecasting
Abstract
Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications or are limited in their application domain. This work tackles this shortcoming by proposing a scalable synthetic graph generation tool to scale the datasets to production-size graphs with trillions of edges and billions of nodes. The tool learns a series of parametric models from proprietary datasets that can be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. We demonstrate the generalizability of the framework across a series of datasets, mimicking structural and feature distributions as well as the ability to scale them across varying sizes demonstrating their usefulness for benchmarking and model development.
Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction
Authors: Authors: Olivier Moliner, Sangxia Huang, Kalle Åström
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We address the challenges in estimating 3D human poses from multiple views under occlusion and with limited overlapping views. We approach multi-view, single-person 3D human pose reconstruction as a regression problem and propose a novel encoder-decoder Transformer architecture to estimate 3D poses from multi-view 2D pose sequences. The encoder refines 2D skeleton joints detected across different views and times, fusing multi-view and temporal information through global self-attention. We enhance the encoder by incorporating a geometry-biased attention mechanism, effectively leveraging geometric relationships between views. Additionally, we use detection scores provided by the 2D pose detector to further guide the encoder's attention based on the reliability of the 2D detections. The decoder subsequently regresses the 3D pose sequence from these refined tokens, using pre-defined queries for each joint. To enhance the generalization of our method to unseen scenes and improve resilience to missing joints, we implement strategies including scene centering, synthetic views, and token dropout. We conduct extensive experiments on three benchmark public datasets, Human3.6M, CMU Panoptic and Occlusion-Persons. Our results demonstrate the efficacy of our approach, particularly in occluded scenes and when few views are available, which are traditionally challenging scenarios for triangulation-based methods.
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math
Authors: Authors: Zengzhi Wang, Rui Xia, Pengfei Liu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
High-quality, large-scale corpora are the cornerstone of building foundation models. In this work, we introduce \textsc{MathPile}, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. Throughout its creation, we adhered to the principle of ``\emph{less is more}'', firmly believing in the supremacy of data quality over quantity, even in the pre-training phase. Our meticulous data collection and processing efforts included a complex suite of preprocessing, prefiltering, language identification, cleaning, filtering, and deduplication, ensuring the high quality of our corpus. Furthermore, we performed data contamination detection on downstream benchmark test sets to eliminate duplicates. We hope our \textsc{MathPile} can help to enhance the mathematical reasoning abilities of language models. We plan to open-source different versions of \mathpile with the scripts used for processing, to facilitate future developments in this field.
FENet: Focusing Enhanced Network for Lane Detection
Authors: Authors: Liman Wang, Hanyang Zhong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Inspired by human driving focus, this research pioneers networks augmented with Focusing Sampling, Partial Field of View Evaluation, Enhanced FPN architecture and Directional IoU Loss - targeted innovations addressing obstacles to precise lane detection for autonomous driving. Experiments demonstrate our Focusing Sampling strategy, emphasizing vital distant details unlike uniform approaches, significantly boosts both benchmark and practical curved/distant lane recognition accuracy essential for safety. While FENetV1 achieves state-of-the-art conventional metric performance via enhancements isolating perspective-aware contexts mimicking driver vision, FENetV2 proves most reliable on the proposed Partial Field analysis. Hence we specifically recommend V2 for practical lane navigation despite fractional degradation on standard entire-image measures. Future directions include collecting on-road data and integrating complementary dual frameworks to further breakthroughs guided by human perception principles. Code will be made available.
Do Androids Know They're Only Dreaming of Electric Sheep?
Authors: Authors: Sky CH-Wang, Benjamin Van Durme, Jason Eisner, Chris Kedzie
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
We design probes trained on the internal representations of a transformer language model that are predictive of its hallucinatory behavior on in-context generation tasks. To facilitate this detection, we create a span-annotated dataset of organic and synthetic hallucinations over several tasks. We find that probes trained on the force-decoded states of synthetic hallucinations are generally ecologically invalid in organic hallucination detection. Furthermore, hidden state information about hallucination appears to be task and distribution-dependent. Intrinsic and extrinsic hallucination saliency varies across layers, hidden state types, and tasks; notably, extrinsic hallucinations tend to be more salient in a transformer's internal representations. Outperforming multiple contemporary baselines, we show that probing is a feasible and efficient alternative to language model hallucination evaluation when model states are available.
Keyword: face recognition
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
Authors: Authors: Trung Tuan Dao, Duc Hong Vu, Cuong Pham, Anh Tran
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The existing facial datasets, while having plentiful images at near frontal views, lack images with extreme head poses, leading to the downgraded performance of deep learning models when dealing with profile or pitched faces. This work aims to address this gap by introducing a novel dataset named Extreme Pose Face High-Quality Dataset (EFHQ), which includes a maximum of 450k high-quality images of faces at extreme poses. To produce such a massive dataset, we utilize a novel and meticulous dataset processing pipeline to curate two publicly available datasets, VFHQ and CelebV-HQ, which contain many high-resolution face videos captured in various settings. Our dataset can complement existing datasets on various facial-related tasks, such as facial synthesis with 2D/3D-aware GAN, diffusion-based text-to-image face generation, and face reenactment. Specifically, training with EFHQ helps models generalize well across diverse poses, significantly improving performance in scenarios involving extreme views, confirmed by extensive experiments. Additionally, we utilize EFHQ to define a challenging cross-view face verification benchmark, in which the performance of SOTA face recognition models drops 5-37\% compared to frontal-to-frontal scenarios, aiming to stimulate studies on face recognition under severe pose conditions in the wild.
Keyword: augmentation
LightGCN: Evaluated and Enhanced
Authors: Authors: Milena Kapralova, Luca Pantea, Andrei Blahovici
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
This paper analyses LightGCN in the context of graph recommendation algorithms. Despite the initial design of Graph Convolutional Networks for graph classification, the non-linear operations are not always essential. LightGCN enables linear propagation of embeddings, enhancing performance. We reproduce the original findings, assess LightGCN's robustness on diverse datasets and metrics, and explore Graph Diffusion as an augmentation of signal propagation in LightGCN.
Revisiting Knowledge Distillation under Distribution Shift
Abstract
Knowledge distillation transfers knowledge from large models into small models, and has recently made remarkable achievements. However, few studies has investigated the mechanism of knowledge distillation against distribution shift. Distribution shift refers to the data distribution drifts between training and testing phases. In this paper, we reconsider the paradigm of knowledge distillation by reformulating the objective function in shift situations. Under the real scenarios, we propose a unified and systematic framework to benchmark knowledge distillation against two general distributional shifts including diversity and correlation shift. The evaluation benchmark covers more than 30 methods from algorithmic, data-driven, and optimization perspectives for five benchmark datasets. Overall, we conduct extensive experiments on the student model. We reveal intriguing observations of poor teaching performance under distribution shifts; in particular, complex algorithms and data augmentation offer limited gains in many cases.
Are All Unseen Data Out-of-Distribution?
Authors: Authors: Songming Zhang, Yuxiao Luo, Qizhou Wang, Haoang Chi, Weikai Li, Bo Han, Jinyan Li
Abstract
Distributions of unseen data have been all treated as out-of-distribution (OOD), making their generalization a significant challenge. Much evidence suggests that the size increase of training data can monotonically decrease generalization errors in test data. However, this is not true from other observations and analysis. In particular, when the training data have multiple source domains and the test data contain distribution drifts, then not all generalization errors on the test data decrease monotonically with the increasing size of training data. Such a non-decreasing phenomenon is formally investigated under a linear setting with empirical verification across varying visual benchmarks. Motivated by these results, we redefine the OOD data as a type of data outside the convex hull of the training domains and prove a new generalization bound based on this new definition. It implies that the effectiveness of a well-trained model can be guaranteed for the unseen data that is within the convex hull of the training domains. But, for some data beyond the convex hull, a non-decreasing error trend can happen. Therefore, we investigate the performance of popular strategies such as data augmentation and pre-training to overcome this issue. Moreover, we propose a novel reinforcement learning selection algorithm in the source domains only that can deliver superior performance over the baseline methods.
Domain Generalization with Vital Phase Augmentation
Abstract
Deep neural networks have shown remarkable performance in image classification. However, their performance significantly deteriorates with corrupted input data. Domain generalization methods have been proposed to train robust models against out-of-distribution data. Data augmentation in the frequency domain is one of such approaches that enable a model to learn phase features to establish domain-invariant representations. This approach changes the amplitudes of the input data while preserving the phases. However, using fixed phases leads to susceptibility to phase fluctuations because amplitudes and phase fluctuations commonly occur in out-of-distribution. In this study, to address this problem, we introduce an approach using finite variation of the phases of input data rather than maintaining fixed phases. Based on the assumption that the degree of domain-invariant features varies for each phase, we propose a method to distinguish phases based on this degree. In addition, we propose a method called vital phase augmentation (VIPAug) that applies the variation to the phases differently according to the degree of domain-invariant features of given phases. The model depends more on the vital phases that contain more domain-invariant features for attaining robustness to amplitude and phase fluctuations. We present experimental evaluations of our proposed approach, which exhibited improved performance for both clean and corrupted data. VIPAug achieved SOTA performance on the benchmark CIFAR-10 and CIFAR-100 datasets, as well as near-SOTA performance on the ImageNet-100 and ImageNet datasets. Our code is available at https://github.com/excitedkid/vipaug.
S2M: Converting Single-Turn to Multi-Turn Datasets for Conversational Question Answering
Authors: Authors: Baokui Li, Sen Zhang, Wangshu Zhang, Yicheng Chen, Changlin Yang, Sen Hu, Teng Xu, Siye liu, Jiwei Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Supplying data augmentation to conversational question answering (CQA) can effectively improve model performance. However, there is less improvement from single-turn datasets in CQA due to the distribution gap between single-turn and multi-turn datasets. On the other hand, while numerous single-turn datasets are available, we have not utilized them effectively. To solve this problem, we propose a novel method to convert single-turn datasets to multi-turn datasets. The proposed method consists of three parts, namely, a QA pair Generator, a QA pair Reassembler, and a question Rewriter. Given a sample consisting of context and single-turn QA pairs, the Generator obtains candidate QA pairs and a knowledge graph based on the context. The Reassembler utilizes the knowledge graph to get sequential QA pairs, and the Rewriter rewrites questions from a conversational perspective to obtain a multi-turn dataset S2M. Our experiments show that our method can synthesize effective training resources for CQA. Notably, S2M ranks 1st place on the QuAC leaderboard at the time of submission (Aug 24th, 2022).
RDGCL: Reaction-Diffusion Graph Contrastive Learning for Recommendation
Abstract
Contrastive learning (CL) has emerged as a promising technique for improving recommender systems, addressing the challenge of data sparsity by leveraging self-supervised signals from raw data. Integration of CL with graph convolutional network (GCN)-based collaborative filterings (CFs) has been explored in recommender systems. However, current CL-based recommendation models heavily rely on low-pass filters and graph augmentations. In this paper, we propose a novel CL method for recommender systems called the reaction-diffusion graph contrastive learning model (RDGCL). We design our own GCN for CF based on both the diffusion, i.e., low-pass filter, and the reaction, i.e., high-pass filter, equations. Our proposed CL-based training occurs between reaction and diffusion-based embeddings, so there is no need for graph augmentations. Experimental evaluation on 6 benchmark datasets demonstrates that our proposed method outperforms state-of-the-art CL-based recommendation models. By enhancing recommendation accuracy and diversity, our method brings an advancement in CL for recommender systems.
GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection
Abstract
Few-shot object detection (FSOD) aims to achieve object detection only using a few novel class training data. Most of the existing methods usually adopt a transfer-learning strategy to construct the novel class distribution by transferring the base class knowledge. However, this direct way easily results in confusion between the novel class and other similar categories in the decision space. To address the problem, we propose generating local reverse samples (LRSamples) in Prototype Reference Frames to adaptively adjust the center position and boundary range of the novel class distribution to learn more discriminative novel class samples for FSOD. Firstly, we propose a Center Calibration Variance Augmentation (CCVA) module, which contains the selection rule of LRSamples, the generator of LRSamples, and augmentation on the calibrated distribution centers. Specifically, we design an intra-class feature converter (IFC) as the generator of CCVA to learn the selecting rule. By transferring the knowledge of IFC from the base training to fine-tuning, the IFC generates plentiful novel samples to calibrate the novel class distribution. Moreover, we propose a Feature Density Boundary Optimization (FDBO) module to adaptively adjust the importance of samples depending on their distance from the decision boundary. It can emphasize the importance of the high-density area of the similar class (closer decision boundary area) and reduce the weight of the low-density area of the similar class (farther decision boundary area), thus optimizing a clearer decision boundary for each category. We conduct extensive experiments to demonstrate the effectiveness of our proposed method. Our method achieves consistent improvement on the Pascal VOC and MS COCO datasets based on DeFRCN and MFDC baselines.
Mitigating Degree Biases in Message Passing Mechanism by Utilizing Community Structures
Authors: Authors: Van Thuy Hoang, O-Joun Lee
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract
This study utilizes community structures to address node degree biases in message-passing (MP) via learnable graph augmentations and novel graph transformers. Recent augmentation-based methods showed that MP neural networks often perform poorly on low-degree nodes, leading to degree biases due to a lack of messages reaching low-degree nodes. Despite their success, most methods use heuristic or uniform random augmentations, which are non-differentiable and may not always generate valuable edges for learning representations. In this paper, we propose Community-aware Graph Transformers, namely CGT, to learn degree-unbiased representations based on learnable augmentations and graph transformers by extracting within community structures. We first design a learnable graph augmentation to generate more within-community edges connecting low-degree nodes through edge perturbation. Second, we propose an improved self-attention to learn underlying proximity and the roles of nodes within the community. Third, we propose a self-supervised learning task that could learn the representations to preserve the global graph structure and regularize the graph augmentations. Extensive experiments on various benchmark datasets showed CGT outperforms state-of-the-art baselines and significantly improves the node degree biases. The source code is available at https://github.com/NSLab-CUK/Community-aware-Graph-Transformer.
Temporal Knowledge Distillation for Time-Sensitive Financial Services Applications
Abstract
Detecting anomalies has become an increasingly critical function in the financial service industry. Anomaly detection is frequently used in key compliance and risk functions such as financial crime detection fraud and cybersecurity. The dynamic nature of the underlying data patterns especially in adversarial environments like fraud detection poses serious challenges to the machine learning models. Keeping up with the rapid changes by retraining the models with the latest data patterns introduces pressures in balancing the historical and current patterns while managing the training data size. Furthermore the model retraining times raise problems in time-sensitive and high-volume deployment systems where the retraining period directly impacts the models ability to respond to ongoing attacks in a timely manner. In this study we propose a temporal knowledge distillation-based label augmentation approach (TKD) which utilizes the learning from older models to rapidly boost the latest model and effectively reduces the model retraining times to achieve improved agility. Experimental results show that the proposed approach provides advantages in retraining times while improving the model performance.
DiffKG: Knowledge Graph Diffusion Model for Recommendation
Abstract
Knowledge Graphs (KGs) have emerged as invaluable resources for enriching recommendation systems by providing a wealth of factual information and capturing semantic relationships among items. Leveraging KGs can significantly enhance recommendation performance. However, not all relations within a KG are equally relevant or beneficial for the target recommendation task. In fact, certain item-entity connections may introduce noise or lack informative value, thus potentially misleading our understanding of user preferences. To bridge this research gap, we propose a novel knowledge graph diffusion model for recommendation, referred to as DiffKG. Our framework integrates a generative diffusion model with a data augmentation paradigm, enabling robust knowledge graph representation learning. This integration facilitates a better alignment between knowledge-aware item semantics and collaborative relation modeling. Moreover, we introduce a collaborative knowledge graph convolution mechanism that incorporates collaborative signals reflecting user-item interaction patterns, guiding the knowledge graph diffusion process. We conduct extensive experiments on three publicly available datasets, consistently demonstrating the superiority of our DiffKG compared to various competitive baselines. We provide the source code repository of our proposed DiffKG model at the following link: https://github.com/HKUDS/DiffKG.
3DTINC: Time-Equivariant Non-Contrastive Learning for Predicting Disease Progression from Longitudinal OCTs
Authors: Authors: Taha Emre, Arunava Chakravarty, Antoine Rivail, Dmitrii Lachinov, Oliver Leingang, Sophie Riedl, Julia Mai, Hendrik P.N. Scholl, Sobha Sivaprasad, Daniel Rueckert, Andrew Lotery, Ursula Schmidt-Erfurth, Hrvoje Bogunović
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Self-supervised learning (SSL) has emerged as a powerful technique for improving the efficiency and effectiveness of deep learning models. Contrastive methods are a prominent family of SSL that extract similar representations of two augmented views of an image while pushing away others in the representation space as negatives. However, the state-of-the-art contrastive methods require large batch sizes and augmentations designed for natural images that are impractical for 3D medical images. To address these limitations, we propose a new longitudinal SSL method, 3DTINC, based on non-contrastive learning. It is designed to learn perturbation-invariant features for 3D optical coherence tomography (OCT) volumes, using augmentations specifically designed for OCT. We introduce a new non-contrastive similarity loss term that learns temporal information implicitly from intra-patient scans acquired at different times. Our experiments show that this temporal information is crucial for predicting progression of retinal diseases, such as age-related macular degeneration (AMD). After pretraining with 3DTINC, we evaluated the learned representations and the prognostic models on two large-scale longitudinal datasets of retinal OCTs where we predict the conversion to wet-AMD within a six months interval. Our results demonstrate that each component of our contributions is crucial for learning meaningful representations useful in predicting disease progression from longitudinal volumetric scans.
Generalizable Visual Reinforcement Learning with Segment Anything Model
Abstract
Learning policies that can generalize to unseen environments is a fundamental challenge in visual reinforcement learning (RL). While most current methods focus on acquiring robust visual representations through auxiliary supervision, pre-training, or data augmentation, the potential of modern vision foundation models remains underleveraged. In this work, we introduce Segment Anything Model for Generalizable visual RL (SAM-G), a novel framework that leverages the promptable segmentation ability of Segment Anything Model (SAM) to enhance the generalization capabilities of visual RL agents. We utilize image features from DINOv2 and SAM to find correspondence as point prompts to SAM, and then SAM produces high-quality masked images for agents directly. Evaluated across 8 DMControl tasks and 3 Adroit tasks, SAM-G significantly improves the visual generalization ability without altering the RL agents' architecture but merely their observations. Notably, SAM-G achieves 44% and 29% relative improvements on the challenging video hard setting on DMControl and Adroit respectively, compared to state-of-the-art methods. Video and code: https://yanjieze.com/SAM-G/
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Authors: Authors: Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action. To unify different modalities, we tokenize inputs and outputs -- images, text, audio, action, bounding boxes, etc., into a shared semantic space and then process them with a single encoder-decoder transformer model. Since training with such diverse modalities is challenging, we propose various architectural improvements to stabilize model training. We train our model from scratch on a large multimodal pre-training corpus from diverse sources with a multimodal mixture of denoisers objective. To learn an expansive set of skills, such as following multimodal instructions, we construct and finetune on an ensemble of 120 datasets with prompts and augmentations. With a single unified model, Unified-IO 2 achieves state-of-the-art performance on the GRIT benchmark and strong results in more than 35 benchmarks, including image generation and understanding, natural language understanding, video and audio understanding, and robotic manipulation. We release all our models to the research community.
Keyword: detection
Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection
Observable Propagation: A Data-Efficient Approach to Uncover Feature Vectors in Transformers
Nearly Tight Bounds For Differentially Private Min $s$-$t$ and Multiway Cut
LLM Polygraph: Uncovering LLMs' Factual Discernment through Intermediate Data Analysis
Segment Change Model (SCM) for Unsupervised Change detection in VHR Remote Sensing Images: a Case Study of Buildings
Soft Contrastive Learning for Time Series
ReSynthDetect: A Fundus Anomaly Detection Network with Reconstruction and Synthetic Features
Source Code is a Graph, Not a Sequence: A Cross-Lingual Perspective on Code Clone Detection
Camera calibration for the surround-view system: a benchmark and dataset
ConstScene: Dataset and Model for Advancing Robust Semantic Segmentation in Construction Environments
Diagnosis of Small-world Bias in Random Graphs
GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection
Enhancing Traffic Flow Prediction using Outlier-Weighted AutoEncoders: Handling Real-Time Changes
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
Dual-Functional Artificial Noise (DFAN) Aided Robust Covert Communications in Integrated Sensing and Communications
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Landslide Detection and Segmentation Using Remote Sensing Images and Deep Neural Network
A pipeline for multiple orange detection and tracking with 3-D fruit relocalization and neural-net based yield regression in commercial citrus orchards
Bayesian Sensor Placement for Multi-source Localization of Pathogens in Wastewater Networks
Graph Neural Networks for Antisocial Behavior Detection on Twitter
Study of Adaptive LLR-based AP selection for Grant-Free Random Access in Cell-Free Networks
Temporal Knowledge Distillation for Time-Sensitive Financial Services Applications
Review of Machine Learning Approaches for Diagnostics and Prognostics of Industrial Systems Using Industrial Open Source Data
METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection
Sensor Data Simulation for Anomaly Detection of the Elderly Living Alone
Chaurah: A Smart Raspberry Pi based Parking System
DOEPatch: Dynamically Optimized Ensemble Model for Adversarial Patches Generation
DeLR: Active Learning for Detection with Decoupled Localization and Recognition Query
EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion
SAR-Net: Multi-scale Direction-aware SAR Network via Global Information Fusion
Reinforcement-based Display-size Selection for Frugal Satellite Image Change Detection
AI Powered Road Network Prediction with Multi-Modal Data
Multi-Attention Fusion Drowsy Driving Detection Model
TSPP: A Unified Benchmarking Tool for Time-series Forecasting
Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math
FENet: Focusing Enhanced Network for Lane Detection
Do Androids Know They're Only Dreaming of Electric Sheep?
Keyword: face recognition
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
Keyword: augmentation
LightGCN: Evaluated and Enhanced
Revisiting Knowledge Distillation under Distribution Shift
Are All Unseen Data Out-of-Distribution?
Domain Generalization with Vital Phase Augmentation
S2M: Converting Single-Turn to Multi-Turn Datasets for Conversational Question Answering
RDGCL: Reaction-Diffusion Graph Contrastive Learning for Recommendation
GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection
Mitigating Degree Biases in Message Passing Mechanism by Utilizing Community Structures
Temporal Knowledge Distillation for Time-Sensitive Financial Services Applications
DiffKG: Knowledge Graph Diffusion Model for Recommendation
3DTINC: Time-Equivariant Non-Contrastive Learning for Predicting Disease Progression from Longitudinal OCTs
Generalizable Visual Reinforcement Learning with Segment Anything Model
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action