Abstract
The connection between curvature and topology is a very well-studied theme in the subject of differential geometry. By suitably defining curvature on networks, the study of this theme has been extended into the domain of network analysis as well. In particular, this has led to curvature-based community detection algorithms. In this paper, we reveal the relation between community structure of a network and the curvature of its edges. In particular, we give apriori bounds on the curvature of intercommunity edges of a graph.
Title:
Bias Correction in Machine Learning-based Classification of Rare Events
Abstract
Online platform businesses can be identified by using web-scraped texts. This is a classification problem that combines elements of natural language processing and rare event detection. Because online platforms are rare, accurately identifying them with Machine Learning algorithms is challenging. Here, we describe the development of a Machine Learning-based text classification approach that reduces the number of false positives as much as possible. It greatly reduces the bias in the estimates obtained by using calibrated probabilities and ensembles.
Title:
Digital twin with automatic disturbance detection for real-time optimization of a semi-autogenous grinding (SAG) mill
Authors: Daniel Navia, Rodrigo Bruna, Francisco Fernández, Cristobal Mancilla, Matías Rojas, Mauricio Estrada, Paulina Quintanilla
Subjects: Subjects:
Systems and Control (eess.SY); Machine Learning (cs.LG); Applications (stat.AP)
Abstract
This work presents the development and validation of a digital twin for a semi-autogenous grinding (SAG) mill controlled by an expert control system. The digital twin consists of three interconnected modules that emulate the behavior of a closed-loop system: (1) fuzzy logic for the expert control system, (2) a state-space model for the regulatory control, and (3) a recurrent neural network (RNN) for the SAG mill process. The model was trained with data corresponding to 68 hours of operation and validated with 8 hours of test data. The digital twin predicts the dynamic behavior of the mill's bearing pressure, motor power, tonnage, solids percentage, and rotational speed within a 2.5-minute horizon with a 30-second sampling time. The RNN comprises two serial modules for detection and training. The disturbance detection evaluates the need for training by comparing the recent prediction error with the expected error using hypothesis tests for mean, variance, and probability distribution. If the detection module is activated, the parameters of the neural model are re-estimated with recent data. The detection module was configured with test data to eliminate false positives. Results indicate that the digital twin can satisfactorily supervise the SAG mill, which is operated with the expert control system. Future work will focus on integrating this digital twin into real-time optimization strategies with industrial validation.
Title:
Hybrid Machine Learning Approach For Real-Time Malicious Url Detection Using Som-Rmo And Rbfn With Tabu Search Optimization
Abstract
The proliferation of malicious URLs has become a significant threat to internet security, encompassing SPAM, phishing, malware, and defacement attacks. Traditional detection methods struggle to keep pace with the evolving nature of these threats. Detecting malicious URLs in real-time requires advanced techniques capable of handling large datasets and identifying novel attack patterns. The challenge lies in developing a robust model that combines efficient feature extraction with accurate classification. We propose a hybrid machine learning approach combining Self-Organizing Map based Radial Movement Optimization (SOM-RMO) for feature extraction and Radial Basis Function Network (RBFN) based Tabu Search for classification. SOM-RMO effectively reduces dimensionality and highlights significant features, while RBFN, optimized with Tabu Search, classifies URLs with high precision. The proposed model demonstrates superior performance in detecting various malicious URL attacks. On a benchmark dataset, our approach achieved an accuracy of 96.5%, precision of 95.2%, recall of 94.8%, and an F1-score of 95.0%, outperforming traditional methods significantly.
Title:
ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access Networks
Authors: Pranshav Gajjar, Vijay K. Shah
Subjects: Subjects:
Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Large Language Models (LLMs) can revolutionize how we deploy and operate Open Radio Access Networks (O-RAN) by enhancing network analytics, anomaly detection, and code generation and significantly increasing the efficiency and reliability of a plethora of O-RAN tasks. In this paper, we present ORAN-Bench-13K, the first comprehensive benchmark designed to evaluate the performance of Large Language Models (LLMs) within the context of O-RAN. Our benchmark consists of 13,952 meticulously curated multiple-choice questions generated from 116 O-RAN specification documents. We leverage a novel three-stage LLM framework, and the questions are categorized into three distinct difficulties to cover a wide spectrum of ORAN-related knowledge. We thoroughly evaluate the performance of several state-of-the-art LLMs, including Gemini, Chat-GPT, and Mistral. Additionally, we propose ORANSight, a Retrieval-Augmented Generation (RAG)-based pipeline that demonstrates superior performance on ORAN-Bench-13K compared to other tested closed-source models. Our findings indicate that current popular LLM models are not proficient in O-RAN, highlighting the need for specialized models. We observed a noticeable performance improvement when incorporating the RAG-based ORANSight pipeline, with a Macro Accuracy of 0.784 and a Weighted Accuracy of 0.776, which was on average 21.55% and 22.59% better than the other tested LLMs.
Title:
Unsupervised Fault Detection using SAM with a Moving Window Approach
Abstract
Automated f ault detection and monitoring in engineering are critical but frequently difficult owing to the necessity for collecting and labeling large amounts of defective samples . We present an unsupervised method that uses the high end Segment Anything Model (SAM) and a moving window approach. SAM has gained recognition in AI image segmentation communities for its accuracy and versatility. However, its performance can be inconsistent when dealing with certain unexpected shapes , such as shadows and subtle surface irregularities. This limitation raise s concerns about its applicability for fault detection in real world scenarios We aim to overcome these challenges without requiring fine tun ing or labeled data. Our technique divides pictures into smaller windows, which are subsequently processed using SAM. This increases the accuracy of fault identification by focusing on localized details. We compute the sizes of the segmented sections and then us e a clustering technique to discover consistent fault areas while filtering out noise. To further improve the method's robustness , we propose adding the Exponentially Weighted Moving Average (EWMA) technique for continuous monitoring in industrial settings, which would improve the method's capacity to trace faults over time. We compare our method to various well established methods u sing a real case study where our model achieve s 0.96 accuracy compared to 0. 8 5 for the second best method. W e also compare our method us ing two open source datasets where our model attains a consistent 0. 86 accuracy across the datasets compared to 0.53 and 0.54 for second best model s.
Title:
GeoWATCH for Detecting Heavy Construction in Heterogeneous Time Series of Satellite Images
Authors: Jon Crall, Connor Greenwell, David Joy, Matthew Leotta, Aashish Chaudhary, Anthony Hoogs
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
Learning from multiple sensors is challenging due to spatio-temporal misalignment and differences in resolution and captured spectra. To that end, we introduce GeoWATCH, a flexible framework for training models on long sequences of satellite images sourced from multiple sensor platforms, which is designed to handle image classification, activity recognition, object detection, or object tracking tasks. Our system includes a novel partial weight loading mechanism based on sub-graph isomorphism which allows for continually training and modifying a network over many training cycles. This has allowed us to train a lineage of models over a long period of time, which we have observed has improved performance as we adjust configurations while maintaining a core backbone.
Title:
FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols
Abstract
Blockchain adoption has surged with the rise of Decentralized Finance (DeFi) applications. However, the significant value of digital assets managed by DeFi protocols makes them prime targets for attacks. Current smart contract vulnerability detection tools struggle with DeFi protocols due to deep logical bugs arising from complex financial interactions between multiple smart contracts. These tools primarily analyze individual contracts and resort to brute-force methods for DeFi protocols crossing numerous smart contracts, leading to inefficiency. We introduce Foray, a highly effective attack synthesis framework against deep logical bugs in DeFi protocols. Foray proposes a novel attack sketch generation and completion framework. Specifically, instead of treating DeFis as regular programs, we design a domain-specific language (DSL) to lift the low-level smart contracts into their high-level financial operations. Based on our DSL, we first compile a given DeFi protocol into a token flow graph, our graphical representation of DeFi protocols. Then, we design an efficient sketch generation method to synthesize attack sketches for a certain attack goal (e.g., price manipulation, arbitrage, etc.). This algorithm strategically identifies candidate sketches by finding reachable paths in TFG, which is much more efficient than random enumeration. For each candidate sketch written in our DSL, Foray designs a domain-specific symbolic compilation to compile it into SMT constraints. Our compilation simplifies the constraints by removing redundant smart contract semantics. It maintains the usability of symbolic compilation, yet scales to problems orders of magnitude larger. Finally, the candidates are completed via existing solvers and are transformed into concrete attacks via direct syntax transformation.
Title:
CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings
Authors: Anthony Varkey, Siyuan Jiang, Weijing Huang
Abstract
Pretrained language models for code token embeddings are used in code search, code clone detection, and other code-related tasks. Similarly, code function embeddings are useful in such tasks. However, there are no out-of-box models for function embeddings in the current literature. So, this paper proposes CodeCSE, a contrastive learning model that learns embeddings for functions and their descriptions in one space. We evaluated CodeCSE using code search. CodeCSE's multi-lingual zero-shot approach is as efficient as the models finetuned from GraphCodeBERT for specific languages. CodeCSE is open source at this https URL and the pretrained model is available at the HuggingFace public hub: this https URL
Title:
Leveraging image captions for selective whole slide image annotation
Authors: Jingna Qiu, Marc Aubreville, Frauke Wilm, Mathias Öttl, Jonas Utz, Maja Schlereth, Katharina Breininger
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
Acquiring annotations for whole slide images (WSIs)-based deep learning tasks, such as creating tissue segmentation masks or detecting mitotic figures, is a laborious process due to the extensive image size and the significant manual work involved in the annotation. This paper focuses on identifying and annotating specific image regions that optimize model training, given a limited annotation budget. While random sampling helps capture data variance by collecting annotation regions throughout the WSIs, insufficient data curation may result in an inadequate representation of minority classes. Recent studies proposed diversity sampling to select a set of regions that maximally represent unique characteristics of the WSIs. This is done by pretraining on unlabeled data through self-supervised learning and then clustering all regions in the latent space. However, establishing the optimal number of clusters can be difficult and not all clusters are task-relevant. This paper presents prototype sampling, a new method for annotation region selection. It discovers regions exhibiting typical characteristics of each task-specific class. The process entails recognizing class prototypes from extensive histopathology image-caption databases and detecting unlabeled image regions that resemble these prototypes. Our results show that prototype sampling is more effective than random and diversity sampling in identifying annotation regions with valuable training information, resulting in improved model performance in semantic segmentation and mitotic figure detection tasks. Code is available at this https URL.
Title:
Stochastic Traveling Salesperson Problem with Neighborhoods for Object Detection
Authors: Cheng Peng, Minghan Wei, Volkan Isler
Subjects: Subjects:
Data Structures and Algorithms (cs.DS)
Abstract
We introduce a new route-finding problem which considers perception and travel costs simultaneously. Specifically, we consider the problem of finding the shortest tour such that all objects of interest can be detected successfully. To represent a viable detection region for each object, we propose to use an entropy-based viewing score that generates a diameter-bounded region as a viewing neighborhood. We formulate the detection-based trajectory planning problem as a stochastic traveling salesperson problem with neighborhoods and propose a center-visit method that obtains an approximation ratio of O(DmaxDmin) for disjoint regions. For non-disjoint regions, our method -provides a novel finite detour in 3D, which utilizes the region's minimum curvature property. Finally, we show that our method can generate efficient trajectories compared to a baseline method in a photo-realistic simulation environment.
Title:
Advanced Financial Fraud Detection Using GNN-CL Model
Authors: Yu Cheng, Junjie Guo, Shiqing Long, You Wu, Mengfang Sun, Rong Zhang
Abstract
The innovative GNN-CL model proposed in this paper marks a breakthrough in the field of financial fraud detection by synergistically combining the advantages of graph neural networks (gnn), convolutional neural networks (cnn) and long short-term memory (LSTM) networks. This convergence enables multifaceted analysis of complex transaction patterns, improving detection accuracy and resilience against complex fraudulent activities. A key novelty of this paper is the use of multilayer perceptrons (MLPS) to estimate node similarity, effectively filtering out neighborhood noise that can lead to false positives. This intelligent purification mechanism ensures that only the most relevant information is considered, thereby improving the model's understanding of the network structure. Feature weakening often plagues graph-based models due to the dilution of key signals. In order to further address the challenge of feature weakening, GNN-CL adopts reinforcement learning strategies. By dynamically adjusting the weights assigned to central nodes, it reinforces the importance of these influential entities to retain important clues of fraud even in less informative data. Experimental evaluations on Yelp datasets show that the results highlight the superior performance of GNN-CL compared to existing methods.
Title:
DriftGAN: Using historical data for Unsupervised Recurring Drift Detection
Authors: Christofer Fellicious, Sahib Julka, Lorenz Wendlinger, Michael Granitzer
Abstract
In real-world applications, input data distributions are rarely static over a period of time, a phenomenon known as concept drift. Such concept drifts degrade the model's prediction performance, and therefore we require methods to overcome these issues. The initial step is to identify concept drifts and have a training method in place to recover the model's performance. Most concept drift detection methods work on detecting concept drifts and signalling the requirement to retrain the model. However, in real-world cases, there could be concept drifts that recur over a period of time. In this paper, we present an unsupervised method based on Generative Adversarial Networks(GAN) to detect concept drifts and identify whether a specific concept drift occurred in the past. Our method reduces the time and data the model requires to get up to speed for recurring drifts. Our key results indicate that our proposed model can outperform the current state-of-the-art models in most datasets. We also test our method on a real-world use case from astrophysics, where we detect the bow shock and magnetopause crossings with better results than the existing methods in the domain.
Title:
Robust and Explainable Framework to Address Data Scarcity in Diagnostic Imaging
Abstract
Deep learning has significantly advanced automatic medical diagnostics and released the occupation of human resources to reduce clinical pressure, yet the persistent challenge of data scarcity in this area hampers its further improvements and applications. To address this gap, we introduce a novel ensemble framework called `Efficient Transfer and Self-supervised Learning based Ensemble Framework' (ETSEF). ETSEF leverages features from multiple pre-trained deep learning models to efficiently learn powerful representations from a limited number of data samples. To the best of our knowledge, ETSEF is the first strategy that combines two pre-training methodologies (Transfer Learning and Self-supervised Learning) with ensemble learning approaches. Various data enhancement techniques, including data augmentation, feature fusion, feature selection, and decision fusion, have also been deployed to maximise the efficiency and robustness of the ETSEF model. Five independent medical imaging tasks, including endoscopy, breast cancer, monkeypox, brain tumour, and glaucoma detection, were tested to demonstrate ETSEF's effectiveness and robustness. Facing limited sample numbers and challenging medical tasks, ETSEF has proved its effectiveness by improving diagnostics accuracies from 10\% to 13.3\% when compared to strong ensemble baseline models and up to 14.4\% improvements compared with published state-of-the-art methods. Moreover, we emphasise the robustness and trustworthiness of the ETSEF method through various vision-explainable artificial intelligence techniques, including Grad-CAM, SHAP, and t-SNE. Compared to those large-scale deep learning models, ETSEF can be deployed flexibly and maintain superior performance for challenging medical imaging tasks, showing the potential to be applied to more areas that lack training data
Title:
D-MASTER: Mask Annealed Transformer for Unsupervised Domain Adaptation in Breast Cancer Detection from Mammograms
Abstract
We focus on the problem of Unsupervised Domain Adaptation (\uda) for breast cancer detection from mammograms (BCDM) problem. Recent advancements have shown that masked image modeling serves as a robust pretext task for UDA. However, when applied to cross-domain BCDM, these techniques struggle with breast abnormalities such as masses, asymmetries, and micro-calcifications, in part due to the typically much smaller size of region of interest in comparison to natural images. This often results in more false positives per image (FPI) and significant noise in pseudo-labels typically used to bootstrap such techniques. Recognizing these challenges, we introduce a transformer-based Domain-invariant Mask Annealed Student Teacher autoencoder (D-MASTER) framework. D-MASTER adaptively masks and reconstructs multi-scale feature maps, enhancing the model's ability to capture reliable target domain features. D-MASTER also includes adaptive confidence refinement to filter pseudo-labels, ensuring only high-quality detections are considered. We also provide a bounding box annotated subset of 1000 mammograms from the RSNA Breast Screening Dataset (referred to as RSNA-BSD1K) to support further research in BCDM. We evaluate D-MASTER on multiple BCDM datasets acquired from diverse domains. Experimental results show a significant improvement of 9% and 13% in sensitivity at 0.3 FPI over state-of-the-art UDA techniques on publicly available benchmark INBreast and DDSM datasets respectively. We also report an improvement of 11% and 17% on In-house and RSNA-BSD1K datasets respectively. The source code, pre-trained D-MASTER model, along with RSNA-BSD1K dataset annotations is available at this https URL.
Title:
Ensembled Cold-Diffusion Restorations for Unsupervised Anomaly Detection
Authors: Sergio Naval Marimont, Vasilis Siomos, Matthew Baugh, Christos Tzelepis, Bernhard Kainz, Giacomo Tarroni
Abstract
Unsupervised Anomaly Detection (UAD) methods aim to identify anomalies in test samples comparing them with a normative distribution learned from a dataset known to be anomaly-free. Approaches based on generative models offer interpretability by generating anomaly-free versions of test images, but are typically unable to identify subtle anomalies. Alternatively, approaches using feature modelling or self-supervised methods, such as the ones relying on synthetically generated anomalies, do not provide out-of-the-box interpretability. In this work, we present a novel method that combines the strengths of both strategies: a generative cold-diffusion pipeline (i.e., a diffusion-like pipeline which uses corruptions not based on noise) that is trained with the objective of turning synthetically-corrupted images back to their normal, original appearance. To support our pipeline we introduce a novel synthetic anomaly generation procedure, called DAG, and a novel anomaly score which ensembles restorations conditioned with different degrees of abnormality. Our method surpasses the prior state-of-the art for unsupervised anomaly detection in three different Brain MRI datasets.
Title:
Studying the Degradation of Propagation Delay on FPGAs at the European XFEL
Authors: Leandro Lanzieri, Lukasz Butkowski, Jiri Kral, Goerschwin Fey, Holger Schlarb, Thomas C. Schmidt
Abstract
An increasing number of unhardened commercial-off-the-shelf embedded devices are deployed under harsh operating conditions and in highly-dependable systems. Due to the mechanisms of hardware degradation that affect these devices, ageing detection and monitoring are crucial to prevent critical failures. In this paper, we empirically study the propagation delay of 298 naturally-aged FPGA devices that are deployed in the European XFEL particle accelerator. Based on in-field measurements, we find that operational devices show significantly slower switching frequencies than unused chips, and that increased gamma and neutron radiation doses correlate with increased hardware degradation. Furthermore, we demonstrate the feasibility of developing machine learning models that estimate the switching frequencies of the devices based on historical and environmental data.
Title:
A Predictive Model Based on Transformer with Statistical Feature Embedding in Manufacturing Sensor Dataset
Abstract
In the manufacturing process, sensor data collected from equipment is crucial for building predictive models to manage processes and improve productivity. However, in the field, it is challenging to gather sufficient data to build robust models. This study proposes a novel predictive model based on the Transformer, utilizing statistical feature embedding and window positional encoding. Statistical features provide an effective representation of sensor data, and the embedding enables the Transformer to learn both time- and sensor-related information. Window positional encoding captures precise time details from the feature embedding. The model's performance is evaluated in two problems: fault detection and virtual metrology, showing superior results compared to baseline models. This improvement is attributed to the efficient use of parameters, which is particularly beneficial for sensor data that often has limited sample sizes. The results support the model's applicability across various manufacturing industries, demonstrating its potential for enhancing process management and yield.
Title:
Universal Multi-view Black-box Attack against Object Detectors via Layout Optimization
Abstract
Object detectors have demonstrated vulnerability to adversarial examples crafted by small perturbations that can deceive the object detector. Existing adversarial attacks mainly focus on white-box attacks and are merely valid at a specific viewpoint, while the universal multi-view black-box attack is less explored, limiting their generalization in practice. In this paper, we propose a novel universal multi-view black-box attack against object detectors, which optimizes a universal adversarial UV texture constructed by multiple image stickers for a 3D object via the designed layout optimization algorithm. Specifically, we treat the placement of image stickers on the UV texture as a circle-based layout optimization problem, whose objective is to find the optimal circle layout filled with image stickers so that it can deceive the object detector under the multi-view scenario. To ensure reasonable placement of image stickers, two constraints are elaborately devised. To optimize the layout, we adopt the random search algorithm enhanced by the devised important-aware selection strategy to find the most appropriate image sticker for each circle from the image sticker pools. Extensive experiments conducted on four common object detectors suggested that the detection performance decreases by a large magnitude of 74.29% on average in multi-view scenarios. Additionally, a novel evaluation tool based on the photo-realistic simulator is designed to assess the texture-based attack fairly.
Title:
PSPU: Enhanced Positive and Unlabeled Learning by Leveraging Pseudo Supervision
Abstract
Positive and Unlabeled (PU) learning, a binary classification model trained with only positive and unlabeled data, generally suffers from overfitted risk estimation due to inconsistent data distributions. To address this, we introduce a pseudo-supervised PU learning framework (PSPU), in which we train the PU model first, use it to gather confident samples for the pseudo supervision, and then apply these supervision to correct the PU model's weights by leveraging non-PU objectives. We also incorporate an additional consistency loss to mitigate noisy sample effects. Our PSPU outperforms recent PU learning methods significantly on MNIST, CIFAR-10, CIFAR-100 in both balanced and imbalanced settings, and enjoys competitive performance on MVTecAD for industrial anomaly detection.
Title:
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Authors: Yu-Guan Hsieh, Cheng-Yu Hsieh, Shih-Ying Yeh, Louis Béthune, Hadi Pour Ansari, Pavan Kumar Anasosalu Vasu, Chun-Liang Li, Ranjay Krishna, Oncel Tuzel, Marco Cuturi
Abstract
Humans describe complex scenes with compositionality, using simple text descriptions enriched with links and relationships. While vision-language research has aimed to develop models with compositional understanding capabilities, this is not reflected yet in existing datasets which, for the most part, still use plain text to describe images. In this work, we propose a new annotation strategy, graph-based captioning (GBC) that describes an image using a labelled graph structure, with nodes of various types. The nodes in GBC are created using, in a first stage, object detection and dense captioning tools nested recursively to uncover and describe entity nodes, further linked together in a second stage by highlighting, using new types of nodes, compositions and relations among entities. Since all GBC nodes hold plain text descriptions, GBC retains the flexibility found in natural language, but can also encode hierarchical information in its edges. We demonstrate that GBC can be produced automatically, using off-the-shelf multimodal LLMs and open-vocabulary detection models, by building a new dataset, GBC10M, gathering GBC annotations for about 10M images of the CC12M dataset. We use GBC10M to showcase the wealth of node captions uncovered by GBC, as measured with CLIP training. We show that using GBC nodes' annotations -- notably those stored in composition and relation nodes -- results in significant performance boost on downstream models when compared to other dataset formats. To further explore the opportunities provided by GBC, we also propose a new attention mechanism that can leverage the entire GBC graph, with encouraging experimental results that show the extra benefits of incorporating the graph structure. Our datasets are released at \url{this https URL}.
Title:
Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions
Authors: Wenxin Zhou, Thuy Hang Ngo
Subjects: Subjects:
Computation and Language (cs.CL)
Abstract
Our team participated in the BioASQ 2024 Task12b and Synergy tasks to build a system that can answer biomedical questions by retrieving relevant articles and snippets from the PubMed database and generating exact and ideal answers. We propose a two-level information retrieval and question-answering system based on pre-trained large language models (LLM), focused on LLM prompt engineering and response post-processing. We construct prompts with in-context few-shot examples and utilize post-processing techniques like resampling and malformed response detection. We compare the performance of various pre-trained LLM models on this challenge, including Mixtral, OpenAI GPT and Llama2. Our best-performing system achieved 0.14 MAP score on document retrieval, 0.05 MAP score on snippet retrieval, 0.96 F1 score for yes/no questions, 0.38 MRR score for factoid questions and 0.50 F1 score for list questions in Task 12b.
Title:
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
Authors: Shuang Hao, Chunlin Zhong, He Tang
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
The depth/thermal information is beneficial for detecting salient object with conventional RGB images. However, in dual-modal salient object detection (SOD) model, the robustness against noisy inputs and modality missing is crucial but rarely studied. To tackle this problem, we introduce \textbf{Co}nditional Dropout and \textbf{LA}nguage-driven(\textbf{CoLA}) framework comprising two core components. 1) Language-driven Quality Assessment (LQA): Leveraging a pretrained vision-language model with a prompt learner, the LQA recalibrates image contributions without requiring additional quality annotations. This approach effectively mitigates the impact of noisy inputs. 2) Conditional Dropout (CD): A learning method to strengthen the model's adaptability in scenarios with missing modalities, while preserving its performance under complete modalities. The CD serves as a plug-in training scheme that treats modality-missing as conditions, strengthening the overall robustness of various dual-modal SOD models. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art dual-modal SOD models, under both modality-complete and modality-missing conditions. We will release source code upon acceptance.
Title:
AstroSpy: On detecting Fake Images in Astronomy via Joint Image-Spectral Representations
Authors: Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
The prevalence of AI-generated imagery has raised concerns about the authenticity of astronomical images, especially with advanced text-to-image models like Stable Diffusion producing highly realistic synthetic samples. Existing detection methods, primarily based on convolutional neural networks (CNNs) or spectral analysis, have limitations when used independently. We present AstroSpy, a hybrid model that integrates both spectral and image features to distinguish real from synthetic astronomical images. Trained on a unique dataset of real NASA images and AI-generated fakes (approximately 18k samples), AstroSpy utilizes a dual-pathway architecture to fuse spatial and spectral information. This approach enables AstroSpy to achieve superior performance in identifying authentic astronomical images. Extensive evaluations demonstrate AstroSpy's effectiveness and robustness, significantly outperforming baseline models in both in-domain and cross-domain tasks, highlighting its potential to combat misinformation in astronomy.
Title:
Cue Point Estimation using Object Detection
Authors: Giulia Argüello, Luca A. Lanzendörfer, Roger Wattenhofer
Abstract
Cue points indicate possible temporal boundaries in a transition between two pieces of music in DJ mixing and constitute a crucial element in autonomous DJ systems as well as for live mixing. In this work, we present a novel method for automatic cue point estimation, interpreted as a computer vision object detection task. Our proposed system is based on a pre-trained object detection transformer which we fine-tune on our novel cue point dataset. Our provided dataset contains 21k manually annotated cue points from human experts as well as metronome information for nearly 5k individual tracks, making this dataset 35x larger than the previously available cue point dataset. Unlike previous methods, our approach does not require low-level musical information analysis, while demonstrating increased precision in retrieving cue point positions. Moreover, our proposed method demonstrates high adherence to phrasing, a type of high-level music structure commonly emphasized in electronic dance music. The code, model checkpoints, and dataset are made publicly available.
Title:
A Mamba-based Siamese Network for Remote Sensing Change Detection
Authors: Jay N. Paranjape, Celso de Melo, Vishal M. Patel
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
Change detection in remote sensing images is an essential tool for analyzing a region at different times. It finds varied applications in monitoring environmental changes, man-made changes as well as corresponding decision-making and prediction of future trends. Deep learning methods like Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in detecting significant changes, given two images at different times. In this paper, we propose a Mamba-based Change Detector (M-CD) that segments out the regions of interest even better. Mamba-based architectures demonstrate linear-time training capabilities and an improved receptive field over transformers. Our experiments on four widely used change detection datasets demonstrate significant improvements over existing state-of-the-art (SOTA) methods. Our code and pre-trained models are available at this https URL
Title:
HTD-Mamba: Efficient Hyperspectral Target Detection with Pyramid State Space Model
Authors: Dunbin Shen, Xuanbing Zhu, Jiacheng Tian, Jianjun Liu, Zhenrong Du, Hongyu Wang, Xiaorui Ma
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
Hyperspectral target detection (HTD) identifies objects of interest from complex backgrounds at the pixel level, playing a vital role in Earth observation. However, HTD faces challenges due to limited prior knowledge and spectral variations, leading to underfitting models and unreliable performance. To address these challenges, this paper proposes an efficient self-supervised HTD method with a pyramid state space model (SSM), named HTD-Mamba, which employs spectrally contrastive learning to distinguish between target and background based on the similarity measurement of intrinsic features. Specifically, to obtain sufficient training samples and leverage spatial contextual information, we propose a spatial-encoded spectral augmentation technique that encodes all surrounding pixels within a patch into a transformed view of the central pixel. Additionally, to explore global band correlations, we divide pixels into continuous group-wise spectral embeddings and introduce Mamba to HTD for the first time to model long-range dependencies of the spectral sequence with linear complexity. Furthermore, to alleviate spectral variation and enhance robust representation, we propose a pyramid SSM as a backbone to capture and fuse multiresolution spectral-wise intrinsic features. Extensive experiments conducted on four public datasets demonstrate that the proposed method outperforms state-of-the-art methods in both quantitative and qualitative evaluations. Code is available at \url{this https URL}.
Title:
TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data
Authors: Lucas Correia, Jan-Christoph Goos, Philipp Klein, Thomas Bäck, Anna V. Kononova
Abstract
As attention to recorded data grows in the realm of automotive testing and manual evaluation reaches its limits, there is a growing need for automatic online anomaly detection. This real-world data is complex in many ways and requires the modelling of testee behaviour. To address this, we propose a temporal variational autoencoder (TeVAE) that can detect anomalies with minimal false positives when trained on unlabelled data. Our approach also avoids the bypass phenomenon and introduces a new method to remap individual windows to a continuous time series. Furthermore, we propose metrics to evaluate the detection delay and root-cause capability of our approach and present results from experiments on a real-world industrial data set. When properly configured, TeVAE flags anomalies only 6% of the time wrongly and detects 65% of anomalies present. It also has the potential to perform well with a smaller training and validation subset but requires a more sophisticated threshold estimation method.
Title:
Coinductive Techniques for Checking Satisfiability of Generalized Nested Conditions
Authors: Lara Stoltenow, Barbara König, Sven Schneider, Andrea Corradini, Leen Lambers, Fernando Orejas
Subjects: Subjects:
Logic in Computer Science (cs.LO)
Abstract
We study nested conditions, a generalization of first-order logic to a categorical setting, and provide a tableau-based (semi-decision) procedure for checking (un)satisfiability and finite model generation. This generalizes earlier results on graph conditions. Furthermore we introduce a notion of witnesses, allowing the detection of infinite models in some cases. To ensure completeness, paths in a tableau must be fair, where fairness requires that all parts of a condition are processed eventually. Since the correctness arguments are non-trivial, we rely on coinductive proof methods and up-to techniques that structure the arguments. We distinguish between two types of categories: categories where all sections are isomorphisms, allowing for a simpler tableau calculus that includes finite model generation; in categories where this requirement does not hold, model generation does not work, but we still obtain a sound and complete calculus.
Title:
Integrating Ontology Design with the CRISP-DM in the context of Cyber-Physical Systems Maintenance
Authors: Milapji Singh Gill, Tom Westermann, Gernot Steindl, Felix Gehlhoff, Alexander Fay
Abstract
In the following contribution, a method is introduced that integrates domain expert-centric ontology design with the Cross-Industry Standard Process for Data Mining (CRISP-DM). This approach aims to efficiently build an application-specific ontology tailored to the corrective maintenance of Cyber-Physical Systems (CPS). The proposed method is divided into three phases. In phase one, ontology requirements are systematically specified, defining the relevant knowledge scope. Accordingly, CPS life cycle data is contextualized in phase two using domain-specific ontological artifacts. This formalized domain knowledge is then utilized in the CRISP-DM to efficiently extract new insights from the data. Finally, the newly developed data-driven model is employed to populate and expand the ontology. Thus, information extracted from this model is semantically annotated and aligned with the existing ontology in phase three. The applicability of this method has been evaluated in an anomaly detection case study for a modular process plant.
Title:
Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge
Authors: Sriram Yenamandra, Arun Ramachandran, Mukul Khanna, Karmesh Yadav, Jay Vakil, Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke, Yang Luo, Jinxin Zhu, Yansen Han, Bingyi Lu, Xuan Gu, Qinyuan Liu, Yaping Zhao, Qiting Ye, Chenxiao Dou, Yansong Chua, Volodymyr Kuzma, Vladyslav Humennyy, Ruslan Partsey, Jonathan Francis, Devendra Singh Chaplot, Gunjan Chhablani, Alexander Clegg, Theophile Gervet, Vidhi Jain, Ram Ramrakhya, Andrew Szot, Austin Wang, Tsung-Yen Yang, Aaron Edsinger, Charlie Kemp, Binit Shah, Zsolt Kira, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton
Subjects: Subjects:
Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Abstract
In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface within that environment. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. Our baselines on the most challenging version of this task, using real perception in simulation, achieved only an 0.8% success rate; by the end of the competition, the best participants achieved an 10.8\% success rate, a 13x improvement. We observed that the most successful teams employed a variety of methods, yet two common threads emerged among the best solutions: enhancing error detection and recovery, and improving the integration of perception with decision-making processes. In this paper, we detail the results and methodologies used, both in simulation and real-world settings. We discuss the lessons learned and their implications for future research. Additionally, we compare performance in real and simulated environments, emphasizing the necessity for robust generalization to novel settings.
Title:
Detection-Triggered Recursive Impact Mitigation against Secondary False Data Injection Attacks in Microgrids
Abstract
The cybersecurity of microgrid has received widespread attentions due to the frequently reported attack accidents against distributed energy resource (DER) manufactures. Numerous impact mitigation schemes have been proposed to reduce or eliminate the impacts of false data injection attacks (FDIAs). Nevertheless, the existing methods either requires at least one neighboring trustworthy agent or may bring in unacceptable cost burdens. This paper aims to propose a detection-triggered recursive impact mitigation scheme that can timely and precisely counter the secondary FDIAs (SFDIAs) against the communication links among DERs. Once triggering attack alarms, the power line current readings will be utilised to observe the voltage bias injections through the physical interconnections among DERs, based on which the current bias injections can be recursively reconstructed from the residuals generated by unknown input observers (UIOs). The attack impacts are eliminated by subtracting the reconstructed bias from the incoming compromised data. The proposed mitigation method can work even in the worst case where all communication links are under SFDIAs and only require extra current sensors. The bias reconstruction performance under initial errors and system noises is theoretically analysed and the reconstruction error is proved to be bounded regardless of the electrical parameters. To avoid deploying current sensors on all power lines, a cost-effective deployment strategy is presented to secure a spanning tree set of communication links that can guarantee the secondary control performance. Extensive validation studies are conducted in MATLAB/SIMULINK and hardware-in-the-loop (HIL) testbeds to validate the proposed method's effectiveness against single/multiple and continuous/discontinuous SFDIAs.
Title:
INTERACT: An authoring tool that facilitates the creation of human centric interaction with 3d objects in virtual reality
Authors: Rama Krishnan Gopal Ramasamy Thandapani, Benjamin Capel, Antoine Lasnier, Ioannis Chatzigiannakis
Abstract
A widespread adoption of Virtual, Augmented, and Mixed Reality (VR/AR/MR), collectively referred to as Extended Reality (XR), has become a tangible possibility to revolutionize educational and training scenarios by offering immersive, interactive experiences. In this paper we present \textsf{INTERACT}, an authoring tool for creating advanced 3D physics-based Intelligent Tutoring Systems (ITS) by individual developers or small-scale development teams. \textsf{INTERACT} is based on a cutting edge physics engine allowing realistic interactions such as collision detection and ergonomic evaluations. We demonstrate the benefits of \textsf{INTERACT} by developing a set of training scenarios for a use case of a Laser cutting machine. The use case illustrates the numerous possibilities such as creating interaction with objects, ease of configuring a scenario and how to design the visual effects to the machine.
Title:
Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images
Abstract
We study the 3D object understanding task for manipulating everyday objects with different material properties (diffuse, specular, transparent and mixed). Existing monocular and RGB-D methods suffer from scale ambiguity due to missing or imprecise depth measurements. We present CODERS, a one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. The base of our pipeline is an implicit stereo matching module that combines stereo image features with 3D position information. Concatenating this presented module and the following transform-decoder architecture leads to end-to-end learning of multiple tasks required by robot manipulation. Our approach significantly outperforms all competing methods in the public TOD dataset. Furthermore, trained on simulated data, CODERS generalize well to unseen category-level object instances in real-world robot manipulation experiments. Our dataset, code, and demos will be available on our project page.
Title:
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Authors: Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, James Glass
Subjects: Subjects:
Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.
Keyword: face recognition
There is no result
Keyword: augmentation
Title:
Enhanced Model Robustness to Input Corruptions by Per-corruption Adaptation of Normalization Statistics
Authors: Elena Camuffo, Umberto Michieli, Simone Milani, Jijoong Moon, Mete Ozay
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
Developing a reliable vision system is a fundamental challenge for robotic technologies (e.g., indoor service robots and outdoor autonomous robots) which can ensure reliable navigation even in challenging environments such as adverse weather conditions (e.g., fog, rain), poor lighting conditions (e.g., over/under exposure), or sensor degradation (e.g., blurring, noise), and can guarantee high performance in safety-critical functions. Current solutions proposed to improve model robustness usually rely on generic data augmentation techniques or employ costly test-time adaptation methods. In addition, most approaches focus on addressing a single vision task (typically, image recognition) utilising synthetic data. In this paper, we introduce Per-corruption Adaptation of Normalization statistics (PAN) to enhance the model robustness of vision systems. Our approach entails three key components: (i) a corruption type identification module, (ii) dynamic adjustment of normalization layer statistics based on identified corruption type, and (iii) real-time update of these statistics according to input data. PAN can integrate seamlessly with any convolutional model for enhanced accuracy in several robot vision tasks. In our experiments, PAN obtains robust performance improvement on challenging real-world corrupted image datasets (e.g., OpenLoris, ExDark, ACDC), where most of the current solutions tend to fail. Moreover, PAN outperforms the baseline models by 20-30% on synthetic benchmarks in object recognition tasks.
Title:
Robust and Explainable Framework to Address Data Scarcity in Diagnostic Imaging
Abstract
Deep learning has significantly advanced automatic medical diagnostics and released the occupation of human resources to reduce clinical pressure, yet the persistent challenge of data scarcity in this area hampers its further improvements and applications. To address this gap, we introduce a novel ensemble framework called `Efficient Transfer and Self-supervised Learning based Ensemble Framework' (ETSEF). ETSEF leverages features from multiple pre-trained deep learning models to efficiently learn powerful representations from a limited number of data samples. To the best of our knowledge, ETSEF is the first strategy that combines two pre-training methodologies (Transfer Learning and Self-supervised Learning) with ensemble learning approaches. Various data enhancement techniques, including data augmentation, feature fusion, feature selection, and decision fusion, have also been deployed to maximise the efficiency and robustness of the ETSEF model. Five independent medical imaging tasks, including endoscopy, breast cancer, monkeypox, brain tumour, and glaucoma detection, were tested to demonstrate ETSEF's effectiveness and robustness. Facing limited sample numbers and challenging medical tasks, ETSEF has proved its effectiveness by improving diagnostics accuracies from 10\% to 13.3\% when compared to strong ensemble baseline models and up to 14.4\% improvements compared with published state-of-the-art methods. Moreover, we emphasise the robustness and trustworthiness of the ETSEF method through various vision-explainable artificial intelligence techniques, including Grad-CAM, SHAP, and t-SNE. Compared to those large-scale deep learning models, ETSEF can be deployed flexibly and maintain superior performance for challenging medical imaging tasks, showing the potential to be applied to more areas that lack training data
Abstract
Compressed sensing combines the power of convex optimization techniques with a sparsity-inducing prior on the signal space to solve an underdetermined system of equations. For many problems, the sparsifying dictionary is not directly given, nor its existence can be assumed. Besides, the sensing matrix can change across different scenarios. Addressing these issues requires solving a sparse representation learning problem, namely dictionary learning, taking into account the epistemic uncertainty of the learned dictionaries and, finally, jointly learning sparse representations and reconstructions under varying sensing matrix conditions. We address both concerns by proposing a variant of the LISTA architecture. First, we introduce Augmented Dictionary Learning ISTA (A-DLISTA), which incorporates an augmentation module to adapt parameters to the current measurement setup. Then, we propose to learn a distribution over dictionaries via a variational approach, dubbed Variational Learning ISTA (VLISTA). VLISTA exploits A-DLISTA as the likelihood model and approximates a posterior distribution over the dictionaries as part of an unfolded LISTA-based recovery algorithm. As a result, VLISTA provides a probabilistic way to jointly learn the dictionary distribution and the reconstruction algorithm with varying sensing matrices. We provide theoretical and experimental support for our architecture and show that our model learns calibrated uncertainties.
Title:
Improving the Transferability of Adversarial Examples by Feature Augmentation
Abstract
Despite the success of input transformation-based attacks on boosting adversarial transferability, the performance is unsatisfying due to the ignorance of the discrepancy across models. In this paper, we propose a simple but effective feature augmentation attack (FAUG) method, which improves adversarial transferability without introducing extra computation costs. Specifically, we inject the random noise into the intermediate features of the model to enlarge the diversity of the attack gradient, thereby mitigating the risk of overfitting to the specific model and notably amplifying adversarial transferability. Moreover, our method can be combined with existing gradient attacks to augment their performance further. Extensive experiments conducted on the ImageNet dataset across CNN and transformer models corroborate the efficacy of our method, e.g., we achieve improvement of +26.22% and +5.57% on input transformation-based attacks and combination methods, respectively.
Title:
HTD-Mamba: Efficient Hyperspectral Target Detection with Pyramid State Space Model
Authors: Dunbin Shen, Xuanbing Zhu, Jiacheng Tian, Jianjun Liu, Zhenrong Du, Hongyu Wang, Xiaorui Ma
Subjects: Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Abstract
Hyperspectral target detection (HTD) identifies objects of interest from complex backgrounds at the pixel level, playing a vital role in Earth observation. However, HTD faces challenges due to limited prior knowledge and spectral variations, leading to underfitting models and unreliable performance. To address these challenges, this paper proposes an efficient self-supervised HTD method with a pyramid state space model (SSM), named HTD-Mamba, which employs spectrally contrastive learning to distinguish between target and background based on the similarity measurement of intrinsic features. Specifically, to obtain sufficient training samples and leverage spatial contextual information, we propose a spatial-encoded spectral augmentation technique that encodes all surrounding pixels within a patch into a transformed view of the central pixel. Additionally, to explore global band correlations, we divide pixels into continuous group-wise spectral embeddings and introduce Mamba to HTD for the first time to model long-range dependencies of the spectral sequence with linear complexity. Furthermore, to alleviate spectral variation and enhance robust representation, we propose a pyramid SSM as a backbone to capture and fuse multiresolution spectral-wise intrinsic features. Extensive experiments conducted on four public datasets demonstrate that the proposed method outperforms state-of-the-art methods in both quantitative and qualitative evaluations. Code is available at \url{this https URL}.
Keyword: detection
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Title:
Keyword: face recognition
There is no result
Keyword: augmentation
Title:
Title:
Title:
Title:
Title: