New submissions for Mon, 8 Apr 24

Keyword: detection

X-lifecycle Learning for Cloud Incident Management using LLMs

Authors: Authors: Drishti Goel, Fiza Husain, Aditya Singh, Supriyo Ghosh, Anjaly Parayil, Chetan Bansal, Xuchao Zhang, Saravan Rajmohan
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03662
Pdf link: https://arxiv.org/pdf/2404.03662
Abstract Incident management for large cloud services is a complex and tedious process and requires significant amount of manual efforts from on-call engineers (OCEs). OCEs typically leverage data from different stages of the software development lifecycle [SDLC] (e.g., codes, configuration, monitor data, service properties, service dependencies, trouble-shooting documents, etc.) to generate insights for detection, root causing and mitigating of incidents. Recent advancements in large language models [LLMs] (e.g., ChatGPT, GPT-4, Gemini) created opportunities to automatically generate contextual recommendations to the OCEs assisting them to quickly identify and mitigate critical issues. However, existing research typically takes a silo-ed view for solving a certain task in incident management by leveraging data from a single stage of SDLC. In this paper, we demonstrate that augmenting additional contextual data from different stages of SDLC improves the performance of two critically important and practically challenging tasks: (1) automatically generating root cause recommendations for dependency failure related incidents, and (2) identifying ontology of service monitors used for automatically detecting incidents. By leveraging 353 incident and 260 monitor dataset from Microsoft, we demonstrate that augmenting contextual information from different stages of the SDLC improves the performance over State-of-The-Art methods.
Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips
Authors: Authors: Man Yao, Jiakui Hu, Tianxiang Hu, Yifan Xu, Zhaokun Zhou, Yonghong Tian, Bo Xu, Guoqi Li
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03663
Pdf link: https://arxiv.org/pdf/2404.03663
Abstract Neuromorphic computing, which exploits Spiking Neural Networks (SNNs) on neuromorphic chips, is a promising energy-efficient alternative to traditional AI. CNN-based SNNs are the current mainstream of neuromorphic computing. By contrast, no neuromorphic chips are designed especially for Transformer-based SNNs, which have just emerged, and their performance is only on par with CNN-based SNNs, offering no distinct advantage. In this work, we propose a general Transformer-based SNN architecture, termed as ``Meta-SpikeFormer", whose goals are: 1) Lower-power, supports the spike-driven paradigm that there is only sparse addition in the network; 2) Versatility, handles various vision tasks; 3) High-performance, shows overwhelming performance advantages over CNN-based SNNs; 4) Meta-architecture, provides inspiration for future next-generation Transformer-based neuromorphic chip designs. Specifically, we extend the Spike-driven Transformer in \citet{yao2023spike} into a meta architecture, and explore the impact of structure, spike-driven self-attention, and skip connection on its performance. On ImageNet-1K, Meta-SpikeFormer achieves 80.0\% top-1 accuracy (55M), surpassing the current state-of-the-art (SOTA) SNN baselines (66M) by 3.7\%. This is the first direct training SNN backbone that can simultaneously supports classification, detection, and segmentation, obtaining SOTA results in SNNs. Finally, we discuss the inspiration of the meta SNN architecture for neuromorphic chip design. Source code and models are available at \url{https://github.com/BICLab/Spike-Driven-Transformer-V2}.
Securing Social Spaces: Harnessing Deep Learning to Eradicate Cyberbullying
Authors: Authors: Rohan Biswas, Kasturi Ganguly, Arijit Das, Diganta Saha
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2404.03686
Pdf link: https://arxiv.org/pdf/2404.03686
Abstract In today's digital world, cyberbullying is a serious problem that can harm the mental and physical health of people who use social media. This paper explains just how serious cyberbullying is and how it really affects indi-viduals exposed to it. It also stresses how important it is to find better ways to detect cyberbullying so that online spaces can be safer. Plus, it talks about how making more accurate tools to spot cyberbullying will be really helpful in the future. Our paper introduces a deep learning-based ap-proach, primarily employing BERT and BiLSTM architectures, to effective-ly address cyberbullying. This approach is designed to analyse large vol-umes of posts and predict potential instances of cyberbullying in online spaces. Our results demonstrate the superiority of the hateBERT model, an extension of BERT focused on hate speech detection, among the five mod-els, achieving an accuracy rate of 89.16%. This research is a significant con-tribution to "Computational Intelligence for Social Transformation," prom-ising a safer and more inclusive digital landscape.
Improvement of Performance in Freezing of Gait detection in Parkinsons Disease using Transformer networks and a single waist worn triaxial accelerometer
Authors: Authors: Luis Sigcha, Luigi Borzì, Ignacio Pavón, Nélson Costa, Susana Costa, Pedro Arezes, Juan-Manuel López, Guillermo De Arcas
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.03704
Pdf link: https://arxiv.org/pdf/2404.03704
Abstract Freezing of gait (FOG) is one of the most incapacitating symptoms in Parkinsons disease, affecting more than 50 percent of patients in advanced stages of the disease. The presence of FOG may lead to falls and a loss of independence with a consequent reduction in the quality of life. Wearable technology and artificial intelligence have been used for automatic FOG detection to optimize monitoring. However, differences between laboratory and daily-life conditions present challenges for the implementation of reliable detection systems. Consequently, improvement of FOG detection methods remains important to provide accurate monitoring mechanisms intended for free-living and real-time use. This paper presents advances in automatic FOG detection using a single body-worn triaxial accelerometer and a novel classification algorithm based on Transformers and convolutional networks. This study was performed with data from 21 patients who manifested FOG episodes while performing activities of daily living in a home setting. Results indicate that the proposed FOG-Transformer can bring a significant improvement in FOG detection using leave-one-subject-out cross-validation (LOSO CV). These results bring opportunities for the implementation of accurate monitoring systems for use in ambulatory or home settings.
SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection
Authors: Authors: Bradley P. Allen, Fina Polat, Paul Groth
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03732
Pdf link: https://arxiv.org/pdf/2404.03732
Abstract We describe the University of Amsterdam Intelligent Data Engineering Lab team's entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt programming and in-context learning with large language models (LLMs) to build classifiers for hallucination detection, and extends that work through the incorporation of context-specific definition of task, role, and target concept, and automated generation of examples for use in a few-shot prompting approach. The resulting system achieved fourth-best and sixth-best performance in the model-agnostic track and model-aware tracks for Task 6, respectively, and evaluation using the validation sets showed that the system's classification decisions were consistent with those of the crowd-sourced human labellers. We further found that a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. Code for the system described in this paper is available on Github.
Test Time Training for Industrial Anomaly Segmentation
Authors: Authors: Alex Costanzino, Pierluigi Zama Ramirez, Mirko Del Moro, Agostino Aiezzo, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03743
Pdf link: https://arxiv.org/pdf/2404.03743
Abstract Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation set containing only nominal samples, resulting in poor segmentation performance. This paper addresses this problem by proposing a test time training strategy to improve the segmentation performance. Indeed, at test time, we can extract rich features directly from anomalous samples to train a classifier that can discriminate defects effectively. Our general approach can work downstream to any AD&S method that provides an anomaly score map as output, even in multimodal settings. We demonstrate the effectiveness of our approach over baselines through extensive experimentation and evaluation on MVTec AD and MVTec 3D-AD.
Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
Authors: Authors: Mahjabin Nahar, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.03745
Pdf link: https://arxiv.org/pdf/2404.03745
Abstract The widespread adoption and transformative effects of large language models (LLMs) have sparked concerns regarding their capacity to produce inaccurate and fictitious content, referred to as `hallucinations'. Given the potential risks associated with hallucinations, humans should be able to identify them. This research aims to understand the human perception of LLM hallucinations by systematically varying the degree of hallucination (genuine, minor hallucination, major hallucination) and examining its interaction with warning (i.e., a warning of potential inaccuracies: absent vs. present). Participants (N=419) from Prolific rated the perceived accuracy and engaged with content (e.g., like, dislike, share) in a Q/A format. Results indicate that humans rank content as truthful in the order genuine > minor hallucination > major hallucination and user engagement behaviors mirror this pattern. More importantly, we observed that warning improves hallucination detection without significantly affecting the perceived truthfulness of genuine content. We conclude by offering insights for future tools to aid human detection of hallucinations.
On Extending the Automatic Test Markup Language (ATML) for Machine Learning
Authors: Authors: Tyler Cody, Bingtong Li, Peter A. Beling
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.03769
Pdf link: https://arxiv.org/pdf/2404.03769
Abstract This paper addresses the urgent need for messaging standards in the operational test and evaluation (T&E) of machine learning (ML) applications, particularly in edge ML applications embedded in systems like robots, satellites, and unmanned vehicles. It examines the suitability of the IEEE Standard 1671 (IEEE Std 1671), known as the Automatic Test Markup Language (ATML), an XML-based standard originally developed for electronic systems, for ML application testing. The paper explores extending IEEE Std 1671 to encompass the unique challenges of ML applications, including the use of datasets and dependencies on software. Through modeling various tests such as adversarial robustness and drift detection, this paper offers a framework adaptable to specific applications, suggesting that minor modifications to ATML might suffice to address the novelties of ML. This paper differentiates ATML's focus on testing from other ML standards like Predictive Model Markup Language (PMML) or Open Neural Network Exchange (ONNX), which concentrate on ML model specification. We conclude that ATML is a promising tool for effective, near real-time operational T&E of ML applications, an essential aspect of AI lifecycle management, safety, and governance.
R5Detect: Detecting Control-Flow Attacks from Standard RISC-V Enclaves
Authors: Authors: Davide Bove, Lukas Panzer
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.03771
Pdf link: https://arxiv.org/pdf/2404.03771
Abstract Embedded and Internet-of-Things (IoT) devices are ubiquitous today, and the uprising of several botnets based on them (e.g., Mirai, Ripple20) raises issues about the security of such devices. Especially low-power devices often lack support for modern system security measures, such as stack integrity, Non-eXecutable bits or strong cryptography. In this work, we present R5Detect, a security monitoring software that detects and prevents control-flow attacks on unmodified RISC-V standard architectures. With a novel combination of different protection techniques, it can run on embedded and low-power IoT devices, which may lack proper security features. R5Detect implements a memory-protected shadow stack to prevent runtime modifications, as well as a heuristics detection based on Hardware Performance Counters to detect control-flow integrity violations. Our results indicate that regular software can be protected against different degrees of control-flow manipulations with an average performance overhead of below 5 %. We implement and evaluate R5Detect on standard low-power RISC-V devices and show that such security features can be effectively used with minimal hardware support.
A Systems Theoretic Approach to Online Machine Learning
Authors: Authors: Anli du Preez, Peter A. Beling, Tyler Cody
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.03775
Pdf link: https://arxiv.org/pdf/2404.03775
Abstract The machine learning formulation of online learning is incomplete from a systems theoretic perspective. Typically, machine learning research emphasizes domains and tasks, and a problem solving worldview. It focuses on algorithm parameters, features, and samples, and neglects the perspective offered by considering system structure and system behavior or dynamics. Online learning is an active field of research and has been widely explored in terms of statistical theory and computational algorithms, however, in general, the literature still lacks formal system theoretical frameworks for modeling online learning systems and resolving systems-related concept drift issues. Furthermore, while the machine learning formulation serves to classify methods and literature, the systems theoretic formulation presented herein serves to provide a framework for the top-down design of online learning systems, including a novel definition of online learning and the identification of key design parameters. The framework is formulated in terms of input-output systems and is further divided into system structure and system behavior. Concept drift is a critical challenge faced in online learning, and this work formally approaches it as part of the system behavior characteristics. Healthcare provider fraud detection using machine learning is used as a case study throughout the paper to ground the discussion in a real-world online learning challenge.
Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
Authors: Authors: Qinji Yu, Yirui Wang, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Le Lu, Na Shen, Qifeng Wang, Xiaowei Ding, Xianghua Ye, Dakai Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03819
Pdf link: https://arxiv.org/pdf/2404.03819
Abstract Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previous automatic LN detection works typically yield limited recall and high false positives (FPs) due to adjacent anatomies with similar image intensities, shapes, or textures (vessels, muscles, esophagus, etc). In this work, we propose a new LN DEtection TRansformer, named LN-DETR, to achieve more accurate performance. By enhancing the 2D backbone with a multi-scale 2.5D feature fusion to incorporate 3D context explicitly, more importantly, we make two main contributions to improve the representation quality of LN queries. 1) Considering that LN boundaries are often unclear, an IoU prediction head and a location debiased query selection are proposed to select LN queries of higher localization accuracy as the decoder query's initialization. 2) To reduce FPs, query contrastive learning is employed to explicitly reinforce LN queries towards their best-matched ground-truth queries over unmatched query predictions. Trained and tested on 3D CT scans of 1067 patients (with 10,000+ labeled LNs) via combining seven LN datasets from different body parts (neck, chest, and abdomen) and pathologies/cancers, our method significantly improves the performance of previous leading methods by > 4-5% average recall at the same FP rates in both internal and external testing. We further evaluate on the universal lesion detection task using NIH DeepLesion benchmark, and our method achieves the top performance of 88.46% averaged recall across 0.5 to 4 FPs per image, compared with other leading reported results.
Estimating mixed memberships in multi-layer networks
Authors: Authors: Huan Qing
Subjects: Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.03916
Pdf link: https://arxiv.org/pdf/2404.03916
Abstract Community detection in multi-layer networks has emerged as a crucial area of modern network analysis. However, conventional approaches often assume that nodes belong exclusively to a single community, which fails to capture the complex structure of real-world networks where nodes may belong to multiple communities simultaneously. To address this limitation, we propose novel spectral methods to estimate the common mixed memberships in the multi-layer mixed membership stochastic block model. The proposed methods leverage the eigen-decomposition of three aggregate matrices: the sum of adjacency matrices, the debiased sum of squared adjacency matrices, and the sum of squared adjacency matrices. We establish rigorous theoretical guarantees for the consistency of our methods. Specifically, we derive per-node error rates under mild conditions on network sparsity, demonstrating their consistency as the number of nodes and/or layers increases under the multi-layer mixed membership stochastic block model. Our theoretical results reveal that the method leveraging the sum of adjacency matrices generally performs poorer than the other two methods for mixed membership estimation in multi-layer networks. We conduct extensive numerical experiments to empirically validate our theoretical findings. For real-world multi-layer networks with unknown community information, we introduce two novel modularity metrics to quantify the quality of mixed membership community detection. Finally, we demonstrate the practical applications of our algorithms and modularity metrics by applying them to real-world multi-layer networks, demonstrating their effectiveness in extracting meaningful community structures.
Towards introspective loop closure in 4D radar SLAM
Authors: Authors: Maximilian Hilger, Vladimír Kubelka, Daniel Adolfsson, Henrik Andreasson, Achim J. Lilienthal
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.03940
Pdf link: https://arxiv.org/pdf/2404.03940
Abstract Imaging radar is an emerging sensor modality in the context of Localization and Mapping (SLAM), especially suitable for vision-obstructed environments. This article investigates the use of 4D imaging radars for SLAM and analyzes the challenges in robust loop closure. Previous work indicates that 4D radars, together with inertial measurements, offer ample information for accurate odometry estimation. However, the low field of view, limited resolution, and sparse and noisy measurements render loop closure a significantly more challenging problem. Our work builds on the previous work - TBV SLAM - which was proposed for robust loop closure with 360$^\circ$ spinning radars. This article highlights and addresses challenges inherited from a directional 4D radar, such as sparsity, noise, and reduced field of view, and discusses why the common definition of a loop closure is unsuitable. By combining multiple quality measures for accurate loop closure detection adapted to 4D radar data, significant results in trajectory estimation are achieved; the absolute trajectory error is as low as 0.46 m over a distance of 1.8 km, with consistent operation over multiple environments.
Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study
Authors: Authors: Myrthe Reuver, Suzan Verberne, Antske Fokkens
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.03987
Pdf link: https://arxiv.org/pdf/2404.03987
Abstract For a viewpoint-diverse news recommender, identifying whether two news articles express the same viewpoint is essential. One way to determine "same or different" viewpoint is stance detection. In this paper, we investigate the robustness of operationalization choices for few-shot stance detection, with special attention to modelling stance across different topics. Our experiments test pre-registered hypotheses on stance detection. Specifically, we compare two stance task definitions (Pro/Con versus Same Side Stance), two LLM architectures (bi-encoding versus cross-encoding), and adding Natural Language Inference knowledge, with pre-trained RoBERTa models trained with shots of 100 examples from 7 different stance detection datasets. Some of our hypotheses and claims from earlier work can be confirmed, while others give more inconsistent results. The effect of the Same Side Stance definition on performance differs per dataset and is influenced by other modelling choices. We found no relationship between the number of training topics in the training shots and performance. In general, cross-encoding out-performs bi-encoding, and adding NLI training to our models gives considerable improvement, but these results are not consistent across all datasets. Our results indicate that it is essential to include multiple datasets and systematic modelling experiments when aiming to find robust modelling choices for the concept `stance'.
Pros and Cons! Evaluating ChatGPT on Software Vulnerability
Authors: Authors: Xin Yin
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.03994
Pdf link: https://arxiv.org/pdf/2404.03994
Abstract This paper proposes a pipeline for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available dataset. We carry out an extensive technical evaluation of ChatGPT using Big-Vul covering five different common software vulnerability tasks. We evaluate the multitask and multilingual aspects of ChatGPT based on this dataset. We found that the existing state-of-the-art methods are generally superior to ChatGPT in software vulnerability detection. Although ChatGPT improves accuracy when providing context information, it still has limitations in accurately predicting severity ratings for certain CWE types. In addition, ChatGPT demonstrates some ability in locating vulnerabilities for certain CWE types, but its performance varies among different CWE types. ChatGPT exhibits limited vulnerability repair capabilities in both providing and not providing context information. Finally, ChatGPT shows uneven performance in generating CVE descriptions for various CWE types, with limited accuracy in detailed information. Overall, though ChatGPT performs well in some aspects, it still needs improvement in understanding the subtle differences in code vulnerabilities and the ability to describe vulnerabilities in order to fully realize its potential. Our evaluation framework provides valuable insights for further enhancing ChatGPT' s software vulnerability handling capabilities.
A Flexible Evolutionary Algorithm With Dynamic Mutation Rate Archive
Authors: Authors: Martin S. Krejca, Carsten Witt
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2404.04015
Pdf link: https://arxiv.org/pdf/2404.04015
Abstract We propose a new, flexible approach for dynamically maintaining successful mutation rates in evolutionary algorithms using $k$-bit flip mutations. The algorithm adds successful mutation rates to an archive of promising rates that are favored in subsequent steps. Rates expire when their number of unsuccessful trials has exceeded a threshold, while rates currently not present in the archive can enter it in two ways: (i) via user-defined minimum selection probabilities for rates combined with a successful step or (ii) via a stagnation detection mechanism increasing the value for a promising rate after the current bit-flip neighborhood has been explored with high probability. For the minimum selection probabilities, we suggest different options, including heavy-tailed distributions. We conduct rigorous runtime analysis of the flexible evolutionary algorithm on the OneMax and Jump functions, on general unimodal functions, on minimum spanning trees, and on a class of hurdle-like functions with varying hurdle width that benefit particularly from the archive of promising mutation rates. In all cases, the runtime bounds are close to or even outperform the best known results for both stagnation detection and heavy-tailed mutations.
Fusing Dictionary Learning and Support Vector Machines for Unsupervised Anomaly Detection
Authors: Authors: Paul Irofti, Iulian-Andrei Hîji, Andrei Pătraşcu, Nicolae Cleju
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2404.04064
Pdf link: https://arxiv.org/pdf/2404.04064
Abstract We study in this paper the improvement of one-class support vector machines (OC-SVM) through sparse representation techniques for unsupervised anomaly detection. As Dictionary Learning (DL) became recently a common analysis technique that reveals hidden sparse patterns of data, our approach uses this insight to endow unsupervised detection with more control on pattern finding and dimensions. We introduce a new anomaly detection model that unifies the OC-SVM and DL residual functions into a single composite objective, subsequently solved through K-SVD-type iterative algorithms. A closed-form of the alternating K-SVD iteration is explicitly derived for the new composite model and practical implementable schemes are discussed. The standard DL model is adapted for the Dictionary Pair Learning (DPL) context, where the usual sparsity constraints are naturally eliminated. Finally, we extend both objectives to the more general setting that allows the use of kernel functions. The empirical convergence properties of the resulting algorithms are provided and an in-depth analysis of their parametrization is performed while also demonstrating their numerical performance in comparison with existing methods.
Designing Robots to Help Women
Authors: Authors: Martin Cooney, Lena Klasén, Fernando Alonso-Fernandez
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2404.04123
Pdf link: https://arxiv.org/pdf/2404.04123
Abstract Robots are being designed to help people in an increasing variety of settings--but seemingly little attention has been given so far to the specific needs of women, who represent roughly half of the world's population but are highly underrepresented in robotics. Here we used a speculative prototyping approach to explore this expansive design space: First, we identified some potential challenges of interest, including crimes and illnesses that disproportionately affect women, as well as potential opportunities for designers, which were visualized in five sketches. Then, one of the sketched scenarios was further explored by developing a prototype, of a robotic helper drone equipped with computer vision to detect hidden cameras that could be used to spy on women. While object detection introduced some errors, hidden cameras were identified with a reasonable accuracy of 80\% (Intersection over Union (IoU) score: 0.40). Our aim is that the identified challenges and opportunities could help spark discussion and inspire designers, toward realizing a safer, more inclusive future through responsible use of technology.
Improving Detection in Aerial Images by Capturing Inter-Object Relationships
Authors: Authors: Botao Ren, Botian Xu, Yifan Pu, Jingyi Wang, Zhidong Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04140
Pdf link: https://arxiv.org/pdf/2404.04140
Abstract In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships. In most modern detection pipelines, however, the detection proposals are processed independently, overlooking the underlying relationships between objects. In this work, we introduce a transformer-based approach to capture these inter-object relationships to refine classification and regression outcomes for detected objects. Building on two-stage detectors, we tokenize the region of interest (RoI) proposals to be processed by a transformer encoder. Specific spatial and geometric relations are incorporated into the attention weights and adaptively modulated and regularized. Experimental results demonstrate that the proposed method achieves consistent performance improvement on three benchmarks including DOTA-v1.0, DOTA-v1.5, and HRSC 2016, especially ranking first on both DOTA-v1.5 and HRSC 2016. Specifically, our new method has an increase of 1.59 mAP on DOTA-v1.0, 4.88 mAP on DOTA-v1.5, and 2.1 mAP on HRSC 2016, respectively, compared to the baselines.
SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers
Authors: Authors: Weile Li, Muqing Shi, Zhonghua Hong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.04179
Pdf link: https://arxiv.org/pdf/2404.04179
Abstract Traditional deep learning-based object detection networks often resize images during the data preprocessing stage to achieve a uniform size and scale in the feature map. Resizing is done to facilitate model propagation and fully connected classification. However, resizing inevitably leads to object deformation and loss of valuable information in the images. This drawback becomes particularly pronounced for tiny objects like distribution towers with linear shapes and few pixels. To address this issue, we propose abandoning the resizing operation. Instead, we introduce Positional-Encoding Multi-head Criss-Cross Attention. This allows the model to capture contextual information and learn from multiple representation subspaces, effectively enriching the semantics of distribution towers. Additionally, we enhance Spatial Pyramid Pooling by reshaping three pooled feature maps into a new unified one while also reducing the computational burden. This approach allows images of different sizes and scales to generate feature maps with uniform dimensions and can be employed in feature map propagation. Our SCAResNet incorporates these aforementioned improvements into the backbone network ResNet. We evaluated our SCAResNet using the Electric Transmission and Distribution Infrastructure Imagery dataset from Duke University. Without any additional tricks, we employed various object detection models with Gaussian Receptive Field based Label Assignment as the baseline. When incorporating the SCAResNet into the baseline model, we achieved a 2.1% improvement in mAPs. This demonstrates the advantages of our SCAResNet in detecting transmission and distribution towers and its value in tiny object detection. The source code is available at https://github.com/LisavilaLee/SCAResNet_mmdet.
Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection
Authors: Authors: João Vitorino, Miguel Silva, Eva Maia, Isabel Praça
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.04188
Pdf link: https://arxiv.org/pdf/2404.04188
Abstract The growing cybersecurity threats make it essential to use high-quality data to train Machine Learning (ML) models for network traffic analysis, without noisy or missing data. By selecting the most relevant features for cyber-attack detection, it is possible to improve both the robustness and computational efficiency of the models used in a cybersecurity system. This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets. Two different feature sets were selected and were used to train multiple ML models with regular and adversarial training. Finally, an adversarial evasion robustness benchmark was performed to analyze the reliability of the different feature sets and their impact on the susceptibility of the models to adversarial examples. By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization. The robustness of the models was significantly improved without their generalization to regular traffic flows being affected, without increases of false alarms, and without requiring too many computational resources, which enables a reliable detection of suspicious activity and perturbed traffic flows in enterprise computer networks.
Watermark-based Detection and Attribution of AI-Generated Content
Authors: Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04254
Pdf link: https://arxiv.org/pdf/2404.04254
Abstract Several companies--such as Google, Microsoft, and OpenAI--have deployed techniques to watermark AI-generated content to enable proactive detection. However, existing literature mainly focuses on user-agnostic detection. Attribution aims to further trace back the user of a generative-AI service who generated a given content detected as AI-generated. Despite its growing importance, attribution is largely unexplored. In this work, we aim to bridge this gap by providing the first systematic study on watermark-based, user-aware detection and attribution of AI-generated content. Specifically, we theoretically study the detection and attribution performance via rigorous probabilistic analysis. Moreover, we develop an efficient algorithm to select watermarks for the users to enhance attribution performance. Both our theoretical and empirical results show that watermark-based detection and attribution inherit the accuracy and (non-)robustness properties of the watermarking method.
Keyword: face recognition

There is no result

Keyword: augmentation

Machine Learning in Proton Exchange Membrane Water Electrolysis -- Part I: A Knowledge-Integrated Framework
Authors: Authors: Xia Chen, Alexander Rex, Janis Woelke, Christoph Eckert, Boris Bensmann, Richard Hanke-Rauschenbach, Philipp Geyer
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2404.03660
Pdf link: https://arxiv.org/pdf/2404.03660
Abstract In this study, we propose to adopt a novel framework, Knowledge-integrated Machine Learning, for advancing Proton Exchange Membrane Water Electrolysis (PEMWE) development. Given the significance of PEMWE in green hydrogen production and the inherent challenges in optimizing its performance, our framework aims to meld data-driven models with domain-specific insights systematically to address the domain challenges. We first identify the uncertainties originating from data acquisition conditions, data-driven model mechanisms, and domain expertise, highlighting their complementary characteristics in carrying information from different perspectives. Building upon this foundation, we showcase how to adeptly decompose knowledge and extract unique information to contribute to the data augmentation, modeling process, and knowledge discovery. We demonstrate a hierarchical three-level framework, termed the "Ladder of Knowledge-integrated Machine Learning", in the PEMWE context, applying it to three case studies within a context of cell degradation analysis to affirm its efficacy in interpolation, extrapolation, and information representation. This research lays the groundwork for more knowledge-informed enhancements in ML applications in engineering.
JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Authors: Authors: Lars Ankile, Anthony Simeonov, Idan Shenfeld, Pulkit Agrawal
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03729
Pdf link: https://arxiv.org/pdf/2404.03729
Abstract While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisely grasping, reorienting, and inserting multiple parts over long horizons and multiple task phases. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. These help expand dataset support and supervise the model with locally corrective actions near bottleneck regions requiring high precision. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps directly from RGB images, outperforming imitation and data augmentation baselines.
Some observations regarding the RBF-FD approximation accuracy dependence on stencil size
Authors: Authors: Andrej Kolar-Požun, Mitja Jančič, Miha Rot, Gregor Kosec
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2404.03793
Pdf link: https://arxiv.org/pdf/2404.03793
Abstract When solving partial differential equations on scattered nodes using the Radial Basis Function-generated Finite Difference (RBF-FD) method, one of the parameters that must be chosen is the stencil size. Focusing on Polyharmonic Spline RBFs with monomial augmentation, we observe that it affects the approximation accuracy in a particularly interesting way - the solution error oscillates under increasing stencil size. We find that we can connect this behaviour with the spatial dependence of the signed approximation error. Based on this observation we are able to introduce a numerical quantity that could indicate whether a given stencil size is locally optimal. This work is an extension of our ICCS 2023 conference paper.
A proximal policy optimization based intelligent home solar management
Authors: Authors: Kode Creer, Imitiaz Parvez
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03888
Pdf link: https://arxiv.org/pdf/2404.03888
Abstract In the smart grid, the prosumers can sell unused electricity back to the power grid, assuming the prosumers own renewable energy sources and storage units. The maximizing of their profits under a dynamic electricity market is a problem that requires intelligent planning. To address this, we propose a framework based on Proximal Policy Optimization (PPO) using recurrent rewards. By using the information about the rewards modeled effectively with PPO to maximize our objective, we were able to get over 30\% improvement over the other naive algorithms in accumulating total profits. This shows promise in getting reinforcement learning algorithms to perform tasks required to plan their actions in complex domains like financial markets. We also introduce a novel method for embedding longs based on soliton waves that outperformed normal embedding in our use case with random floating point data augmentation.
Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI
Authors: Authors: Maryam Ahmed, Tooba Bibi, Rizwan Ahmed Khan, Sidra Nasir
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.03892
Pdf link: https://arxiv.org/pdf/2404.03892
Abstract The study introduces an integrated framework combining Convolutional Neural Networks (CNNs) and Explainable Artificial Intelligence (XAI) for the enhanced diagnosis of breast cancer using the CBIS-DDSM dataset. Utilizing a fine-tuned ResNet50 architecture, our investigation not only provides effective differentiation of mammographic images into benign and malignant categories but also addresses the opaque "black-box" nature of deep learning models by employing XAI methodologies, namely Grad-CAM, LIME, and SHAP, to interpret CNN decision-making processes for healthcare professionals. Our methodology encompasses an elaborate data preprocessing pipeline and advanced data augmentation techniques to counteract dataset limitations, and transfer learning using pre-trained networks, such as VGG-16, DenseNet and ResNet was employed. A focal point of our study is the evaluation of XAI's effectiveness in interpreting model predictions, highlighted by utilising the Hausdorff measure to assess the alignment between AI-generated explanations and expert annotations quantitatively. This approach plays a critical role for XAI in promoting trustworthiness and ethical fairness in AI-assisted diagnostics. The findings from our research illustrate the effective collaboration between CNNs and XAI in advancing diagnostic methods for breast cancer, thereby facilitating a more seamless integration of advanced AI technologies within clinical settings. By enhancing the interpretability of AI-driven decisions, this work lays the groundwork for improved collaboration between AI systems and medical practitioners, ultimately enriching patient care. Furthermore, the implications of our research extend well beyond the current methodologies, advocating for subsequent inquiries into the integration of multimodal data and the refinement of AI explanations to satisfy the needs of clinical practice.
Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving
Authors: Authors: Gulsum Yigit, Mehmet Fatih Amasyali
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.03938
Pdf link: https://arxiv.org/pdf/2404.03938
Abstract Math Word Problem (MWP) solving presents a challenging task in Natural Language Processing (NLP). This study aims to provide MWP solvers with a more diverse training set, ultimately improving their ability to solve various math problems. We propose several methods for data augmentation by modifying the problem texts and equations, such as synonym replacement, rule-based: question replacement, and rule based: reversing question methodologies over two English MWP datasets. This study extends by introducing a new in-context learning augmentation method, employing the Llama-7b language model. This approach involves instruction-based prompting for rephrasing the math problem texts. Performance evaluations are conducted on 9 baseline models, revealing that augmentation methods outperform baseline models. Moreover, concatenating examples generated by various augmentation methods further improves performance.
Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks
Authors: Authors: Mohammed Ghaith Altarabichi, Sławomir Nowaczyk, Sepideh Pashami, Peyman Sheikholharam Mashhadi, Julia Handl
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2404.03992
Pdf link: https://arxiv.org/pdf/2404.03992
Abstract This paper investigates how various randomization techniques impact Deep Neural Networks (DNNs). Randomization, like weight noise and dropout, aids in reducing overfitting and enhancing generalization, but their interactions are poorly understood. The study categorizes randomness techniques into four types and proposes new methods: adding noise to the loss function and random masking of gradient updates. Using Particle Swarm Optimizer (PSO) for hyperparameter optimization, it explores optimal configurations across MNIST, FASHION-MNIST, CIFAR10, and CIFAR100 datasets. Over 30,000 configurations are evaluated, revealing data augmentation and weight initialization randomness as main performance contributors. Correlation analysis shows different optimizers prefer distinct randomization types. The complete implementation and dataset are available on GitHub.
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
Authors: Authors: Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.04243
Pdf link: https://arxiv.org/pdf/2404.04243
Abstract Text-to-image diffusion models have shown remarkable success in generating a personalized subject based on a few reference images. However, current methods struggle with handling multiple subjects simultaneously, often resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by the Segment Anything Model for both training and inference, as a form of data augmentation for training and initialization for the generation process. Our experiments demonstrate that MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects as shown in Figure 1. In human evaluation, MuDI shows twice as many successes for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% compared to the strongest baseline. More results are available at https://mudi-t2i.github.io/.

LeeKyungwook / get-arxiv-noti

New submissions for Mon, 8 Apr 24 #1053

Keyword: detection

X-lifecycle Learning for Cloud Incident Management using LLMs

Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

Securing Social Spaces: Harnessing Deep Learning to Eradicate Cyberbullying

Improvement of Performance in Freezing of Gait detection in Parkinsons Disease using Transformer networks and a single waist worn triaxial accelerometer

SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection

Test Time Training for Industrial Anomaly Segmentation

Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations

On Extending the Automatic Test Markup Language (ATML) for Machine Learning

R5Detect: Detecting Control-Flow Attacks from Standard RISC-V Enclaves

A Systems Theoretic Approach to Online Machine Learning

Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

Estimating mixed memberships in multi-layer networks

Towards introspective loop closure in 4D radar SLAM

Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study

Pros and Cons! Evaluating ChatGPT on Software Vulnerability

A Flexible Evolutionary Algorithm With Dynamic Mutation Rate Archive

Fusing Dictionary Learning and Support Vector Machines for Unsupervised Anomaly Detection

Designing Robots to Help Women

Improving Detection in Aerial Images by Capturing Inter-Object Relationships

SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers

Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection

Watermark-based Detection and Attribution of AI-Generated Content

Keyword: face recognition

Keyword: augmentation

Machine Learning in Proton Exchange Membrane Water Electrolysis -- Part I: A Knowledge-Integrated Framework

JUICER: Data-Efficient Imitation Learning for Robotic Assembly

Some observations regarding the RBF-FD approximation accuracy dependence on stencil size

A proximal policy optimization based intelligent home solar management

Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI

Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving

Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models