Abstract
A self-healing software system is an advanced computer program or system designed to detect, diagnose, and automatically recover from faults or errors without human intervention. These systems are typically employed in mission-critical applications where downtime can have significant financial or operational consequences. Failure detection is one of the important steps in the self-healing system. In this research, a method using runtime verification is proposed to diagnose four types of errors at the component level. The simulation on mRUBIS shows that the suggested method has the necessary efficiency in detecting the occurrence of failures.
Early-stage detection of cognitive impairment by hybrid quantum-classical algorithm using resting-state functional MRI time-series
Authors: Authors: Junggu Choi, Tak Hur, Daniel K. Park, Na-Young Shin, Seung-Koo Lee, Hakbae Lee, Sanghoon Han
Abstract
Following the recent development of quantum machine learning techniques, the literature has reported several quantum machine learning algorithms for disease detection. This study explores the application of a hybrid quantum-classical algorithm for classifying region-of-interest time-series data obtained from resting-state functional magnetic resonance imaging in patients with early-stage cognitive impairment based on the importance of cognitive decline for dementia or aging. Classical one-dimensional convolutional layers are used together with quantum convolutional neural networks in our hybrid algorithm. In the classical simulation, the proposed hybrid algorithms showed higher balanced accuracies than classical convolutional neural networks under the similar training conditions. Moreover, a total of nine brain regions (left precentral gyrus, right superior temporal gyrus, left rolandic operculum, right rolandic operculum, left parahippocampus, right hippocampus, left medial frontal gyrus, right cerebellum crus, and cerebellar vermis) among 116 brain regions were found to be relatively effective brain regions for the classification based on the model performances. The associations of the selected nine regions with cognitive decline, as found in previous studies, were additionally validated through seed-based functional connectivity analysis. We confirmed both the improvement of model performance with the quantum convolutional neural network and neuroscientific validities of brain regions from our hybrid quantum-classical model.
Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024
Authors: Authors: Thuy Nguyen Thi, Anh Nguyen Viet, Thin Dang Van, Ngan Nguyen Luu Thuy
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our bestperforming system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted F1-score of 67.80%, delivering our team the 3rd rank in Sub-task I for the Software Mention Recognition task.
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models
Authors: Authors: Tanmay Sen, Ansuman Das, Mrinmay Sen
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Hate speech encompasses verbal, written, or behavioral communication that targets derogatory or discriminatory language against individuals or groups based on sensitive characteristics. Automated hate speech detection plays a crucial role in curbing its propagation, especially across social media platforms. Various methods, including recent advancements in deep learning, have been devised to address this challenge. In this study, we introduce HateTinyLLM, a novel framework based on fine-tuned decoder-only tiny large language models (tinyLLMs) for efficient hate speech detection. Our experimental findings demonstrate that the fine-tuned HateTinyLLM outperforms the pretrained mixtral-7b model by a significant margin. We explored various tiny LLMs, including PY007/TinyLlama-1.1B-step-50K-105b, Microsoft/phi-2, and facebook/opt-1.3b, and fine-tuned them using LoRA and adapter methods. Our observations indicate that all LoRA-based fine-tuned models achieved over 80\% accuracy.
Large Language Model Agent for Fake News Detection
Authors: Authors: Xinyi Li, Yongfeng Zhang, Edward C. Malthouse
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract
In the current digital era, the rapid spread of misinformation on online platforms presents significant challenges to societal well-being, public trust, and democratic processes, influencing critical decision making and public opinion. To address these challenges, there is a growing need for automated fake news detection mechanisms. Pre-trained large language models (LLMs) have demonstrated exceptional capabilities across various natural language processing (NLP) tasks, prompting exploration into their potential for verifying news claims. Instead of employing LLMs in a non-agentic way, where LLMs generate responses based on direct prompts in a single shot, our work introduces FactAgent, an agentic approach of utilizing LLMs for fake news detection. FactAgent enables LLMs to emulate human expert behavior in verifying news claims without any model training, following a structured workflow. This workflow breaks down the complex task of news veracity checking into multiple sub-steps, where LLMs complete simple tasks using their internal knowledge or external tools. At the final step of the workflow, LLMs integrate all findings throughout the workflow to determine the news claim's veracity. Compared to manual human verification, FactAgent offers enhanced efficiency. Experimental studies demonstrate the effectiveness of FactAgent in verifying claims without the need for any training process. Moreover, FactAgent provides transparent explanations at each step of the workflow and during final decision-making, offering insights into the reasoning process of fake news detection for end users. FactAgent is highly adaptable, allowing for straightforward updates to its tools that LLMs can leverage within the workflow, as well as updates to the workflow itself using domain knowledge. This adaptability enables FactAgent's application to news verification across various domains.
Improving Disease Detection from Social Media Text via Self-Augmentation and Contrastive Learning
Authors: Authors: Pervaiz Iqbal Khan, Andreas Dengel, Sheraz Ahmed
Abstract
Detecting diseases from social media has diverse applications, such as public health monitoring and disease spread detection. While language models (LMs) have shown promising performance in this domain, there remains ongoing research aimed at refining their discriminating representations. In this paper, we propose a novel method that integrates Contrastive Learning (CL) with language modeling to address this challenge. Our approach introduces a self-augmentation method, wherein hidden representations of the model are augmented with their own representations. This method comprises two branches: the first branch, a traditional LM, learns features specific to the given data, while the second branch incorporates augmented representations from the first branch to encourage generalization. CL further refines these representations by pulling pairs of original and augmented versions closer while pushing other samples away. We evaluate our method on three NLP datasets encompassing binary, multi-label, and multi-class classification tasks involving social media posts related to various diseases. Our approach demonstrates notable improvements over traditional fine-tuning methods, achieving up to a 2.48% increase in F1-score compared to baseline approaches and a 2.1% enhancement over state-of-the-art methods.
A probabilistic estimation of remaining useful life from censored time-to-event data
Authors: Authors: Christian Marius Lillelund, Fernando Pannullo, Morten Opprud Jakobsen, Manuel Morante, Christian Fischer Pedersen
Abstract
Predicting the remaining useful life (RUL) of ball bearings plays an important role in predictive maintenance. A common definition of the RUL is the time until a bearing is no longer functional, which we denote as an event, and many data-driven methods have been proposed to predict the RUL. However, few studies have addressed the problem of censored data, where this event of interest is not observed, and simply ignoring these observations can lead to an overestimation of the failure risk. In this paper, we propose a probabilistic estimation of RUL using survival analysis that supports censored data. First, we analyze sensor readings from ball bearings in the frequency domain and annotate when a bearing starts to deteriorate by calculating the Kullback-Leibler (KL) divergence between the probability density function (PDF) of the current process and a reference PDF. Second, we train several survival models on the annotated bearing dataset, capable of predicting the RUL over a finite time horizon using the survival function. This function is guaranteed to be strictly monotonically decreasing and is an intuitive estimation of the remaining lifetime. We demonstrate our approach in the XJTU-SY dataset using cross-validation and find that Random Survival Forests consistently outperforms both non-neural networks and neural networks in terms of the mean absolute error (MAE). Our work encourages the inclusion of censored data in predictive maintenance models and highlights the unique advantages that survival analysis offers when it comes to probabilistic RUL estimation and early fault detection.
Out-of-distribution detection based on subspace projection of high-dimensional features output by the last convolutional layer
Authors: Authors: Qiuyu Zhu, Yiwei He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Out-of-distribution (OOD) detection, crucial for reliable pattern classification, discerns whether a sample originates outside the training distribution. This paper concentrates on the high-dimensional features output by the final convolutional layer, which contain rich image features. Our key idea is to project these high-dimensional features into two specific feature subspaces, leveraging the dimensionality reduction capacity of the network's linear layers, trained with Predefined Evenly-Distribution Class Centroids (PEDCC)-Loss. This involves calculating the cosines of three projection angles and the norm values of features, thereby identifying distinctive information for in-distribution (ID) and OOD data, which assists in OOD detection. Building upon this, we have modified the batch normalization (BN) and ReLU layer preceding the fully connected layer, diminishing their impact on the output feature distributions and thereby widening the distribution gap between ID and OOD data features. Our method requires only the training of the classification network model, eschewing any need for input pre-processing or specific OOD data pre-tuning. Extensive experiments on several benchmark datasets demonstrates that our approach delivers state-of-the-art performance. Our code is available at https://github.com/Hewell0/ProjOOD.
WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs
Abstract
Originating from semantic bugs, Entity-Inconsistency Bugs (EIBs) involve misuse of syntactically valid yet incorrect program entities, such as variable identifiers and function names, which often have security implications. Unlike straightforward syntactic vulnerabilities, EIBs are subtle and can remain undetected for years. Traditional detection methods, such as static analysis and dynamic testing, often fall short due to the versatile and context-dependent nature of EIBs. However, with advancements in Large Language Models (LLMs) like GPT-4, we believe LLM-powered automatic EIB detection becomes increasingly feasible through these models' semantics understanding abilities. This research first undertakes a systematic measurement of LLMs' capabilities in detecting EIBs, revealing that GPT-4, while promising, shows limited recall and precision that hinder its practical application. The primary problem lies in the model's tendency to focus on irrelevant code snippets devoid of EIBs. To address this, we introduce a novel, cascaded EIB detection system named WitheredLeaf, which leverages smaller, code-specific language models to filter out most negative cases and mitigate the problem, thereby significantly enhancing the overall precision and recall. We evaluated WitheredLeaf on 154 Python and C GitHub repositories, each with over 1,000 stars, identifying 123 new flaws, 45% of which can be exploited to disrupt the program's normal operations. Out of 69 submitted fixes, 27 have been successfully merged.
Generative AI in Cybersecurity
Authors: Authors: Shivani Metta, Isaac Chang, Jack Parker, Michael P. Roman, Arturo F. Ehuan
Abstract
The dawn of Generative Artificial Intelligence (GAI), characterized by advanced models such as Generative Pre-trained Transformers (GPT) and other Large Language Models (LLMs), has been pivotal in reshaping the field of data analysis, pattern recognition, and decision-making processes. This surge in GAI technology has ushered in not only innovative opportunities for data processing and automation but has also introduced significant cybersecurity challenges. As GAI rapidly progresses, it outstrips the current pace of cybersecurity protocols and regulatory frameworks, leading to a paradox wherein the same innovations meant to safeguard digital infrastructures also enhance the arsenal available to cyber criminals. These adversaries, adept at swiftly integrating and exploiting emerging technologies, may utilize GAI to develop malware that is both more covert and adaptable, thus complicating traditional cybersecurity efforts. The acceleration of GAI presents an ambiguous frontier for cybersecurity experts, offering potent tools for threat detection and response, while concurrently providing cyber attackers with the means to engineer more intricate and potent malware. Through the joint efforts of Duke Pratt School of Engineering, Coalfire, and Safebreach, this research undertakes a meticulous analysis of how malicious agents are exploiting GAI to augment their attack strategies, emphasizing a critical issue for the integrity of future cybersecurity initiatives. The study highlights the critical need for organizations to proactively identify and develop more complex defensive strategies to counter the sophisticated employment of GAI in malware creation.
Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving
Authors: Authors: Zhenjiang Mao, Dong-You Jhong, Ao Wang, Ivan Ruchkin
Abstract
Out-of-distribution (OOD) detection is essential in autonomous driving, to determine when learning-based components encounter unexpected inputs. Traditional detectors typically use encoder models with fixed settings, thus lacking effective human interaction capabilities. With the rise of large foundation models, multimodal inputs offer the possibility of taking human language as a latent representation, thus enabling language-defined OOD detection. In this paper, we use the cosine similarity of image and text representations encoded by the multimodal model CLIP as a new representation to improve the transparency and controllability of latent encodings used for visual anomaly detection. We compare our approach with existing pre-trained encoders that can only produce latent representations that are meaningless from the user's standpoint. Our experiments on realistic driving data show that the language-based latent representation performs better than the traditional representation of the vision encoder and helps improve the detection performance when combined with standard representations.
SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients
Abstract
Small object detection in aerial imagery presents significant challenges in computer vision due to the minimal data inherent in small-sized objects and their propensity to be obscured by larger objects and background noise. Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases, which adversely affect their performance with objects of varying orientations and scales. This underscores the need for more adaptable, lightweight models. In response, this paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects. Firstly, we explore the use of the SAHI framework on the newly introduced lightweight YOLO v9 architecture, which utilizes Programmable Gradient Information (PGI) to reduce the substantial information loss typically encountered in sequential feature extraction processes. The paper employs the Vision Mamba model, which incorporates position embeddings to facilitate precise location-aware visual understanding, combined with a novel bidirectional State Space Model (SSM) for effective visual context modeling. This State Space Model adeptly harnesses the linear complexity of CNNs and the global receptive field of Transformers, making it particularly effective in remote sensing image classification. Our experimental results demonstrate substantial improvements in detection accuracy and processing efficiency, validating the applicability of these approaches for real-time small object detection across diverse aerial scenarios. This paper also discusses how these methodologies could serve as foundational models for future advancements in aerial object recognition technologies. The source code will be made accessible here.
Explainability Guided Adversarial Evasion Attacks on Malware Detectors
Abstract
As the focus on security of Artificial Intelligence (AI) is becoming paramount, research on crafting and inserting optimal adversarial perturbations has become increasingly critical. In the malware domain, this adversarial sample generation relies heavily on the accuracy and placement of crafted perturbation with the goal of evading a trained classifier. This work focuses on applying explainability techniques to enhance the adversarial evasion attack on a machine-learning-based Windows PE malware detector. The explainable tool identifies the regions of PE malware files that have the most significant impact on the decision-making process of a given malware detector, and therefore, the same regions can be leveraged to inject the adversarial perturbation for maximum efficiency. Profiling all the PE malware file regions based on their impact on the malware detector's decision enables the derivation of an efficient strategy for identifying the optimal location for perturbation injection. The strategy should incorporate the region's significance in influencing the malware detector's decision and the sensitivity of the PE malware file's integrity towards modifying that region. To assess the utility of explainable AI in crafting an adversarial sample of Windows PE malware, we utilize the DeepExplainer module of SHAP for determining the contribution of each region of PE malware to its detection by a CNN-based malware detector, MalConv. Furthermore, we analyzed the significance of SHAP values at a more granular level by subdividing each section of Windows PE into small subsections. We then performed an adversarial evasion attack on the subsections based on the corresponding SHAP values of the byte sequences.
Diabetic Retinopathy Detection Using Quantum Transfer Learning
Authors: Authors: Ankush Jain, Rinav Gupta, Jai Singhal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Diabetic Retinopathy (DR), a prevalent complication in diabetes patients, can lead to vision impairment due to lesions formed on the retina. Detecting DR at an advanced stage often results in irreversible blindness. The traditional process of diagnosing DR through retina fundus images by ophthalmologists is not only time-intensive but also expensive. While classical transfer learning models have been widely adopted for computer-aided detection of DR, their high maintenance costs can hinder their detection efficiency. In contrast, Quantum Transfer Learning offers a more effective solution to this challenge. This approach is notably advantageous because it operates on heuristic principles, making it highly optimized for the task. Our proposed methodology leverages this hybrid quantum transfer learning technique to detect DR. To construct our model, we utilize the APTOS 2019 Blindness Detection dataset, available on Kaggle. We employ the ResNet-18, ResNet34, ResNet50, ResNet101, ResNet152 and Inception V3, pre-trained classical neural networks, for the initial feature extraction. For the classification stage, we use a Variational Quantum Classifier. Our hybrid quantum model has shown remarkable results, achieving an accuracy of 97% for ResNet-18. This demonstrates that quantum computing, when integrated with quantum machine learning, can perform tasks with a level of power and efficiency unattainable by classical computers alone. By harnessing these advanced technologies, we can significantly improve the detection and diagnosis of Diabetic Retinopathy, potentially saving many from the risk of blindness. Keywords: Diabetic Retinopathy, Quantum Transfer Learning, Deep Learning
Hierarchical mixture of discriminative Generalized Dirichlet classifiers
Abstract
This paper presents a discriminative classifier for compositional data. This classifier is based on the posterior distribution of the Generalized Dirichlet which is the discriminative counterpart of Generalized Dirichlet mixture model. Moreover, following the mixture of experts paradigm, we proposed a hierarchical mixture of this classifier. In order to learn the models parameters, we use a variational approximation by deriving an upper-bound for the Generalized Dirichlet mixture. To the best of our knownledge, this is the first time this bound is proposed in the literature. Experimental results are presented for spam detection and color space identification.
Towards Green Communication: Soft Decoding Scheme for OOK Signals in Zero-Energy Devices
Authors: Authors: Ticao Zhang, Dennis Hui, Mehrnaz Afshang, Mohammad Mozaffari
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
The booming of Internet-of-Things (IoT) is expected to provide more intelligent and reliable communication services for higher network coverage, massive connectivity, and low-cost solutions for 6G services. However, frequent charging and battery replacement of these massive IoT devices brings a series of challenges. Zero energy devices, which rely on energy-harvesting technologies and can operate without battery replacement or charging, play a pivotal role in facilitating the massive use of IoT devices. In order to enable reliable communications of such low-power devices, Manchester-coded on-off keying (OOK) modulation and non-coherent detections are attractive techniques due to their energy efficiency, robustness in noisy environments, and simplicity in receiver design. Moreover, to extend their communication range, employing channel coding along with enhanced detection schemes is crucial. In this paper, a novel soft-decision decoder is designed for OOK-based low-power receivers to enhance their detection performance. In addition, exact closed-form expressions and two simplified approximations are derived for the log-likelihood ratio (LLR), an essential metric for soft decoding. Numerical results demonstrate the significant coverage gain achieved through soft decoding for convolutional code.
FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space
Authors: Authors: Hui Ma, Sen Lei, Turgay Celik, Heng-Chao Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Facial Expression Recognition (FER) plays a pivotal role in understanding human emotional cues. However, traditional FER methods based on visual information have some limitations, such as preprocessing, feature extraction, and multi-stage classification procedures. These not only increase computational complexity but also require a significant amount of computing resources. Considering Convolutional Neural Network (CNN)-based FER schemes frequently prove inadequate in identifying the deep, long-distance dependencies embedded within facial expression images, and the Transformer's inherent quadratic computational complexity, this paper presents the FER-YOLO-Mamba model, which integrates the principles of Mamba and YOLO technologies to facilitate efficient coordination in facial expression image recognition and localization. Within the FER-YOLO-Mamba model, we further devise a FER-YOLO-VSS dual-branch module, which combines the inherent strengths of convolutional layers in local feature extraction with the exceptional capability of State Space Models (SSMs) in revealing long-distance dependencies. To the best of our knowledge, this is the first Vision Mamba model designed for facial expression detection and classification. To evaluate the performance of the proposed FER-YOLO-Mamba model, we conducted experiments on two benchmark datasets, RAF-DB and SFEW. The experimental results indicate that the FER-YOLO-Mamba model achieved better results compared to other models. The code is available from https://github.com/SwjtuMa/FER-YOLO-Mamba.
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
Authors: Authors: Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee
Abstract
To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts.
Detecting and Deterring Manipulation in a Cognitive Hierarchy
Authors: Authors: Nitay Alon, Lion Schulz, Joseph M. Barnby, Jeffrey S. Rosenschein, Peter Dayan
Subjects: Multiagent Systems (cs.MA); Computer Science and Game Theory (cs.GT)
Abstract
Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper reasoning and more sophisticated opponent modelling. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework, $\aleph$-IPOMDP, augmenting model-based RL agents' Bayesian inference with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and zero-sum game. Our results show the $\aleph$ mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.
An Onboard Framework for Staircases Modeling Based on Point Clouds
Authors: Authors: Chun Qing, Rongxiang Zeng, Xuan Wu, Yongliang Shi, Gan Ma
Abstract
The detection of traversable regions on staircases and the physical modeling constitutes pivotal aspects of the mobility of legged robots. This paper presents an onboard framework tailored to the detection of traversable regions and the modeling of physical attributes of staircases by point cloud data. To mitigate the influence of illumination variations and the overfitting due to the dataset diversity, a series of data augmentations are introduced to enhance the training of the fundamental network. A curvature suppression cross-entropy(CSCE) loss is proposed to reduce the ambiguity of prediction on the boundary between traversable and non-traversable regions. Moreover, a measurement correction based on the pose estimation of stairs is introduced to calibrate the output of raw modeling that is influenced by tilted perspectives. Lastly, we collect a dataset pertaining to staircases and introduce new evaluation criteria. Through a series of rigorous experiments conducted on this dataset, we substantiate the superior accuracy and generalization capabilities of our proposed method. Codes, models, and datasets will be available at https://github.com/szturobotics/Stair-detection-and-modeling-project.
Lightweight Change Detection in Heterogeneous Remote Sensing Images with Online All-Integer Pruning Training
Authors: Authors: Chengyang Zhang, Weiming Li, Gang Li, Huina Song, Zhaohui Song, Xueqian Wang, Antonio Plaza
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Detection of changes in heterogeneous remote sensing images is vital, especially in response to emergencies like earthquakes and floods. Current homogenous transformation-based change detection (CD) methods often suffer from high computation and memory costs, which are not friendly to edge-computation devices like onboard CD devices at satellites. To address this issue, this paper proposes a new lightweight CD method for heterogeneous remote sensing images that employs the online all-integer pruning (OAIP) training strategy to efficiently fine-tune the CD network using the current test data. The proposed CD network consists of two visual geometry group (VGG) subnetworks as the backbone architecture. In the OAIP-based training process, all the weights, gradients, and intermediate data are quantized to integers to speed up training and reduce memory usage, where the per-layer block exponentiation scaling scheme is utilized to reduce the computation errors of network parameters caused by quantization. Second, an adaptive filter-level pruning method based on the L1-norm criterion is employed to further lighten the fine-tuning process of the CD network. Experimental results show that the proposed OAIP-based method attains similar detection performance (but with significantly reduced computation complexity and memory usage) in comparison with state-of-the-art CD methods.
An Attention Based Pipeline for Identifying Pre-Cancer Lesions in Head and Neck Clinical Images
Authors: Authors: Abdullah Alsalemi, Anza Shakeel, Mollie Clark, Syed Ali Khurram, Shan E Ahmed Raza
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Early detection of cancer can help improve patient prognosis by early intervention. Head and neck cancer is diagnosed in specialist centres after a surgical biopsy, however, there is a potential for these to be missed leading to delayed diagnosis. To overcome these challenges, we present an attention based pipeline that identifies suspected lesions, segments, and classifies them as non-dysplastic, dysplastic and cancerous lesions. We propose (a) a vision transformer based Mask R-CNN network for lesion detection and segmentation of clinical images, and (b) Multiple Instance Learning (MIL) based scheme for classification. Current results show that the segmentation model produces segmentation masks and bounding boxes with up to 82% overlap accuracy score on unseen external test data and surpassing reviewed segmentation benchmarks. Next, a classification F1-score of 85% on the internal cohort test set. An app has been developed to perform lesion segmentation taken via a smart device. Future work involves employing endoscopic video data for precise early detection and prognosis.
Adversarial Botometer: Adversarial Analysis for Social Bot Detection
Abstract
Social bots play a significant role in many online social networks (OSN) as they imitate human behavior. This fact raises difficult questions about their capabilities and potential risks. Given the recent advances in Generative AI (GenAI), social bots are capable of producing highly realistic and complex content that mimics human creativity. As the malicious social bots emerge to deceive people with their unrealistic content, identifying them and distinguishing the content they produce has become an actual challenge for numerous social platforms. Several approaches to this problem have already been proposed in the literature, but the proposed solutions have not been widely evaluated. To address this issue, we evaluate the behavior of a text-based bot detector in a competitive environment where some scenarios are proposed: \textit{First}, the tug-of-war between a bot and a bot detector is examined. It is interesting to analyze which party is more likely to prevail and which circumstances influence these expectations. In this regard, we model the problem as a synthetic adversarial game in which a conversational bot and a bot detector are engaged in strategic online interactions. \textit{Second}, the bot detection model is evaluated under attack examples generated by a social bot; to this end, we poison the dataset with attack examples and evaluate the model performance under this condition. \textit{Finally}, to investigate the impact of the dataset, a cross-domain analysis is performed. Through our comprehensive evaluation of different categories of social bots using two benchmark datasets, we were able to demonstrate some achivement that could be utilized in future works.
Are We in The Zone? Exploring The Features and Method of Detecting Simultaneous Flow Experiences Based on EEG Signals
Abstract
When executing interdependent personal tasks for the team's purpose, simultaneous individual flow(simultaneous flow) is the antecedent condition of achieving shared team flow. Detecting simultaneous flow helps better understanding the status of team members, which is thus important for optimizing multi-user interaction systems. However, there is currently a lack exploration on objective features and methods for detecting simultaneous flow. Based on brain mechanism of flow in teamwork and previous studies on electroencephalogram (EEG)-based individual flow detection, this study aims to explore the significant EEG features related to simultaneous flow, as well as effective detection methods based on EEG signals. First, a two-player simultaneous flow task is designed, based on which we construct the first multi-EEG signals dataset of simultaneous flow. Then, we explore the potential EEG signal features that may be related to individual and simultaneous flow and validate their effectiveness in simultaneous flow detection with various machine learning models. The results show that 1) the inter-brain synchrony features are relevant to simultaneous flow due to enhancing the models' performance in detecting different types of simultaneous flow; 2) the features from the frontal lobe area seem to be given priority attention when detecting simultaneous flows; 3) Random Forests performed best in binary classification while Neural Network and Deep Neural Network3 performed best in ternary classification.
Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection
Authors: Authors: Canhui Tang, Sanping Zhou, Yizhe Li, Yonghao Dong, Le Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution. However, it still remains a challenging issue to maintain these ideal assumptions in practice. In this paper, we propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND, which sequentially performs Anomaly Amplification and Normality Distillation to obtain robust feature discrepancy. In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder. With the exposure of synthetic anomalies, it amplifies anomalies via residual generation while maintaining the integrity of pre-trained model. It mainly comprises a Matching-guided Residual Gate and an Attribute-scaling Residual Generator, which can determine the residuals' proportion and characteristic, respectively. In the second normality distillation stage, we further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns. Comprehensive experiments on the MvTecAD, VisA, and MvTec3D-RGB datasets show that our method achieves state-of-the-art performance.
Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation
Abstract
The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com/xzhouzeng/PRPose.
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Authors: Authors: Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Abstract
Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for.In this paper we study the potential of large-scale pre-trained models for audio deepfake detection, with special focus on generalization ability. To this end, the detection problem is reformulated in a speaker verification framework and fake audios are exposed by the mismatch between the voice sample under test and the voice of the claimed identity. With this paradigm, no fake speech sample is necessary in training, cutting off any link with the generation method at the root, and ensuring full generalization ability. Features are extracted by general-purpose large pre-trained models, with no need for training or fine-tuning on specific fake detection or speaker verification datasets. At detection time only a limited set of voice fragments of the identity under test is required. Experiments on several datasets widespread in the community show that detectors based on pre-trained models achieve excellent performance and show strong generalization ability, rivaling supervised methods on in-distribution data and largely overcoming them on out-of-distribution data.
Impact of emoji exclusion on the performance of Arabic sarcasm detection models
Authors: Authors: Ghalyah H. Aleryani, Wael Deabes, Khaled Albishre, Alaa E. Abdel-Hakim
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
The complex challenge of detecting sarcasm in Arabic speech on social media is increased by the language diversity and the nature of sarcastic expressions. There is a significant gap in the capability of existing models to effectively interpret sarcasm in Arabic, which mandates the necessity for more sophisticated and precise detection methods. In this paper, we investigate the impact of a fundamental preprocessing component on sarcasm speech detection. While emojis play a crucial role in mitigating the absence effect of body language and facial expressions in modern communication, their impact on automated text analysis, particularly in sarcasm detection, remains underexplored. We investigate the impact of emoji exclusion from datasets on the performance of sarcasm detection models in social media content for Arabic as a vocabulary-super rich language. This investigation includes the adaptation and enhancement of AraBERT pre-training models, specifically by excluding emojis, to improve sarcasm detection capabilities. We use AraBERT pre-training to refine the specified models, demonstrating that the removal of emojis can significantly boost the accuracy of sarcasm detection. This approach facilitates a more refined interpretation of language, eliminating the potential confusion introduced by non-textual elements. The evaluated AraBERT models, through the focused strategy of emoji removal, adeptly navigate the complexities of Arabic sarcasm. This study establishes new benchmarks in Arabic natural language processing and presents valuable insights for social media platforms.
Subgraph2vec: A random walk-based algorithm for embedding knowledge graphs
Abstract
Graph is an important data representation which occurs naturally in the real world applications \cite{goyal2018graph}. Therefore, analyzing graphs provides users with better insights in different areas such as anomaly detection \cite{ma2021comprehensive}, decision making \cite{fan2023graph}, clustering \cite{tsitsulin2023graph}, classification \cite{wang2021mixup} and etc. However, most of these methods require high levels of computational time and space. We can use other ways like embedding to reduce these costs. Knowledge graph (KG) embedding is a technique that aims to achieve the vector representation of a KG. It represents entities and relations of a KG in a low-dimensional space while maintaining the semantic meanings of them. There are different methods for embedding graphs including random walk-based methods such as node2vec, metapath2vec and regpattern2vec. However, most of these methods bias the walks based on a rigid pattern usually hard-coded in the algorithm. In this work, we introduce \textit{subgraph2vec} for embedding KGs where walks are run inside a user-defined subgraph. We use this embedding for link prediction and prove our method has better performance in most cases in comparison with the previous ones.
Keyword: face recognition
There is no result
Keyword: augmentation
CodeFort: Robust Training for Code Generation Models
Abstract
Code generation models are not robust to small perturbations, which often lead to inconsistent and incorrect generations and significantly degrade the performance of these models. Improving the robustness of code generation models is crucial to better user experience when these models are deployed in real-world applications. However, existing efforts have not addressed this issue for code generation models. To fill this gap, we propose CodeFort, a framework to improve the robustness of code generation models, generalizing a large variety of code perturbations to enrich the training data and enabling various robust training strategies, mixing data augmentation, batch augmentation, adversarial logits pairing, and contrastive learning, all carefully designed to support high-throughput training. Extensive evaluations show that we improve the average robust pass rates of baseline CodeGen models from 14.79 to 21.74. Notably, the improvement in robustness against code-syntax perturbations is evidenced by a significant decrease in pass rate drop from 95.04% to 53.35%
Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications
Authors: Authors: Sujit Khanna, Shishir Subedi
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract
In recent times Large Language Models have exhibited tremendous capabilities, especially in the areas of mathematics, code generation and general-purpose reasoning. However for specialized domains especially in applications that require parsing and analyzing large chunks of numeric or tabular data even state-of-the-art (SOTA) models struggle. In this paper, we introduce a new approach to solving domain-specific tabular data analysis tasks by presenting a unique RAG workflow that mitigates the scalability issues of existing tabular LLM solutions. Specifically, we present Tabular Embedding Model (TEM), a novel approach to fine-tune embedding models for tabular Retrieval-Augmentation Generation (RAG) applications. Embedding models form a crucial component in the RAG workflow and even current SOTA embedding models struggle as they are predominantly trained on textual datasets and thus underperform in scenarios involving complex tabular data. The evaluation results showcase that our approach not only outperforms current SOTA embedding models in this domain but also does so with a notably smaller and more efficient model structure.
Improving Disease Detection from Social Media Text via Self-Augmentation and Contrastive Learning
Authors: Authors: Pervaiz Iqbal Khan, Andreas Dengel, Sheraz Ahmed
Abstract
Detecting diseases from social media has diverse applications, such as public health monitoring and disease spread detection. While language models (LMs) have shown promising performance in this domain, there remains ongoing research aimed at refining their discriminating representations. In this paper, we propose a novel method that integrates Contrastive Learning (CL) with language modeling to address this challenge. Our approach introduces a self-augmentation method, wherein hidden representations of the model are augmented with their own representations. This method comprises two branches: the first branch, a traditional LM, learns features specific to the given data, while the second branch incorporates augmented representations from the first branch to encourage generalization. CL further refines these representations by pulling pairs of original and augmented versions closer while pushing other samples away. We evaluate our method on three NLP datasets encompassing binary, multi-label, and multi-class classification tasks involving social media posts related to various diseases. Our approach demonstrates notable improvements over traditional fine-tuning methods, achieving up to a 2.48% increase in F1-score compared to baseline approaches and a 2.1% enhancement over state-of-the-art methods.
ATNPA: A Unified View of Oversmoothing Alleviation in Graph Neural Networks
Abstract
Oversmoothing is a commonly observed challenge in graph neural network (GNN) learning, where, as layers increase, embedding features learned from GNNs quickly become similar/indistinguishable, making them incapable of differentiating network proximity. A GNN with shallow layer architectures can only learn short-term relation or localized structure information, limiting its power of learning long-term connection, evidenced by their inferior learning performance on heterophilous graphs. Tackling oversmoothing is crucial to harness deep-layer architectures for GNNs. To date, many methods have been proposed to alleviate oversmoothing. The vast difference behind their design principles, combined with graph complications, make it difficult to understand and even compare their difference in tackling the oversmoothing. In this paper, we propose ATNPA, a unified view with five key steps: Augmentation, Transformation, Normalization, Propagation, and Aggregation, to summarize GNN oversmoothing alleviation approaches. We first outline three themes to tackle oversmoothing, and then separate all methods into six categories, followed by detailed reviews of representative methods, including their relation to the ATNPA, and discussion about their niche, strength, and weakness. The review not only draws in-depth understanding of existing methods in the field, but also shows a clear road map for future study.
Adapting Self-Supervised Learning for Computational Pathology
Authors: Authors: Eric Zimmermann, Neil Tenenholtz, James Hall, George Shaikovski, Michal Zelechowski, Adam Casson, Fausto Milletari, Julian Viret, Eugene Vorontsov, Siqi Liu, Kristen Severson
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Self-supervised learning (SSL) has emerged as a key technique for training networks that can generalize well to diverse tasks without task-specific supervision. This property makes SSL desirable for computational pathology, the study of digitized images of tissues, as there are many target applications and often limited labeled training samples. However, SSL algorithms and models have been primarily developed in the field of natural images and whether their performance can be improved by adaptation to particular domains remains an open question. In this work, we present an investigation of modifications to SSL for pathology data, specifically focusing on the DINOv2 algorithm. We propose alternative augmentations, regularization functions, and position encodings motivated by the characteristics of pathology images. We evaluate the impact of these changes on several benchmarks to demonstrate the value of tailored approaches.
Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning
Authors: Authors: Rafael Elberg, Denis Parra, Mircea Petrache
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Image and multimodal machine learning tasks are very challenging to solve in the case of poorly distributed data. In particular, data availability and privacy restrictions exacerbate these hurdles in the medical domain. The state of the art in image generation quality is held by Latent Diffusion models, making them prime candidates for tackling this problem. However, a few key issues still need to be solved, such as the difficulty in generating data from under-represented classes and a slow inference process. To mitigate these issues, we propose a new method for image augmentation in long-tailed data based on leveraging the rich latent space of pre-trained Stable Diffusion Models. We create a modified separable latent space to mix head and tail class examples. We build this space via Iterated Learning of underlying sparsified embeddings, which we apply to task-specific saliency maps via a K-NN approach. Code is available at https://github.com/SugarFreeManatee/Feature-Space-Augmentation-and-Iterated-Learning
Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Authors: Authors: Saikat Chakraborty, Gabriel Ebner, Siddharth Bhat, Sarah Fakhoury, Sakina Fatima, Shuvendu Lahiri, Nikhil Swamy
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F programs and proofs, including software used in production systems ranging from Windows and Linux, to Python and Firefox. Our dataset includes around 32K top-level F definitions, each representing a type-directed program and proof synthesis problem -- producing a definition given a formal specification expressed as an F type. We provide a program-fragment checker that queries F to check the correctness of candidate solutions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker. Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F, with promising results. Our main finding in that the performance of fine-tuned smaller language models (such as Phi-2 or StarCoder) compare favorably with large language models (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements.
Creation of Novel Soft Robot Designs using Generative AI
Abstract
Soft robotics has emerged as a promising field with the potential to revolutionize industries such as healthcare and manufacturing. However, designing effective soft robots presents challenges, particularly in managing the complex interplay of material properties, structural design, and control strategies. Traditional design methods are often time-consuming and may not yield optimal designs. In this paper, we explore the use of generative AI to create 3D models of soft actuators. We create a dataset of over 70 text-shape pairings of soft pneumatic robot actuator designs, and adapt a latent diffusion model (SDFusion) to learn the data distribution and generate novel designs from it. By employing transfer learning and data augmentation techniques, we significantly improve the performance of the diffusion model. These findings highlight the potential of generative AI in designing complex soft robotic systems, paving the way for future advancements in the field.
An Onboard Framework for Staircases Modeling Based on Point Clouds
Authors: Authors: Chun Qing, Rongxiang Zeng, Xuan Wu, Yongliang Shi, Gan Ma
Abstract
The detection of traversable regions on staircases and the physical modeling constitutes pivotal aspects of the mobility of legged robots. This paper presents an onboard framework tailored to the detection of traversable regions and the modeling of physical attributes of staircases by point cloud data. To mitigate the influence of illumination variations and the overfitting due to the dataset diversity, a series of data augmentations are introduced to enhance the training of the fundamental network. A curvature suppression cross-entropy(CSCE) loss is proposed to reduce the ambiguity of prediction on the boundary between traversable and non-traversable regions. Moreover, a measurement correction based on the pose estimation of stairs is introduced to calibrate the output of raw modeling that is influenced by tilted perspectives. Lastly, we collect a dataset pertaining to staircases and introduce new evaluation criteria. Through a series of rigorous experiments conducted on this dataset, we substantiate the superior accuracy and generalization capabilities of our proposed method. Codes, models, and datasets will be available at https://github.com/szturobotics/Stair-detection-and-modeling-project.
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
Authors: Authors: Maxime Zanella, Ismail Ben Ayed
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmented views of a single image to enhance zero-shot generalization, is emerging as a significant area of interest. This has predominantly directed research efforts toward test-time prompt tuning. In contrast, we introduce a robust MeanShift for Test-time Augmentation (MTA), which surpasses prompt-based methods without requiring this intensive training procedure. This positions MTA as an ideal solution for both standalone and API-based applications. Additionally, our method does not rely on ad hoc rules (e.g., confidence threshold) used in some previous test-time augmentation techniques to filter the augmented views. Instead, MTA incorporates a quality assessment variable for each view directly into its optimization process, termed as the inlierness score. This score is jointly optimized with a density mode seeking process, leading to an efficient training- and hyperparameter-free approach. We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency. Deployed easily as plug-and-play module on top of zero-shot models and state-of-the-art few-shot methods, MTA shows systematic and consistent improvements.
Keyword: detection
Analysing software failure using runtime verification and LTL
Early-stage detection of cognitive impairment by hybrid quantum-classical algorithm using resting-state functional MRI time-series
Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models
Large Language Model Agent for Fake News Detection
Improving Disease Detection from Social Media Text via Self-Augmentation and Contrastive Learning
A probabilistic estimation of remaining useful life from censored time-to-event data
Out-of-distribution detection based on subspace projection of high-dimensional features output by the last convolutional layer
WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs
Generative AI in Cybersecurity
Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving
SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients
Explainability Guided Adversarial Evasion Attacks on Malware Detectors
Diabetic Retinopathy Detection Using Quantum Transfer Learning
Hierarchical mixture of discriminative Generalized Dirichlet classifiers
Towards Green Communication: Soft Decoding Scheme for OOK Signals in Zero-Energy Devices
FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
Detecting and Deterring Manipulation in a Cognitive Hierarchy
An Onboard Framework for Staircases Modeling Based on Point Clouds
Lightweight Change Detection in Heterogeneous Remote Sensing Images with Online All-Integer Pruning Training
An Attention Based Pipeline for Identifying Pre-Cancer Lesions in Head and Neck Clinical Images
Adversarial Botometer: Adversarial Analysis for Social Bot Detection
Are We in The Zone? Exploring The Features and Method of Detecting Simultaneous Flow Experiences Based on EEG Signals
Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection
Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Impact of emoji exclusion on the performance of Arabic sarcasm detection models
Subgraph2vec: A random walk-based algorithm for embedding knowledge graphs
Keyword: face recognition
There is no result
Keyword: augmentation
CodeFort: Robust Training for Code Generation Models
Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications
Improving Disease Detection from Social Media Text via Self-Augmentation and Contrastive Learning
ATNPA: A Unified View of Oversmoothing Alleviation in Graph Neural Networks
Adapting Self-Supervised Learning for Computational Pathology
Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning
Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Creation of Novel Soft Robot Designs using Generative AI
An Onboard Framework for Staircases Modeling Based on Point Clouds
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?