Abstract
Detecting faults in steel plates is crucial for ensuring the safety and reliability of the structures and industrial equipment. Early detection of faults can prevent further damage and costly repairs. This chapter aims at diagnosing and predicting the likelihood of steel plates developing faults using experimental text data. Various machine learning methods such as GWO-based and FDO-based MLP and CMLP are tested to classify steel plates as either faulty or non-faulty. The experiments produced promising results for all models, with similar accuracy and performance. However, the FDO-based MLP and CMLP models consistently achieved the best results, with 100% accuracy in all tested datasets. The other models' outcomes varied from one experiment to another. The findings indicate that models that employed the FDO as a learning algorithm had the potential to achieve higher accuracy with a little longer runtime compared to other algorithms. In conclusion, early detection of faults in steel plates is critical for maintaining safety and reliability, and machine learning techniques can help predict and diagnose these faults accurately.
Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach
Abstract
Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transaction data, focusing on technical advancements for robust and precise fraud detection. Results demonstrat e that the integration of NN and SMOTE exhibits superior precision, recall, and F1-score compared to traditional models, highlighting its potential as an advanced solution for handling imbalanced datasets in credit card fraud detection scenarios. This rese arch contributes to the ongoing efforts to develop effective and efficient mechanisms for safeguarding financial transactions from fraudulent activities.
SegNet: A Segmented Deep Learning based Convolutional Neural Network Approach for Drones Wildfire Detection
Authors: Authors: Aditya V. Jonnalagadda, Hashim A. Hashim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract
This research addresses the pressing challenge of enhancing processing times and detection capabilities in Unmanned Aerial Vehicle (UAV)/drone imagery for global wildfire detection, despite limited datasets. Proposing a Segmented Neural Network (SegNet) selection approach, we focus on reducing feature maps to boost both time resolution and accuracy significantly advancing processing speeds and accuracy in real-time wildfire detection. This paper contributes to increased processing speeds enabling real-time detection capabilities for wildfire, increased detection accuracy of wildfire, and improved detection capabilities of early wildfire, through proposing a new direction for image classification of amorphous objects like fire, water, smoke, etc. Employing Convolutional Neural Networks (CNNs) for image classification, emphasizing on the reduction of irrelevant features vital for deep learning processes, especially in live feed data for fire detection. Amidst the complexity of live feed data in fire detection, our study emphasizes on image feed, highlighting the urgency to enhance real-time processing. Our proposed algorithm combats feature overload through segmentation, addressing challenges arising from diverse features like objects, colors, and textures. Notably, a delicate balance of feature map size and dataset adequacy is pivotal. Several research papers use smaller image sizes, compromising feature richness which necessitating a new approach. We illuminate the critical role of pixel density in retaining essential details, especially for early wildfire detection. By carefully selecting number of filters during training, we underscore the significance of higher pixel density for proper feature selection. The proposed SegNet approach is rigorously evaluated using real-world dataset obtained by a drone flight and compared to state-of-the-art literature.
Research and application of artificial intelligence based webshell detection model: A literature review
Abstract
Webshell, as the "culprit" behind numerous network attacks, is one of the research hotspots in the field of cybersecurity. However, the complexity, stealthiness, and confusing nature of webshells pose significant challenges to the corresponding detection schemes. With the rise of Artificial Intelligence (AI) technology, researchers have started to apply different intelligent algorithms and neural network architectures to the task of webshell detection. However, the related research still lacks a systematic and standardized methodological process, which is confusing and redundant. Therefore, following the development timeline, we carefully summarize the progress of relevant research in this field, dividing it into three stages: Start Stage, Initial Development Stage, and In-depth Development Stage. We further elaborate on the main characteristics and core algorithms of each stage. In addition, we analyze the pain points and challenges that still exist in this field and predict the future development trend of this field from our point of view. To the best of our knowledge, this is the first review that details the research related to AI-based webshell detection. It is also hoped that this paper can provide detailed technical information for more researchers interested in AI-based webshell detection tasks.
Successive Interference Cancellation for ISAC in a Large Full-Duplex Cellular Network
Abstract
To reuse the scarce spectrum efficiently, a large full-duplex cellular network with integrated sensing and communication (ISAC) is studied. Monostatic detection at the base station (BS) is considered. At the BS, we receive two signals: the communication-mode uplink signal to be decoded and the radar-mode signal to be detected. After self-interference cancellation (SIC), inspired by NOMA, successive interference cancellation (SuIC) is a natural strategy at the BS to retrieve both signals. However, the ordering of SuIC, usually based on some measure of channel strength, is not clear as the radar-mode target is unknown. The detection signal suffers a double path-loss making it vulnerable, but the uplink signal to be decoded originates at a user which has much lower power than the BS making it weak as well. Further, the intercell interference from a large network reduces the channel disparity between the two signals. We investigate the impact of both SuIC orders at the BS, i.e., decoding $1^{st}$ or detecting $1^{st}$ and highlight the importance of careful order selection. We find the existence of a threshold target distance before which detecting $1^{st}$ is superior and decoding $2^{nd}$ does not suffer much. After this distance, both decoding $1^{st}$ and detecting $2^{nd}$ is superior. Similarly, a threshold UE power exists after which the optimum SuIC order changes. We consider imperfections in SIC; this helps highlight the vulnerability of the decoding and detection in the setup.
Graph Neural Network Approach to Semantic Type Detection in Tables
Authors: Authors: Ehsan Hoseinzade, Ke Wang
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
This study addresses the challenge of detecting semantic column types in relational tables, a key task in many real-world applications. While language models like BERT have improved prediction accuracy, their token input constraints limit the simultaneous processing of intra-table and inter-table information. We propose a novel approach using Graph Neural Networks (GNNs) to model intra-table dependencies, allowing language models to focus on inter-table information. Our proposed method not only outperforms existing state-of-the-art algorithms but also offers novel insights into the utility and functionality of various GNN types for semantic type detection. The code is available at https://github.com/hoseinzadeehsan/GAIT
Greater benefits of deep learning-based computer-aided detection systems for finding small signals in 3D volumetric medical images
Authors: Authors: Devi Klein, Srijita Karmakar, Aditya Jonnalagadda, Craig K. Abbey, Miguel P. Eckstein
Abstract
Purpose: Radiologists are tasked with visually scrutinizing large amounts of data produced by 3D volumetric imaging modalities. Small signals can go unnoticed during the 3d search because they are hard to detect in the visual periphery. Recent advances in machine learning and computer vision have led to effective computer-aided detection (CADe) support systems with the potential to mitigate perceptual errors. Approach: Sixteen non-expert observers searched through digital breast tomosynthesis (DBT) phantoms and single cross-sectional slices of the DBT phantoms. The 3D/2D searches occurred with and without a convolutional neural network (CNN)-based CADe support system. The model provided observers with bounding boxes superimposed on the image stimuli while they looked for a small microcalcification signal and a large mass signal. Eye gaze positions were recorded and correlated with changes in the area under the ROC curve (AUC). Results: The CNN-CADe improved the 3D search for the small microcalcification signal (delta AUC = 0.098, p = 0.0002) and the 2D search for the large mass signal (delta AUC = 0.076, p = 0.002). The CNN-CADe benefit in 3D for the small signal was markedly greater than in 2D (delta delta AUC = 0.066, p = 0.035). Analysis of individual differences suggests that those who explored the least with eye movements benefited the most from the CNN-CADe (r = -0.528, p = 0.036). However, for the large signal, the 2D benefit was not significantly greater than the 3D benefit (delta delta AUC = 0.033, p = 0.133). Conclusion: The CNN-CADe brings unique performance benefits to the 3D (vs. 2D) search of small signals by reducing errors caused by the under-exploration of the volumetric data.
Logical analysis and contradiction detection in high-level requirements during the review process using sat-solver
Abstract
DO-178C stands out as a guiding standard for aviation system development processes. This standard not only mandates ensuring the consistency of requirements in the software verification process but also recognizes it as a mandatory element. The main objective of this study is to introduce a method for analyzing and identifying inconsistencies between high-level requirements using information obtained from a data dictionary. This method aims to transform high-level requirements into logical expressions and then thoroughly examine them using a SAT Solver to detect inconsistencies. While methods focused on identifying inconsistencies among requirements often appear in the literature, this study presents a novel approach to detect contradictions between non-natural language, systematically structured, and language-independent requirements. The goal of this approach is to significantly reduce the review time of high-level requirements in the software verification process. Evaluations indicate that the use of this method results in substantial time savings in the inconsistency detection process.
Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Authors: Authors: Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA). Specifically, each instance of the proposed benchmark involves three sets of human annotations to indicate the "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. In addition, we also introduce MMEval, a novel evaluation metric designed to better align with human preferences for CUVA, facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies. Finally, we propose a novel prompt-based method that can serve as a baseline approach for the challenging CUVA. We conduct extensive experiments to show the superiority of our evaluation metric and the prompt-based approach. Our code and dataset are available at https://github.com/fesvhtr/CUVA.
Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer
Abstract
Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still heavily relies on large labeled datasets for effective training. Several semi-supervised approaches have emerged to overcome this challenge, often employing CNN-based detectors with anchor proposals and post-processing techniques like non-maximal suppression (NMS). However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms. Previous research has focused on two key areas to improve transformer-based detectors: refining the quality of object queries and optimizing attention mechanisms. However, increasing object queries can introduce redundancy, while adjustments to the attention mechanism can increase complexity. To address these challenges, we introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features. Our approach demonstrates remarkable reductions in false positives and substantial enhancements in table detection performance, particularly in complex documents characterized by diverse table structures. This work provides more efficient and accurate table detection in semi-supervised settings.
Synthetic Image Verification in the Era of Generative AI: What Works and What Isn't There Yet
Authors: Authors: Diangarti Tariang, Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, Luisa Verdoliva
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this work we present an overview of approaches for the detection and attribution of synthetic images and highlight their strengths and weaknesses. We also point out and discuss hot topics in this field and outline promising directions for future research.
STT: Stateful Tracking with Transformers for Autonomous Driving
Abstract
Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset.
CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification
Authors: Authors: Yuchen Tian, Weixiang Yan, Qian Yang, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma
Subjects: Computation and Language (cs.CL); Software Engineering (cs.SE)
Abstract
Large Language Models (LLMs) have made significant advancements in the field of code generation, offering unprecedented support for automated programming and assisting developers. However, LLMs sometimes generate code that appears plausible but fails to meet the expected requirements or executes incorrectly. This phenomenon of hallucinations in the coding field has not been explored. To advance the community's understanding and research on code hallucinations in LLMs, we propose a definition method for these hallucinations based on execution verification and introduce the concept of code hallucinations for the first time. We categorize code hallucinations into four main types: mapping, naming, resource, and logic hallucinations, each further divided into different subcategories to better understand and address the unique challenges faced by LLMs during code generation. To systematically evaluate code hallucinations, we propose a dynamic detection algorithm for code hallucinations and construct the CodeHalu benchmark, which includes 8,883 samples from 699 tasks, to actively detect hallucination phenomena in LLMs during programming. We tested 16 popular LLMs on this benchmark to evaluate the frequency and nature of their hallucinations during code generation. The findings reveal significant variations in the accuracy and reliability of LLMs in generating code, highlighting the urgent need to improve models and training methods to ensure the functional correctness and safety of automatically generated code. This study not only classifies and quantifies code hallucinations but also provides insights for future improvements in LLM-based code generation research. The CodeHalu benchmark and code are publicly available at https://github.com/yuchen814/CodeHalu.
The Reversing Machine: Reconstructing Memory Assumptions
Authors: Authors: Mohammad Sina Karvandi, Soroush Meghdadizanjani, Sima Arasteh, Saleh Khalaj Monfared, Mohammad K. Fallah, Saeid Gorgin, Jeong-A Lee, Erik van der Kouwe
Abstract
Existing anti-malware software and reverse engineering toolkits struggle with stealthy sub-OS rootkits due to limitations of run-time kernel-level monitoring. A malicious kernel-level driver can bypass OS-level anti-virus mechanisms easily. Although static analysis of such malware is possible, obfuscation and packing techniques complicate offline analysis. Moreover, current dynamic analyzers suffer from virtualization performance overhead and create detectable traces that allow modern malware to evade them. To address these issues, we present \textit{The Reversing Machine} (TRM), a new hypervisor-based memory introspection design for reverse engineering, reconstructing memory offsets, and fingerprinting evasive and obfuscated user-level and kernel-level malware. TRM proposes two novel techniques that enable efficient and transparent analysis of evasive malware: hooking a binary using suspended process creation for hypervisor-based memory introspection, and leveraging Mode-Based Execution Control (MBEC) to detect user/kernel mode transitions and memory access patterns. Unlike existing malware detection environments, TRM can extract full memory traces in user and kernel spaces and hook the entire target memory map to reconstruct arrays, structures within the operating system, and possible rootkits. We perform TRM-assisted reverse engineering of kernel-level structures and show that it can speed up manual reverse engineering by 75\% on average. We obfuscate known malware with the latest packing tools and successfully perform similarity detection. Furthermore, we demonstrate a real-world attack by deploying a modified rootkit onto a driver that bypasses state-of-the-art security auditing tools. We show that TRM can detect each threat and that, out of 24 state-of-the-art AV solutions, only TRM can detect the most advanced threats.
Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis
Authors: Authors: Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper investigates the effectiveness of self-supervised pre-trained transformers compared to supervised pre-trained transformers and conventional neural networks (ConvNets) for detecting various types of deepfakes. We focus on their potential for improved generalization, particularly when training data is limited. Despite the notable success of large vision-language models utilizing transformer architectures in various tasks, including zero-shot and few-shot learning, the deepfake detection community has still shown some reluctance to adopt pre-trained vision transformers (ViTs), especially large ones, as feature extractors. One concern is their perceived excessive capacity, which often demands extensive data, and the resulting suboptimal generalization when training or fine-tuning data is small or less diverse. This contrasts poorly with ConvNets, which have already established themselves as robust feature extractors. Additionally, training and optimizing transformers from scratch requires significant computational resources, making this accessible primarily to large companies and hindering broader investigation within the academic community. Recent advancements in using self-supervised learning (SSL) in transformers, such as DINO and its derivatives, have showcased significant adaptability across diverse vision tasks and possess explicit semantic segmentation capabilities. By leveraging DINO for deepfake detection with modest training data and implementing partial fine-tuning, we observe comparable adaptability to the task and the natural explainability of the detection result via the attention mechanism. Moreover, partial fine-tuning of transformers for deepfake detection offers a more resource-efficient alternative, requiring significantly fewer computational resources.
Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol
Authors: Authors: Konstantinos Apostolidis, Jakob Abesser, Luca Cuccovillo, Vasileios Mezaris
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them. To facilitate further research and provide a common evaluation platform, we introduce an experimental protocol and a benchmark dataset simulating such inconsistencies. Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection, highlighting its potential in content verification applications.
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models
Abstract
Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well.
Certified Adversarial Robustness of Machine Learning-based Malware Detectors via (De)Randomized Smoothing
Authors: Authors: Daniel Gibert, Luca Demetrio, Giulio Zizzo, Quan Le, Jordi Planes, Battista Biggio
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
Deep learning-based malware detection systems are vulnerable to adversarial EXEmples - carefully-crafted malicious programs that evade detection with minimal perturbation. As such, the community is dedicating effort to develop mechanisms to defend against adversarial EXEmples. However, current randomized smoothing-based defenses are still vulnerable to attacks that inject blocks of adversarial content. In this paper, we introduce a certifiable defense against patch attacks that guarantees, for a given executable and an adversarial patch size, no adversarial EXEmple exist. Our method is inspired by (de)randomized smoothing which provides deterministic robustness certificates. During training, a base classifier is trained using subsets of continguous bytes. At inference time, our defense splits the executable into non-overlapping chunks, classifies each chunk independently, and computes the final prediction through majority voting to minimize the influence of injected content. Furthermore, we introduce a preprocessing step that fixes the size of the sections and headers to a multiple of the chunk size. As a consequence, the injected content is confined to an integer number of chunks without tampering the other chunks containing the real bytes of the input examples, allowing us to extend our certified robustness guarantees to content insertion attacks. We perform an extensive ablation study, by comparing our defense with randomized smoothing-based defenses against a plethora of content manipulation attacks and neural network architectures. Results show that our method exhibits unmatched robustness against strong content-insertion attacks, outperforming randomized smoothing-based defenses in the literature.
Abstract
As a natural extension to the standard conformal prediction method, several conformal risk control methods have been recently developed and applied to various learning problems. In this work, we seek to control the conformal risk in expectation for ordinal classification tasks, which have broad applications to many real problems. For this purpose, we firstly formulated the ordinal classification task in the conformal risk control framework, and provided theoretic risk bounds of the risk control method. Then we proposed two types of loss functions specially designed for ordinal classification tasks, and developed corresponding algorithms to determine the prediction set for each case to control their risks at a desired level. We demonstrated the effectiveness of our proposed methods, and analyzed the difference between the two types of risks on three different datasets, including a simulated dataset, the UTKFace dataset and the diabetic retinopathy detection dataset.
Detection of ransomware attacks using federated learning based on the CNN model
Abstract
Computing is still under a significant threat from ransomware, which necessitates prompt action to prevent it. Ransomware attacks can have a negative impact on how smart grids, particularly digital substations. In addition to examining a ransomware detection method using artificial intelligence (AI), this paper offers a ransomware attack modeling technique that targets the disrupted operation of a digital substation. The first, binary data is transformed into image data and fed into the convolution neural network model using federated learning. The experimental findings demonstrate that the suggested technique detects ransomware with a high accuracy rate.
On the Potential of RIS in the Context of PLA in Wireless Communication Systems
Authors: Authors: Hama Amin, Waqas Aman, Saif Al-Kuwari
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)
Abstract
Re-configurable Intelligent Surfaces (RIS) technology has proven itself a promising candidate for the next generation of wireless networks through its enhanced performance in terms of throughput, spectral, and energy efficiency. However, the broadcast nature of RIS-assisted wireless communication makes it vulnerable to malicious attacks at the physical layer. On the other hand, physical layer authentication is an emerging area in the security domain to thwart different attacks such as cloning, spoofing, and impersonation by using the random features of the physical layer. In this paper, we investigate RIS-assisted wireless communication systems to unlock the potential of using RIS for physical layer authentication (PLA). Specifically, we exploit two distinct features of the physical layer: pathloss and channel impulse response (CIR) for PLA in RIS-assisted wireless communication. We construct hypothesis tests for the estimated features and derive the closed-form errors' expressions. Further, we chose the critical error, i.e., missed detection as our objective function for minimization by optimizing the phase shift of the RIS pannel. We compare the performance of our proposed mechanisms with baseline mechanisms which are PLA schemes using the same features but with no RIS assistance. Furthermore, we thoroughly evaluate our proposed schemes using performance metrics such as the probability of false alarm (PFA), the probability of missed detection (PMD), and the receiver operating characteristic (ROC) curves. The results demonstrate the significant positive impact of RIS on PLA, as it effectively reduces PMD values to zero when determining the optimal phase shift.
CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection
Authors: Authors: Shihan Dou, Yueming Wu, Haoxiang Jia, Yuhao Zhou, Yan Liu, Yang Liu
Abstract
With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement). Therefore, it is important to conduct code clone detection to discover similar code pairs. Many approaches have been proposed to detect code clones where token-based tools can scale to big code. However, due to the lack of program details, they cannot handle more complicated code clones, i.e., semantic code clones. In this paper, we introduce CC2Vec, a novel code encoding method designed to swiftly identify simple code clones while also enhancing the capability for semantic code clone detection. To retain the program details between tokens, CC2Vec divides them into different categories (i.e., typed tokens) according to the syntactic types and then applies two self-attention mechanism layers to encode them. To resist changes in the code structure of semantic code clones, CC2Vec performs contrastive learning to reduce the differences introduced by different code implementations. We evaluate CC2Vec on two widely used datasets (i.e., BigCloneBench and Google Code Jam) and the results report that our method can effectively detect simple code clones. In addition, CC2Vec not only attains comparable performance to widely used semantic code clone detection systems such as ASTNN, SCDetector, and FCCA by simply fine-tuning, but also significantly surpasses these methods in both detection efficiency.
In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol
Abstract
As deep generative models advance, we anticipate deepfakes achieving "perfection"-generating no discernible artifacts or noise. However, current deepfake detectors, intentionally or inadvertently, rely on such artifacts for detection, as they are exclusive to deepfakes and absent in genuine examples. To bridge this gap, we introduce the Rebalanced Deepfake Detection Protocol (RDDP) to stress-test detectors under balanced scenarios where genuine and forged examples bear similar artifacts. We offer two RDDP variants: RDDP-WHITEHAT uses white-hat deepfake algorithms to create 'self-deepfakes,' genuine portrait videos with the resemblance of the underlying identity, yet carry similar artifacts to deepfake videos; RDDP-SURROGATE employs surrogate functions (e.g., Gaussian noise) to process both genuine and forged examples, introducing equivalent noise, thereby sidestepping the need of deepfake algorithms. Towards detecting perfect deepfake videos that aligns with genuine ones, we present ID-Miner, a detector that identifies the puppeteer behind the disguise by focusing on motion over artifacts or appearances. As an identity-based detector, it authenticates videos by comparing them with reference footage. Equipped with the artifact-agnostic loss at frame-level and the identity-anchored loss at video-level, ID-Miner effectively singles out identity signals amidst distracting variations. Extensive experiments comparing ID-Miner with 12 baseline detectors under both conventional and RDDP evaluations with two deepfake datasets, along with additional qualitative studies, affirm the superiority of our method and the necessity for detectors designed to counter perfect deepfakes.
Deep Metric Learning-Based Out-of-Distribution Detection with Synthetic Outlier Exposure
Authors: Authors: Assefa Seyoum Wahd
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we present a novel approach that combines deep metric learning and synthetic data generation using diffusion models for out-of-distribution (OOD) detection. One popular approach for OOD detection is outlier exposure, where models are trained using a mixture of in-distribution (ID) samples and ``seen" OOD samples. For the OOD samples, the model is trained to minimize the KL divergence between the output probability and the uniform distribution while correctly classifying the in-distribution (ID) data. In this paper, we propose a label-mixup approach to generate synthetic OOD data using Denoising Diffusion Probabilistic Models (DDPMs). Additionally, we explore recent advancements in metric learning to train our models. In the experiments, we found that metric learning-based loss functions perform better than the softmax. Furthermore, the baseline models (including softmax, and metric learning) show a significant improvement when trained with the generated OOD data. Our approach outperforms strong baselines in conventional OOD detection metrics.
Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models
Authors: Authors: Colton R. Crum, Samuel Webster, Adam Czajka
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.
Keyword: face recognition
Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion
Authors: Authors: David Geissbühler, Hatef Otroshi Shahreza, Sébastien Marcel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Face Recognition (FR) models are trained on large-scale datasets, which have privacy and ethical concerns. Lately, the use of synthetic data to complement or replace genuine data for the training of FR models has been proposed. While promising results have been obtained, it still remains unclear if generative models can yield diverse enough data for such tasks. In this work, we introduce a new method, inspired by the physical motion of soft particles subjected to stochastic Brownian forces, allowing us to sample identities distributions in a latent space under various constraints. With this in hands, we generate several face datasets and benchmark them by training FR models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-art diffusion-based synthetic datasets. We also show that this method can be used to mitigate leakage from the generator's training set and explore the ability of generative models to generate data beyond it.
Keyword: augmentation
Transforming Dutch: Debiasing Dutch Coreference Resolution Systems for Non-binary Pronouns
Authors: Authors: Goya van Boven, Yupei Du, Dong Nguyen
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Gender-neutral pronouns are increasingly being introduced across Western languages. Recent evaluations have however demonstrated that English NLP systems are unable to correctly process gender-neutral pronouns, with the risk of erasing and misgendering non-binary individuals. This paper examines a Dutch coreference resolution system's performance on gender-neutral pronouns, specifically hen and die. In Dutch, these pronouns were only introduced in 2016, compared to the longstanding existence of singular they in English. We additionally compare two debiasing techniques for coreference resolution systems in non-binary contexts: Counterfactual Data Augmentation (CDA) and delexicalisation. Moreover, because pronoun performance can be hard to interpret from a general evaluation metric like LEA, we introduce an innovative evaluation metric, the pronoun score, which directly represents the portion of correctly processed pronouns. Our results reveal diminished performance on gender-neutral pronouns compared to gendered counterparts. Nevertheless, although delexicalisation fails to yield improvements, CDA substantially reduces the performance gap between gendered and gender-neutral pronouns. We further show that CDA remains effective in low-resource settings, in which a limited set of debiasing documents is used. This efficacy extends to previously unseen neopronouns, which are currently infrequently used but may gain popularity in the future, underscoring the viability of effective debiasing with minimal resources and low computational costs.
Re-visiting Skip-Gram Negative Sampling: Dimension Regularization for More Efficient Dissimilarity Preservation in Graph Embeddings
Authors: Authors: David Liu, Arjun Seshadri, Tina Eliassi-Rad, Johan Ugander
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Abstract
A wide range of graph embedding objectives decompose into two components: one that attracts the embeddings of nodes that are perceived as similar, and another that repels embeddings of nodes that are perceived as dissimilar. Because real-world graphs are sparse and the number of dissimilar pairs grows quadratically with the number of nodes, Skip-Gram Negative Sampling (SGNS) has emerged as a popular and efficient repulsion approach. SGNS repels each node from a sample of dissimilar nodes, as opposed to all dissimilar nodes. In this work, we show that node-wise repulsion is, in aggregate, an approximate re-centering of the node embedding dimensions. Such dimension operations are much more scalable than node operations. The dimension approach, in addition to being more efficient, yields a simpler geometric interpretation of the repulsion. Our result extends findings from the self-supervised learning literature to the skip-gram model, establishing a connection between skip-gram node contrast and dimension regularization. We show that in the limit of large graphs, under mild regularity conditions, the original node repulsion objective converges to optimization with dimension regularization. We use this observation to propose an algorithm augmentation framework that speeds up any existing algorithm, supervised or unsupervised, using SGNS. The framework prioritizes node attraction and replaces SGNS with dimension regularization. We instantiate this generic framework for LINE and node2vec and show that the augmented algorithms preserve downstream performance while dramatically increasing efficiency.
A Framework for Approximation Schemes on Knapsack and Packing Problems of Hyperspheres and Fat Objects
Abstract
Geometric packing problems have been investigated for centuries in mathematics. In contrast, works on sphere packing in the field of approximation algorithms are scarce. Most results are for squares and rectangles, and their d-dimensional counterparts. To help fill this gap, we present a framework that yields approximation schemes for the geometric knapsack problem as well as other packing problems and some generalizations, and that supports not only hyperspheres but also a wide range of shapes for the items and the bins. Our first result is a PTAS for the hypersphere multiple knapsack problem. In fact, we can deal with a more generalized version of the problem that contains additional constraints on the items. These constraints, under some conditions, can encompass very common and pertinent constraints such as conflict constraints, multiple-choice constraints, and capacity constraints. Our second result is a resource augmentation scheme for the multiple knapsack problem for a wide range of convex fat objects, which are not restricted to polygons and polytopes. Examples are ellipsoids, rhombi, hypercubes, hyperspheres under the Lp-norm, etc. Also, for the generalized version of the multiple knapsack problem, our technique still yields a PTAS under resource augmentation for these objects. Thirdly, we improve the resource augmentation schemes of fat objects to allow rotation on the objects by any angle. This result, in particular, brings something extra to our framework, since most results comprising such general objects are limited to translations. At last, our framework is able to contemplate other problems such as the cutting stock problem, the minimum-size bin packing problem and the multiple strip packing problem.
Data Augmentation Policy Search for Long-Term Forecasting
Abstract
Data augmentation serves as a popular regularization technique to combat overfitting challenges in neural networks. While automatic augmentation has demonstrated success in image classification tasks, its application to time-series problems, particularly in long-term forecasting, has received comparatively less attention. To address this gap, we introduce a time-series automatic augmentation approach named TSAA, which is both efficient and easy to implement. The solution involves tackling the associated bilevel optimization problem through a two-step process: initially training a non-augmented model for a limited number of epochs, followed by an iterative split procedure. During this iterative process, we alternate between identifying a robust augmentation policy through Bayesian optimization and refining the model while discarding suboptimal runs. Extensive evaluations on challenging univariate and multivariate forecasting benchmark problems demonstrate that TSAA consistently outperforms several robust baselines, suggesting its potential integration into prediction pipelines.
Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation
Authors: Authors: Yoori Oh, Yoseob Han, Kyogu Lee
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different audio samples. Therefore, under many-to-one mapping conditions, audio-text datasets lead to poor performance of retrieval tasks. In this paper, we propose a novel approach to tackle the data imbalance problem in audio-language retrieval task. To overcome the limitation, we introduce a method that employs a distance sampling-based paraphraser leveraging ChatGPT, utilizing distance function to generate a controllable distribution of manipulated text data. For a set of sentences with the same context, the distance is used to calculate a degree of manipulation for any two sentences, and ChatGPT's few-shot prompting is performed using a text cluster with a similar distance defined by the Jaccard similarity. Therefore, ChatGPT, when applied to few-shot prompting with text clusters, can adjust the diversity of the manipulated text based on the distance. The proposed approach is shown to significantly enhance performance in audio-text retrieval, outperforming conventional text augmentation techniques.
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Abstract
Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issues, we propose a novel framework for expressive voice conversion called "SAVC" based on soft speech units from HuBert-soft. Taking soft speech units as input, we design an attribute encoder to extract content and prosody features respectively. Specifically, we first introduce statistic perturbation imposed by adversarial style augmentation to eliminate speaker information. Then the prosody is implicitly modeled on soft speech units with knowledge distillation. Experiment results show that the intelligibility and naturalness of converted speech outperform previous work.
Self-Play Preference Optimization for Language Model Alignment
Abstract
Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed \textit{Self-Play Preference Optimization} (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models.
Keyword: detection
Steel Plate Fault Detection using the Fitness Dependent Optimizer and Neural Networks
Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach
SegNet: A Segmented Deep Learning based Convolutional Neural Network Approach for Drones Wildfire Detection
Research and application of artificial intelligence based webshell detection model: A literature review
Successive Interference Cancellation for ISAC in a Large Full-Duplex Cellular Network
Graph Neural Network Approach to Semantic Type Detection in Tables
Greater benefits of deep learning-based computer-aided detection systems for finding small signals in 3D volumetric medical images
Logical analysis and contradiction detection in high-level requirements during the review process using sat-solver
Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer
Synthetic Image Verification in the Era of Generative AI: What Works and What Isn't There Yet
STT: Stateful Tracking with Transformers for Autonomous Driving
CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification
The Reversing Machine: Reconstructing Memory Assumptions
Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis
Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models
Certified Adversarial Robustness of Machine Learning-based Malware Detectors via (De)Randomized Smoothing
Conformal Risk Control for Ordinal Classification
Detection of ransomware attacks using federated learning based on the CNN model
On the Potential of RIS in the Context of PLA in Wireless Communication Systems
CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection
In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol
Deep Metric Learning-Based Out-of-Distribution Detection with Synthetic Outlier Exposure
Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models
Keyword: face recognition
Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion
Keyword: augmentation
Transforming Dutch: Debiasing Dutch Coreference Resolution Systems for Non-binary Pronouns
Re-visiting Skip-Gram Negative Sampling: Dimension Regularization for More Efficient Dissimilarity Preservation in Graph Embeddings
A Framework for Approximation Schemes on Knapsack and Packing Problems of Hyperspheres and Fat Objects
Data Augmentation Policy Search for Long-Term Forecasting
Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Self-Play Preference Optimization for Language Model Alignment