Abstract
We propose and analyze an adaptive adversary that can retrain a Trojaned DNN and is also aware of SOTA output-based Trojaned model detectors. We show that such an adversary can ensure (1) high accuracy on both trigger-embedded and clean samples and (2) bypass detection. Our approach is based on an observation that the high dimensionality of the DNN parameters provides sufficient degrees of freedom to simultaneously achieve these objectives. We also enable SOTA detectors to be adaptive by allowing retraining to recalibrate their parameters, thus modeling a co-evolution of parameters of a Trojaned model and detectors. We then show that this co-evolution can be modeled as an iterative game, and prove that the resulting (optimal) solution of this interactive game leads to the adversary successfully achieving the above objectives. In addition, we provide a greedy algorithm for the adversary to select a minimum number of input samples for embedding triggers. We show that for cross-entropy or log-likelihood loss functions used by the DNNs, the greedy algorithm provides provable guarantees on the needed number of trigger-embedded input samples. Extensive experiments on four diverse datasets -- MNIST, CIFAR-10, CIFAR-100, and SpeechCommand -- reveal that the adversary effectively evades four SOTA output-based Trojaned model detectors: MNTD, NeuralCleanse, STRIP, and TABOR.
Unveiling Hidden Energy Anomalies: Harnessing Deep Learning to Optimize Energy Management in Sports Facilities
Abstract
Anomaly detection in sport facilities has gained significant attention due to its potential to promote energy saving and optimizing operational efficiency. In this research article, we investigate the role of machine learning, particularly deep learning, in anomaly detection for sport facilities. We explore the challenges and perspectives of utilizing deep learning methods for this task, aiming to address the drawbacks and limitations of conventional approaches. Our proposed approach involves feature extraction from the data collected in sport facilities. We present a problem formulation using Deep Feedforward Neural Networks (DFNN) and introduce threshold estimation techniques to identify anomalies effectively. Furthermore, we propose methods to reduce false alarms, ensuring the reliability and accuracy of anomaly detection. To evaluate the effectiveness of our approach, we conduct experiments on aquatic center dataset at Qatar University. The results demonstrate the superiority of our deep learning-based method over conventional techniques, highlighting its potential in real-world applications. Typically, 94.33% accuracy and 92.92% F1-score have been achieved using the proposed scheme.
Automated detection of motion artifacts in brain MR images using deep learning and explainable artificial intelligence
Authors: Authors: Marina Manso Jimeno, Keerthi Sravan Ravi, Maggie Fung, John Thomas Vaughan, Jr., Sairam Geethanath
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Quality assessment, including inspecting the images for artifacts, is a critical step during MRI data acquisition to ensure data quality and downstream analysis or interpretation success. This study demonstrates a deep learning model to detect rigid motion in T1-weighted brain images. We leveraged a 2D CNN for three-class classification and tested it on publicly available retrospective and prospective datasets. Grad-CAM heatmaps enabled the identification of failure modes and provided an interpretation of the model's results. The model achieved average precision and recall metrics of 85% and 80% on six motion-simulated retrospective datasets. Additionally, the model's classifications on the prospective dataset showed a strong inverse correlation (-0.84) compared to average edge strength, an image quality metric indicative of motion. This model is part of the ArtifactID tool, aimed at inline automatic detection of Gibbs ringing, wrap-around, and motion artifacts. This tool automates part of the time-consuming QA process and augments expertise on-site, particularly relevant in low-resource settings where local MR knowledge is scarce.
Learning-based Bone Quality Classification Method for Spinal Metastasis
Abstract
Spinal metastasis is the most common disease in bone metastasis and may cause pain, instability and neurological injuries. Early detection of spinal metastasis is critical for accurate staging and optimal treatment. The diagnosis is usually facilitated with Computed Tomography (CT) scans, which requires considerable efforts from well-trained radiologists. In this paper, we explore a learning-based automatic bone quality classification method for spinal metastasis based on CT images. We simultaneously take the posterolateral spine involvement classification task into account, and employ multi-task learning (MTL) technique to improve the performance. MTL acts as a form of inductive bias which helps the model generalize better on each task by sharing representations between related tasks. Based on the prior knowledge that the mixed type can be viewed as both blastic and lytic, we model the task of bone quality classification as two binary classification sub-tasks, i.e., whether blastic and whether lytic, and leverage a multiple layer perceptron to combine their predictions. In order to make the model more robust and generalize better, self-paced learning is adopted to gradually involve from easy to more complex samples into the training process. The proposed learning-based method is evaluated on a proprietary spinal metastasis CT dataset. At slice level, our method significantly outperforms an 121-layer DenseNet classifier in sensitivities by $+12.54\%$, $+7.23\%$ and $+29.06\%$ for blastic, mixed and lytic lesions, respectively, meanwhile $+12.33\%$, $+23.21\%$ and $+34.25\%$ at vertebrae level.
Graph Inference Acceleration by Learning MLPs on Graphs without Supervision
Authors: Authors: Zehong Wang, Zheyuan Zhang, Chuxu Zhang, Yanfang Ye
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract
Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph learning tasks, yet their reliance on message-passing constraints their deployment in latency-sensitive applications such as financial fraud detection. Recent works have explored distilling knowledge from GNNs to Multi-Layer Perceptrons (MLPs) to accelerate inference. However, this task-specific supervised distillation limits generalization to unseen nodes, which are prevalent in latency-sensitive applications. To this end, we present \textbf{\textsc{SimMLP}}, a \textbf{\textsc{Sim}}ple yet effective framework for learning \textbf{\textsc{MLP}}s on graphs without supervision, to enhance generalization. \textsc{SimMLP} employs self-supervised alignment between GNNs and MLPs to capture the fine-grained and generalizable correlation between node features and graph structures, and proposes two strategies to alleviate the risk of trivial solutions. Theoretically, we comprehensively analyze \textsc{SimMLP} to demonstrate its equivalence to GNNs in the optimal case and its generalization capability. Empirically, \textsc{SimMLP} outperforms state-of-the-art baselines, especially in settings with unseen nodes. In particular, it obtains significant performance gains {\bf (7$\sim$26\%)} over MLPs and inference acceleration over GNNs {\bf (90$\sim$126$\times$)} on large-scale graph datasets. Our codes are available at: \url{https://github.com/Zehong-Wang/SimMLP}.
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Abstract
Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.
Evaluating DTW Measures via a Synthesis Framework for Time-Series Data
Abstract
Time-series data originate from various applications that describe specific observations or quantities of interest over time. Their analysis often involves the comparison across different time-series data sequences, which in turn requires the alignment of these sequences. Dynamic Time Warping (DTW) is the standard approach to achieve an optimal alignment between two temporal signals. Different variations of DTW have been proposed to address various needs for signal alignment or classifications. However, a comprehensive evaluation of their performance in these time-series data processing tasks is lacking. Most DTW measures perform well on certain types of time-series data without a clear explanation of the reason. To address that, we propose a synthesis framework to model the variation between two time-series data sequences for comparison. Our synthesis framework can produce a realistic initial signal and deform it with controllable variations that mimic real-world scenarios. With this synthesis framework, we produce a large number of time-series sequence pairs with different but known variations, which are used to assess the performance of a number of well-known DTW measures for the tasks of alignment and classification. We report their performance on different variations and suggest the proper DTW measure to use based on the type of variations between two time-series sequences. This is the first time such a guideline is presented for selecting a proper DTW measure. To validate our conclusion, we apply our findings to real-world applications, i.e., the detection of the formation top for the oil and gas industry and the pattern search in streamlines for flow visualization.
Research and application of Transformer based anomaly detection model: A literature review
Abstract
Transformer, as one of the most advanced neural network models in Natural Language Processing (NLP), exhibits diverse applications in the field of anomaly detection. To inspire research on Transformer-based anomaly detection, this review offers a fresh perspective on the concept of anomaly detection. We explore the current challenges of anomaly detection and provide detailed insights into the operating principles of Transformer and its variants in anomaly detection tasks. Additionally, we delineate various application scenarios for Transformer-based anomaly detection models and discuss the datasets and evaluation metrics employed. Furthermore, this review highlights the key challenges in Transformer-based anomaly detection research and conducts a comprehensive analysis of future research trends in this domain. The review includes an extensive compilation of over 100 core references related to Transformer-based anomaly detection. To the best of our knowledge, this is the first comprehensive review that focuses on the research related to Transformer in the context of anomaly detection. We hope that this paper can provide detailed technical information to researchers interested in Transformer-based anomaly detection tasks.
OmniBOR: A System for Automatic, Verifiable Artifact Resolution across Software Supply Chains
Authors: Authors: Bharathi Seshadri, Yongkui Han, Chris Olson, David Pollak, Vojislav Tomasevic
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Abstract
Software supply chain attacks, which exploit the build process or artifacts used in the process of building a software product, are increasingly of concern. To combat these attacks, one must be able to check that every artifact that a software product depends on does not contain vulnerabilities. In this paper, we introduce OmniBOR, (Universal Bill of Receipts) a minimalistic scheme for build tools to create an artifact dependency graph which can be used to track every software artifact incorporated into a built software product. We present the architecture of OmniBOR, the underlying data representations, and two implementations that produce OmniBOR data and embed an OmniBOR Identifier into built software, including a compiler-based approach and one based on tracing the build process. We demonstrate the efficacy of this approach on benchmarks including a Linux distribution for applications such as Common Vulnerabilities and Exposures (CVE) detection and software bill of materials (SBOM) computation.
Detecting Adversarial Spectrum Attacks via Distance to Decision Boundary Statistics
Authors: Authors: Wenwei Zhao, Xiaowen Li, Shangqing Zhao, Jie Xu, Yao Liu, Zhuo Lu
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Abstract
Machine learning has been adopted for efficient cooperative spectrum sensing. However, it incurs an additional security risk due to attacks leveraging adversarial machine learning to create malicious spectrum sensing values to deceive the fusion center, called adversarial spectrum attacks. In this paper, we propose an efficient framework for detecting adversarial spectrum attacks. Our design leverages the concept of the distance to the decision boundary (DDB) observed at the fusion center and compares the training and testing DDB distributions to identify adversarial spectrum attacks. We create a computationally efficient way to compute the DDB for machine learning based spectrum sensing systems. Experimental results based on realistic spectrum data show that our method, under typical settings, achieves a high detection rate of up to 99\% and maintains a low false alarm rate of less than 1\%. In addition, our method to compute the DDB based on spectrum data achieves 54\%--64\% improvements in computational efficiency over existing distance calculation methods. The proposed DDB-based detection framework offers a practical and efficient solution for identifying malicious sensing values created by adversarial spectrum attacks.
Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection
Abstract
The growing importance of multi-modal humor detection within affective computing correlates with the expanding influence of short-form video sharing on social media platforms. In this paper, we propose a novel two-branch hierarchical model for short-form video humor detection (SVHD), named Comment-aided Video-Language Alignment (CVLA) via data-augmented multi-modal contrastive pre-training. Notably, our CVLA not only operates on raw signals across various modal channels but also yields an appropriate multi-modal representation by aligning the video and language components within a consistent semantic space. The experimental results on two humor detection datasets, including DY11k and UR-FUNNY, demonstrate that CVLA dramatically outperforms state-of-the-art and several competitive baseline approaches. Our dataset, code and model release at https://github.com/yliu-cs/CVLA.
Solid Waste Detection in Remote Sensing Images: A Survey
Authors: Authors: Piero Fraternali, Luca Morandini, Sergio Luis Herrera González
Abstract
The detection and characterization of illegal solid waste disposal sites are essential for environmental protection, particularly for mitigating pollution and health hazards. Improperly managed landfills contaminate soil and groundwater via rainwater infiltration, posing threats to both animals and humans. Traditional landfill identification approaches, such as on-site inspections, are time-consuming and expensive. Remote sensing is a cost-effective solution for the identification and monitoring of solid waste disposal sites that enables broad coverage and repeated acquisitions over time. Earth Observation (EO) satellites, equipped with an array of sensors and imaging capabilities, have been providing high-resolution data for several decades. Researchers proposed specialized techniques that leverage remote sensing imagery to perform a range of tasks such as waste site detection, dumping site monitoring, and assessment of suitable locations for new landfills. This review aims to provide a detailed illustration of the most relevant proposals for the detection and monitoring of solid waste sites by describing and comparing the approaches, the implemented techniques, and the employed data. Furthermore, since the data sources are of the utmost importance for developing an effective solid waste detection model, a comprehensive overview of the satellites and publicly available data sets is presented. Finally, this paper identifies the open issues in the state-of-the-art and discusses the relevant research directions for reducing the costs and improving the effectiveness of novel solid waste detection methods.
Detection Latencies of Anomaly Detectors: An Overlooked Perspective ?
Authors: Authors: Tommaso Puccetti, Andrea Ceccarelli
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
The ever-evolving landscape of attacks, coupled with the growing complexity of ICT systems, makes crafting anomaly-based intrusion detectors (ID) and error detectors (ED) a difficult task: they must accurately detect attacks, and they should promptly perform detections. Although improving and comparing the detection capability is the focus of most research works, the timeliness of the detection is less considered and often insufficiently evaluated or discussed. In this paper, we argue the relevance of measuring the temporal latency of attacks and errors, and we propose an evaluation approach for detectors to ensure a pragmatic trade-off between correct and in-time detection. Briefly, the approach relates the false positive rate with the temporal latency of attacks and errors, and this ultimately leads to guidelines for configuring a detector. We apply our approach by evaluating different ED and ID solutions in two industrial cases: i) an embedded railway on-board system that optimizes public mobility, and ii) an edge device for the Industrial Internet of Things. Our results show that considering latency in addition to traditional metrics like the false positive rate, precision, and coverage gives an additional fundamental perspective on the actual performance of the detector and should be considered when assessing and configuring anomaly detectors.
Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit Clues
Authors: Authors: Zhiyuan Chang, Mingyang Li, Yi Liu, Junjie Wang, Qing Wang, Yang Liu
Abstract
With the development of LLMs, the security threats of LLMs are getting more and more attention. Numerous jailbreak attacks have been proposed to assess the security defense of LLMs. Current jailbreak attacks primarily utilize scenario camouflage techniques. However their explicitly mention of malicious intent will be easily recognized and defended by LLMs. In this paper, we propose an indirect jailbreak attack approach, Puzzler, which can bypass the LLM's defense strategy and obtain malicious response by implicitly providing LLMs with some clues about the original malicious query. In addition, inspired by the wisdom of ''When unable to attack, defend'' from Sun Tzu's Art of War, we adopt a defensive stance to gather clues about the original malicious query through LLMs. Extensive experimental results show that Puzzler achieves a query success rate of 96.6% on closed-source LLMs, which is 57.9%-82.7% higher than baselines. Furthermore, when tested against the state-of-the-art jailbreak detection approaches, Puzzler proves to be more effective at evading detection compared to baselines.
Unity is Strength: Enhancing Precision in Reentrancy Vulnerability Detection of Smart Contract Analysis Tools
Abstract
Reentrancy is one of the most notorious vulnerabilities in smart contracts, resulting in significant digital asset losses. However, many previous works indicate that current Reentrancy detection tools suffer from high false positive rates. Even worse, recent years have witnessed the emergence of new Reentrancy attack patterns fueled by intricate and diverse vulnerability exploit mechanisms. Unfortunately, current tools face a significant limitation in their capacity to adapt and detect these evolving Reentrancy patterns. Consequently, ensuring precise and highly extensible Reentrancy vulnerability detection remains critical challenges for existing tools. To address this issue, we propose a tool named ReEP, designed to reduce the false positives for Reentrancy vulnerability detection. Additionally, ReEP can integrate multiple tools, expanding its capacity for vulnerability detection. It evaluates results from existing tools to verify vulnerability likelihood and reduce false positives. ReEP also offers excellent extensibility, enabling the integration of different detection tools to enhance precision and cover different vulnerability attack patterns. We perform ReEP to eight existing state-of-the-art Reentrancy detection tools. The average precision of these eight tools increased from the original 0.5% to 73% without sacrificing recall. Furthermore, ReEP exhibits robust extensibility. By integrating multiple tools, the precision further improved to a maximum of 83.6%. These results demonstrate that ReEP effectively unites the strengths of existing works, enhances the precision of Reentrancy vulnerability detection tools.
Exploring the Adversarial Capabilities of Large Language Models
Authors: Authors: Lukas Struppek, Minh Hieu Le, Dominik Hintersdorf, Kristian Kersting
Abstract
The proliferation of large language models (LLMs) has sparked widespread and general interest due to their strong language generation capabilities, offering great potential for both industry and research. While previous research delved into the security and privacy issues of LLMs, the extent to which these models can exhibit adversarial behavior remains largely unexplored. Addressing this gap, we investigate whether common publicly available LLMs have inherent capabilities to perturb text samples to fool safety measures, so-called adversarial examples resp.~attacks. More specifically, we investigate whether LLMs are inherently able to craft adversarial examples out of benign samples to fool existing safe rails. Our experiments, which focus on hate speech detection, reveal that LLMs succeed in finding adversarial perturbations, effectively undermining hate speech detection systems. Our findings carry significant implications for (semi-)autonomous systems relying on LLMs, highlighting potential challenges in their interaction with existing systems and safety measures.
Advancing NLP Models with Strategic Text Augmentation: A Comprehensive Study of Augmentation Methods and Curriculum Strategies
Authors: Authors: Himmet Toprak Kesgin, Mehmet Fatih Amasyali
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
This study conducts a thorough evaluation of text augmentation techniques across a variety of datasets and natural language processing (NLP) tasks to address the lack of reliable, generalized evidence for these methods. It examines the effectiveness of these techniques in augmenting training sets to improve performance in tasks such as topic classification, sentiment analysis, and offensive language detection. The research emphasizes not only the augmentation methods, but also the strategic order in which real and augmented instances are introduced during training. A major contribution is the development and evaluation of Modified Cyclical Curriculum Learning (MCCL) for augmented datasets, which represents a novel approach in the field. Results show that specific augmentation methods, especially when integrated with MCCL, significantly outperform traditional training approaches in NLP model performance. These results underscore the need for careful selection of augmentation techniques and sequencing strategies to optimize the balance between speed and quality improvement in various NLP tasks. The study concludes that the use of augmentation methods, especially in conjunction with MCCL, leads to improved results in various classification tasks, providing a foundation for future advances in text augmentation strategies in NLP.
Wireless Crowd Detection for Smart Overtourism Mitigation
Authors: Authors: Tomás Mestre Santos, Rui Neto Marinheiro, Fernando Brito e Abreu
Subjects: Computers and Society (cs.CY); Networking and Internet Architecture (cs.NI)
Abstract
Overtourism occurs when the number of tourists exceeds the carrying capacity of a destination, leading to negative impacts on the environment, culture, and quality of life for residents. By monitoring overtourism, destination managers can identify areas of concern and implement measures to mitigate the negative impacts of tourism while promoting smarter tourism practices. This can help ensure that tourism benefits both visitors and residents while preserving the natural and cultural resources that make these destinations so appealing. This chapter describes a low-cost approach to monitoring overtourism based on mobile devices' wireless activity. A flexible architecture was designed for a smart tourism toolkit to be used by Small and Medium-sized Enterprises (SMEs) in crowding management solutions, to build better tourism services, improve efficiency and sustainability, and reduce the overwhelming feeling of pressure in critical hotspots. The crowding sensors count the number of surrounding mobile devices, by detecting trace elements of wireless technologies, mitigating the effect of MAC address randomization. They run detection programs for several technologies, and fingerprinting analysis results are only stored locally in an anonymized database, without infringing privacy rights. After that edge computing, sensors communicate the crowding information to a cloud server, by using a variety of uplink techniques to mitigate local connectivity limitations, something that has been often disregarded in alternative approaches. Field validation of sensors has been performed on Iscte's campus. Preliminary results show that these sensors can be deployed in multiple scenarios and provide a diversity of spatio-temporal crowding data that can scaffold tourism overcrowding management strategies.
Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling
Authors: Authors: Yuhui Shi, Qiang Sheng, Juan Cao, Hao Mi, Beizhe Hu, Danding Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs' internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.
Switch EMA: A Free Lunch for Better Flatness and Sharpness
Authors: Authors: Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di Wu, Cheng Tan, Tao Lin, Yang Liu, Baigui Sun, Stan Z. Li
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations. This work unveils the full potential of EMA with a single line of modification, i.e., switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA). From both theoretical and empirical aspects, we demonstrate that SEMA can help DNNs to reach generalization optima that better trade-off between flatness and sharpness. To verify the effectiveness of SEMA, we conduct comparison experiments with discriminative, generative, and regression tasks on vision and language datasets, including image classification, self-supervised learning, object detection and segmentation, image generation, video prediction, attribute regression, and language modeling. Comprehensive results with popular optimizers and networks show that SEMA is a free lunch for DNN training by improving performances and boosting convergence speeds.
Efficient One-stage Video Object Detection by Exploiting Temporal Consistency
Authors: Authors: Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recently, one-stage detectors have achieved competitive accuracy and faster speed compared with traditional two-stage detectors on image data. However, in the field of video object detection (VOD), most existing VOD methods are still based on two-stage detectors. Moreover, directly adapting existing VOD methods to one-stage detectors introduces unaffordable computational costs. In this paper, we first analyse the computational bottlenecks of using one-stage detectors for VOD. Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames. Specifically, our method consists of a location-prior network to filter out background regions and a size-prior network to skip unnecessary computations on low-level feature maps for specific frames. We test our method on various modern one-stage detectors and conduct extensive experiments on the ImageNet VID dataset. Excellent experimental results demonstrate the superior effectiveness, efficiency, and compatibility of our method. The code is available at https://github.com/guanxiongsun/vfe.pytorch.
Synthesizing Knowledge-enhanced Features for Real-world Zero-shot Food Detection
Abstract
Food computing brings various perspectives to computer vision like vision-based food analysis for nutrition and health. As a fundamental task in food computing, food detection needs Zero-Shot Detection (ZSD) on novel unseen food objects to support real-world scenarios, such as intelligent kitchens and smart restaurants. Therefore, we first benchmark the task of Zero-Shot Food Detection (ZSFD) by introducing FOWA dataset with rich attribute annotations. Unlike ZSD, fine-grained problems in ZSFD like inter-class similarity make synthesized features inseparable. The complexity of food semantic attributes further makes it more difficult for current ZSD methods to distinguish various food categories. To address these problems, we propose a novel framework ZSFDet to tackle fine-grained problems by exploiting the interaction between complex attributes. Specifically, we model the correlation between food categories and attributes in ZSFDet by multi-source graphs to provide prior knowledge for distinguishing fine-grained features. Within ZSFDet, Knowledge-Enhanced Feature Synthesizer (KEFS) learns knowledge representation from multiple sources (e.g., ingredients correlation from knowledge graph) via the multi-source graph fusion. Conditioned on the fusion of semantic knowledge representation, the region feature diffusion model in KEFS can generate fine-grained features for training the effective zero-shot detector. Extensive evaluations demonstrate the superior performance of our method ZSFDet on FOWA and the widely-used food dataset UECFOOD-256, with significant improvements by 1.8% and 3.7% ZSD mAP compared with the strong baseline RRFS. Further experiments on PASCAL VOC and MS COCO prove that enhancement of the semantic knowledge can also improve the performance on general ZSD. Code and dataset are available at https://github.com/LanceZPF/KEFS.
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
Authors: Authors: Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep video models, for example, 3D CNNs or video transformers, have achieved promising performance on sparse video tasks, i.e., predicting one result per video. However, challenges arise when adapting existing deep video models to dense video tasks, i.e., predicting one result per frame. Specifically, these models are expensive for deployment, less effective when handling redundant frames, and difficult to capture long-range temporal correlations. To overcome these issues, we propose a Temporal Dilated Video Transformer (TDViT) that consists of carefully designed temporal dilated transformer blocks (TDTB). TDTB can efficiently extract spatiotemporal representations and effectively alleviate the negative effect of temporal redundancy. Furthermore, by using hierarchical TDTBs, our approach obtains an exponentially expanded temporal receptive field and therefore can model long-range dynamics. Extensive experiments are conducted on two different dense video benchmarks, i.e., ImageNet VID for video object detection and YouTube VIS for video instance segmentation. Excellent experimental results demonstrate the superior efficiency, effectiveness, and compatibility of our method. The code is available at https://github.com/guanxiongsun/vfe.pytorch.
UR2M: Uncertainty and Resource-Aware Event Detection on Microcontrollers
Authors: Authors: Hong Jia, Young D. Kwon, Dong Ma, Nhat Pham, Lorena Qendro, Tam Vu, Cecilia Mascolo
Abstract
Traditional machine learning techniques are prone to generating inaccurate predictions when confronted with shifts in the distribution of data between the training and testing phases. This vulnerability can lead to severe consequences, especially in applications such as mobile healthcare. Uncertainty estimation has the potential to mitigate this issue by assessing the reliability of a model's output. However, existing uncertainty estimation techniques often require substantial computational resources and memory, making them impractical for implementation on microcontrollers (MCUs). This limitation hinders the feasibility of many important on-device wearable event detection (WED) applications, such as heart attack detection. In this paper, we present UR2M, a novel Uncertainty and Resource-aware event detection framework for MCUs. Specifically, we (i) develop an uncertainty-aware WED based on evidential theory for accurate event detection and reliable uncertainty estimation; (ii) introduce a cascade ML framework to achieve efficient model inference via early exits, by sharing shallower model layers among different event models; (iii) optimize the deployment of the model and MCU library for system efficiency. We conducted extensive experiments and compared UR2M to traditional uncertainty baselines using three wearable datasets. Our results demonstrate that UR2M achieves up to 864% faster inference speed, 857% energy-saving for uncertainty estimation, 55% memory saving on two popular MCUs, and a 22% improvement in uncertainty quantification performance. UR2M can be deployed on a wide range of MCUs, significantly expanding real-time and reliable WED applications.
Personalized Large Language Models
Authors: Authors: Stanisław Woźniak, Bartłomiej Koptyra, Arkadiusz Janz, Przemysław Kazienko, Jan Kocoń
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Large language models (LLMs) have significantly advanced Natural Language Processing (NLP) tasks in recent years. However, their universal nature poses limitations in scenarios requiring personalized responses, such as recommendation systems and chatbots. This paper investigates methods to personalize LLMs, comparing fine-tuning and zero-shot reasoning approaches on subjective tasks. Results demonstrate that personalized fine-tuning improves model reasoning compared to non-personalized models. Experiments on datasets for emotion recognition and hate speech detection show consistent performance gains with personalized methods across different LLM architectures. These findings underscore the importance of personalization for enhancing LLM capabilities in subjective text perception tasks.
Hybrid Machine Learning techniques in the management of harmful algal blooms impact
Authors: Authors: Andres Molares-Ulloa, Daniel Rivero, Jesus Gil Ruiz, Enrique Fernandez-Blanco, Luis de-la-Fuente-Valentín
Abstract
Harmful algal blooms (HABs) are episodes of high concentrations of algae that are potentially toxic for human consumption. Mollusc farming can be affected by HABs because, as filter feeders, they can accumulate high concentrations of marine biotoxins in their tissues. To avoid the risk to human consumption, harvesting is prohibited when toxicity is detected. At present, the closure of production areas is based on expert knowledge and the existence of a predictive model would help when conditions are complex and sampling is not possible. Although the concentration of toxin in meat is the method most commonly used by experts in the control of shellfish production areas, it is rarely used as a target by automatic prediction models. This is largely due to the irregularity of the data due to the established sampling programs. As an alternative, the activity status of production areas has been proposed as a target variable based on whether mollusc meat has a toxicity level below or above the legal limit. This new option is the most similar to the actual functioning of the control of shellfish production areas. For this purpose, we have made a comparison between hybrid machine learning models like Neural-Network-Adding Bootstrap (BAGNET) and Discriminative Nearest Neighbor Classification (SVM-KNN) when estimating the state of production areas. The study has been carried out in several estuaries with different levels of complexity in the episodes of algal blooms to demonstrate the generalization capacity of the models in bloom detection. As a result, we could observe that, with an average recall value of 93.41% and without dropping below 90% in any of the estuaries, BAGNET outperforms the other models both in terms of results and robustness.
Insights and caveats from mining local and global temporal motifs in cryptocurrency transaction networks
Authors: Authors: Naomi A. Arnold, Peijie Zhong, Cheick Tidiane Ba, Ben Steer, Raul Mondragon, Felix Cuadrado, Renaud Lambiotte, Richard G. Clegg
Abstract
Distributed ledger technologies have opened up a wealth of fine-grained transaction data from cryptocurrencies like Bitcoin and Ethereum. This allows research into problems like anomaly detection, anti-money laundering, pattern mining and activity clustering (where data from traditional currencies is rarely available). The formalism of temporal networks offers a natural way of representing this data and offers access to a wealth of metrics and models. However, the large scale of the data presents a challenge using standard graph analysis techniques. We use temporal motifs to analyse two Bitcoin datasets and one NFT dataset, using sequences of three transactions and up to three users. We show that the commonly used technique of simply counting temporal motifs over all users and all time can give misleading conclusions. Here we also study the motifs contributed by each user and discover that the motif distribution is heavy-tailed and that the key players have diverse motif signatures. We study the motifs that occur in different time periods and find events and anomalous activity that cannot be seen just by a count on the whole dataset. Studying motif completion time reveals dynamics driven by human behaviour as well as algorithmic behaviour.
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
Authors: Authors: Vahid Majdinasab, Amin Nikanjam, Foutse Khomh
Abstract
Code auditing ensures that the developed code adheres to standards, regulations, and copyright protection by verifying that it does not contain code from protected sources. The recent advent of Large Language Models (LLMs) as coding assistants in the software development process poses new challenges for code auditing. The dataset for training these models is mainly collected from publicly available sources. This raises the issue of intellectual property infringement as developers' codes are already included in the dataset. Therefore, auditing code developed using LLMs is challenging, as it is difficult to reliably assert if an LLM used during development has been trained on specific copyrighted codes, given that we do not have access to the training datasets of these models. Given the non-disclosure of the training datasets, traditional approaches such as code clone detection are insufficient for asserting copyright infringement. To address this challenge, we propose a new approach, TraWiC; a model-agnostic and interpretable method based on membership inference for detecting code inclusion in an LLM's training dataset. We extract syntactic and semantic identifiers unique to each program to train a classifier for detecting code inclusion. In our experiments, we observe that TraWiC is capable of detecting 83.87% of codes that were used to train an LLM. In comparison, the prevalent clone detection tool NiCad is only capable of detecting 47.64%. In addition to its remarkable performance, TraWiC has low resource overhead in contrast to pair-wise clone detection that is conducted during the auditing process of tools like CodeWhisperer reference tracker, across thousands of code snippets.
Few-Shot Object Detection with Sparse Context Transformers
Abstract
Few-shot detection is a major task in pattern recognition which seeks to localize objects using models trained with few labeled data. One of the mainstream few-shot methods is transfer learning which consists in pretraining a detection model in a source domain prior to its fine-tuning in a target domain. However, it is challenging for fine-tuned models to effectively identify new classes in the target domain, particularly when the underlying labeled training data are scarce. In this paper, we devise a novel sparse context transformer (SCT) that effectively leverages object knowledge in the source domain, and automatically learns a sparse context from only few training images in the target domain. As a result, it combines different relevant clues in order to enhance the discrimination power of the learned detectors and reduce class confusion. We evaluate the proposed method on two challenging few-shot object detection benchmarks, and empirical results show that the proposed method obtains competitive performance compared to the related state-of-the-art.
YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection
Abstract
Wrist trauma and even fractures occur frequently in daily life, particularly among children who account for a significant proportion of fracture cases. Before performing surgery, surgeons often request patients to undergo X-ray imaging first and prepare for it based on the analysis of the radiologist. With the development of neural networks, You Only Look Once (YOLO) series models have been widely used in fracture detection as computer-assisted diagnosis (CAD). In 2023, Ultralytics presented the latest version of the YOLO models, which has been employed for detecting fractures across various parts of the body. Attention mechanism is one of the hottest methods to improve the model performance. This research work proposes YOLOv8-AM, which incorporates the attention mechanism into the original YOLOv8 architecture. Specifically, we respectively employ four attention modules, Convolutional Block Attention Module (CBAM), Global Attention Mechanism (GAM), Efficient Channel Attention (ECA), and Shuffle Attention (SA), to design the improved models and train them on GRAZPEDWRI-DX dataset. Experimental results demonstrate that the mean Average Precision at IoU 50 (mAP 50) of the YOLOv8-AM model based on ResBlock + CBAM (ResCBAM) increased from 63.6% to 65.8%, which achieves the state-of-the-art (SOTA) performance. Conversely, YOLOv8-AM model incorporating GAM obtains the mAP 50 value of 64.2%, which is not a satisfactory enhancement. Therefore, we combine ResBlock and GAM, introducing ResGAM to design another new YOLOv8-AM model, whose mAP 50 value is increased to 65.0%.
Mitigating Reward Hacking via Information-Theoretic Reward Modeling
Authors: Authors: Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, Dacheng Tao
Abstract
Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge, which primarily stems from limitations in reward modeling, i.e., generalizability of the reward model and inconsistency in the preference dataset. In this work, we tackle this problem from an information theoretic-perspective, and propose a generalizable and robust framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective to filter out irrelevant information and developing a mechanism for model complexity modulation. Notably, we further identify a correlation between overoptimization and outliers in the latent space, establishing InfoRM as a promising tool for detecting reward overoptimization. Inspired by this finding, we propose the Integrated Cluster Deviation Score (ICDS), which quantifies deviations in the latent space, as an indicator of reward overoptimization to facilitate the development of online mitigation strategies. Extensive experiments on a wide range of settings and model scales (70M, 440M, 1.4B, and 7B) support the effectiveness of InfoRM. Further analyses reveal that InfoRM's overoptimization detection mechanism is effective, potentially signifying a notable advancement in the field of RLHF. Code will be released upon acceptance.
GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in Metagenomic Assembly
Authors: Authors: Ali Azizpour, Advait Balaji, Todd J. Treangen, Santiago Segarra
Abstract
Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment. This is particularly true for metagenomic data, where genome dynamics such as horizontal gene transfer, gene duplication, and gene loss/gain complicate accurate genome assembly from metagenomic communities. Detecting repeats is a crucial first step in overcoming these challenges. To address this issue, we propose GraSSRep, a novel approach that leverages the assembly graph's structure through graph neural networks (GNNs) within a self-supervised learning framework to classify DNA sequences into repetitive and non-repetitive categories. Specifically, we frame this problem as a node classification task within a metagenomic assembly graph. In a self-supervised fashion, we rely on a high-precision (but low-recall) heuristic to generate pseudo-labels for a small proportion of the nodes. We then use those pseudo-labels to train a GNN embedding and a random forest classifier to propagate the labels to the remaining nodes. In this way, GraSSRep combines sequencing features with pre-defined and learned graph features to achieve state-of-the-art performance in repeat detection. We evaluate our method using simulated and synthetic metagenomic datasets. The results on the simulated data highlight our GraSSRep's robustness to repeat attributes, demonstrating its effectiveness in handling the complexity of repeated sequences. Additionally, our experiments with synthetic metagenomic datasets reveal that incorporating the graph structure and the GNN enhances our detection performance. Finally, in comparative analyses, GraSSRep outperforms existing repeat detection tools with respect to precision and recall.
Keyword: face recognition
Is my Data in your AI Model? Membership Inference Test with Application to Face Images
Authors: Authors: Daniel DeAlcala, Aythami Morales, Gonzalo Mancera, Julian Fierrez, Ruben Tolosana, Javier Ortega-Garcia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This paper introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if specific data was used during the training of Artificial Intelligence (AI) models. Specifically, we propose two novel MINT architectures designed to learn the distinct activation patterns that emerge when an audited model is exposed to data used during its training process. The first architecture is based on a Multilayer Perceptron (MLP) network and the second one is based on Convolutional Neural Networks (CNNs). The proposed MINT architectures are evaluated on a challenging face recognition task, considering three state-of-the-art face recognition models. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total. Also, different experimental scenarios are considered depending on the context available of the AI model to test. Promising results, up to 90% accuracy, are achieved using our proposed MINT approach, suggesting that it is possible to recognize if an AI model has been trained with specific data.
Only My Model On My Data: A Privacy Preserving Approach Protecting one Model and Deceiving Unauthorized Black-Box Models
Abstract
Deep neural networks are extensively applied to real-world tasks, such as face recognition and medical image classification, where privacy and data protection are critical. Image data, if not protected, can be exploited to infer personal or contextual information. Existing privacy preservation methods, like encryption, generate perturbed images that are unrecognizable to even humans. Adversarial attack approaches prohibit automated inference even for authorized stakeholders, limiting practical incentives for commercial and widespread adaptation. This pioneering study tackles an unexplored practical privacy preservation use case by generating human-perceivable images that maintain accurate inference by an authorized model while evading other unauthorized black-box models of similar or dissimilar objectives, and addresses the previous research gaps. The datasets employed are ImageNet, for image classification, Celeba-HQ dataset, for identity classification, and AffectNet, for emotion classification. Our results show that the generated images can successfully maintain the accuracy of a protected model and degrade the average accuracy of the unauthorized black-box models to 11.97%, 6.63%, and 55.51% on ImageNet, Celeba-HQ, and AffectNet datasets, respectively.
Abstract
The standard approach to modern self-supervised learning is to generate random views through data augmentations and minimise a loss computed from the representations of these views. This inherently encourages invariance to the transformations that comprise the data augmentation function. In this work, we show that adding a module to constrain the representations to be predictive of an affine transformation improves the performance and efficiency of the learning process. The module is agnostic to the base self-supervised model and manifests in the form of an additional loss term that encourages an aggregation of the encoder representations to be predictive of an affine transformation applied to the input images. We perform experiments in various modern self-supervised models and see a performance improvement in all cases. Further, we perform an ablation study on the components of the affine transformation to understand which of them is affecting performance the most, as well as on key architectural design decisions.
Advancing NLP Models with Strategic Text Augmentation: A Comprehensive Study of Augmentation Methods and Curriculum Strategies
Authors: Authors: Himmet Toprak Kesgin, Mehmet Fatih Amasyali
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
This study conducts a thorough evaluation of text augmentation techniques across a variety of datasets and natural language processing (NLP) tasks to address the lack of reliable, generalized evidence for these methods. It examines the effectiveness of these techniques in augmenting training sets to improve performance in tasks such as topic classification, sentiment analysis, and offensive language detection. The research emphasizes not only the augmentation methods, but also the strategic order in which real and augmented instances are introduced during training. A major contribution is the development and evaluation of Modified Cyclical Curriculum Learning (MCCL) for augmented datasets, which represents a novel approach in the field. Results show that specific augmentation methods, especially when integrated with MCCL, significantly outperform traditional training approaches in NLP model performance. These results underscore the need for careful selection of augmentation techniques and sequencing strategies to optimize the balance between speed and quality improvement in various NLP tasks. The study concludes that the use of augmentation methods, especially in conjunction with MCCL, leads to improved results in various classification tasks, providing a foundation for future advances in text augmentation strategies in NLP.
Unifying Invariance and Spuriousity for Graph Out-of-Distribution via Probability of Necessity and Sufficiency
Abstract
Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has a massive of real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraph and further result in suboptimal generalization. To address this challenge, we propose a unified framework to exploit the Probability of Necessity and Sufficiency to extract the Invariant Substructure (PNSIS). Beyond that, this framework further leverages the spurious subgraph to boost the generalization performance in an ensemble manner to enhance the robustness on the noise data. Specificially, we first consider the data generation process for graph data. Under mild conditions, we show that the invariant subgraph can be extracted by minimizing an upper bound, which is built on the theoretical advance of probability of necessity and sufficiency. To further bridge the theory and algorithm, we devise the PNSIS model, which involves an invariant subgraph extractor for invariant graph learning as well invariant and spurious subgraph classifiers for generalization enhancement. Experimental results demonstrate that our \textbf{PNSIS} model outperforms the state-of-the-art techniques on graph OOD on several benchmarks, highlighting the effectiveness in real-world scenarios.
Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration
Authors: Authors: Jiexin Wang, Jiahao Chen, Bing Su
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Although deep neural networks yield high classification accuracy given sufficient training data, their predictions are typically overconfident or under-confident, i.e., the prediction confidences cannot truly reflect the accuracy. Post-hoc calibration tackles this problem by calibrating the prediction confidences without re-training the classification model. However, current approaches assume congruence between test and validation data distributions, limiting their applicability to out-of-distribution scenarios. To this end, we propose a novel meta-set-based cascaded temperature regression method for post-hoc calibration. Our method tailors fine-grained scaling functions to distinct test sets by simulating various domain shifts through data augmentation on the validation set. We partition each meta-set into subgroups based on predicted category and confidence level, capturing diverse uncertainties. A regression network is then trained to derive category-specific and confidence-level-specific scaling, achieving calibration across meta-sets. Extensive experimental results on MNIST, CIFAR-10, and TinyImageNet demonstrate the effectiveness of the proposed method.
Prediction of Activated Sludge Settling Characteristics from Microscopy Images with Deep Convolutional Neural Networks and Transfer Learning
Authors: Authors: Sina Borzooei, Leonardo Scabini, Gisele Miranda, Saba Daneshgar, Lukas Deblieck, Piet De Langhe, Odemir Bruno, Bernard De Baets, Ingmar Nopens, Elena Torfs
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE)
Abstract
Microbial communities play a key role in biological wastewater treatment processes. Activated sludge settling characteristics, for example, are affected by microbial community composition, varying by changes in operating conditions and influent characteristics of wastewater treatment plants (WWTPs). Timely assessment and prediction of changes in microbial composition leading to settling problems, such as filamentous bulking (FB), can prevent operational challenges, reductions in treatment efficiency, and adverse environmental impacts. This study presents an innovative computer vision-based approach to assess activated sludge-settling characteristics based on the morphological properties of flocs and filaments in microscopy images. Implementing the transfer learning of deep convolutional neural network (CNN) models, this approach aims to overcome the limitations of existing quantitative image analysis techniques. The offline microscopy image dataset was collected over two years, with weekly sampling at a full-scale industrial WWTP in Belgium. Multiple data augmentation techniques were employed to enhance the generalizability of the CNN models. Various CNN architectures, including Inception v3, ResNet18, ResNet152, ConvNeXt-nano, and ConvNeXt-S, were tested to evaluate their performance in predicting sludge settling characteristics. The sludge volume index was used as the final prediction variable, but the method can easily be adjusted to predict any other settling metric of choice. The results showed that the suggested CNN-based approach provides less labour-intensive, objective, and consistent assessments, while transfer learning notably minimises the training phase, resulting in a generalizable system that can be employed in real-time applications.
Keyword: detection
Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors
Unveiling Hidden Energy Anomalies: Harnessing Deep Learning to Optimize Energy Management in Sports Facilities
Automated detection of motion artifacts in brain MR images using deep learning and explainable artificial intelligence
Learning-based Bone Quality Classification Method for Spinal Metastasis
Graph Inference Acceleration by Learning MLPs on Graphs without Supervision
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Evaluating DTW Measures via a Synthesis Framework for Time-Series Data
Research and application of Transformer based anomaly detection model: A literature review
OmniBOR: A System for Automatic, Verifiable Artifact Resolution across Software Supply Chains
Detecting Adversarial Spectrum Attacks via Distance to Decision Boundary Statistics
Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection
Solid Waste Detection in Remote Sensing Images: A Survey
Detection Latencies of Anomaly Detectors: An Overlooked Perspective ?
Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit Clues
Unity is Strength: Enhancing Precision in Reentrancy Vulnerability Detection of Smart Contract Analysis Tools
Exploring the Adversarial Capabilities of Large Language Models
Advancing NLP Models with Strategic Text Augmentation: A Comprehensive Study of Augmentation Methods and Curriculum Strategies
Wireless Crowd Detection for Smart Overtourism Mitigation
Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling
Switch EMA: A Free Lunch for Better Flatness and Sharpness
Efficient One-stage Video Object Detection by Exploiting Temporal Consistency
Synthesizing Knowledge-enhanced Features for Real-world Zero-shot Food Detection
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
UR2M: Uncertainty and Resource-Aware Event Detection on Microcontrollers
Personalized Large Language Models
Hybrid Machine Learning techniques in the management of harmful algal blooms impact
Insights and caveats from mining local and global temporal motifs in cryptocurrency transaction networks
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
Few-Shot Object Detection with Sparse Context Transformers
YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection
Mitigating Reward Hacking via Information-Theoretic Reward Modeling
GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in Metagenomic Assembly
Keyword: face recognition
Is my Data in your AI Model? Membership Inference Test with Application to Face Images
Only My Model On My Data: A Privacy Preserving Approach Protecting one Model and Deceiving Unauthorized Black-Box Models
Keyword: augmentation
Affine transformation estimation improves visual self-supervised learning
Advancing NLP Models with Strategic Text Augmentation: A Comprehensive Study of Augmentation Methods and Curriculum Strategies
Unifying Invariance and Spuriousity for Graph Out-of-Distribution via Probability of Necessity and Sufficiency
Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration
Prediction of Activated Sludge Settling Characteristics from Microscopy Images with Deep Convolutional Neural Networks and Transfer Learning