New submissions for Tue, 6 Feb 24

Keyword: detection

Detection of Machine-Generated Text: Literature Survey

Authors: Authors: Dmytro Valiaiev
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.01642
Pdf link: https://arxiv.org/pdf/2402.01642
Abstract Since language models produce fake text quickly and easily, there is an oversupply of such content in the public domain. The degree of sophistication and writing style has reached a point where differentiating between human authored and machine-generated content is nearly impossible. As a result, works generated by language models rather than human authors have gained significant media attention and stirred controversy.Concerns regarding the possible influence of advanced language models on society have also arisen, needing a fuller knowledge of these processes. Natural language generation (NLG) and generative pre-trained transformer (GPT) models have revolutionized a variety of sectors: the scope not only permeated throughout journalism and customer service but also reached academia. To mitigate the hazardous implications that may arise from the use of these models, preventative measures must be implemented, such as providing human agents with the capacity to distinguish between artificially made and human composed texts utilizing automated systems and possibly reverse-engineered language models. Furthermore, to ensure a balanced and responsible approach, it is critical to have a full grasp of the socio-technological ramifications of these breakthroughs. This literature survey aims to compile and synthesize accomplishments and developments in the aforementioned work, while also identifying future prospects. It also gives an overview of machine-generated text trends and explores the larger societal implications. Ultimately, this survey intends to contribute to the development of robust and effective approaches for resolving the issues connected with the usage and detection of machine-generated text by exploring the interplay between the capabilities of language models and their possible implications.
Recent Innovations in Footwear Sensors: Role of Smart Footwear in Healthcare -- A Survey
Authors: Authors: Pradyumna G. R., Roopa B. Hegde, Bommegowda K. B., Anil Kumar Bhat, Amit N. Pujari, Ganesh R. Naik
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2402.01645
Pdf link: https://arxiv.org/pdf/2402.01645
Abstract Smart shoes have ushered in a new era of personalised health monitoring and assistive technology. The shoe leverages technologies such as Bluetooth for data collection and wireless transmission and incorporates features such as GPS tracking, obstacle detection, and fitness tracking. This article provides an overview of the current state of smart shoe technology, highlighting the integration of advanced sensors for health monitoring, energy harvesting, assistive features for the visually impaired, and deep learning for data analysis. The study discusses the potential of smart footwear in medical applications, particularly for patients with diabetes, and the ongoing research in this field. Current footwear challenges are also discussed, including complex construction, poor fit, comfort, and high cost.
Killer Apps: Low-Speed, Large-Scale AI Weapons
Authors: Authors: Philip Feldman, Aaron Dant, James R. Foulds
Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.01663
Pdf link: https://arxiv.org/pdf/2402.01663
Abstract The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid decision-making in kinetic conflict. However, an equally important but often overlooked aspect is the potential of AI-based psychological manipulation at internet scales within the information domain. These capabilities could pose significant threats to individuals, organizations, and societies globally. This paper explores the concept of AI weapons, their deployment, detection, and potential countermeasures.
Edge Offloading in Smart Grid
Authors: Authors: Gabriel Ioan Arcas, Tudor Cioara, Ionut Anghel, Dragos Lazea, Anca Hangan
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2402.01664
Pdf link: https://arxiv.org/pdf/2402.01664
Abstract The energy transition supports the shift towards more sustainable energy alternatives, paving towards decentralized smart grids, where the energy is generated closer to the point of use. The decentralized smart grids foresee novel data-driven low latency applications for improving resilience and responsiveness, such as peer-to-peer energy trading, microgrid control, fault detection, or demand response. However, the traditional cloud-based smart grid architectures are unable to meet the requirements of the new emerging applications such as low latency and high-reliability thus alternative architectures such as edge, fog, or hybrid need to be adopted. Moreover, edge offloading can play a pivotal role for the next-generation smart grid AI applications because it enables the efficient utilization of computing resources and addresses the challenges of increasing data generated by IoT devices, optimizing the response time, energy consumption, and network performance. However, a comprehensive overview of the current state of research is needed to support sound decisions regarding energy-related applications offloading from cloud to fog or edge, focusing on smart grid open challenges and potential impacts. In this paper, we delve into smart grid and computational distribution architec-tures, including edge-fog-cloud models, orchestration architecture, and serverless computing, and analyze the decision-making variables and optimization algorithms to assess the efficiency of edge offloading. Finally, the work contributes to a comprehensive understanding of the edge offloading in smart grid, providing a SWOT analysis to support decision making.
Resource-efficient In-orbit Detection of Earth Objects
Authors: Authors: Qiyang Zhang, Xin Yuan, Ruolin Xing, Yiran Zhang, Zimu Zheng, Xiao Ma, Mengwei Xu, Schahram Dustdar, Shangguang Wang
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2402.01675
Pdf link: https://arxiv.org/pdf/2402.01675
Abstract With the rapid proliferation of large Low Earth Orbit (LEO) satellite constellations, a huge amount of in-orbit data is generated and needs to be transmitted to the ground for processing. However, traditional LEO satellite constellations, which downlink raw data to the ground, are significantly restricted in transmission capability. Orbital edge computing (OEC), which exploits the computation capacities of LEO satellites and processes the raw data in orbit, is envisioned as a promising solution to relieve the downlink burden. Yet, with OEC, the bottleneck is shifted to the inelastic computation capacities. The computational bottleneck arises from two primary challenges that existing satellite systems have not adequately addressed: the inability to process all captured images and the limited energy supply available for satellite operations. In this work, we seek to fully exploit the scarce satellite computation and communication resources to achieve satellite-ground collaboration and present a satellite-ground collaborative system named TargetFuse for onboard object detection. TargetFuse incorporates a combination of techniques to minimize detection errors under energy and bandwidth constraints. Extensive experiments show that TargetFuse can reduce detection errors by 3.4 times on average, compared to onboard computing. TargetFuse achieves a 9.6 times improvement in bandwidth efficiency compared to the vanilla baseline under the limited bandwidth budget constraint.
A Systematic Mapping Study of Digital Twins for Diagnosis in Transportation
Authors: Authors: Liliana Marie Prikler, Franz Wotawa (Graz University of Technology, Institute for Software Technology)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2402.01686
Pdf link: https://arxiv.org/pdf/2402.01686
Abstract In recent years, digital twins have been proposed and implemented in various fields with potential applications ranging from prototyping to maintenance. Going forward, they are to enable numerous efficient and sustainable technologies, among them autonomous cars. However, despite a large body of research in many fields, academics have yet to agree on what exactly a digital twin is -- and as a result, what its capabilities and limitations might be. To further our understanding, we explore the capabilities of digital twins concerning diagnosis in the field of transportation. We conduct a systematic mapping study including digital twins of vehicles and their components, as well as transportation infrastructure. We discovered that few papers on digital twins describe any diagnostic process. Furthermore, most existing approaches appear limited to system monitoring or fault detection. These findings suggest that we need more research for diagnostic reasoning utilizing digital twins.
Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Large Language Models
Authors: Authors: Hamideh Ghanadian, Isar Nejadgholi, Hussein Al Osman
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.01712
Pdf link: https://arxiv.org/pdf/2402.01712
Abstract Suicidal ideation detection is a vital research area that holds great potential for improving mental health support systems. However, the sensitivity surrounding suicide-related data poses challenges in accessing large-scale, annotated datasets necessary for training effective machine learning models. To address this limitation, we introduce an innovative strategy that leverages the capabilities of generative AI models, such as ChatGPT, Flan-T5, and Llama, to create synthetic data for suicidal ideation detection. Our data generation approach is grounded in social factors extracted from psychology literature and aims to ensure coverage of essential information related to suicidal ideation. In our study, we benchmarked against state-of-the-art NLP classification models, specifically, those centered around the BERT family structures. When trained on the real-world dataset, UMD, these conventional models tend to yield F1-scores ranging from 0.75 to 0.87. Our synthetic data-driven method, informed by social factors, offers consistent F1-scores of 0.82 for both models, suggesting that the richness of topics in synthetic data can bridge the performance gap across different model complexities. Most impressively, when we combined a mere 30% of the UMD dataset with our synthetic data, we witnessed a substantial increase in performance, achieving an F1-score of 0.88 on the UMD test set. Such results underscore the cost-effectiveness and potential of our approach in confronting major challenges in the field, such as data scarcity and the quest for diversity in data representation.
Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties
Authors: Authors: Jasmine Chiat Ling Ong, Liyuan Jin, Kabilan Elangovan, Gilbert Yong San Lim, Daniel Yan Zheng Lim, Gerald Gui Ren Sng, Yuhe Ke, Joshua Yi Min Tung, Ryan Jian Zhong, Christopher Ming Yao Koh, Keane Zhi Hao Lee, Xiang Chen, Jack Kian Chng, Aung Than, Ken Junyang Goh, Daniel Shu Wei Ting
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.01741
Pdf link: https://arxiv.org/pdf/2402.01741
Abstract Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) as a Clinical Decision Support System (CDSS) for safe medication prescription. This model addresses the limitations of traditional rule-based CDSS by providing relevant prescribing error alerts tailored to patient context and institutional guidelines. Objective: The study evaluates the efficacy of an LLM-based CDSS in identifying medication errors across various medical and surgical case vignettes, compared to a human expert panel. It also examines clinician preferences among different CDSS integration modalities: junior pharmacist, LLM-based CDSS alone, and a combination of both. Design, Setting, and Participants: Utilizing a RAG model with GPT-4.0, the study involved 61 prescribing error scenarios within 23 clinical vignettes across 12 specialties. An expert panel assessed these cases using the PCNE classification and NCC MERP index. Three junior pharmacists independently reviewed each vignette under simulated conditions. Main Outcomes and Measures: The study assesses the LLM-based CDSS's accuracy, precision, recall, and F1 scores in identifying Drug-Related Problems (DRPs), compared to junior pharmacists alone or in an assistive mode with the CDSS. Results: The co-pilot mode of RAG-LLM significantly improved DRP identification accuracy by 22% over solo pharmacists. It showed higher recall and F1 scores, indicating better detection of severe DRPs, despite a slight decrease in precision. Accuracy varied across categories when pharmacists had access to RAG-LLM responses. Conclusions: The RAG-LLM based CDSS enhances medication error identification accuracy when used with junior pharmacists, especially in detecting severe DRPs.
Systematic Literature Review: Computational Approaches for Humour Style Classification
Authors: Authors: Mary Ogbuka Kenneth, Foaad Khosmood, Abbas Edalat
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.01759
Pdf link: https://arxiv.org/pdf/2402.01759
Abstract Understanding various humour styles is essential for comprehending the multifaceted nature of humour and its impact on fields such as psychology and artificial intelligence. This understanding has revealed that humour, depending on the style employed, can either have therapeutic or detrimental effects on an individual's health and relationships. Although studies dedicated exclusively to computational-based humour style analysis remain somewhat rare, an expansive body of research thrives within related task, particularly binary humour and sarcasm recognition. In this systematic literature review (SLR), we survey the landscape of computational techniques applied to these related tasks and also uncover their fundamental relevance to humour style analysis. Through this study, we unveil common approaches, illuminate various datasets and evaluation metrics, and effectively navigate the complex terrain of humour research. Our efforts determine potential research gaps and outlined promising directions. Furthermore, the SLR identifies a range of features and computational models that can seamlessly transition from related tasks like binary humour and sarcasm detection to invigorate humour style classification. These features encompass incongruity, sentiment and polarity analysis, ambiguity detection, acoustic nuances, visual cues, contextual insights, and more. The computational models that emerge contain traditional machine learning paradigms, neural network architectures, transformer-based models, and specialised models attuned to the nuances of humour. Finally, the SLR provides access to existing datasets related to humour and sarcasm, facilitating the work of future researchers.
AOC-IDS: Autonomous Online Framework with Contrastive Learning for Intrusion Detection
Authors: Authors: Xinchen Zhang, Running Zhao, Zhihan Jiang, Zhicong Sun, Yulong Ding, Edith C.H. Ngai, Shuang-Hua Yang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.01807
Pdf link: https://arxiv.org/pdf/2402.01807
Abstract The rapid expansion of the Internet of Things (IoT) has raised increasing concern about targeted cyber attacks. Previous research primarily focused on static Intrusion Detection Systems (IDSs), which employ offline training to safeguard IoT systems. However, such static IDSs struggle with real-world scenarios where IoT system behaviors and attack strategies can undergo rapid evolution, necessitating dynamic and adaptable IDSs. In response to this challenge, we propose AOC-IDS, a novel online IDS that features an autonomous anomaly detection module (ADM) and a labor-free online framework for continual adaptation. In order to enhance data comprehension, the ADM employs an Autoencoder (AE) with a tailored Cluster Repelling Contrastive (CRC) loss function to generate distinctive representation from limited or incrementally incoming data in the online setting. Moreover, to reduce the burden of manual labeling, our online framework leverages pseudo-labels automatically generated from the decision-making process in the ADM to facilitate periodic updates of the ADM. The elimination of human intervention for labeling and decision-making boosts the system's compatibility and adaptability in the online setting to remain synchronized with dynamic environments. Experimental validation using the NSL-KDD and UNSW-NB15 datasets demonstrates the superior performance and adaptability of AOC-IDS, surpassing the state-of-the-art solutions. The code is released at https://github.com/xinchen930/AOC-IDS.
S2malloc: Statistically Secure Allocator for Use-After-Free Protection And More
Authors: Authors: Ruizhe Wang, Meng Xu, N. Asokan
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.01894
Pdf link: https://arxiv.org/pdf/2402.01894
Abstract Attacks on heap memory, encompassing memory overflow, double and invalid free, use-after-free (UAF), and various heap spraying techniques are ever-increasing. Existing entropy-based secure memory allocators provide statistical defenses against virtually all of these attack vectors. Although they claim protections against UAF attacks, their designs are not tailored to detect (failed) attempts. Consequently, to beat this entropy-based protection, an attacker can simply launch the same attack repeatedly with the potential use of heap spraying to further improve their chance of success. We introduce S2malloc, aiming to enhance UAF-attempt detection without compromising other security guarantees or introducing significant performance overhead. To achieve this, we use three innovative constructs in secure allocator design: free block canaries (FBC) to detect UAF attempts, random in-block offset (RIO) to stop the attacker from accurately overwriting the victim object, and random bag layout (RBL) to impede attackers from estimating the block size based on its address. We show that (a) by reserving 25% of the object size for the RIO offset, an 8-byte canary offers a 69% protection rate if the attacker reuses the same pointer and 96% protection rate if the attacker does not, against UAF exploitation attempts targeting a 64 bytes object, with equal or higher security guarantees against all other attacks; and (b) S2malloc is practical, with only a 2.8% run-time overhead on PARSEC and an 11.5% overhead on SPEC. Compared to state-of-the-art entropy-based allocators, S2malloc improves UAF-protection without incurring additional performance overhead. Compared to UAF-mitigating allocators, S2malloc trades off a minuscule probability of failed protection for significantly lower overhead.
Are You a Real Software Engineer? Best Practices in Online Recruitment for Software Engineering Studies
Authors: Authors: Adam Alami, Mansooreh Zahedi, Neil Ernst
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.01925
Pdf link: https://arxiv.org/pdf/2402.01925
Abstract Online research platforms, such as Prolific, offer rapid access to diverse participant pools but also pose unique challenges in participant qualification and skill verification. Previous studies reported mixed outcomes and challenges in leveraging online platforms for the recruitment of qualified software engineers. Drawing from our experience in conducting three different studies using Prolific, we propose best practices for recruiting and screening participants to enhance the quality and relevance of both qualitative and quantitative software engineering (SE) research samples. We propose refined best practices for recruitment in SE research on Prolific. (1) Iterative and controlled prescreening, enabling focused and manageable assessment of submissions (2) task-oriented and targeted questions that assess technical skills, knowledge of basic SE concepts, and professional engagement. (3) AI detection to verify the authenticity of free-text responses. (4) Qualitative and manual assessment of responses, ensuring authenticity and relevance in participant answers (5) Additional layers of prescreening are necessary when necessary to collect data relevant to the topic of the study. (6) Fair or generous compensation post-qualification to incentivize genuine participation. By sharing our experiences and lessons learned, we contribute to the development of effective and rigorous methods for SE empirical research. particularly the ongoing effort to establish guidelines to ensure reliable data collection. These practices have the potential to transferability to other participant recruitment platforms.
MasonPerplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles
Authors: Authors: Amrita Ganguly, Al Nahian Bin Emran, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Dhiman Goswami, Marcos Zampieri
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.01967
Pdf link: https://arxiv.org/pdf/2402.01967
Abstract The automatic identification of offensive language such as hate speech is important to keep discussions civil in online communities. Identifying hate speech in multimodal content is a particularly challenging task because offensiveness can be manifested in either words or images or a juxtaposition of the two. This paper presents the MasonPerplexity submission for the Shared Task on Multimodal Hate Speech Event Detection at CASE 2024 at EACL 2024. The task is divided into two sub-tasks: sub-task A focuses on the identification of hate speech and sub-task B focuses on the identification of targets in text-embedded images during political events. We use an XLM-roBERTa-large model for sub-task A and an ensemble approach combining XLM-roBERTa-base, BERTweet-large, and BERT-base for sub-task B. Our approach obtained 0.8347 F1-score in sub-task A and 0.6741 F1-score in sub-task B ranking 3rd on both sub-tasks.
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery
Authors: Authors: Lianhao Yin, Yutong Ban, Jennifer Eckhoff, Ozanan Meireles, Daniela Rus, Guy Rosman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.01974
Pdf link: https://arxiv.org/pdf/2402.01974
Abstract Understanding and anticipating intraoperative events and actions is critical for intraoperative assistance and decision-making during minimally invasive surgery. Automated prediction of events, actions, and the following consequences is addressed through various computational approaches with the objective of augmenting surgeons' perception and decision-making capabilities. We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video, while flexibly leveraging surgical knowledge graphs. The approach incorporates a hypergraph-transformer (HGT) structure that encodes expert knowledge into the network design and predicts the hidden embedding of the graph. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets, and the achievement of the Critical View of Safety (CVS). Moreover, we address specific, safety-related tasks, such as predicting the clipping of cystic duct or artery without prior achievement of the CVS. Our results demonstrate the superiority of our approach compared to unstructured alternatives.
MasonPerplexity at ClimateActivism 2024: Integrating Advanced Ensemble Techniques and Data Augmentation for Climate Activism Stance and Hate Event Identification
Authors: Authors: Al Nahian Bin Emran, Amrita Ganguly, Sadiya Sayara Chowdhury Puspo, Dhiman Goswami, Md Nishat Raihan
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.01976
Pdf link: https://arxiv.org/pdf/2402.01976
Abstract The task of identifying public opinions on social media, particularly regarding climate activism and the detection of hate events, has emerged as a critical area of research in our rapidly changing world. With a growing number of people voicing either to support or oppose to climate-related issues - understanding these diverse viewpoints has become increasingly vital. Our team, MasonPerplexity, participates in a significant research initiative focused on this subject. We extensively test various models and methods, discovering that our most effective results are achieved through ensemble modeling, enhanced by data augmentation techniques like back-translation. In the specific components of this research task, our team achieved notable positions, ranking 5th, 1st, and 6th in the respective sub-tasks, thereby illustrating the effectiveness of our approach in this important field of study.
SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks
Authors: Authors: Gourab Dey, Adithya V Ganesan, Yash Kumar Lal, Manal Shah, Shreyashee Sinha, Matthew Matero, Salvatore Giorgi, Vivek Kulkarni, H. Andrew Schwartz
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.01980
Pdf link: https://arxiv.org/pdf/2402.01980
Abstract Social science NLP tasks, such as emotion or humor detection, are required to capture the semantics along with the implicit pragmatics from text, often with limited amounts of training data. Instruction tuning has been shown to improve the many capabilities of large language models (LLMs) such as commonsense reasoning, reading comprehension, and computer programming. However, little is known about the effectiveness of instruction tuning on the social domain where implicit pragmatic cues are often needed to be captured. We explore the use of instruction tuning for social science NLP tasks and introduce Socialite-Llama -- an open-source, instruction-tuned Llama. On a suite of 20 social science tasks, Socialite-Llama improves upon the performance of Llama as well as matches or improves upon the performance of a state-of-the-art, multi-task finetuned model on a majority of them. Further, Socialite-Llama also leads to improvement on 5 out of 6 related social tasks as compared to Llama, suggesting instruction tuning can lead to generalized social understanding. All resources including our code, model and dataset can be found through bit.ly/socialitellama.
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning
Authors: Authors: Yaning Zhang, Zitong Yu, Xiaobin Huang, Linlin Shen, Jianfeng Ren
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02003
Pdf link: https://arxiv.org/pdf/2402.02003
Abstract The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Thus, benchmarking and advancing techniques detecting digital manipulation become an urgent issue. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology, which does not involve the most recent technologies like diffusion. The diversity and quality of images generated by diffusion models have been significantly improved and thus a much more challenging face forgery dataset shall be used to evaluate SOTA forgery detection literature. In this paper, we propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection, which contains a large number of forgery faces generated by advanced generators such as the diffusion-based model and more detailed labels about the manipulation approaches and adopted generators. In addition to evaluating SOTA approaches on our benchmark, we design an innovative cross appearance-edge learning (CAEL) detector to capture multi-grained appearance and edge global representations, and detect discriminative and general forgery traces. Moreover, we devise an appearance-edge cross-attention (AECA) module to explore the various integrations across two domains. Extensive experiment results and visualizations show that our detection model outperforms the state of the arts on different settings like cross-generator, cross-forgery, and cross-dataset evaluations. Code and datasets will be available at \url{https://github.com/Jenine-321/GenFace
Understanding Time Series Anomaly State Detection through One-Class Classification
Authors: Authors: Hanxu Zhou, Yuan Zhang, Guangjie Leng, Ruofan Wang, Zhi-Qin John Xu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02007
Pdf link: https://arxiv.org/pdf/2402.02007
Abstract For a long time, research on time series anomaly detection has mainly focused on finding outliers within a given time series. Admittedly, this is consistent with some practical problems, but in other practical application scenarios, people are concerned about: assuming a standard time series is given, how to judge whether another test time series deviates from the standard time series, which is more similar to the problem discussed in one-class classification (OCC). Therefore, in this article, we try to re-understand and define the time series anomaly detection problem through OCC, which we call 'time series anomaly state detection problem'. We first use stochastic processes and hypothesis testing to strictly define the 'time series anomaly state detection problem', and its corresponding anomalies. Then, we use the time series classification dataset to construct an artificial dataset corresponding to the problem. We compile 38 anomaly detection algorithms and correct some of the algorithms to adapt to handle this problem. Finally, through a large number of experiments, we fairly compare the actual performance of various time series anomaly detection algorithms, providing insights and directions for future research by researchers.
Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving
Authors: Authors: Lixing Xiao, Ruixiao Shi, Xiaoyang Tang, Yi Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02026
Pdf link: https://arxiv.org/pdf/2402.02026
Abstract Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL.
Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks
Authors: Authors: Xi Li, Hang Wang, David J. Miller, George Kesidis
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2402.02034
Pdf link: https://arxiv.org/pdf/2402.02034
Abstract A variety of defenses have been proposed against backdoors attacks on deep neural network (DNN) classifiers. Universal methods seek to reliably detect and/or mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while reverse-engineering methods often explicitly assume one. In this paper, we describe a new detector that: relies on internal feature map of the defended DNN to detect and reverse-engineer the backdoor and identify its target class; can operate post-training (without access to the training dataset); is highly effective for various incorporation mechanisms (i.e., is universal); and which has low computational overhead and so is scalable. Our detection approach is evaluated for different attacks on a benchmark CIFAR-10 image classifier.
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Authors: Authors: Zhe Li, Laurence T. Yang, Bocheng Ren, Xin Nie, Zhangyang Gao, Cheng Tan, Stan Z. Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02045
Pdf link: https://arxiv.org/pdf/2402.02045
Abstract The scarcity of annotated data has sparked significant interest in unsupervised pre-training methods that leverage medical reports as auxiliary signals for medical visual representation learning. However, existing research overlooks the multi-granularity nature of medical visual representation and lacks suitable contrastive learning techniques to improve the models' generalizability across different granularities, leading to the underutilization of image-text information. To address this, we propose MLIP, a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning. Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge. Experimental evaluations reveal the efficacy of our model in enhancing transfer performance for tasks such as image classification, object detection, and semantic segmentation. Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection
Authors: Authors: Tianxiang Chen, Zhentao Tan, Qi Chu, Yue Wu, Bin Liu, Nenghai Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02046
Pdf link: https://arxiv.org/pdf/2402.02046
Abstract Infrared small target detection (ISTD) is critical to national security and has been extensively applied in military areas. ISTD aims to segment small target pixels from background. Most ISTD networks focus on designing feature extraction blocks or feature fusion modules, but rarely describe the ISTD process from the feature map evolution perspective. In the ISTD process, the network attention gradually shifts towards target areas. We abstract this process as the directional movement of feature map pixels to target areas through convolution, pooling and interactions with surrounding pixels, which can be analogous to the movement of thermal particles constrained by surrounding variables and particles. In light of this analogy, we propose Thermal Conduction-Inspired Transformer (TCI-Former) based on the theoretical principles of thermal conduction. According to thermal conduction differential equation in heat dynamics, we derive the pixel movement differential equation (PMDE) in the image domain and further develop two modules: Thermal Conduction-Inspired Attention (TCIA) and Thermal Conduction Boundary Module (TCBM). TCIA incorporates finite difference method with PMDE to reach a numerical approximation so that target body features can be extracted. To further remove errors in boundary areas, TCBM is designed and supervised by boundary masks to refine target body features with fine boundary details. Experiments on IRSTD-1k and NUAA-SIRST demonstrate the superiority of our method.
Feature Selection using the concept of Peafowl Mating in IDS
Authors: Authors: Partha Ghosh, Joy Sharma, Nilesh Pandey
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02052
Pdf link: https://arxiv.org/pdf/2402.02052
Abstract Cloud computing has high applicability as an Internet based service that relies on sharing computing resources. Cloud computing provides services that are Infrastructure based, Platform based and Software based. The popularity of this technology is due to its superb performance, high level of computing ability, low cost of services, scalability, availability and flexibility. The obtainability and openness of data in cloud environment make it vulnerable to the world of cyber-attacks. To detect the attacks Intrusion Detection System is used, that can identify the attacks and ensure information security. Such a coherent and proficient Intrusion Detection System is proposed in this paper to achieve higher certainty levels regarding safety in cloud environment. In this paper, the mating behavior of peafowl is incorporated into an optimization algorithm which in turn is used as a feature selection algorithm. The algorithm is used to reduce the huge size of cloud data so that the IDS can work efficiently on the cloud to detect intrusions. The proposed model has been experimented with NSL-KDD dataset as well as Kyoto dataset and have proved to be a better as well as an efficient IDS.
RIDERS: Radar-Infrared Depth Estimation for Robust Sensing
Authors: Authors: Han Li, Yukai Ma, Yuehao Huang, Yaqing Gu, Weihua Xu, Yong Liu, Xingxing Zuo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02067
Pdf link: https://arxiv.org/pdf/2402.02067
Abstract Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave Radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense Radar augmentation by learning Radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios. Our code will be released at \url{https://github.com/MMOCKING/RIDERS}.
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties
Authors: Authors: Ekaterina Artemova, Verena Blaschke, Barbara Plank
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.02078
Pdf link: https://arxiv.org/pdf/2402.02078
Abstract Mainstream cross-lingual task-oriented dialogue (ToD) systems leverage the transfer learning paradigm by training a joint model for intent recognition and slot-filling in English and applying it, zero-shot, to other languages. We address a gap in prior research, which often overlooked the transfer to lower-resource colloquial varieties due to limited test data. Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets. Our perturbation rules cover 18 distinct language phenomena, enabling us to explore the impact of each perturbation on slot and intent performance. Using these new datasets, we conduct an experimental evaluation across six different transformers. Here, we demonstrate that when applied to colloquial varieties, ToD systems maintain their intent recognition performance, losing 6% (4.62 percentage points) in accuracy on average. However, they exhibit a significant drop in slot detection, with a decrease of 31% (21 percentage points) in slot F1 score. Our findings are further supported by a transfer experiment from Standard American English to synthetic Urban African American Vernacular English.
DeCoF: Generated Video Detection via Frame Consistency
Authors: Authors: Long Ma, Jiajia Zhang, Hongping Deng, Ningyu Zhang, Yong Liao, Haiyang Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02085
Pdf link: https://arxiv.org/pdf/2402.02085
Abstract The escalating quality of video generated by advanced video generation methods leads to new security challenges in society, which makes generated video detection an urgent research priority.To foster collaborative research in this area, we construct the first open-source dataset explicitly for generated video detection, providing a valuable resource for the community to benchmark and improve detection methodologies. Through a series of carefully designed probe experiments, our study explores the significance of temporal and spatial artifacts in developing general and robust detectors for generated video. Based on the principle of video frame consistency, we introduce a simple yet effective detection model (DeCoF) that eliminates the impact of spatial artifacts during generalizing feature learning. Our extensive experiments demonstrate the efficacy of DeCoF in detecting videos produced by unseen video generation models and confirm its powerful generalization capabilities across several commercial proprietary models.
Recent Advances in Digital Image and Video Forensics, Anti-forensics and Counter Anti-forensics
Authors: Authors: Maryam Al-Fehani, Saif Al-Kuwari
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.02089
Pdf link: https://arxiv.org/pdf/2402.02089
Abstract Image and video forensics have recently gained increasing attention due to the proliferation of manipulated images and videos, especially on social media platforms, such as Twitter and Instagram, which spread disinformation and fake news. This survey explores image and video identification and forgery detection covering both manipulated digital media and generative media. However, media forgery detection techniques are susceptible to anti-forensics; on the other hand, such anti-forensics techniques can themselves be detected. We therefore further cover both anti-forensics and counter anti-forensics techniques in image and video. Finally, we conclude this survey by highlighting some open problems in this domain.
Decomposition-based and Interference Perception for Infrared and Visible Image Fusion in Complex Scenes
Authors: Authors: Xilai Li, Xiaosong Li, Haishu Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02096
Pdf link: https://arxiv.org/pdf/2402.02096
Abstract Infrared and visible image fusion has emerged as a prominent research in computer vision. However, little attention has been paid on complex scenes fusion, causing existing techniques to produce sub-optimal results when suffers from real interferences. To fill this gap, we propose a decomposition-based and interference perception image fusion method. Specifically, we classify the pixels of visible image from the degree of scattering of light transmission, based on which we then separate the detail and energy information of the image. This refined decomposition facilitates the proposed model in identifying more interfering pixels that are in complex scenes. To strike a balance between denoising and detail preservation, we propose an adaptive denoising scheme for fusing detail components. Meanwhile, we propose a new weighted fusion rule by considering the distribution of image energy information from the perspective of multiple directions. Extensive experiments in complex scenes fusions cover adverse weathers, noise, blur, overexposure, fire, as well as downstream tasks including semantic segmentation, object detection, salient object detection and depth estimation, consistently indicate the effectiveness and superiority of the proposed method compared with the recent representative methods.
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection
Authors: Authors: Sarah Masud, Mohammad Aflah Khan, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.02144
Pdf link: https://arxiv.org/pdf/2402.02144
Abstract Despite the widespread adoption, there is a lack of research into how various critical aspects of pretrained language models (PLMs) affect their performance in hate speech detection. Through five research questions, our findings and recommendations lay the groundwork for empirically investigating different aspects of PLMs' use in hate speech detection. We deep dive into comparing different pretrained models, evaluating their seed robustness, finetuning settings, and the impact of pretraining data collection time. Our analysis reveals early peaks for downstream tasks during pretraining, the limited benefit of employing a more recent pretraining corpus, and the significance of specific layers during finetuning. We further call into question the use of domain-specific models and highlight the need for dynamic datasets for benchmarking hate speech detection.
Collaborative Agents for Software Engineering
Authors: Authors: Daniel Tang, Zhenghan Chen, Kisub Kim, Yewei Song, Haoye Tian, Saad Ezzini, Yongfeng Huang, Jacques Klein Tegawende F. Bissyande
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.02172
Pdf link: https://arxiv.org/pdf/2402.02172
Abstract Code review is a heavily collaborative process, which aims at ensuring the overall quality and reliability of software. While it provides massive benefits, the implementation of code review in an organization faces several challenges that make its automation appealing. Automated code review tools have been around for a while and are now improving thanks to the adoption of novel AI models, which help can learn about standard practices and systematically check that the reviewed code adheres to them. Unfortunately, existing methods fall short: they often target a single input-output generative model, which cannot simulate the collaboration interactions in code review to account for various perspectives; they are also sub-performing on various critical code review sub-tasks. In this paper, we advance the state of the art in code review automation by introducing CodeAgent, a novel multi-agent-based system for code review. Fundamentally, CodeAgent is steered by QA-Checker (short for ``Question-Answer Checking"), a supervision agent, designed specifically to ensure that all agents' contributions remain relevant to the initial review question. CodeAgent is autonomous, multi-agent, and Large language model-driven. To demonstrate the effectiveness of CodeAgent, we performed experiments to assess its capabilities in various tasks including 1) detection of inconsistencies between code changes and commit messages, 2) detection of vulnerability introduction by commits, and 3) validation of adherence to code style. Our website is accessed in \url{https://code-agent-new.vercel.app/index.html}.
Detecting Respiratory Pathologies Using Convolutional Neural Networks and Variational Autoencoders for Unbalancing Data
Authors: Authors: María Teresa García-Ordás, José Alberto Benítez-Andrades, Isaías García-Rodríguez, Carmen Benavides, Héctor Alaiz-Moretón
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02183
Pdf link: https://arxiv.org/pdf/2402.02183
Abstract The aim of this paper was the detection of pathologies through respiratory sounds. The ICBHI (International Conference on Biomedical and Health Informatics) Benchmark was used. This dataset is composed of 920 sounds of which 810 are of chronic diseases, 75 of non-chronic diseases and only 35 of healthy individuals. As more than 88% of the samples of the dataset are from the same class (Chronic), the use of a Variational Convolutional Autoencoder was proposed to generate new labeled data and other well known oversampling techniques after determining that the dataset classes are unbalanced. Once the preprocessing step was carried out, a Convolutional Neural Network (CNN) was used to classify the respiratory sounds into healthy, chronic, and non-chronic disease. In addition, we carried out a more challenging classification trying to distinguish between the different types of pathologies or healthy: URTI, COPD, Bronchiectasis, Pneumonia, and Bronchiolitis. We achieved results up to 0.993 F-Score in the three-label classification and 0.990 F-Score in the more challenging six-class classification.
Diabetes detection using deep learning techniques with oversampling and feature augmentation
Authors: Authors: María Teresa García-Ordás, Carmen Benavides, José Alberto Benítez-Andrades, Héctor Alaiz-Moretón, Isaías García-Rodríguez
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02188
Pdf link: https://arxiv.org/pdf/2402.02188
Abstract Background and objective: Diabetes is a chronic pathology which is affecting more and more people over the years. It gives rise to a large number of deaths each year. Furthermore, many people living with the disease do not realize the seriousness of their health status early enough. Late diagnosis brings about numerous health problems and a large number of deaths each year so the development of methods for the early diagnosis of this pathology is essential. Methods: In this paper, a pipeline based on deep learning techniques is proposed to predict diabetic people. It includes data augmentation using a variational autoencoder (VAE), feature augmentation using an sparse autoencoder (SAE) and a convolutional neural network for classification. Pima Indians Diabetes Database, which takes into account information on the patients such as the number of pregnancies, glucose or insulin level, blood pressure or age, has been evaluated. Results: A 92.31% of accuracy was obtained when CNN classifier is trained jointly the SAE for featuring augmentation over a well balanced dataset. This means an increment of 3.17% of accuracy with respect the state-of-the-art. Conclusions: Using a full deep learning pipeline for data preprocessing and classification has demonstrate to be very promising in the diabetes detection field outperforming the state-of-the-art proposals.
CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse
Authors: Authors: Cunhan Guo, Heyan Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02217
Pdf link: https://arxiv.org/pdf/2402.02217
Abstract Camouflaged Object Detection (COD) is a critical aspect of computer vision aimed at identifying concealed objects, with applications spanning military, industrial, medical and monitoring domains. To address the problem of poor detail segmentation effect, we introduce a novel method for camouflage object detection, named CoFiNet. Our approach primarily focuses on multi-scale feature fusion and extraction, with special attention to the model's segmentation effectiveness for detailed features, enhancing its ability to effectively detect camouflaged objects. CoFiNet adopts a coarse-to-fine strategy. A multi-scale feature integration module is laveraged to enhance the model's capability of fusing context feature. A multi-activation selective kernel module is leveraged to grant the model the ability to autonomously alter its receptive field, enabling it to selectively choose an appropriate receptive field for camouflaged objects of different sizes. During mask generation, we employ the dual-mask strategy for image segmentation, separating the reconstruction of coarse and fine masks, which significantly enhances the model's learning capacity for details. Comprehensive experiments were conducted on four different datasets, demonstrating that CoFiNet achieves state-of-the-art performance across all datasets. The experiment results of CoFiNet underscore its effectiveness in camouflage object detection and highlight its potential in various practical application scenarios.
Revisiting Generative Adversarial Networks for Binary Semantic Segmentation on Imbalanced Datasets
Authors: Authors: Lei Xu, Moncef Gabbouj
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02245
Pdf link: https://arxiv.org/pdf/2402.02245
Abstract Anomalous pavement surface conditions detection aims to detect pixels representing anomalous states, such as cracks, on pavement surface images automatically by algorithms. Recently, deep learning models have been intensively applied to related topics with outstanding performance. However, most existing deep learning-related solutions rarely achieve a stable performance on diverse datasets. To address this issue, in this work, we propose a deep learning framework based on conditional Generative Adversarial Networks for anomalous region detection on pavement images at the pixel level. In particular, the proposed framework is developed to enhance the generator's ability to estimate the probability feature map from heterogeneous inputs with two training stages and multiscale feature representation. Moreover, several attention mechanisms are incorporated into the proposed framework to mitigate the performance deterioration of model training on severely imbalanced datasets. We implement experiments on six accessible pavement datasets. Extensive qualitative and quantitative experiments demonstrate that the proposed framework can achieve SOTA results on these datasets efficiently and robustly.
Data Quality Matters: Suicide Intention Detection on Social Media Posts Using a RoBERTa-CNN Model
Authors: Authors: Emily Lin, Jian Sun, Hsingyu Chen, Mohammad H. Mahoor
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02262
Pdf link: https://arxiv.org/pdf/2402.02262
Abstract Suicide remains a global health concern for the field of health, which urgently needs innovative approaches for early detection and intervention. In this paper, we focus on identifying suicidal intentions in SuicideWatch Reddit posts and present a novel approach to suicide detection using the cutting-edge RoBERTa-CNN model, a variant of RoBERTa (Robustly optimized BERT approach). RoBERTa is used for various Natural Language Processing (NLP) tasks, including text classification and sentiment analysis. The effectiveness of the RoBERTa lies in its ability to capture textual information and form semantic relationships within texts. By adding the Convolution Neural Network (CNN) layer to the original model, the RoBERTa enhances its ability to capture important patterns from heavy datasets. To evaluate the RoBERTa-CNN, we experimented on the Suicide and Depression Detection dataset and obtained solid results. For example, RoBERTa-CNN achieves 98% mean accuracy with the standard deviation (STD) of 0.0009. It also reaches over 97.5% mean AUC value with an STD of 0.0013. In the meanwhile, RoBERTa-CNN outperforms competitive methods, demonstrating the robustness and ability to capture nuanced linguistic patterns for suicidal intentions. Therefore, RoBERTa-CNN can detect suicide intention on text data very well.
$\textit{A Contrario}$ Paradigm for YOLO-based Infrared Small Target Detection
Authors: Authors: Alina Ciocarlan, Sylvie Le Hégarat-Mascle, Sidonie Lefebvre, Arnaud Woiselle, Clara Barbanson
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02288
Pdf link: https://arxiv.org/pdf/2402.02288
Abstract Detecting small to tiny targets in infrared images is a challenging task in computer vision, especially when it comes to differentiating these targets from noisy or textured backgrounds. Traditional object detection methods such as YOLO struggle to detect tiny objects compared to segmentation neural networks, resulting in weaker performance when detecting small targets. To reduce the number of false alarms while maintaining a high detection rate, we introduce an $\textit{a contrario}$ decision criterion into the training of a YOLO detector. The latter takes advantage of the $\textit{unexpectedness}$ of small targets to discriminate them from complex backgrounds. Adding this statistical criterion to a YOLOv7-tiny bridges the performance gap between state-of-the-art segmentation methods for infrared small target detection and object detection networks. It also significantly increases the robustness of YOLO towards few-shot settings.
Joint Activity and Data Detection for Massive Grant-Free Access Using Deterministic Non-Orthogonal Signatures
Authors: Authors: Nam Yul Yu, Wei Yu
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2402.02307
Pdf link: https://arxiv.org/pdf/2402.02307
Abstract Grant-free access is a key enabler for connecting wireless devices with low latency and low signaling overhead in massive machine-type communications (mMTC). For massive grant-free access, user-specific signatures are uniquely assigned to mMTC devices. In this paper, we first derive a sufficient condition for the successful identification of active devices through maximum likelihood (ML) estimation in massive grant-free access. The condition is represented by the coherence of a signature sequence matrix containing the signatures of all devices. Then, we present a design framework of non-orthogonal signature sequences in a deterministic fashion. The design principle relies on unimodular masking sequences with low correlation, which are applied as masking sequences to the columns of the discrete Fourier transform (DFT) matrix. For example constructions, we use four polyphase masking sequences represented by characters over finite fields. Leveraging algebraic techniques, we show that the signature sequence matrix of proposed non-orthogonal sequences has theoretically bounded low coherence. Simulation results demonstrate that the deterministic non-orthogonal signatures achieve the excellent performance of joint activity and data detection by ML- and approximate message passing (AMP)-based algorithms for massive grant-free access in mMTC.
Timer: Transformers for Time Series Analysis at Scale
Authors: Authors: Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02368
Pdf link: https://arxiv.org/pdf/2402.02368
Abstract Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world small-sample scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progresses have been achieved as the emergence of large language models, exhibiting unprecedented ability in few-shot generalization, scalability, and task generality, which is however absent in time series models. To change the current practices of training small models on specific datasets from scratch, this paper aims at an early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), that is pre-trained by autoregressive next token prediction on large multi-domain datasets, and is fine-tuned to downstream scenarios with promising abilities as an LTSM.
AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art
Authors: Authors: Faizan Farooq Khan, Diana Kim, Divyansh Jha, Youssef Mohamed, Hanna H Chang, Ahmed Elgammal, Luba Elliott, Mohamed Elhoseiny
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02453
Pdf link: https://arxiv.org/pdf/2402.02453
Abstract Discovering the creative potentials of a random signal to various artistic expressions in aesthetic and conceptual richness is a ground for the recent success of generative machine learning as a way of art creation. To understand the new artistic medium better, we conduct a comprehensive analysis to position AI-generated art within the context of human art heritage. Our comparative analysis is based on an extensive dataset, dubbed ArtConstellation,'' consisting of annotations about art principles, likability, and emotions for 6,000 WikiArt and 3,200 AI-generated artworks. After training various state-of-the-art generative models, art samples are produced and compared with WikiArt data on the last hidden layer of a deep-CNN trained for style classification. We actively examined the various art principles to interpret the neural representations and used them to drive the comparative knowledge about human and AI-generated art. A key finding in the semantic analysis is that AI-generated artworks are visually related to the principle concepts for modern period art made in 1800-2000. In addition, through Out-Of-Distribution (OOD) and In-Distribution (ID) detection in CLIP space, we find that AI-generated artworks are ID to human art when they depict landscapes and geometric abstract figures, while detected as OOD when the machine art consists of deformed and twisted figures. We observe that machine-generated art is uniquely characterized by incomplete and reduced figuration. Lastly, we conducted a human survey about emotional experience. Color composition and familiar subjects are the key factors of likability and emotions in art appreciation. We propose our whole methodologies and collected dataset as our analytical framework to contrast human and AI-generated art, which we refer to asArtNeuralConstellation''. Code is available at: https://github.com/faixan-khan/ArtNeuralConstellation
DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms in Vision Transformers
Authors: Authors: Oryan Yehezkel, Alon Zolfi, Amit Baras, Yuval Elovici, Asaf Shabtai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02554
Pdf link: https://arxiv.org/pdf/2402.02554
Abstract Vision transformers have contributed greatly to advancements in the computer vision domain, demonstrating state-of-the-art performance in diverse tasks (e.g., image classification, object detection). However, their high computational requirements grow quadratically with the number of tokens used. Token sparsification techniques have been proposed to address this issue. These techniques employ an input-dependent strategy, in which uninformative tokens are discarded from the computation pipeline, improving the model's efficiency. However, their dynamism and average-case assumption makes them vulnerable to a new threat vector - carefully crafted adversarial examples capable of fooling the sparsification mechanism, resulting in worst-case performance. In this paper, we present DeSparsify, an attack targeting the availability of vision transformers that use token sparsification mechanisms. The attack aims to exhaust the operating system's resources, while maintaining its stealthiness. Our evaluation demonstrates the attack's effectiveness on three token sparsification techniques and examines the attack's transferability between them and its effect on the GPU resources. To mitigate the impact of the attack, we propose various countermeasures.
Gazebo Plants: Simulating Plant-Robot Interaction with Cosserat Rods
Authors: Authors: Junchen Deng, Samhita Marri, Jonathan Klein, Wojtek Pałubicki, Sören Pirk, Girish Chowdhary, Dominik L. Michels
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02570
Pdf link: https://arxiv.org/pdf/2402.02570
Abstract Robotic harvesting has the potential to positively impact agricultural productivity, reduce costs, improve food quality, enhance sustainability, and to address labor shortage. In the rapidly advancing field of agricultural robotics, the necessity of training robots in a virtual environment has become essential. Generating training data to automatize the underlying computer vision tasks such as image segmentation, object detection and classification, also heavily relies on such virtual environments as synthetic data is often required to overcome the shortage and lack of variety of real data sets. However, physics engines commonly employed within the robotics community, such as ODE, Simbody, Bullet, and DART, primarily support motion and collision interaction of rigid bodies. This inherent limitation hinders experimentation and progress in handling non-rigid objects such as plants and crops. In this contribution, we present a plugin for the Gazebo simulation platform based on Cosserat rods to model plant motion. It enables the simulation of plants and their interaction with the environment. We demonstrate that, using our plugin, users can conduct harvesting simulations in Gazebo by simulating a robotic arm picking fruits and achieve results comparable to real-world experiments.
Spatio-temporal Prompting Network for Robust Video Feature Extraction
Authors: Authors: Guanxiong Sun, Chi Wang, Zhaoyu Zhang, Jiankang Deng, Stefanos Zafeiriou, Yang Hua
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02574
Pdf link: https://arxiv.org/pdf/2402.02574
Abstract Frame quality deterioration is one of the main challenges in the field of video understanding. To compensate for the information loss caused by deteriorated frames, recent approaches exploit transformer-based integration modules to obtain spatio-temporal information. However, these integration modules are heavy and complex. Furthermore, each integration module is specifically tailored for its target task, making it difficult to generalise to multiple tasks. In this paper, we present a neat and unified framework, called Spatio-Temporal Prompting Network (STPN). It can efficiently extract robust and accurate video features by dynamically adjusting the input features in the backbone network. Specifically, STPN predicts several video prompts containing spatio-temporal information of neighbour frames. Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction. Moreover, STPN is easy to generalise to various video tasks because it does not contain task-specific modules. Without bells and whistles, STPN achieves state-of-the-art performance on three widely-used datasets for different video understanding tasks, i.e., ImageNetVID for video object detection, YouTubeVIS for video instance segmentation, and GOT-10k for visual object tracking. Code is available at https://github.com/guanxiongsun/vfe.pytorch.
Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning Approach
Authors: Authors: Brian Etter, James Lee Hu, Mohammedreza Ebrahimi, Weifeng Li, Xin Li, Hsinchun Chen
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02600
Pdf link: https://arxiv.org/pdf/2402.02600
Abstract Adversarial Malware Generation (AMG), the generation of adversarial malware variants to strengthen Deep Learning (DL)-based malware detectors has emerged as a crucial tool in the development of proactive cyberdefense. However, the majority of extant works offer subtle perturbations or additions to executable files and do not explore full-file obfuscation. In this study, we show that an open-source encryption tool coupled with a Reinforcement Learning (RL) framework can successfully obfuscate malware to evade state-of-the-art malware detection engines and outperform techniques that use advanced modification methods. Our results show that the proposed method improves the evasion rate from 27%-49% compared to widely-used state-of-the-art reinforcement learning-based methods.
Learning with Mixture of Prototypes for Out-of-Distribution Detection
Authors: Authors: Haodong Lu, Dong Gong, Shuo Wang, Jason Xue, Lina Yao, Kristen Moore
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02653
Pdf link: https://arxiv.org/pdf/2402.02653
Abstract Out-of-distribution (OOD) detection aims to detect testing samples far away from the in-distribution (ID) training data, which is crucial for the safe deployment of machine learning models in the real world. Distance-based OOD detection methods have emerged with enhanced deep representation learning. They identify unseen OOD samples by measuring their distances from ID class centroids or prototypes. However, existing approaches learn the representation relying on oversimplified data assumptions, e.g, modeling ID data of each class with one centroid class prototype or using loss functions not designed for OOD detection, which overlook the natural diversities within the data. Naively enforcing data samples of each class to be compact around only one prototype leads to inadequate modeling of realistic data and limited performance. To tackle these issues, we propose PrototypicAl Learning with a Mixture of prototypes (PALM) which models each class with multiple prototypes to capture the sample diversities, and learns more faithful and compact samples embeddings to enhance OOD detection. Our method automatically identifies and dynamically updates prototypes, assigning each sample to a subset of prototypes via reciprocal neighbor soft assignment weights. PALM optimizes a maximum likelihood estimation (MLE) loss to encourage the sample embeddings to be compact around the associated prototypes, as well as a contrastive loss on all prototypes to enhance intra-class compactness and inter-class discrimination at the prototype level. Moreover, the automatic estimation of prototypes enables our approach to be extended to the challenging OOD detection task with unlabelled ID data. Extensive experiments demonstrate the superiority of PALM, achieving state-of-the-art average AUROC performance of 93.82 on the challenging CIFAR-100 benchmark. Code is available at https://github.com/jeff024/PALM.
Innovative Cybersickness Detection: Exploring Head Movement Patterns in Virtual Reality
Authors: Authors: Masoud Salehi, Nikoo Javadpour, Brietta Beisner, Mohammadamin Sanaei, Stephen B. Gilbert
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2402.02725
Pdf link: https://arxiv.org/pdf/2402.02725
Abstract Despite the widespread adoption of Virtual Reality (VR) technology, cybersickness remains a barrier for some users. This research investigates head movement patterns as a novel physiological marker for cybersickness detection. Unlike traditional markers, head movements provide a continuous, non-invasive measure that can be easily captured through the sensors embedded in all commercial VR headsets. We used a publicly available dataset from a VR experiment involving 75 participants and analyzed head movements across six axes. An extensive feature extraction process was then performed on the head movement dataset and its derivatives, including velocity, acceleration, and jerk. Three categories of features were extracted, encompassing statistical, temporal, and spectral features. Subsequently, we employed the Recursive Feature Elimination method to select the most important and effective features. In a series of experiments, we trained a variety of machine learning algorithms. The results demonstrate a 76% accuracy and 83% precision in predicting cybersickness in the subjects based on the head movements. This study contribution to the cybersickness literature lies in offering a preliminary analysis of a new source of data and providing insight into the relationship of head movements and cybersickness.
Improving Robustness of LiDAR-Camera Fusion Model against Weather Corruption from Fusion Strategy Perspective
Authors: Authors: Yihao Huang, Kaiyuan Yu, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Tianlin Li, Geguang Pu, Yang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02738
Pdf link: https://arxiv.org/pdf/2402.02738
Abstract In recent years, LiDAR-camera fusion models have markedly advanced 3D object detection tasks in autonomous driving. However, their robustness against common weather corruption such as fog, rain, snow, and sunlight in the intricate physical world remains underexplored. In this paper, we evaluate the robustness of fusion models from the perspective of fusion strategies on the corrupted dataset. Based on the evaluation, we further propose a concise yet practical fusion strategy to enhance the robustness of the fusion models, namely flexibly weighted fusing features from LiDAR and camera sources to adapt to varying weather scenarios. Experiments conducted on four types of fusion models, each with two distinct lightweight implementations, confirm the broad applicability and effectiveness of the approach.
DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models
Authors: Authors: Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02739
Pdf link: https://arxiv.org/pdf/2402.02739
Abstract In the exciting generative AI era, the diffusion model has emerged as a very powerful and widely adopted content generation and editing tool for various data modalities, making the study of their potential security risks very necessary and critical. Very recently, some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks, calling for in-depth analysis and investigation of the security challenges of this popular and fundamental AI technique. In this paper, for the first time, we systematically explore the detectability of the poisoned noise input for the backdoored diffusion models, an important performance metric yet little explored in the existing works. Starting from the perspective of a defender, we first analyze the properties of the trigger pattern in the existing diffusion backdoor attacks, discovering the important role of distribution discrepancy in Trojan detection. Based on this finding, we propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise. We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger to evade our proposed detection scheme. Empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100\% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100\% detection pass rate with very high attack and benign performance for the backdoored diffusion models.
Transmission Line Detection Based on Improved Hough Transform
Authors: Authors: Wei Song, Pei Li, Man Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02761
Pdf link: https://arxiv.org/pdf/2402.02761
Abstract To address the challenges of low detection accuracy and high false positive rates of transmission lines in UAV (Unmanned Aerial Vehicle) images, we explore the linear features and spatial distribution. We introduce an enhanced stochastic Hough transform technique tailored for detecting transmission lines in complex backgrounds. By employing the Hessian matrix for initial preprocessing of transmission lines, and utilizing boundary search and pixel row segmentation, our approach distinguishes transmission line areas from the background. We significantly reduce both false positives and missed detections, thereby improving the accuracy of transmission line identification. Experiments demonstrate that our method not only processes images more rapidly, but also yields superior detection results compared to conventional and random Hough transform methods.
Dual Knowledge Distillation for Efficient Sound Event Detection
Authors: Authors: Yang Xiao, Rohan Kumar Das
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02781
Pdf link: https://arxiv.org/pdf/2402.02781
Abstract Sound event detection (SED) is essential for recognizing specific sounds and their temporal locations within acoustic signals. This becomes challenging particularly for on-device applications, where computational resources are limited. To address this issue, we introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems in this work. Our proposed dual knowledge distillation commences with temporal-averaging knowledge distillation (TAKD), utilizing a mean student model derived from the temporal averaging of the student model's parameters. This allows the student model to indirectly learn from a pre-trained teacher model, ensuring a stable knowledge distillation. Subsequently, we introduce embedding-enhanced feature distillation (EEFD), which involves incorporating an embedding distillation layer within the student model to bolster contextual learning. On DCASE 2023 Task 4A public evaluation dataset, our proposed SED system with dual knowledge distillation having merely one-third of the baseline model's parameters, demonstrates superior performance in terms of PSDS1 and PSDS2. This highlights the importance of proposed dual knowledge distillation for compact SED systems, which can be ideal for edge devices.
Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects
Authors: Authors: Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02797
Pdf link: https://arxiv.org/pdf/2402.02797
Abstract Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion network (JAFFNet) for saliency detection of surface defects based on the encoder-decoder network. JAFFNet mainly incorporates a joint attention-guided feature fusion (JAFF) module into decoding stages to adaptively fuse low-level and high-level features. The JAFF module learns to emphasize defect features and suppress background noise during feature fusion, which is beneficial for detecting low-contrast defects. In addition, JAFFNet introduces a dense receptive field (DRF) module following the encoder to capture features with rich context information, which helps detect defects of different scales. The JAFF module mainly utilizes a learned joint channel-spatial attention map provided by high-level semantic features to guide feature fusion. The attention map makes the model pay more attention to defect features. The DRF module utilizes a sequence of multi-receptive-field (MRF) units with each taking as inputs all the preceding MRF feature maps and the original input. The obtained DRF features capture rich context information with a large range of receptive fields. Extensive experiments conducted on SD-saliency-900, Magnetic tile, and DAGM 2007 indicate that our method achieves promising performance in comparison with other state-of-the-art methods. Meanwhile, our method reaches a real-time defect detection speed of 66 FPS.
A Comprehensive Numerical Approach to Coil Placement in Cerebral Aneurysms: Mathematical Modeling and In Silico Occlusion Classification
Authors: Authors: Fabian Holzberger, Markus Muhr, Barbara Wohlmuth
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2402.02798
Pdf link: https://arxiv.org/pdf/2402.02798
Abstract Endovascular coil embolization is one of the primary treatment techniques for cerebral aneurysms. Although it is a well established and minimally invasive method, it bears the risk of sub-optimal coil placement which can lead to incomplete occlusion of the aneurysm possibly causing recurrence. One of the key features of coils is that they have an imprinted natural shape supporting the fixation within the aneurysm. For the spatial discretization our mathematical coil model is based on the Discrete Elastic Rod model which results in a dimension-reduced 1D system of differential equations. We include bending and twisting responses to account for the coils natural curvature. Collisions between coil segments and the aneurysm-wall are handled by an efficient contact algorithm that relies on an octree based collision detection. The numerical solution of the model is obtained by a symplectic semi-implicit Euler time stepping method. Our model can be easily incorporated into blood flow simulations of embolized aneurysms. In order to differentiate optimal from sub-optimal placements, we employ a suitable in silico Raymond-Roy type occlusion classification and measure the local packing density in the aneurysm at its neck, wall-region and core. We investigate the impact of uncertainties in the coil parameters and embolization procedure. To this end, we vary the position and the angle of insertion of the microcatheter, and approximate the local packing density distributions by evaluating sample statistics.
Are Sounds Sound for Phylogenetic Reconstruction?
Authors: Authors: Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.02807
Pdf link: https://arxiv.org/pdf/2402.02807
Abstract In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.
Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective
Authors: Authors: Zexin Wang, Changhua Pei, Minghua Ma, Xin Wang, Zhihan Li, Dan Pei, Saravan Rajmohan, Dongmei Zhang, Qingwei Lin, Haiming Zhang, Jianhui Li, Gaogang Xie
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02820
Pdf link: https://arxiv.org/pdf/2402.02820
Abstract Time series Anomaly Detection (AD) plays a crucial role for web systems. Various web systems rely on time series data to monitor and identify anomalies in real time, as well as to initiate diagnosis and remediation procedures. Variational Autoencoders (VAEs) have gained popularity in recent decades due to their superior de-noising capabilities, which are useful for anomaly detection. However, our study reveals that VAE-based methods face challenges in capturing long-periodic heterogeneous patterns and detailed short-periodic trends simultaneously. To address these challenges, we propose Frequency-enhanced Conditional Variational Autoencoder (FCVAE), a novel unsupervised AD method for univariate time series. To ensure an accurate AD, FCVAE exploits an innovative approach to concurrently integrate both the global and local frequency features into the condition of Conditional Variational Autoencoder (CVAE) to significantly increase the accuracy of reconstructing the normal data. Together with a carefully designed "target attention" mechanism, our approach allows the model to pick the most useful information from the frequency domain for better short-periodic trend construction. Our FCVAE has been evaluated on public datasets and a large-scale cloud system, and the results demonstrate that it outperforms state-of-the-art methods. This confirms the practical applicability of our approach in addressing the limitations of current VAE-based anomaly detection models.
Evading Data Contamination Detection for Language Models is (too) Easy
Authors: Authors: Jasper Dekoninck, Mark Niklas Müller, Maximilian Baader, Marc Fischer, Martin Vechev
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02823
Pdf link: https://arxiv.org/pdf/2402.02823
Abstract Large language models are widespread, with their performance on benchmarks frequently guiding user preferences for one model over another. However, the vast amount of data these models are trained on can inadvertently lead to contamination with public benchmarks, thus compromising performance measurements. While recently developed contamination detection methods try to address this issue, they overlook the possibility of deliberate contamination by malicious model providers aiming to evade detection. We argue that this setting is of crucial importance as it casts doubt on the reliability of public benchmarks. To more rigorously study this issue, we propose a categorization of both model providers and contamination detection methods. This reveals vulnerabilities in existing methods that we exploit with EAL, a simple yet effective contamination technique that significantly inflates benchmark performance while completely evading current detection methods.
SynthVision -- Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data
Authors: Authors: Yudara Kularathne, Prathapa Janitha, Sithira Ambepitiya, Thanveer Ahamed, Dinuka Wijesundara, Prarththanan Sothyrajah
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02826
Pdf link: https://arxiv.org/pdf/2402.02826
Abstract Rapid development of disease detection computer vision models is vital in response to urgent medical crises like epidemics or events of bioterrorism. However, traditional data gathering methods are too slow for these scenarios necessitating innovative approaches to generate reliable models quickly from minimal data. We demonstrate our new approach by building a comprehensive computer vision model for detecting Human Papilloma Virus Genital warts using only synthetic data. In our study, we employed a two phase experimental design using diffusion models. In the first phase diffusion models were utilized to generate a large number of diverse synthetic images from 10 HPV guide images explicitly focusing on accurately depicting genital warts. The second phase involved the training and testing vision model using this synthetic dataset. This method aimed to assess the effectiveness of diffusion models in rapidly generating high quality training data and the subsequent impact on the vision model performance in medical image recognition. The study findings revealed significant insights into the performance of the vision model trained on synthetic images generated through diffusion models. The vision model showed exceptional performance in accurately identifying cases of genital warts. It achieved an accuracy rate of 96% underscoring its effectiveness in medical image classification. For HPV cases the model demonstrated a high precision of 99% and a recall of 94%. In normal cases the precision was 95% with an impressive recall of 99%. These metrics indicate the model capability to correctly identify true positive cases and minimize false positives. The model achieved an F1 Score of 96% for HPV cases and 97% for normal cases. The high F1 Score across both categories highlights the balanced nature of the model precision and recall ensuring reliability and robustness in its predictions.
PowerGraph: A power grid benchmark dataset for graph neural networks
Authors: Authors: Anna Varbella, Kenza Amara, Blazhe Gjorgiev, Giovanni Sansavini
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02827
Pdf link: https://arxiv.org/pdf/2402.02827
Abstract Public Graph Neural Networks (GNN) benchmark datasets facilitate the use of GNN and enhance GNN applicability to diverse disciplines. The community currently lacks public datasets of electrical power grids for GNN applications. Indeed, GNNs can potentially capture complex power grid phenomena over alternative machine learning techniques. Power grids are complex engineered networks that are naturally amenable to graph representations. Therefore, GNN have the potential for capturing the behavior of power grids over alternative machine learning techniques. To this aim, we develop a graph dataset for cascading failure events, which are the major cause of blackouts in electric power grids. Historical blackout datasets are scarce and incomplete. The assessment of vulnerability and the identification of critical components are usually conducted via computationally expensive offline simulations of cascading failures. Instead, we propose using machine learning models for the online detection of cascading failures leveraging the knowledge of the system state at the onset of the cascade. We develop PowerGraph, a graph dataset modeling cascading failures in power grids, designed for two purposes, namely, i) training GNN models for different graph-level tasks including multi-class classification, binary classification, and regression, and ii) explaining GNN models. The dataset generated via a physics-based cascading failure model ensures the generality of the operating and environmental conditions by spanning diverse failure scenarios. In addition, we foster the use of the dataset to benchmark GNN explainability methods by assigning ground-truth edge-level explanations. PowerGraph helps the development of better GNN models for graph-level tasks and explainability, critical in many domains ranging from chemistry to biology, where the systems and processes can be described as graphs.
On the Performance of RIS-Aided Spatial Modulation for Downlink Transmission
Authors: Authors: Xusheng Zhu, Qingqing Wu, Wen Chen
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2402.02893
Pdf link: https://arxiv.org/pdf/2402.02893
Abstract In this study, we explore the performance of a reconfigurable reflecting surface (RIS)-assisted transmit spatial modulation (SM) system for downlink transmission, wherein the deployment of RIS serves the purpose of blind area coverage within the channel. At the receiving end, we present three detectors, i.e., maximum likelihood (ML) detector, two-stage ML detection, and greedy detector to recover the transmitted signal. By utilizing the ML detector, we initially derive the conditional pair error probability expression for the proposed scheme. Subsequently, we leverage the central limit theorem (CLT) to obtain the probability density function of the combined channel. Following this, the Gaussian-Chebyshev quadrature method is applied to derive a closed-form expression for the unconditional pair error probability and establish the union tight upper bound for the average bit error probability (ABEP). Furthermore, we derive a closed-form expression for the ergodic capacity of the proposed RIS-SM scheme. Monte Carlo simulations are conducted not only to assess the complexity and reliability of the three detection algorithms but also to validate the results obtained through theoretical derivation results.
Dynamic Test Case Prioritization in Industrial Test Result Datasets
Authors: Authors: Alina Torbunova, Per Erik Strandberg, Ivan Porres
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.02925
Pdf link: https://arxiv.org/pdf/2402.02925
Abstract Regression testing in software development checks if new software features affect existing ones. Regression testing is a key task in continuous development and integration, where software is built in small increments and new features are integrated as soon as possible. It is therefore important that developers are notified about possible faults quickly. In this article, we propose a test case prioritization schema that combines the use of a static and a dynamic prioritization algorithm. The dynamic prioritization algorithm rearranges the order of execution of tests on the fly, while the tests are being executed. We propose to use a conditional probability dynamic algorithm for this. We evaluate our solution on three industrial datasets and utilize Average Percentage of Fault Detection for that. The main findings are that our dynamic prioritization algorithm can: a) be applied with any static algorithm that assigns a priority score to each test case b) can improve the performance of the static algorithm if there are failure correlations between test cases c) can also reduce the performance of the static algorithm, but only when the static scheduling is performed at a near optimal level.
Automated Cognate Detection as a Supervised Link Prediction Task with Cognate Transformer
Authors: Authors: V.S.D.S.Mahesh Akavarapu, Arnab Bhattacharya
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02926
Pdf link: https://arxiv.org/pdf/2402.02926
Abstract Identification of cognates across related languages is one of the primary problems in historical linguistics. Automated cognate identification is helpful for several downstream tasks including identifying sound correspondences, proto-language reconstruction, phylogenetic classification, etc. Previous state-of-the-art methods for cognate identification are mostly based on distributions of phonemes computed across multilingual wordlists and make little use of the cognacy labels that define links among cognate clusters. In this paper, we present a transformer-based architecture inspired by computational biology for the task of automated cognate detection. Beyond a certain amount of supervision, this method performs better than the existing methods, and shows steady improvement with further increase in supervision, thereby proving the efficacy of utilizing the labeled information. We also demonstrate that accepting multiple sequence alignments as input and having an end-to-end architecture with link prediction head saves much computation time while simultaneously yielding superior performance.
Kernel PCA for Out-of-Distribution Detection
Authors: Authors: Kun Fang, Qinghua Tao, Kexin Lv, Mingzhen He, Xiaolin Huang, Jie Yang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02949
Pdf link: https://arxiv.org/pdf/2402.02949
Abstract Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proceeding in a linear subspace, which instead can be resolved through proper nonlinear mappings. In this work, we leverage the framework of Kernel PCA (KPCA) for OoD detection, seeking subspaces where OoD and InD features are allocated with significantly different patterns. We devise two feature mappings that induce non-linear kernels in KPCA to advocate the separability between InD and OoD data in the subspace spanned by the principal components. Given any test sample, the reconstruction error in such subspace is then used to efficiently obtain the detection result with $\mathcal{O}(1)$ time complexity in inference. Extensive empirical results on multiple OoD data sets and network structures verify the superiority of our KPCA-based detector in efficiency and efficacy with state-of-the-art OoD detection performances.
Unraveling the Key of Machine Learning Solutions for Android Malware Detection
Authors: Authors: Jiahao Liu, Jun Zeng, Fabio Pierazzi, Lorenzo Cavallaro, Zhenkai Liang
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02953
Pdf link: https://arxiv.org/pdf/2402.02953
Abstract Android malware detection serves as the front line against malicious apps. With the rapid advancement of machine learning (ML), ML-based Android malware detection has attracted increasing attention due to its capability of automatically capturing malicious patterns from Android APKs. These learning-driven methods have reported promising results in detecting malware. However, the absence of an in-depth analysis of current research progress makes it difficult to gain a holistic picture of the state of the art in this area. This paper presents a comprehensive investigation to date into ML-based Android malware detection with empirical and quantitative analysis. We first survey the literature, categorizing contributions into a taxonomy based on the Android feature engineering and ML modeling pipeline. Then, we design a general-propose framework for ML-based Android malware detection, re-implement 12 representative approaches from different research communities, and evaluate them from three primary dimensions, i.e., effectiveness, robustness, and efficiency. The evaluation reveals that ML-based approaches still face open challenges and provides insightful findings like more powerful ML models are not the silver bullet for designing better malware detectors. We further summarize our findings and put forth recommendations to guide future research.
Are We There Yet? Unraveling the State-of-the-Art Smart Contract Fuzzers
Authors: Authors: Shuohan Wu, Zihao Li, Luyi Yan, Weimin Chen, Muhui Jiang, Chenxu Wang, Xiapu Luo, Hao Zhou
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.02973
Pdf link: https://arxiv.org/pdf/2402.02973
Abstract Given the growing importance of smart contracts in various applications, ensuring their security and reliability is critical. Fuzzing, an effective vulnerability detection technique, has recently been widely applied to smart contracts. Despite numerous studies, a systematic investigation of smart contract fuzzing techniques remains lacking. In this paper, we fill this gap by: 1) providing a comprehensive review of current research in contract fuzzing, and 2) conducting an in-depth empirical study to evaluate state-of-the-art contract fuzzers' usability. To guarantee a fair evaluation, we employ a carefully-labeled benchmark and introduce a set of pragmatic performance metrics, evaluating fuzzers from five complementary perspectives. Based on our findings, we provide direction for the future research and development of contract fuzzers.
Putting Context in Context: the Impact of Discussion Structure on Text Classification
Authors: Authors: Nicolò Penzo, Antonio Longa, Bruno Lepri, Sara Tonelli, Marco Guerini
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.02975
Pdf link: https://arxiv.org/pdf/2402.02975
Abstract Current text classification approaches usually focus on the content to be classified. Contextual aspects (both linguistic and extra-linguistic) are usually neglected, even in tasks based on online discussions. Still in many cases the multi-party and multi-turn nature of the context from which these elements are selected can be fruitfully exploited. In this work, we propose a series of experiments on a large dataset for stance detection in English, in which we evaluate the contribution of different types of contextual information, i.e. linguistic, structural and temporal, by feeding them as natural language input into a transformer-based model. We also experiment with different amounts of training data and analyse the topology of local discussion networks in a privacy-compliant way. Results show that structural information can be highly beneficial to text classification but only under certain circumstances (e.g. depending on the amount of training data and on discussion chain complexity). Indeed, we show that contextual information on smaller datasets from other classification tasks does not yield significant improvements. Our framework, based on local discussion networks, allows the integration of structural information, while minimising user profiling, thus preserving their privacy.
A Safety-Adapted Loss for Pedestrian Detection in Automated Driving
Authors: Authors: Maria Lyssenko, Piyush Pimplikar, Maarten Bieshaar, Farzad Nozarian, Rudolph Triebel
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02986
Pdf link: https://arxiv.org/pdf/2402.02986
Abstract In safety-critical domains like automated driving (AD), errors by the object detector may endanger pedestrians and other vulnerable road users (VRU). As common evaluation metrics are not an adequate safety indicator, recent works employ approaches to identify safety-critical VRU and back-annotate the risk to the object detector. However, those approaches do not consider the safety factor in the deep neural network (DNN) training process. Thus, state-of-the-art DNN penalizes all misdetections equally irrespective of their criticality. Subsequently, to mitigate the occurrence of critical failure cases, i.e., false negatives, a safety-aware training strategy might be required to enhance the detection performance for critical pedestrians. In this paper, we propose a novel safety-aware loss variation that leverages the estimated per-pedestrian criticality scores during training. We exploit the reachability set-based time-to-collision (TTC-RSB) metric from the motion domain along with distance information to account for the worst-case threat quantifying the criticality. Our evaluation results using RetinaNet and FCOS on the nuScenes dataset demonstrate that training the models with our safety-aware loss function mitigates the misdetection of critical pedestrians without sacrificing performance for the general case, i.e., pedestrians outside the safety-critical zone.
[Citation needed] Data usage and citation practices in medical imaging conferences
Authors: Authors: Théo Sourget, Ahmet Akkoç, Stinna Winther, Christine Lyngbye Galsgaard, Amelia Jiménez-Sánchez, Dovile Juodelyte, Caroline Petitjean, Veronika Cheplygina
Subjects: Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL)
Arxiv link: https://arxiv.org/abs/2402.03003
Pdf link: https://arxiv.org/pdf/2402.03003
Abstract Medical imaging papers often focus on methodology, but the quality of the algorithms and the validity of the conclusions are highly dependent on the datasets used. As creating datasets requires a lot of effort, researchers often use publicly available datasets, there is however no adopted standard for citing the datasets used in scientific papers, leading to difficulty in tracking dataset usage. In this work, we present two open-source tools we created that could help with the detection of dataset usage, a pipeline \url{https://github.com/TheoSourget/Public_Medical_Datasets_References} using OpenAlex and full-text analysis, and a PDF annotation software \url{https://github.com/TheoSourget/pdf_annotator} used in our study to manually label the presence of datasets. We applied both tools on a study of the usage of 20 publicly available medical datasets in papers from MICCAI and MIDL. We compute the proportion and the evolution between 2013 and 2023 of 3 types of presence in a paper: cited, mentioned in the full text, cited and mentioned. Our findings demonstrate the concentration of the usage of a limited set of datasets. We also highlight different citing practices, making the automation of tracking difficult.
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
Authors: Authors: Yan Shu, Weichao Zeng, Zhenhang Li, Fangmin Zhao, Yu Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.03082
Pdf link: https://arxiv.org/pdf/2402.03082
Abstract Visual text, a pivotal element in both document and scene images, speaks volumes and attracts significant attention in the computer vision domain. Beyond visual text detection and recognition, the field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models. However, challenges persist due to the unique properties and features that distinguish text from general objects. Effectively leveraging these unique textual characteristics is crucial in visual text processing, as observed in our study. In this survey, we present a comprehensive, multi-perspective analysis of recent advancements in this field. Initially, we introduce a hierarchical taxonomy encompassing areas ranging from text image enhancement and restoration to text image manipulation, followed by different learning paradigms. Subsequently, we conduct an in-depth discussion of how specific textual features such as structure, stroke, semantics, style, and spatial context are seamlessly integrated into various tasks. Furthermore, we explore available public datasets and benchmark the reviewed methods on several widely-used datasets. Finally, we identify principal challenges and potential avenues for future research. Our aim is to establish this survey as a fundamental resource, fostering continued exploration and innovation in the dynamic area of visual text processing.
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
Authors: Authors: Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Lingjie Kong, Yanwei Fu, Luc Van Gool, Xingqun Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.03094
Pdf link: https://arxiv.org/pdf/2402.03094
Abstract This paper addresses the challenge of cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors e.g., DE-ViT~\cite{zhang2023detect} have excelled in both open-vocabulary object detection and traditional few-shot object detection, detecting categories beyond those seen during training, we thus naturally raise two key questions: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If no, how to enhance the results of open-set methods when faced with significant domain gaps? To address the first question, we introduce several metrics to quantify domain variances and establish a new CD-FSOD benchmark with diverse domain metric values. Some State-Of-The-Art (SOTA) open-set object detection methods are evaluated on this benchmark, with evident performance degradation observed across out-of-domain datasets. This indicates the failure of adopting open-set detectors directly for CD-FSOD. Sequentially, to overcome the performance degradation issue and also to answer the second proposed question, we endeavor to enhance the vanilla DE-ViT. With several novel components including finetuning, a learnable prototype module, and a lightweight attention module, we present an improved Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO). Experiments show that our CD-ViTO achieves impressive results on both out-of-domain and in-domain target datasets, establishing new SOTAs for both CD-FSOD and FSOD. All the datasets, codes, and models will be released to the community.
Detecting Scams Using Large Language Models
Authors: Authors: Liming Jiang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.03147
Pdf link: https://arxiv.org/pdf/2402.03147
Abstract Large Language Models (LLMs) have gained prominence in various applications, including security. This paper explores the utility of LLMs in scam detection, a critical aspect of cybersecurity. Unlike traditional applications, we propose a novel use case for LLMs to identify scams, such as phishing, advance fee fraud, and romance scams. We present notable security applications of LLMs and discuss the unique challenges posed by scams. Specifically, we outline the key steps involved in building an effective scam detector using LLMs, emphasizing data collection, preprocessing, model selection, training, and integration into target systems. Additionally, we conduct a preliminary evaluation using GPT-3.5 and GPT-4 on a duplicated email, highlighting their proficiency in identifying common signs of phishing or scam emails. The results demonstrate the models' effectiveness in recognizing suspicious elements, but we emphasize the need for a comprehensive assessment across various language tasks. The paper concludes by underlining the importance of ongoing refinement and collaboration with cybersecurity experts to adapt to evolving threats.
Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms
Authors: Authors: Ethan Wilson, Frederick Shic, Sophie Jörg, Eakta Jain
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.03188
Pdf link: https://arxiv.org/pdf/2402.03188
Abstract Advances in face swapping have enabled the automatic generation of highly realistic faces. Yet face swaps are perceived differently than when looking at real faces, with key differences in viewer behavior surrounding the eyes. Face swapping algorithms generally place no emphasis on the eyes, relying on pixel or feature matching losses that consider the entire face to guide the training process. We further investigate viewer perception of face swaps, focusing our analysis on the presence of an uncanny valley effect. We additionally propose a novel loss equation for the training of face swapping models, leveraging a pretrained gaze estimation network to directly improve representation of the eyes. We confirm that viewed face swaps do elicit uncanny responses from viewers. Our proposed improvements significant reduce viewing angle errors between face swaps and their source material. Our method additionally reduces the prevalence of the eyes as a deciding factor when viewers perform deepfake detection tasks. Our findings have implications on face swapping for special effects, as digital avatars, as privacy mechanisms, and more; negative responses from users could limit effectiveness in said applications. Our gaze improvements are a first step towards alleviating negative viewer perceptions via a targeted approach.
Unified Hallucination Detection for Multimodal Large Language Models
Authors: Authors: Xiang Chen, Chenxi Wang, Yida Xue, Ningyu Zhang, Xiaoyan Yang, Qiang Li, Yue Shen, Jinjie Gu, Huajun Chen
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2402.03190
Pdf link: https://arxiv.org/pdf/2402.03190
Abstract Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
"Define Your Terms" : Enhancing Efficient Offensive Speech Classification with Definition
Authors: Authors: Huy Nghiem, Umang Gupta, Fred Morstatter
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.03221
Pdf link: https://arxiv.org/pdf/2402.03221
Abstract The propagation of offensive content through social media channels has garnered attention of the research community. Multiple works have proposed various semantically related yet subtle distinct categories of offensive speech. In this work, we explore meta-earning approaches to leverage the diversity of offensive speech corpora to enhance their reliable and efficient detection. We propose a joint embedding architecture that incorporates the input's label and definition for classification via Prototypical Network. Our model achieves at least 75% of the maximal F1-score while using less than 10% of the available training data across 4 datasets. Our experimental findings also provide a case study of training strategies valuable to combat resource scarcity.
ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection
Authors: Authors: Ahmed Ghita, Bjørk Antoniussen, Walter Zimmer, Ross Greer, Christian Creß, Andreas Møgelmose, Mohan M. Trivedi, Alois C. Knoll
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.03235
Pdf link: https://arxiv.org/pdf/2402.03235
Abstract The curation of large-scale datasets is still costly and requires much time and resources. Data is often manually labeled, and the challenge of creating high-quality datasets remains. In this work, we fill the research gap using active learning for multi-modal 3D object detection. We propose ActiveAnno3D, an active learning framework to select data samples for labeling that are of maximum informativeness for training. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance. Furthermore, we perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset. BEVFusion achieved an mAP of 64.31 when using half of the training data and 75.0 mAP when using the complete nuScenes dataset. We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling and minimize the labeling costs. Finally, we provide code, weights, and visualization results on our website: https://active3d-framework.github.io/active3d-framework.
Multiclass Classification Procedure for Detecting Attacks on MQTT-IoT Protocol
Authors: Authors: Hector Alaiz-Moreton (1), Jose Aveleira-Mata (2), Jorge Ondicol-Garcia (2), Angel Luis Muñoz-Castañeda (2), Isaías García (1), Carmen Benavides (1) ((1) Escuela de Ingenierías, Universidad de León, (2) Research Institute of Applied Sciences in Cybersecurity, Universidad de León)
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.03270
Pdf link: https://arxiv.org/pdf/2402.03270
Abstract The large number of sensors and actuators that make up the Internet of Things obliges these systems to use diverse technologies and protocols. This means that IoT networks are more heterogeneous than traditional networks. This gives rise to new challenges in cybersecurity to protect these systems and devices which are characterized by being connected continuously to the Internet. Intrusion detection systems (IDS) are used to protect IoT systems from the various anomalies and attacks at the network level. Intrusion Detection Systems (IDS) can be improved through machine learning techniques. Our work focuses on creating classification models that can feed an IDS using a dataset containing frames under attacks of an IoT system that uses the MQTT protocol. We have addressed two types of method for classifying the attacks, ensemble methods and deep learning models, more specifically recurrent networks with very satisfactory results.
Zero-shot Object-Level OOD Detection with Context-Aware Inpainting
Authors: Authors: Quang-Huy Nguyen, Jin Peng Zhou, Zhenzhen Liu, Khanh-Huyen Bui, Kilian Q. Weinberger, Dung D. Le
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.03292
Pdf link: https://arxiv.org/pdf/2402.03292
Abstract Machine learning algorithms are increasingly provided as black-box cloud services or pre-trained models, without access to their training data. This motivates the problem of zero-shot out-of-distribution (OOD) detection. Concretely, we aim to detect OOD objects that do not belong to the classifier's label set but are erroneously classified as in-distribution (ID) objects. Our approach, RONIN, uses an off-the-shelf diffusion model to replace detected objects with inpainting. RONIN conditions the inpainting process with the predicted ID label, drawing the input object closer to the in-distribution domain. As a result, the reconstructed object is very close to the original in the ID cases and far in the OOD cases, allowing RONIN to effectively distinguish ID and OOD samples. Throughout extensive experiments, we demonstrate that RONIN achieves competitive results compared to previous approaches across several datasets, both in zero-shot and non-zero-shot settings.
HASSOD: Hierarchical Adaptive Self-Supervised Object Detection
Authors: Authors: Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.03311
Pdf link: https://arxiv.org/pdf/2402.03311
Abstract The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects. Drawing inspiration from these two abilities, we propose Hierarchical Adaptive Self-Supervised Object Detection (HASSOD), a novel approach that learns to detect objects and understand their compositions without human supervision. HASSOD employs a hierarchical adaptive clustering strategy to group regions into object masks based on self-supervised visual representations, adaptively determining the number of objects per image. Furthermore, HASSOD identifies the hierarchical levels of objects in terms of composition, by analyzing coverage relations between masks and constructing tree structures. This additional self-supervised learning task leads to improved detection performance and enhanced interpretability. Lastly, we abandon the inefficient multi-round self-training process utilized in prior methods and instead adapt the Mean Teacher framework from semi-supervised learning, which leads to a smoother and more efficient training process. Through extensive experiments on prevalent image datasets, we demonstrate the superiority of HASSOD over existing methods, thereby advancing the state of the art in self-supervised object detection. Notably, we improve Mask AR from 20.2 to 22.5 on LVIS, and from 17.0 to 26.0 on SA-1B. Project page: https://HASSOD-NeurIPS23.github.io.
Keyword: face recognition

There is no result

Keyword: augmentation

Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study
Authors: Authors: Zhe He, Balu Bhasuran, Qiao Jin, Shubo Tian, Karim Hanna, Cindy Shavor, Lisbeth Garcia Arguello, Patrick Murray, Zhiyong Lu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.01693
Pdf link: https://arxiv.org/pdf/2402.01693
Abstract Lab results are often confusing and hard to understand. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with augmentation approaches. We first collected lab test results related question and answer data from Yahoo! Answers and selected 53 QA pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from four LLMs including GPT-4, Meta LLaMA 2, MedAlpaca, and ORCA_mini. We first assessed the similarity of their answers using standard QA similarity-based evaluation metrics including ROUGE, BLEU, METEOR, BERTScore. We also utilized an LLM-based evaluator to judge whether a target model has higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. Finally, we performed a manual evaluation with medical experts for all the responses to seven selected questions on the same four aspects. The results of Win Rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all four aspects (relevance, correctness, helpfulness, and safety). However, LLM responses occasionally also suffer from a lack of interpretation in one's medical context, incorrect statements, and lack of references. We find that compared to other three LLMs and human answer from the Q&A website, GPT-4's responses are more accurate, helpful, relevant, and safer. However, there are cases which GPT-4 responses are inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses.
3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems
Authors: Authors: Liang Zhang, Jionghao Lin, Conrad Borchers, Meng Cao, Xiangen Hu
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.01746
Pdf link: https://arxiv.org/pdf/2402.01746
Abstract Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models, including Generative Adversarial Network (GAN) and Generative Pre-trained Transformer (GPT), for enhanced data imputation and augmentation. The framework operates by first representing the data as a three-dimensional tensor, capturing dimensions of learners, questions, and attempts. It then densifies the data through tensor factorization and augments it using Generative AI models, tailored to individual learning patterns identified via clustering. Applied to data from an AutoTutor lesson by the Center for the Study of Adult Literacy (CSAL), the 3DG framework effectively generated scalable, personalized simulations of learning performance. Comparative analysis revealed GAN's superior reliability over GPT-4 in this context, underscoring its potential in addressing data sparsity challenges in ITSs and contributing to the advancement of personalized educational technology.
Retrieval Augmented End-to-End Spoken Dialog Models
Authors: Authors: Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.01828
Pdf link: https://arxiv.org/pdf/2402.01828
Abstract We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM. In this paper, we apply SLM to speech dialog applications where the dialog states are inferred directly from the audio signal. Task-oriented dialogs often contain domain-specific entities, i.e., restaurants, hotels, train stations, and city names, which are difficult to recognize, however, critical for the downstream applications. Inspired by the RAG (retrieval-augmented generation) paradigm, we propose a retrieval augmented SLM (ReSLM) that overcomes this weakness. We first train a speech retriever to retrieve text entities mentioned in the audio. The retrieved entities are then added as text inputs to the underlying SLM to bias model predictions. We evaluated ReSLM on speech MultiWoz task (DSTC-11 challenge), and found that this retrieval augmentation boosts model performance, achieving joint goal accuracy (38.6% vs 32.7%), slot error rate (20.6% vs 24.8%) and ASR word error rate (5.5% vs 6.7%). While demonstrated on dialog state tracking, our approach is broadly applicable to other speech tasks requiring contextual information or domain-specific entities, such as contextual ASR with biasing capability.
Position Paper: Assessing Robustness, Privacy, and Fairness in Federated Learning Integrated with Foundation Models
Authors: Authors: Xi Li, Jiaqi Wang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.01857
Pdf link: https://arxiv.org/pdf/2402.01857
Abstract Federated Learning (FL), while a breakthrough in decentralized machine learning, contends with significant challenges such as limited data availability and the variability of computational resources, which can stifle the performance and scalability of the models. The integration of Foundation Models (FMs) into FL presents a compelling solution to these issues, with the potential to enhance data richness and reduce computational demands through pre-training and data augmentation. However, this incorporation introduces novel issues in terms of robustness, privacy, and fairness, which have not been sufficiently addressed in the existing research. We make a preliminary investigation into this field by systematically evaluating the implications of FM-FL integration across these dimensions. We analyze the trade-offs involved, uncover the threats and issues introduced by this integration, and propose a set of criteria and strategies for navigating these challenges. Furthermore, we identify potential research directions for advancing this field, laying a foundation for future development in creating reliable, secure, and equitable FL systems.
Simulation-Enhanced Data Augmentation for Machine Learning Pathloss Prediction
Authors: Authors: Ahmed P. Mohamed, Byunghyun Lee, Yaguang Zhang, Max Hollingsworth, C. Robert Anderson, James V. Krogmeier, David J. Love
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2402.01969
Pdf link: https://arxiv.org/pdf/2402.01969
Abstract Machine learning (ML) offers a promising solution to pathloss prediction. However, its effectiveness can be degraded by the limited availability of data. To alleviate these challenges, this paper introduces a novel simulation-enhanced data augmentation method for ML pathloss prediction. Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets. These datasets were collected through an extensive measurement campaign in different environments, including farms, hilly terrains, and residential areas. This comprehensive data collection provides vital ground truth for model training. A set of channel features was engineered, including geographical attributes derived from LiDAR datasets. These features were then used to train our prediction model, incorporating the highly efficient and robust gradient boosting ML algorithm, CatBoost. The integration of synthetic data, as demonstrated in our study, significantly improves the generalizability of the model in different environments, achieving a remarkable improvement of approximately 12dB in terms of mean absolute error for the best-case scenario. Moreover, our analysis reveals that even a small fraction of measurements added to the simulation training set, with proper data balance, can significantly enhance the model's performance.
MasonPerplexity at ClimateActivism 2024: Integrating Advanced Ensemble Techniques and Data Augmentation for Climate Activism Stance and Hate Event Identification
Authors: Authors: Al Nahian Bin Emran, Amrita Ganguly, Sadiya Sayara Chowdhury Puspo, Dhiman Goswami, Md Nishat Raihan
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.01976
Pdf link: https://arxiv.org/pdf/2402.01976
Abstract The task of identifying public opinions on social media, particularly regarding climate activism and the detection of hate events, has emerged as a critical area of research in our rapidly changing world. With a growing number of people voicing either to support or oppose to climate-related issues - understanding these diverse viewpoints has become increasingly vital. Our team, MasonPerplexity, participates in a significant research initiative focused on this subject. We extensively test various models and methods, discovering that our most effective results are achieved through ensemble modeling, enhanced by data augmentation techniques like back-translation. In the specific components of this research task, our team achieved notable positions, ranking 5th, 1st, and 6th in the respective sub-tasks, thereby illustrating the effectiveness of our approach in this important field of study.
Online Uniform Risk Times Sampling: First Approximation Algorithms, Learning Augmentation with Full Confidence Interval Integration
Authors: Authors: Xueqing Liu, Kyra Gan, Esmaeil Keyvanshokooh, Susan Murphy
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2402.01995
Pdf link: https://arxiv.org/pdf/2402.01995
Abstract In digital health, the strategy of allocating a limited treatment budget across available risk times is crucial to reduce user fatigue. This strategy, however, encounters a significant obstacle due to the unknown actual number of risk times, a factor not adequately addressed by existing methods lacking theoretical guarantees. This paper introduces, for the first time, the online uniform risk times sampling problem within the approximation algorithm framework. We propose two online approximation algorithms for this problem, one with and one without learning augmentation, and provide rigorous theoretical performance guarantees for them using competitive ratio analysis. We assess the performance of our algorithms using both synthetic experiments and a real-world case study on HeartSteps mobile applications.
RIDERS: Radar-Infrared Depth Estimation for Robust Sensing
Authors: Authors: Han Li, Yukai Ma, Yuehao Huang, Yaqing Gu, Weihua Xu, Yong Liu, Xingxing Zuo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.02067
Pdf link: https://arxiv.org/pdf/2402.02067
Abstract Dense depth recovery is crucial in autonomous driving, serving as a foundational element for obstacle avoidance, 3D object detection, and local path planning. Adverse weather conditions, including haze, dust, rain, snow, and darkness, introduce significant challenges to accurate dense depth estimation, thereby posing substantial safety risks in autonomous driving. These challenges are particularly pronounced for traditional depth estimation methods that rely on short electromagnetic wave sensors, such as visible spectrum cameras and near-infrared LiDAR, due to their susceptibility to diffraction noise and occlusion in such environments. To fundamentally overcome this issue, we present a novel approach for robust metric depth estimation by fusing a millimeter-wave Radar and a monocular infrared thermal camera, which are capable of penetrating atmospheric particles and unaffected by lighting conditions. Our proposed Radar-Infrared fusion method achieves highly accurate and finely detailed dense depth estimation through three stages, including monocular depth prediction with global scale alignment, quasi-dense Radar augmentation by learning Radar-pixels correspondences, and local scale refinement of dense depth using a scale map learner. Our method achieves exceptional visual quality and accurate metric estimation by addressing the challenges of ambiguity and misalignment that arise from directly fusing multi-modal long-wave features. We evaluate the performance of our approach on the NTU4DRadLM dataset and our self-collected challenging ZJU-Multispectrum dataset. Especially noteworthy is the unprecedented robustness demonstrated by our proposed method in smoky scenarios. Our code will be released at \url{https://github.com/MMOCKING/RIDERS}.
Prototypical Contrastive Learning through Alignment and Uniformity for Recommendation
Authors: Authors: Yangxun Ou, Lei Chen, Fenglin Pan, Yupeng Wu
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02079
Pdf link: https://arxiv.org/pdf/2402.02079
Abstract Graph Collaborative Filtering (GCF), one of the most widely adopted recommendation system methods, effectively captures intricate relationships between user and item interactions. Graph Contrastive Learning (GCL) based GCF has gained significant attention as it leverages self-supervised techniques to extract valuable signals from real-world scenarios. However, many methods usually learn the instances of discrimination tasks that involve the construction of contrastive pairs through random sampling. GCL approaches suffer from sampling bias issues, where the negatives might have a semantic structure similar to that of the positives, thus leading to a loss of effective feature representation. To address these problems, we present the \underline{Proto}typical contrastive learning through \underline{A}lignment and \underline{U}niformity for recommendation, which is called \textbf{ProtoAU}. Specifically, we first propose prototypes (cluster centroids) as a latent space to ensure consistency across different augmentations from the origin graph, aiming to eliminate the need for random sampling of contrastive pairs. Furthermore, the absence of explicit negatives means that directly optimizing the consistency loss between instance and prototype could easily result in dimensional collapse issues. Therefore, we propose aligning and maintaining uniformity in the prototypes of users and items as optimization objectives to prevent falling into trivial solutions. Finally, we conduct extensive experiments on four datasets and evaluate their performance on the task of link prediction. Experimental results demonstrate that the proposed ProtoAU outperforms other representative methods. The source codes of our proposed ProtoAU are available at \url{https://github.com/oceanlvr/ProtoAU}.
Rendering Graphs for Graph Reasoning in Multimodal Large Language Models
Authors: Authors: Yanbin Wei, Shuai Fu, Weisen Jiang, James T. Kwok, Yu Zhang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.02130
Pdf link: https://arxiv.org/pdf/2402.02130
Abstract Large Language Models (LLMs) are increasingly used for various tasks with graph structures, such as robotic planning, knowledge graph completion, and common-sense reasoning. Though LLMs can comprehend graph information in a textual format, they overlook the rich visual modality, which is an intuitive way for humans to comprehend structural information and conduct graph reasoning. The potential benefits and capabilities of representing graph structures as visual images (i.e., visual graph) is still unexplored. In this paper, we take the first step in incorporating visual information into graph reasoning tasks and propose a new benchmark GITQA, where each sample is a tuple (graph, image, textual description). We conduct extensive experiments on the GITQA benchmark using state-of-the-art multimodal LLMs. Results on graph reasoning tasks show that combining textual and visual information together performs better than using one modality alone. Moreover, the LLaVA-7B/13B models finetuned on the training set achieve higher accuracy than the closed-source model GPT-4(V). We also study the effects of augmentations in graph reasoning.
Evolution Guided Generative Flow Networks
Authors: Authors: Zarif Ikram, Ling Pan, Dianbo Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02186
Pdf link: https://arxiv.org/pdf/2402.02186
Abstract Generative Flow Networks (GFlowNets) are a family of probabilistic generative models that learn to sample compositional objects proportional to their rewards. One big challenge of GFlowNets is training them effectively when dealing with long time horizons and sparse rewards. To address this, we propose Evolution guided generative flow networks (EGFN), a simple but powerful augmentation to the GFlowNets training using Evolutionary algorithms (EA). Our method can work on top of any GFlowNets training objective, by training a set of agent parameters using EA, storing the resulting trajectories in the prioritized replay buffer, and training the GFlowNets agent using the stored trajectories. We present a thorough investigation over a wide range of toy and real-world benchmark tasks showing the effectiveness of our method in handling long trajectories and sparse rewards.
Diabetes detection using deep learning techniques with oversampling and feature augmentation
Authors: Authors: María Teresa García-Ordás, Carmen Benavides, José Alberto Benítez-Andrades, Héctor Alaiz-Moretón, Isaías García-Rodríguez
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02188
Pdf link: https://arxiv.org/pdf/2402.02188
Abstract Background and objective: Diabetes is a chronic pathology which is affecting more and more people over the years. It gives rise to a large number of deaths each year. Furthermore, many people living with the disease do not realize the seriousness of their health status early enough. Late diagnosis brings about numerous health problems and a large number of deaths each year so the development of methods for the early diagnosis of this pathology is essential. Methods: In this paper, a pipeline based on deep learning techniques is proposed to predict diabetic people. It includes data augmentation using a variational autoencoder (VAE), feature augmentation using an sparse autoencoder (SAE) and a convolutional neural network for classification. Pima Indians Diabetes Database, which takes into account information on the patients such as the number of pregnancies, glucose or insulin level, blood pressure or age, has been evaluated. Results: A 92.31% of accuracy was obtained when CNN classifier is trained jointly the SAE for featuring augmentation over a well balanced dataset. This means an increment of 3.17% of accuracy with respect the state-of-the-art. Conclusions: Using a full deep learning pipeline for data preprocessing and classification has demonstrate to be very promising in the diabetes detection field outperforming the state-of-the-art proposals.
INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer
Authors: Authors: Han Fang, Zhihao Song, Paul Weng, Yutong Ban
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02317
Pdf link: https://arxiv.org/pdf/2402.02317
Abstract Recently, deep reinforcement learning has shown promising results for learning fast heuristics to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an unseen distribution or distributions with different scales. To address this issue, we propose a novel architecture, called Invariant Nested View Transformer (INViT), which is designed to enforce a nested design together with invariant views inside the encoders to promote the generalizability of the learned solver. It applies a modified policy gradient algorithm enhanced with data augmentations. We demonstrate that the proposed INViT achieves a dominant generalization performance on both TSP and CVRP problems with various distributions and different problem scales.
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
Authors: Authors: Guanghe Li, Yixiang Shan, Zhengbang Zhu, Ting Long, Weinan Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.02439
Pdf link: https://arxiv.org/pdf/2402.02439
Abstract In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).
TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling
Authors: Authors: Jiaxiang Dong, Haixu Wu, Yuxuan Wang, Yunzhong Qiu, Li Zhang, Jianmin Wang, Mingsheng Long
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02475
Pdf link: https://arxiv.org/pdf/2402.02475
Abstract Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks. Prior methods are mainly based on pre-training techniques well-acknowledged in vision or language, such as masked modeling and contrastive learning. However, randomly masking time series or calculating series-wise similarity will distort or neglect inherent temporal correlations crucial in time series data. To emphasize temporal correlation modeling, this paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks. Concretely, TimeSiam pre-trains Siamese encoders to capture intrinsic temporal correlations between randomly sampled past and current subseries. With a simple data augmentation method (e.g.~masking), TimeSiam can benefit from diverse augmented subseries and learn internal time-dependent representations through a past-to-current reconstruction. Moreover, learnable lineage embeddings are also introduced to distinguish temporal distance between sampled series and further foster the learning of diverse temporal correlations. TimeSiam consistently outperforms extensive advanced pre-training baselines, demonstrating superior forecasting and classification capabilities across 13 standard benchmarks in both intra- and cross-domain scenarios.
Adversarial Data Augmentation for Robust Speaker Verification
Authors: Authors: Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.02699
Pdf link: https://arxiv.org/pdf/2402.02699
Abstract Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a potential issue with the vanilla DA is augmentation residual, i.e., unwanted distortion caused by different types of augmentation. To address this problem, this paper proposes a novel approach called adversarial data augmentation (A-DA) which combines DA with adversarial learning. Specifically, it involves an additional augmentation classifier to categorize various augmentation types used in data augmentation. This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations. Experiments conducted on VoxCeleb and CN-Celeb datasets demonstrate that our proposed A-DA outperforms standard DA in both augmentation matched and mismatched test conditions, showcasing its superior robustness and generalization against acoustic variations.
Teach Me How to ImproVISe: Co-Designing an Augmented Piano Training System for Improvisation
Authors: Authors: Jordan Aiko Deja, Sandi Štor, Ilonka Pucihar, Klen Čopič Pucihar, Matjaž Kljun
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2402.02999
Pdf link: https://arxiv.org/pdf/2402.02999
Abstract Improvisation is a vital but often neglected aspect of traditional piano teaching. Challenges such as difficulty in assessment and subjectivity have hindered its effective instruction. Technological approaches, including augmentation, aim to enhance piano instruction, but the specific application of digital augmentation for piano improvisation is under-explored. This paper outlines a co-design process developing an Augmented Reality (AR) Piano Improvisation Training System, ImproVISe, involving improvisation teachers. The prototype, featuring basic improvisation concepts, was created and refined through expert interaction. Their insights guided the identification of objectives, tools, interaction metaphors, and software features. The findings offer design guidelines and recommendations to address challenges in assessing piano improvisation in a learning context.
UniMem: Towards a Unified View of Long-Context Large Language Models
Authors: Authors: Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, Zhiyuan Liu, Maosong Sun
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.03009
Pdf link: https://arxiv.org/pdf/2402.03009
Abstract Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a unified framework that reformulates existing long-context methods from the view of memory augmentation of LLMs. UniMem is characterized by four key dimensions: Memory Management, Memory Writing, Memory Reading, and Memory Injection, providing a systematic theory for understanding various long-context methods. We reformulate 16 existing methods based on UniMem and analyze four representative methods: Transformer-XL, Memorizing Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their design principles and strengths. Based on these analyses, we propose UniMix, an innovative approach that integrates the strengths of these algorithms. Experimental results show that UniMix achieves superior performance in handling long contexts with significantly lower perplexity than baselines.
Boosting Long-Delayed Reinforcement Learning with Auxiliary Short-Delayed Task
Authors: Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2402.03141
Pdf link: https://arxiv.org/pdf/2402.03141
Abstract Reinforcement learning is challenging in delayed scenarios, a common real-world situation where observations and interactions occur with delays. State-of-the-art (SOTA) state-augmentation techniques either suffer from the state-space explosion along with the delayed steps, or performance degeneration in stochastic environments. To address these challenges, our novel Auxiliary-Delayed Reinforcement Learning (AD-RL) leverages an auxiliary short-delayed task to accelerate the learning on a long-delayed task without compromising the performance in stochastic environments. Specifically, AD-RL learns the value function in the short-delayed task and then employs it with the bootstrapping and policy improvement techniques in the long-delayed task. We theoretically show that this can greatly reduce the sample complexity compared to directly learning on the original long-delayed task. On deterministic and stochastic benchmarks, our method remarkably outperforms the SOTAs in both sample efficiency and policy performance.
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
Authors: Authors: Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.03162
Pdf link: https://arxiv.org/pdf/2402.03162
Abstract Recent text-to-video diffusion models have achieved impressive progress. In practice, users often desire the ability to control object motion and camera movement independently for customized video creation. However, current methods lack the focus on separately controlling object motion and camera movement in a decoupled manner, which limits the controllability and flexibility of text-to-video models. In this paper, we introduce Direct-a-Video, a system that allows users to independently specify motions for one or multiple objects and/or camera movements, as if directing a video. We propose a simple yet effective strategy for the decoupled control of object motion and camera movement. Object motion is controlled through spatial cross-attention modulation using the model's inherent priors, requiring no additional optimization. For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters. We further employ an augmentation-based approach to train these layers in a self-supervised manner on a small-scale dataset, eliminating the need for explicit motion annotation. Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios. Extensive experiments demonstrate the superiority and effectiveness of our method. Project page: https://direct-a-video.github.io/.

LeeKyungwook / get-arxiv-noti

New submissions for Tue, 6 Feb 24 #964

Keyword: detection

Detection of Machine-Generated Text: Literature Survey

Recent Innovations in Footwear Sensors: Role of Smart Footwear in Healthcare -- A Survey

Killer Apps: Low-Speed, Large-Scale AI Weapons

Edge Offloading in Smart Grid

Resource-efficient In-orbit Detection of Earth Objects

A Systematic Mapping Study of Digital Twins for Diagnosis in Transportation

Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Large Language Models

Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties

Systematic Literature Review: Computational Approaches for Humour Style Classification

AOC-IDS: Autonomous Online Framework with Contrastive Learning for Intrusion Detection

S2malloc: Statistically Secure Allocator for Use-After-Free Protection And More

Are You a Real Software Engineer? Best Practices in Online Recruitment for Software Engineering Studies

MasonPerplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles

Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery

MasonPerplexity at ClimateActivism 2024: Integrating Advanced Ensemble Techniques and Data Augmentation for Climate Activism Stance and Hate Event Identification

SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks

GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning

Understanding Time Series Anomaly State Detection through One-Class Classification

Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving

Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

Feature Selection using the concept of Peafowl Mating in IDS

RIDERS: Radar-Infrared Depth Estimation for Robust Sensing

Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties

DeCoF: Generated Video Detection via Frame Consistency

Recent Advances in Digital Image and Video Forensics, Anti-forensics and Counter Anti-forensics

Decomposition-based and Interference Perception for Infrared and Visible Image Fusion in Complex Scenes

Probing Critical Learning Dynamics of PLMs for Hate Speech Detection

Collaborative Agents for Software Engineering

Detecting Respiratory Pathologies Using Convolutional Neural Networks and Variational Autoencoders for Unbalancing Data

Diabetes detection using deep learning techniques with oversampling and feature augmentation

CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse

Revisiting Generative Adversarial Networks for Binary Semantic Segmentation on Imbalanced Datasets

Data Quality Matters: Suicide Intention Detection on Social Media Posts Using a RoBERTa-CNN Model

$\textit{A Contrario}$ Paradigm for YOLO-based Infrared Small Target Detection

Joint Activity and Data Detection for Massive Grant-Free Access Using Deterministic Non-Orthogonal Signatures

Timer: Transformers for Time Series Analysis at Scale

AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art

DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms in Vision Transformers

Gazebo Plants: Simulating Plant-Robot Interaction with Cosserat Rods

Spatio-temporal Prompting Network for Robust Video Feature Extraction

Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning Approach

Learning with Mixture of Prototypes for Out-of-Distribution Detection

Innovative Cybersickness Detection: Exploring Head Movement Patterns in Virtual Reality

Improving Robustness of LiDAR-Camera Fusion Model against Weather Corruption from Fusion Strategy Perspective

DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

Transmission Line Detection Based on Improved Hough Transform

Dual Knowledge Distillation for Efficient Sound Event Detection

Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects

A Comprehensive Numerical Approach to Coil Placement in Cerebral Aneurysms: Mathematical Modeling and In Silico Occlusion Classification

Are Sounds Sound for Phylogenetic Reconstruction?

Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective

Evading Data Contamination Detection for Language Models is (too) Easy

SynthVision -- Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data

PowerGraph: A power grid benchmark dataset for graph neural networks

On the Performance of RIS-Aided Spatial Modulation for Downlink Transmission

Dynamic Test Case Prioritization in Industrial Test Result Datasets

Automated Cognate Detection as a Supervised Link Prediction Task with Cognate Transformer

Kernel PCA for Out-of-Distribution Detection

Unraveling the Key of Machine Learning Solutions for Android Malware Detection

Are We There Yet? Unraveling the State-of-the-Art Smart Contract Fuzzers

Putting Context in Context: the Impact of Discussion Structure on Text Classification

A Safety-Adapted Loss for Pedestrian Detection in Automated Driving

[Citation needed] Data usage and citation practices in medical imaging conferences

Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Detecting Scams Using Large Language Models

Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms

Unified Hallucination Detection for Multimodal Large Language Models

"Define Your Terms" : Enhancing Efficient Offensive Speech Classification with Definition

ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection

Multiclass Classification Procedure for Detecting Attacks on MQTT-IoT Protocol

Zero-shot Object-Level OOD Detection with Context-Aware Inpainting

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

Keyword: face recognition

Keyword: augmentation