New submissions for Thu, 7 Mar 24

Keyword: detection

TTPXHunter: Actionable Threat Intelligence Extraction as TTPs form Finished Cyber Threat Reports

Authors: Authors: Nanda Rani, Bikash Saha, Vikas Maurya, Sandeep Kumar Shukla
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2403.03267
Pdf link: https://arxiv.org/pdf/2403.03267
Abstract Understanding the modus operandi of adversaries aids organizations in employing efficient defensive strategies and sharing intelligence in the community. This knowledge is often present in unstructured natural language text within threat analysis reports. A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and translate it into a structured format. This research introduces a methodology named TTPXHunter for the automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports. It leverages cyber domain-specific state-of-the-art natural language processing (NLP) to augment sentences for minority class TTPs and refine pinpointing the TTPs in threat analysis reports significantly. The knowledge of threat intelligence in terms of TTPs is essential for comprehensively understanding cyber threats and enhancing detection and mitigation strategies. We create two datasets: an augmented sentence-TTP dataset of 39,296 samples and a 149 real-world cyber threat intelligence report-to-TTP dataset. Further, we evaluate TTPXHunter on the augmented sentence dataset and the cyber threat reports. The TTPXHunter achieves the highest performance of 92.42% f1-score on the augmented dataset, and it also outperforms existing state-of-the-art solutions in TTP extraction by achieving an f1-score of 97.09% when evaluated over the report dataset. TTPXHunter significantly improves cybersecurity threat intelligence by offering quick, actionable insights into attacker behaviors. This advancement automates threat intelligence analysis, providing a crucial tool for cybersecurity professionals fighting cyber threats.
Beyond the Dashboard: Investigating Distracted Driver Communication Preferences for ADAS
Authors: Authors: Aamir Hasan, D. Livingston McPherson, Melissa Miles, Katherine Driggs-Campbell
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2403.03312
Pdf link: https://arxiv.org/pdf/2403.03312
Abstract Distracted driving is a major cause of road fatalities. With improvements in driver (in)attention detection, these distracted situations can be caught early to alert drivers and improve road safety and comfort. However, drivers may have differing preferences for the modes of such communication based on the driving scenario and their current distraction state. To this end, we present a user study (N=147) where videos of simulated driving scenarios were utilized to learn drivers preferences for modes of communication and their evolution with the drivers changing attention. The survey queried participants preferred modes of communication for scenarios such as collisions or stagnation at a green light. We validate our hypotheses and provide key results that inform the future of communication between drivers and their vehicles. We showcase the different driver preferences based on the nature of the driving scenario and also show that they evolve as the drivers distraction state changes.
DIVERSE: Deciphering Internet Views on the U.S. Military Through Video Comment Stance Analysis, A Novel Benchmark Dataset for Stance Classification
Authors: Authors: Iain J. Cruickshank, Lynnette Hui Xian Ng
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.03334
Pdf link: https://arxiv.org/pdf/2403.03334
Abstract Stance detection of social media text is a key component of downstream tasks involving the identification of groups of users with opposing opinions on contested topics such as vaccination and within arguments. In particular, stance provides an indication of an opinion towards an entity. This paper introduces DIVERSE, a dataset of over 173,000 YouTube video comments annotated for their stance towards videos of the U.S. military. The stance is annotated through a human-guided, machine-assisted labeling methodology that makes use of weak signals of tone within the sentence as supporting indicators, as opposed to using manual annotations by humans. These weak signals consist of the presence of hate speech and sarcasm, the presence of specific keywords, the sentiment of the text, and the stance inference from two Large Language Models. The weak signals are then consolidated using a data programming model before each comment is annotated with a final stance label. On average, the videos have 200 comments each, and the stance of the comments skews slightly towards the "against" characterization for both the U.S. Army and the videos posted on the channel.
Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse
Authors: Authors: Joseph Gatto, Madhusudan Basak, Yash Srivastava, Philip Bohlman, Sarah M. Preum
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2403.03336
Pdf link: https://arxiv.org/pdf/2403.03336
Abstract In this paper, we develop an LLM-powered framework for the curation and evaluation of emerging opinion mining in online health communities. We formulate emerging opinion mining as a pairwise stance detection problem between (title, comment) pairs sourced from Reddit, where post titles contain emerging health-related claims on a topic that is not predefined. The claims are either explicitly or implicitly expressed by the user. We detail (i) a method of claim identification -- the task of identifying if a post title contains a claim and (ii) an opinion mining-driven evaluation framework for stance detection using LLMs. We facilitate our exploration by releasing a novel test dataset, Long COVID-Stance, or LC-stance, which can be used to evaluate LLMs on the tasks of claim identification and stance detection in online health communities. Long Covid is an emerging post-COVID disorder with uncertain and complex treatment guidelines, thus making it a suitable use case for our task. LC-Stance contains long COVID treatment related discourse sourced from a Reddit community. Our evaluation shows that GPT-4 significantly outperforms prior works on zero-shot stance detection. We then perform thorough LLM model diagnostics, identifying the role of claim type (i.e. implicit vs explicit claims) and comment length as sources of model error.
Enhancing Vision-Language Pre-training with Rich Supervisions
Authors: Authors: Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03346
Pdf link: https://arxiv.org/pdf/2403.03346
Abstract We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localization to carefully design 10 pre-training tasks with large scale annotated data. These tasks resemble downstream tasks across different domains and the annotations are cheap to obtain. We demonstrate that, compared to current screenshot pre-training objectives, our innovative pre-training method significantly enhances performance of image-to-text model in nine varied and popular downstream tasks - up to 76.1% improvements on Table Detection, and at least 1% on Widget Captioning.
Leveraging Federated Learning for Automatic Detection of Clopidogrel Treatment Failures
Authors: Authors: Samuel Kim, Min Sang Kim
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2403.03368
Pdf link: https://arxiv.org/pdf/2403.03368
Abstract The effectiveness of clopidogrel, a widely used antiplatelet medication, varies significantly among individuals, necessitating the development of precise predictive models to optimize patient care. In this study, we leverage federated learning strategies to address clopidogrel treatment failure detection. Our research harnesses the collaborative power of multiple healthcare institutions, allowing them to jointly train machine learning models while safeguarding sensitive patient data. Utilizing the UK Biobank dataset, which encompasses a vast and diverse population, we partitioned the data based on geographic centers and evaluated the performance of federated learning. Our results show that while centralized training achieves higher Area Under the Curve (AUC) values and faster convergence, federated learning approaches can substantially narrow this performance gap. Our findings underscore the potential of federated learning in addressing clopidogrel treatment failure detection, offering a promising avenue for enhancing patient care through personalized treatment strategies while respecting data privacy. This study contributes to the growing body of research on federated learning in healthcare and lays the groundwork for secure and privacy-preserving predictive models for various medical conditions.
Performance Evaluation of Semi-supervised Learning Frameworks for Multi-Class Weed Detection
Authors: Authors: Jiajia Li, Dong Chen, Xunyuan Yin, Zhaojian Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2403.03390
Pdf link: https://arxiv.org/pdf/2403.03390
Abstract Effective weed control plays a crucial role in optimizing crop yield and enhancing agricultural product quality. However, the reliance on herbicide application not only poses a critical threat to the environment but also promotes the emergence of resistant weeds. Fortunately, recent advances in precision weed management enabled by ML and DL provide a sustainable alternative. Despite great progress, existing algorithms are mainly developed based on supervised learning approaches, which typically demand large-scale datasets with manual-labeled annotations, which is time-consuming and labor-intensive. As such, label-efficient learning methods, especially semi-supervised learning, have gained increased attention in the broader domain of computer vision and have demonstrated promising performance. These methods aim to utilize a small number of labeled data samples along with a great number of unlabeled samples to develop high-performing models comparable to the supervised learning counterpart trained on a large amount of labeled data samples. In this study, we assess the effectiveness of a semi-supervised learning framework for multi-class weed detection, employing two well-known object detection frameworks, namely FCOS and Faster-RCNN. Specifically, we evaluate a generalized student-teacher framework with an improved pseudo-label generation module to produce reliable pseudo-labels for the unlabeled data. To enhance generalization, an ensemble student network is employed to facilitate the training process. Experimental results show that the proposed approach is able to achieve approximately 76\% and 96\% detection accuracy as the supervised methods with only 10\% of labeled data in CottenWeedDet3 and CottonWeedDet12, respectively. We offer access to the source code, contributing a valuable resource for ongoing semi-supervised learning research in weed detection and beyond.
Contrastive Learning of Person-independent Representations for Facial Action Unit Detection
Authors: Authors: Yong Li, Shiguang Shan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03400
Pdf link: https://arxiv.org/pdf/2403.03400
Abstract Facial action unit (AU) detection, aiming to classify AU present in the facial image, has long suffered from insufficient AU annotations. In this paper, we aim to mitigate this data scarcity issue by learning AU representations from a large number of unlabelled facial videos in a contrastive learning paradigm. We formulate the self-supervised AU representation learning signals in two-fold: (1) AU representation should be frame-wisely discriminative within a short video clip; (2) Facial frames sampled from different identities but show analogous facial AUs should have consistent AU representations. As to achieve these goals, we propose to contrastively learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations. Specially, we adopt a margin-based temporal contrastive learning paradigm to perceive the temporal AU coherence and evolution characteristics within a clip that consists of consecutive input facial frames. Moreover, the cross-identity reconstruction mechanism facilitates pushing the faces from different identities but show analogous AUs close in the latent embedding space. Experimental results on three public AU datasets demonstrate that the learned AU representation is discriminative for AU detection. Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
Advancing Out-of-Distribution Detection through Data Purification and Dynamic Activation Function Design
Authors: Authors: Yingrui Ji, Yao Zhu, Zhigang Li, Jiansheng Chen, Yunlong Kong, Jingbo Chen
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03412
Pdf link: https://arxiv.org/pdf/2403.03412
Abstract In the dynamic realms of machine learning and deep learning, the robustness and reliability of models are paramount, especially in critical real-world applications. A fundamental challenge in this sphere is managing Out-of-Distribution (OOD) samples, significantly increasing the risks of model misclassification and uncertainty. Our work addresses this challenge by enhancing the detection and management of OOD samples in neural networks. We introduce OOD-R (Out-of-Distribution-Rectified), a meticulously curated collection of open-source datasets with enhanced noise reduction properties. In-Distribution (ID) noise in existing OOD datasets can lead to inaccurate evaluation of detection algorithms. Recognizing this, OOD-R incorporates noise filtering technologies to refine the datasets, ensuring a more accurate and reliable evaluation of OOD detection algorithms. This approach not only improves the overall quality of data but also aids in better distinguishing between OOD and ID samples, resulting in up to a 2.5\% improvement in model accuracy and a minimum 3.2\% reduction in false positives. Furthermore, we present ActFun, an innovative method that fine-tunes the model's response to diverse inputs, thereby improving the stability of feature extraction and minimizing specificity issues. ActFun addresses the common problem of model overconfidence in OOD detection by strategically reducing the influence of hidden units, which enhances the model's capability to estimate OOD uncertainty more accurately. Implementing ActFun in the OOD-R dataset has led to significant performance enhancements, including an 18.42\% increase in AUROC of the GradNorm method and a 16.93\% decrease in FPR95 of the Energy method. Overall, our research not only advances the methodologies in OOD detection but also emphasizes the importance of dataset integrity for accurate algorithm evaluation.
FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion
Authors: Authors: Hao Wang, Sayed Pedram Haeri Boroujeni, Xiwen Chen, Ashish Bastola, Huayu Li, Abolfazl Razi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03463
Pdf link: https://arxiv.org/pdf/2403.03463
Abstract The rise of machine learning in recent years has brought benefits to various research fields such as wide fire detection. Nevertheless, small object detection and rare object detection remain a challenge. To address this problem, we present a dataset automata that can generate ground truth paired datasets using diffusion models. Specifically, we introduce a mask-guided diffusion framework that can fusion the wildfire into the existing images while the flame position and size can be precisely controlled. In advance, to fill the gap that the dataset of wildfire images in specific scenarios is missing, we vary the background of synthesized images by controlling both the text prompt and input image. Furthermore, to solve the color tint problem or the well-known domain shift issue, we apply the CLIP model to filter the generated massive dataset to preserve quality. Thus, our proposed framework can generate a massive dataset of that images are high-quality and ground truth-paired, which well addresses the needs of the annotated datasets in specific tasks.
Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator
Authors: Authors: Wonhyeok Choi, Mingyu Shin, Hyukzae Lee, Jaehoon Cho, Jaehyeon Park, Sunghoon Im
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03468
Pdf link: https://arxiv.org/pdf/2403.03468
Abstract Real-time processing is crucial in autonomous driving systems due to the imperative of instantaneous decision-making and rapid response. In real-world scenarios, autonomous vehicles are continuously tasked with interpreting their surroundings, analyzing intricate sensor data, and making decisions within split seconds to ensure safety through numerous computer vision tasks. In this paper, we present a new real-time multi-task network adept at three vital autonomous driving tasks: monocular 3D object detection, semantic segmentation, and dense depth estimation. To counter the challenge of negative transfer, which is the prevalent issue in multi-task learning, we introduce a task-adaptive attention generator. This generator is designed to automatically discern interrelations across the three tasks and arrange the task-sharing pattern, all while leveraging the efficiency of the hard-parameter sharing approach. To the best of our knowledge, the proposed model is pioneering in its capability to concurrently handle multiple tasks, notably 3D object detection, while maintaining real-time processing speeds. Our rigorously optimized network, when tested on the Cityscapes-3D datasets, consistently outperforms various baseline models. Moreover, an in-depth ablation study substantiates the efficacy of the methodologies integrated into our framework.
Global Geolocated Realtime Data of Interfleet Urban Transit Bus Idling
Authors: Authors: Nicholas Kunz, H. Oliver Gao
Subjects: Systems and Control (eess.SY); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2403.03489
Pdf link: https://arxiv.org/pdf/2403.03489
Abstract Urban transit bus idling is a contributor to ecological stress, economic inefficiency, and medically hazardous health outcomes due to emissions. The global accumulation of this frequent pattern of undesirable driving behavior is enormous. In order to measure its scale, we propose GRD-TRT- BUF-4I (Ground Truth Buffer for Idling) an extensible, realtime detection system that records the geolocation and idling duration of urban transit bus fleets internationally. Using live vehicle locations from General Transit Feed Specification (GTFS) Realtime, the system detects approximately 200,000 idling events per day from over 50 cities across North America, Europe, Oceania, and Asia. This realtime data was created to dynamically serve operational decision-making and fleet management to reduce the frequency and duration of idling events as they occur, as well as to capture its accumulative effects. Civil and Transportation Engineers, Urban Planners, Epidemiologists, Policymakers, and other stakeholders might find this useful for emissions modeling, traffic management, route planning, and other urban sustainability efforts at a variety of geographic and temporal scales.
Towards Detecting AI-Generated Text within Human-AI Collaborative Hybrid Texts
Authors: Authors: Zijie Zeng, Shiqi Liu, Lele Sha, Zhuang Li, Kaixun Yang, Sannyuya Liu, Dragan Gašević, Guanliang Chen
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.03506
Pdf link: https://arxiv.org/pdf/2403.03506
Abstract This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.
Non-verbal information in spontaneous speech - towards a new framework of analysis
Authors: Authors: Tirza Biron, Moshe Barboy, Eran Ben-Artzy, Alona Golubchik, Yanir Marmor, Smadar Szekely, Yaron Winter, David Harel
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2403.03522
Pdf link: https://arxiv.org/pdf/2403.03522
Abstract Non-verbal signals in speech are encoded by prosody and carry information that ranges from conversation action to attitude and emotion. Despite its importance, the principles that govern prosodic structure are not yet adequately understood. This paper offers an analytical schema and a technological proof-of-concept for the categorization of prosodic signals and their association with meaning. The schema interprets surface-representations of multi-layered prosodic events. As a first step towards implementation, we present a classification process that disentangles prosodic phenomena of three orders. It relies on fine-tuning a pre-trained speech recognition model, enabling the simultaneous multi-class/multi-label detection. It generalizes over a large variety of spontaneous data, performing on a par with, or superior to, human annotation. In addition to a standardized formalization of prosody, disentangling prosodic patterns can direct a theory of communication and speech organization. A welcome by-product is an interpretation of prosody that will enhance speech- and language-related technologies.
RADIA -- Radio Advertisement Detection with Intelligent Analytics
Authors: Authors: Jorge Álvarez, Juan Carlos Armenteros, Camilo Torrón, Miguel Ortega-Martín, Alfonso Ardoiz, Óscar García, Ignacio Arranz, Íñigo Galdeano, Ignacio Garrido, Adrián Alonso, Fernando Bayón, Oleg Vorontsov
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2403.03538
Pdf link: https://arxiv.org/pdf/2403.03538
Abstract Radio advertising remains an integral part of modern marketing strategies, with its appeal and potential for targeted reach undeniably effective. However, the dynamic nature of radio airtime and the rising trend of multiple radio spots necessitates an efficient system for monitoring advertisement broadcasts. This study investigates a novel automated radio advertisement detection technique incorporating advanced speech recognition and text classification algorithms. RadIA's approach surpasses traditional methods by eliminating the need for prior knowledge of the broadcast content. This contribution allows for detecting impromptu and newly introduced advertisements, providing a comprehensive solution for advertisement detection in radio broadcasting. Experimental results show that the resulting model, trained on carefully segmented and tagged text data, achieves an F1-macro score of 87.76 against a theoretical maximum of 89.33. This paper provides insights into the choice of hyperparameters and their impact on the model's performance. This study demonstrates its potential to ensure compliance with advertising broadcast contracts and offer competitive surveillance. This groundbreaking research could fundamentally change how radio advertising is monitored and open new doors for marketing optimization.
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem
Authors: Authors: Yuhong Sun, Zhangyue Yin, Qipeng Guo, Jiawen Wu, Xipeng Qiu, Hui Zhao
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.03558
Pdf link: https://arxiv.org/pdf/2403.03558
Abstract Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks. However, they are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination. This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP). To support this approach, we innovatively develop a dataset called Unanswerable Math Word Problem (UMWP) which comprises 5200 questions across five categories. We developed an evaluation methodology combining text similarity and mathematical expression detection to determine whether LLM considers the question unanswerable. The results of extensive experiments conducted on 31 LLMs, including GPT-3, InstructGPT, LLaMA, and Claude, demonstrate that in-context learning and reinforcement learning with human feedback (RLHF) training significantly enhance the model's ability to avoid hallucination. We show that utilizing MWP is a reliable and effective approach to assess hallucination. Our code and data are available at https://github.com/Yuki-Asuuna/UMWP.
Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip Perception of Mobile Manipulation Robots
Authors: Authors: Youngjae Yoo, Chung-Yeon Lee, Byoung-Tak Zhang
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.03563
Pdf link: https://arxiv.org/pdf/2403.03563
Abstract Object slip perception is essential for mobile manipulation robots to perform manipulation tasks reliably in the dynamic real-world. Traditional approaches to robot arms' slip perception use tactile or vision sensors. However, mobile robots still have to deal with noise in their sensor signals caused by the robot's movement in a changing environment. To solve this problem, we present an anomaly detection method that utilizes multisensory data based on a deep autoencoder model. The proposed framework integrates heterogeneous data streams collected from various robot sensors, including RGB and depth cameras, a microphone, and a force-torque sensor. The integrated data is used to train a deep autoencoder to construct latent representations of the multisensory data that indicate the normal status. Anomalies can then be identified by error scores measured by the difference between the trained encoder's latent values and the latent values of reconstructed input data. In order to evaluate the proposed framework, we conducted an experiment that mimics an object slip by a mobile service robot operating in a real-world environment with diverse household objects and different moving patterns. The experimental results verified that the proposed framework reliably detects anomalies in object slip situations despite various object types and robot behaviors, and visual and auditory noise in the environment.
Unsupervised Incremental Learning with Dual Concept Drift Detection for Identifying Anomalous Sequences
Authors: Authors: Jin Li, Kleanthis Malialis, Marios M. Polycarpou
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2403.03576
Pdf link: https://arxiv.org/pdf/2403.03576
Abstract In the contemporary digital landscape, the continuous generation of extensive streaming data across diverse domains has become pervasive. Yet, a significant portion of this data remains unlabeled, posing a challenge in identifying infrequent events such as anomalies. This challenge is further amplified in non-stationary environments, where the performance of models can degrade over time due to concept drift. To address these challenges, this paper introduces a new method referred to as VAE4AS (Variational Autoencoder for Anomalous Sequences). VAE4AS integrates incremental learning with dual drift detection mechanisms, employing both a statistical test and a distance-based test. The anomaly detection is facilitated by a Variational Autoencoder. To gauge the effectiveness of VAE4AS, a comprehensive experimental study is conducted using real-world and synthetic datasets characterized by anomalous rates below 10\% and recurrent drift. The results show that the proposed method surpasses both robust baselines and state-of-the-art techniques, providing compelling evidence for their efficacy in effectively addressing some of the challenges associated with anomalous sequence detection in non-stationary streaming data.
Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing
Authors: Authors: Sergio Rubio-Martín, María Teresa García-Ordás, Martín Bayón-Gutiérrez, Natalia Prieto-Fernández, José Alberto Benítez-Andrades
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.03581
Pdf link: https://arxiv.org/pdf/2403.03581
Abstract Purpose: Our study explored the use of artificial intelligence (AI) to diagnose autism spectrum disorder (ASD). It focused on machine learning (ML) and deep learning (DL) to detect ASD from text inputs on social media, addressing challenges in traditional ASD diagnosis. Methods: We used natural language processing (NLP), ML, and DL models (including decision trees, XGB, KNN, RNN, LSTM, Bi-LSTM, BERT, and BERTweet) to analyze 404,627 tweets, classifying them based on ASD or non-ASD authors. A subset of 90,000 tweets was used for model training and testing. Results: Our AI models showed high accuracy, with an 88% success rate in identifying texts from individuals with ASD. Conclusion: The study demonstrates AI's potential in improving ASD diagnosis, especially in children, highlighting the importance of early detection.
DeepEclipse: How to Break White-Box DNN-Watermarking Schemes
Authors: Authors: Alessandro Pegoraro, Carlotta Segna, Kavita Kumari, Ahmad-Reza Sadeghi
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.03590
Pdf link: https://arxiv.org/pdf/2403.03590
Abstract Deep Learning (DL) models have become crucial in digital transformation, thus raising concerns about their intellectual property rights. Different watermarking techniques have been developed to protect Deep Neural Networks (DNNs) from IP infringement, creating a competitive field for DNN watermarking and removal methods. The predominant watermarking schemes use white-box techniques, which involve modifying weights by adding a unique signature to specific DNN layers. On the other hand, existing attacks on white-box watermarking usually require knowledge of the specific deployed watermarking scheme or access to the underlying data for further training and fine-tuning. We propose DeepEclipse, a novel and unified framework designed to remove white-box watermarks. We present obfuscation techniques that significantly differ from the existing white-box watermarking removal schemes. DeepEclipse can evade watermark detection without prior knowledge of the underlying watermarking scheme, additional data, or training and fine-tuning. Our evaluation reveals that DeepEclipse excels in breaking multiple white-box watermarking schemes, reducing watermark detection to random guessing while maintaining a similar model accuracy as the original one. Our framework showcases a promising solution to address the ongoing DNN watermark protection and removal challenges.
Spectrum Occupancy Detection Supported by Federated Learning
Authors: Authors: Łukasz Kułacz
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2403.03617
Pdf link: https://arxiv.org/pdf/2403.03617
Abstract Dynamic spectrum access is essential for radiocommunication and its limited spectrum resources. The key element of dynamic spectrum access systems is effective spectrum occupancy detection. In many cases, machine learning algorithms improve detection effectiveness. Because of the recent trend of using federated learning, a federated learning algorithm is presented in the context of distributed spectrum occupancy detection. The results of the work presented in the paper are based on actual signal samples collected in the laboratory. The proposed algorithm is effective, especially in the context of a set of sensors with faulty sensors.
Processing Load Allocation of On-Board Multi-User Detection for Payload-Constrained Satellite Networks
Authors: Authors: Sirui Miao, Neng Ye, Peisen Wang, Qiaolin Ouyang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2403.03635
Pdf link: https://arxiv.org/pdf/2403.03635
Abstract The rapid advance of mega-constellation facilitates the booming of direct-to-satellite massive access, where multi-user detection is critical to alleviate the induced inter-user interference. While centralized implementation of on-board detection induces unaffordable complexity for a single satellite, this paper proposes to allocate the processing load among cooperative satellites for finest exploitation of distributed processing power. Observing the inherent disparities among users, we first excavate the closed-form trade-offs between achievable sum-rate and the processing load corresponding to the satellite-user matchings, which leads to a system sum-rate maximization problem under stringent payload constraints. To address the non-trivial integer matching, we develop a quadratic transformation to the original problem, and prove it an equivalent conversion. The problem is further simplified into a series of subproblems employing successive lower bound approximation which obtains polynomial-time complexity and converges within a few iterations. Numerical results show remarkably complexity reduction compared with centralized processing, as well as around 20\% sum-rate gain compared with other allocation methods.
Portraying the Need for Temporal Data in Flood Detection via Sentinel-1
Authors: Authors: Xavier Bou, Thibaud Ehret, Rafael Grompone von Gioi, Jeremy Anger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2403.03671
Pdf link: https://arxiv.org/pdf/2403.03671
Abstract Identifying flood affected areas in remote sensing data is a critical problem in earth observation to analyze flood impact and drive responses. While a number of methods have been proposed in the literature, there are two main limitations in available flood detection datasets: (1) a lack of region variability is commonly observed and/or (2) they require to distinguish permanent water bodies from flooded areas from a single image, which becomes an ill-posed setup. Consequently, we extend the globally diverse MMFlood dataset to multi-date by providing one year of Sentinel-1 observations around each flood event. To our surprise, we notice that the definition of flooded pixels in MMFlood is inconsistent when observing the entire image sequence. Hence, we re-frame the flood detection task as a temporal anomaly detection problem, where anomalous water bodies are segmented from a Sentinel-1 temporal sequence. From this definition, we provide a simple method inspired by the popular video change detector ViBe, results of which quantitatively align with the SAR image time series, providing a reasonable baseline for future works.
Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors
Authors: Authors: Kalibinuer Tiliwalidi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03674
Pdf link: https://arxiv.org/pdf/2403.03674
Abstract Currently, infrared imaging technology enjoys widespread usage, with infrared object detection technology experiencing a surge in prominence. While previous studies have delved into physical attacks on infrared object detectors, the implementation of these techniques remains complex. For instance, some approaches entail the use of bulb boards or infrared QR suits as perturbations to execute attacks, which entail costly optimization and cumbersome deployment processes. Other methodologies involve the utilization of irregular aerogel as physical perturbations for infrared attacks, albeit at the expense of optimization expenses and perceptibility issues. In this study, we propose a novel infrared physical attack termed Adversarial Infrared Geometry (\textbf{AdvIG}), which facilitates efficient black-box query attacks by modeling diverse geometric shapes (lines, triangles, ellipses) and optimizing their physical parameters using Particle Swarm Optimization (PSO). Extensive experiments are conducted to evaluate the effectiveness, stealthiness, and robustness of AdvIG. In digital attack experiments, line, triangle, and ellipse patterns achieve attack success rates of 93.1\%, 86.8\%, and 100.0\%, respectively, with average query times of 71.7, 113.1, and 2.57, respectively, thereby confirming the efficiency of AdvIG. Physical attack experiments are conducted to assess the attack success rate of AdvIG at different distances. On average, the line, triangle, and ellipse achieve attack success rates of 61.1\%, 61.2\%, and 96.2\%, respectively. Further experiments are conducted to comprehensively analyze AdvIG, including ablation experiments, transfer attack experiments, and adversarial defense mechanisms. Given the superior performance of our method as a simple and efficient black-box adversarial attack in both digital and physical environments, we advocate for widespread attention to AdvIG.
Quantifying Media Influence on Covid-19 Mask-Wearing Beliefs
Authors: Authors: Nicholas Rabb, Nitya Nadgir, Jan P. de Ruiter, Lenore Cowen
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2403.03684
Pdf link: https://arxiv.org/pdf/2403.03684
Abstract How political beliefs change in accordance with media exposure is a complicated matter. Some studies have been able to demonstrate that groups with different media diets in the aggregate (e.g., U.S. media consumers ingesting partisan news) arrive at different beliefs about policy issues, but proving this from data at a granular level -- at the level of attitudes expressed in news stories -- remains difficult. In contrast to existing opinion formation models that describe granular detail but are not data-driven, or data-driven studies that rely on simple keyword detection and miss linguistic nuances, being able to identify complicated attitudes in news text and use this data to drive models would enable more nuanced empirical study of opinion formation from media messaging. This study contributes a dataset as well as an analysis that allows the mapping of attitudes from individual news stories to aggregate changes of opinion over time for an important public health topic where opinion differed in the U.S. by partisan media diet: Covid mask-wearing beliefs. By gathering a dataset of U.S. news media stories, from April 6 to June 8, 2020, annotated according to Howard 2020's Face Mask Perception Scale for their statements regarding Covid-19 mask-wearing, we demonstrate fine-grained correlations between media messaging and empirical opinion polling data from a Gallup survey conducted during the same period. We also demonstrate that the data can be used for quantitative analysis of pro- and anti-mask sentiment throughout the period, identifying major events that drove opinion changes. This dataset is made publicly available and can be used by other researchers seeking to evaluate how mask-wearing attitudes were driven by news media content. Additionally, we hope that its general method can be used to enable other media researchers to conduct more detailed analyses of media effects on opinion.
CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection
Authors: Authors: Gyusam Chang, Wonseok Roh, Sujin Jang, Dongwook Lee, Daehyun Ji, Gyeongrok Oh, Jinsun Park, Jinkyu Kim, Sangpil Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03721
Pdf link: https://arxiv.org/pdf/2403.03721
Abstract Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution. To reduce such domain gaps and thus to make 3DOD models more generalizable, we introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which (i) leverages visual semantic cues from an image modality (i.e., camera images) as an effective semantic bridge to close the domain gap in the cross-modal Bird's Eye View (BEV) representations. Further, (ii) we also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features, which disrupt the discrimination of whether a feature instance comes from a source or an unseen target domain. Overall, our CMDA framework guides the 3DOD model to generate highly informative and domain-adaptive features for novel data distributions. In our extensive experiments with large-scale benchmarks, such as nuScenes, Waymo, and KITTI, those mentioned above provide significant performance gains for UDA tasks, achieving state-of-the-art performance.
German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset
Authors: Authors: Laura Mascarell, Ribin Chalumattu, Annette Rios
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.03750
Pdf link: https://arxiv.org/pdf/2403.03750
Abstract The advent of Large Language Models (LLMs) has led to remarkable progress on a wide range of natural language processing tasks. Despite the advances, these large-sized models still suffer from hallucinating information in their output, which poses a major issue in automatic text summarization, as we must guarantee that the generated summary is consistent with the content of the source document. Previous research addresses the challenging task of detecting hallucinations in the output (i.e. inconsistency detection) in order to evaluate the faithfulness of the generated summaries. However, these works primarily focus on English and recent multilingual approaches lack German data. This work presents absinth, a manually annotated dataset for hallucination detection in German news summarization and explores the capabilities of novel open-source LLMs on this task in both fine-tuning and in-context learning settings. We open-source and release the absinth dataset to foster further research on hallucination detection in German.
Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery
Authors: Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Guoqiang Lei, Yin Zhuang, Xuerui Mao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03790
Pdf link: https://arxiv.org/pdf/2403.03790
Abstract Ship detection needs to identify ship locations from remote sensing (RS) scenes. However, due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. Therefore, in this article, considering that the large language models (LLMs) emerge the powerful generalization ability, a novel unified visual-language model called Popeye is proposed for multi-source ship detection from RS imagery. First, to bridge the interpretation gap between multi-source images for ship detection, a novel image-instruction-answer way is designed to integrate the various ship detection ways (e.g., horizontal bounding box (HBB), oriented bounding box (OBB)) into a unified labeling paradigm. Then, in view of this, a cross-modal image interpretation method is developed for the proposed Popeye to enhance interactive comprehension ability between visual and language content, which can be easily migrated into any multi-source ship detection task. Subsequently, owing to objective domain differences, a knowledge adaption mechanism is designed to adapt the pre-trained visual-language knowledge from the nature scene into the RS domain for multi-source ship detection. In addition, the segment anything model (SAM) is also seamlessly integrated into the proposed Popeye to achieve pixel-level ship segmentation without additional training costs. Finally, extensive experiments are conducted on the newly constructed instruction dataset named MMShip, and the results indicate that the proposed Popeye outperforms current specialist, open-vocabulary, and other visual-language models for zero-shot multi-source ship detection.
Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks
Authors: Authors: Dario Pasquini, Martin Strohmeier, Carmela Troncoso
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.03792
Pdf link: https://arxiv.org/pdf/2403.03792
Abstract We introduce a new family of prompt injection attacks, termed Neural Exec. Unlike known attacks that rely on handcrafted strings (e.g., "Ignore previous instructions and..."), we show that it is possible to conceptualize the creation of execution triggers as a differentiable search problem and use learning-based methods to autonomously generate them. Our results demonstrate that a motivated adversary can forge triggers that are not only drastically more effective than current handcrafted ones but also exhibit inherent flexibility in shape, properties, and functionality. In this direction, we show that an attacker can design and generate Neural Execs capable of persisting through multi-stage preprocessing pipelines, such as in the case of Retrieval-Augmented Generation (RAG)-based applications. More critically, our findings show that attackers can produce triggers that deviate markedly in form and shape from any known attack, sidestepping existing blacklist-based detection and sanitation approaches.
Temporal Enhanced Floating Car Observers
Authors: Authors: Jeremias Gerner, Klaus Bogenberger, Stefanie Schmidtner
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03825
Pdf link: https://arxiv.org/pdf/2403.03825
Abstract Floating Car Observers (FCOs) are an innovative method to collect traffic data by deploying sensor-equipped vehicles to detect and locate other vehicles. We demonstrate that even a small penetration rate of FCOs can identify a significant amount of vehicles at a given intersection. This is achieved through the emulation of detection within a microscopic traffic simulation. Additionally, leveraging data from previous moments can enhance the detection of vehicles in the current frame. Our findings indicate that, with a 20-second observation window, it is possible to recover up to 20\% of vehicles that are not visible by FCOs in the current timestep. To exploit this, we developed a data-driven strategy, utilizing sequences of Bird's Eye View (BEV) representations of detected vehicles and deep learning models. This approach aims to bring currently undetected vehicles into view in the present moment, enhancing the currently detected vehicles. Results of different spatiotemporal architectures show that up to 41\% of the vehicles can be recovered into the current timestep at their current position. This enhancement enriches the information initially available by the FCO, allowing an improved estimation of traffic states and metrics (e.g. density and queue length) for improved implementation of traffic management strategies.
Redefining cystoscopy with ai: bladder cancer diagnosis using an efficient hybrid cnn-transformer model
Authors: Authors: Meryem Amaouche, Ouassim Karrakchou, Mounir Ghogho, Anouar El Ghazzaly, Mohamed Alami, Ahmed Ameur
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.03879
Pdf link: https://arxiv.org/pdf/2403.03879
Abstract Bladder cancer ranks within the top 10 most diagnosed cancers worldwide and is among the most expensive cancers to treat due to the high recurrence rates which require lifetime follow-ups. The primary tool for diagnosis is cystoscopy, which heavily relies on doctors' expertise and interpretation. Therefore, annually, numerous cases are either undiagnosed or misdiagnosed and treated as urinary infections. To address this, we suggest a deep learning approach for bladder cancer detection and segmentation which combines CNNs with a lightweight positional-encoding-free transformer and dual attention gates that fuse self and spatial attention for feature enhancement. The architecture suggested in this paper is efficient making it suitable for medical scenarios that require real time inference. Experiments have proven that this model addresses the critical need for a balance between computational efficiency and diagnostic accuracy in cystoscopic imaging as despite its small size it rivals large models in performance.
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
Authors: Authors: Tianshu Huang, John Miller, Akarsh Prabhakara, Tao Jin, Tarana Laroia, Zico Kolter, Anthony Rowe
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.03896
Pdf link: https://arxiv.org/pdf/2403.03896
Abstract Simulation is an invaluable tool for radio-frequency system designers that enables rapid prototyping of various algorithms for imaging, target detection, classification, and tracking. However, simulating realistic radar scans is a challenging task that requires an accurate model of the scene, radio frequency material properties, and a corresponding radar synthesis function. Rather than specifying these models explicitly, we propose DART - Doppler Aided Radar Tomography, a Neural Radiance Field-inspired method which uses radar-specific physics to create a reflectance and transmittance-based rendering pipeline for range-Doppler images. We then evaluate DART by constructing a custom data collection platform and collecting a novel radar dataset together with accurate position and instantaneous velocity measurements from lidar-based localization. In comparison to state-of-the-art baselines, DART synthesizes superior radar range-Doppler images from novel views across all datasets and additionally can be used to generate high quality tomographic images.
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing
Authors: Authors: Asmita, Yaroslav Oliinyk, Michael Scott, Ryan Tsang, Chongzhou Fang, Houman Homayoun
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2403.03897
Pdf link: https://arxiv.org/pdf/2403.03897
Abstract BusyBox, an open-source software bundling over 300 essential Linux commands into a single executable, is ubiquitous in Linux-based embedded devices. Vulnerabilities in BusyBox can have far-reaching consequences, affecting a wide array of devices. This research, driven by the extensive use of BusyBox, delved into its analysis. The study revealed the prevalence of older BusyBox versions in real-world embedded products, prompting us to conduct fuzz testing on BusyBox. Fuzzing, a pivotal software testing method, aims to induce crashes that are subsequently scrutinized to uncover vulnerabilities. Within this study, we introduce two techniques to fortify software testing. The first technique enhances fuzzing by leveraging Large Language Models (LLM) to generate target-specific initial seeds. Our study showed a substantial increase in crashes when using LLM-generated initial seeds, highlighting the potential of LLM to efficiently tackle the typically labor-intensive task of generating target-specific initial seeds. The second technique involves repurposing previously acquired crash data from similar fuzzed targets before initiating fuzzing on a new target. This approach streamlines the time-consuming fuzz testing process by providing crash data directly to the new target before commencing fuzzing. We successfully identified crashes in the latest BusyBox target without conducting traditional fuzzing, emphasizing the effectiveness of LLM and crash reuse techniques in enhancing software testing and improving vulnerability detection in embedded systems. Additionally, manual triaging was performed to identify the nature of crashes in the latest BusyBox.
Challenges of Processing Data Clumps within Plugin Architectures of Integrated Development Environment
Authors: Authors: Nils Baumgartner, Elke Pulvermüller
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2403.03903
Pdf link: https://arxiv.org/pdf/2403.03903
Abstract In this study, we explore advanced strategies for enhancing software quality by detecting and refactoring data clumps, special types of code smells. Our approach transcends the capabilities of integrated development environments, utilizing a novel method that separates the detection of data clumps from the source access. This method facilitates data clump processing. We introduce a command-line interface plugin to support this novel method of processing data clumps. This research highlights the efficacy of modularized algorithms and advocates their integration into continuous workflows, promising enhanced code quality and efficient project management across various programming and integrated development environments.
Keyword: face recognition

There is no result

Keyword: augmentation

Mad Libs Are All You Need: Augmenting Cross-Domain Document-Level Event Argument Data
Authors: Authors: Joseph Gatto, Parker Seegmiller, Omar Sharif, Sarah M. Preum
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.03304
Pdf link: https://arxiv.org/pdf/2403.03304
Abstract Document-Level Event Argument Extraction (DocEAE) is an extremely difficult information extraction problem -- with significant limitations in low-resource cross-domain settings. To address this problem, we introduce Mad Lib Aug (MLA), a novel generative DocEAE data augmentation framework. Our approach leverages the intuition that Mad Libs, which are categorically masked documents used as a part of a popular game, can be generated and solved by LLMs to produce data for DocEAE. Using MLA, we achieve a 2.6-point average improvement in overall F1 score. Moreover, this approach achieves a 3.9 and 5.2 point average increase in zero and few-shot event roles compared to augmentation-free baselines across all experiments. To better facilitate analysis of cross-domain DocEAE, we additionally introduce a new metric, Role-Depth F1 (RDF1), which uses statistical depth to identify roles in the target domain which are semantic outliers with respect to roles observed in the source domain. Our experiments show that MLA augmentation can boost RDF1 performance by an average of 5.85 points compared to non-augmented datasets.
Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications
Authors: Authors: Minyang Hu, Hong Chang, Zong Guo, Bingpeng Ma, Shiguan Shan, Xilin Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.03535
Pdf link: https://arxiv.org/pdf/2403.03535
Abstract Few-shot learning (FSL) aims to learn novel tasks with very few labeled samples by leveraging experience from \emph{related} training tasks. In this paper, we try to understand FSL by delving into two key questions: (1) How to quantify the relationship between \emph{training} and \emph{novel} tasks? (2) How does the relationship affect the \emph{adaptation difficulty} on novel tasks for different models? To answer the two questions, we introduce Task Attribute Distance (TAD) built upon attributes as a metric to quantify the task relatedness. Unlike many existing metrics, TAD is model-agnostic, making it applicable to different FSL models. Then, we utilize TAD metric to establish a theoretical connection between task relatedness and task adaptation difficulty. By deriving the generalization error bound on a novel task, we discover how TAD measures the adaptation difficulty on novel tasks for FSL models. To validate our TAD metric and theoretical findings, we conduct experiments on three benchmarks. Our experimental results confirm that TAD metric effectively quantifies the task relatedness and reflects the adaptation difficulty on novel tasks for various FSL methods, even if some of them do not learn attributes explicitly or human-annotated attributes are not available. Finally, we present two applications of the proposed TAD metric: data augmentation and test-time intervention, which further verify its effectiveness and general applicability. The source code is available at https://github.com/hu-my/TaskAttributeDistance.
MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition
Authors: Authors: Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.03691
Pdf link: https://arxiv.org/pdf/2403.03691
Abstract In the field of chemical structure recognition, the task of converting molecular images into graph structures and SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext, a powerful Convolutional Neural Network variant, and Vision-TRansformer. This integration facilitates a more nuanced extraction of both local and global features from molecular images. MolNexTR can predict atoms and bonds simultaneously and understand their layout rules. It also excels at flexibly integrating symbolic chemistry principles to discern chirality and decipher abbreviated structures. We further incorporate a series of advanced algorithms, including improved data augmentation module, image contamination module, and a post-processing module to get the final SMILES output. These modules synergistically enhance the model's robustness against the diverse styles of molecular imagery found in real literature. In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81-97%, marking a significant advancement in the domain of molecular structure recognition. Scientific contribution: MolNexTR is a novel image-to-graph model that incorporates a unique dual-stream encoder to extract complex molecular image features, and combines chemical rules to predict atoms and bonds while understanding atom and bond layout rules. In addition, it employs a series of novel augmentation algorithms to significantly enhance the robustness and performance of the model.
Does Documentation Matter? An Empirical Study of Practitioners' Perspective on Open-Source Software Adoption
Authors: Authors: Aaron Imani, Shiva Radmanesh, Iftekhar Ahmed, Mohammad Moshirpour
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2403.03819
Pdf link: https://arxiv.org/pdf/2403.03819
Abstract In recent years, open-source software (OSS) has become increasingly prevalent in developing software products. While OSS documentation is the primary source of information provided by the developers' community about a product, its role in the industry's adoption process has yet to be examined. We conducted semi-structured interviews and an online survey to provide insight into this area. Based on interviews and survey insights, we developed a topic model to collect relevant information from OSS documentation automatically. Additionally, according to our survey responses regarding challenges associated with OSS documentation, we propose a novel information augmentation approach, DocMentor, by combining OSS documentation corpus TF-IDF scores and ChatGPT. Through explaining technical terms and providing examples and references, our approach enhances the documentation context and improves practitioners' understanding. Our tool's effectiveness is assessed by surveying practitioners.
ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation
Authors: Authors: Erik Brorsson, Knut Åkesson, Lennart Svensson, Kristofer Bengtsson
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.03854
Pdf link: https://arxiv.org/pdf/2403.03854
Abstract We consider unsupervised domain adaptation (UDA) for semantic segmentation in which the model is trained on a labeled source dataset and adapted to an unlabeled target dataset. Unfortunately, current self-training methods are susceptible to misclassified pseudo-labels resulting from erroneous predictions. Since certain classes are typically associated with less reliable predictions in UDA, reducing the impact of such pseudo-labels without skewing the training towards some classes is notoriously difficult. To this end, we propose an extensive cut-and-paste strategy (ECAP) to leverage reliable pseudo-labels through data augmentation. Specifically, ECAP maintains a memory bank of pseudo-labeled target samples throughout training and cut-and-pastes the most confident ones onto the current training batch. We implement ECAP on top of the recent method MIC and boost its performance on two synthetic-to-real domain adaptation benchmarks. Notably, MIC+ECAP reaches an unprecedented performance of 69.1 mIoU on the Synthia->Cityscapes benchmark. Our code is available at https://github.com/ErikBrorsson/ECAP.
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
Authors: Authors: Indraneil Paul, Jun Luo, Goran Glavaš, Iryna Gurevych
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2403.03894
Pdf link: https://arxiv.org/pdf/2403.03894
Abstract Code understanding and generation have fast become some of the most popular applications of language models (LMs). Nonetheless, research on multilingual aspects of Code-LMs (i.e., LMs for code generation) such as cross-lingual transfer between different programming languages, language-specific data augmentation, and post-hoc LM adaptation, alongside exploitation of data sources other than the original textual content, has been much sparser than for their natural language counterparts. In particular, most mainstream Code-LMs have been pre-trained on source code files alone. In this work, we investigate the prospect of leveraging readily available compiler intermediate representations - shared across programming languages - to improve the multilingual capabilities of Code-LMs and facilitate cross-lingual transfer. To this end, we first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files coupled with respective intermediate representations. Next, starting from various base Code-LMs (ranging in size from 1.1B to 7.3B parameters), we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to (1) learn the IR language and (2) align the IR constructs with respective constructs of various programming languages. Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics, including prompt robustness, multilingual code completion, code understanding, and instruction following.

LeeKyungwook / get-arxiv-noti

New submissions for Thu, 7 Mar 24 #1009

Keyword: detection

TTPXHunter: Actionable Threat Intelligence Extraction as TTPs form Finished Cyber Threat Reports

Beyond the Dashboard: Investigating Distracted Driver Communication Preferences for ADAS

DIVERSE: Deciphering Internet Views on the U.S. Military Through Video Comment Stance Analysis, A Novel Benchmark Dataset for Stance Classification

Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse

Enhancing Vision-Language Pre-training with Rich Supervisions

Leveraging Federated Learning for Automatic Detection of Clopidogrel Treatment Failures

Performance Evaluation of Semi-supervised Learning Frameworks for Multi-Class Weed Detection

Contrastive Learning of Person-independent Representations for Facial Action Unit Detection

Advancing Out-of-Distribution Detection through Data Purification and Dynamic Activation Function Design

FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion

Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator

Global Geolocated Realtime Data of Interfleet Urban Transit Bus Idling

Towards Detecting AI-Generated Text within Human-AI Collaborative Hybrid Texts

Non-verbal information in spontaneous speech - towards a new framework of analysis

RADIA -- Radio Advertisement Detection with Intelligent Analytics

Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip Perception of Mobile Manipulation Robots

Unsupervised Incremental Learning with Dual Concept Drift Detection for Identifying Anomalous Sequences

Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing

DeepEclipse: How to Break White-Box DNN-Watermarking Schemes

Spectrum Occupancy Detection Supported by Federated Learning

Processing Load Allocation of On-Board Multi-User Detection for Payload-Constrained Satellite Networks

Portraying the Need for Temporal Data in Flood Detection via Sentinel-1

Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors

Quantifying Media Influence on Covid-19 Mask-Wearing Beliefs

CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection

German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset

Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery

Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks

Temporal Enhanced Floating Car Observers

Redefining cystoscopy with ai: bladder cancer diagnosis using an efficient hybrid cnn-transformer model

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing

Challenges of Processing Data Clumps within Plugin Architectures of Integrated Development Environment

Keyword: face recognition

Keyword: augmentation

Mad Libs Are All You Need: Augmenting Cross-Domain Document-Level Event Argument Data

Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications

MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

Does Documentation Matter? An Empirical Study of Practitioners' Perspective on Open-Source Software Adoption

ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators