New submissions for Wed, 31 Jan 24

Keyword: detection

FaKnow: A Unified Library for Fake News Detection

Authors: Authors: Yiyuan Zhu, Yongjun Li, Jialiang Wang, Ming Gao, Jiali Wei
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.16441
Pdf link: https://arxiv.org/pdf/2401.16441
Abstract Over the past years, a large number of fake news detection algorithms based on deep learning have emerged. However, they are often developed under different frameworks, each mandating distinct utilization methodologies, consequently hindering reproducibility. Additionally, a substantial amount of redundancy characterizes the code development of such fake news detection models. To address these concerns, we propose FaKnow, a unified and comprehensive fake news detection algorithm library. It encompasses a variety of widely used fake news detection models, categorized as content-based and social context-based approaches. This library covers the full spectrum of the model training and evaluation process, effectively organizing the data, models, and training procedures within a unified framework. Furthermore, it furnishes a series of auxiliary functionalities and tools, including visualization, and logging. Our work contributes to the standardization and unification of fake news detection research, concurrently facilitating the endeavors of researchers in this field. The open-source code and documentation can be accessed at https://github.com/NPURG/FaKnow and https://faknow.readthedocs.io, respectively.
Evaluating Deep Networks for Detecting User Familiarity with VR from Hand Interactions
Authors: Authors: Mingjun Li, Numan Zafar, Natasha Kholgade Banerjee, Sean Banerjee
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.16443
Pdf link: https://arxiv.org/pdf/2401.16443
Abstract As VR devices become more prevalent in the consumer space, VR applications are likely to be increasingly used by users unfamiliar with VR. Detecting the familiarity level of a user with VR as an interaction medium provides the potential of providing on-demand training for acclimatization and prevents the user from being burdened by the VR environment in accomplishing their tasks. In this work, we present preliminary results of using deep classifiers to conduct automatic detection of familiarity with VR by using hand tracking of the user as they interact with a numeric passcode entry panel to unlock a VR door. We use a VR door as we envision it to the first point of entry to collaborative virtual spaces, such as meeting rooms, offices, or clinics. Users who are unfamiliar with VR will have used their hands to open doors with passcode entry panels in the real world. Thus, while the user may not be familiar with VR, they would be familiar with the task of opening the door. Using a pilot dataset consisting of 7 users familiar with VR, and 7 not familiar with VR, we acquire highest accuracy of 88.03\% when 6 test users, 3 familiar and 3 not familiar, are evaluated with classifiers trained using data from the remaining 8 users. Our results indicate potential for using user movement data to detect familiarity for the simple yet important task of secure passcode-based access.
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Authors: Authors: Seokju Yun, Youngmin Ro
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.16456
Pdf link: https://arxiv.org/pdf/2401.16456
Abstract Recently, efficient Vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use 4x4 patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance by leveraging token representations with reduced spatial redundancy from the early stages. Furthermore, our preliminary analyses suggest that attention layers in the early stages can be substituted with convolutions, and several attention heads in the latter stages are computationally redundant. To handle this, we introduce a single-head attention module that inherently prevents head redundancy and simultaneously boosts accuracy by parallelly combining global and local information. Building upon our solutions, we introduce SHViT, a Single-Head Vision Transformer that obtains the state-of-the-art speed-accuracy tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device, respectively, while being 1.3% more accurate. For object detection and instance segmentation on MS COCO using Mask-RCNN head, our model achieves performance comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone latency on GPU and mobile device, respectively.
Error detection using pneumatic logic
Authors: Authors: Shane Hoang, Mabel Shehada, Zinal Patel, Minh-Huy Tran, Konstantinos Karydis, Philip Brisk, William H. Grover
Subjects: Emerging Technologies (cs.ET); Instrumentation and Detectors (physics.ins-det)
Arxiv link: https://arxiv.org/abs/2401.16500
Pdf link: https://arxiv.org/pdf/2401.16500
Abstract Pneumatic systems are common in manufacturing, healthcare, transportation, robotics, and many other fields. Failures in these systems can have very serious consequences, particularly if they go undetected. In this work, we present an air-powered error detector device that can detect and respond to failures in pneumatically actuated systems. The device contains 21 monolithic membrane valves that act like transistors in a pneumatic logic "circuit" that uses vacuum to represent TRUE and atmospheric pressure as FALSE. Three pneumatic exclusive-OR (XOR) gates are used to calculate the parity bit corresponding to the values of several control bits. If the calculated value of the parity bit differs from the expected value, then an error (like a leak or a blocked air line) has been detected and the device outputs a pneumatic error signal which can in turn be used to alert a user, shut down the system, or take some other action. As a proof-of-concept, we used our pneumatic error detector to monitor the operation of a medical device, an intermittent pneumatic compression (IPC) device commonly used to prevent the formation of life-threatening blood clots in the wearer's legs. Experiments confirm that when the IPC device was damaged, the pneumatic error detector immediately recognized the error (a leak) and alerted the wearer using sound. By providing a simple and low-cost way to add fault detection to pneumatic actuation systems without using sensors, our pneumatic error detector can promote safety and reliability across the wide range of pneumatic systems.
Saccade-Contingent Rendering
Authors: Authors: Yuna Kwak, Eric Penner, Xuan Wang, Mohammad R. Saeedpour-Parizi, Olivier Mercier, Xiuyun Wu, T. Scott Murdison, Phillip Guan
Subjects: Graphics (cs.GR); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2401.16536
Pdf link: https://arxiv.org/pdf/2401.16536
Abstract Battery-constrained power consumption, compute limitations, and high frame rate requirements in head-mounted displays present unique challenges in the drive to present increasingly immersive and comfortable imagery in virtual reality. However, humans are not equally sensitive to all regions of the visual field, and perceptually-optimized rendering techniques are increasingly utilized to address these bottlenecks. Many of these techniques are gaze-contingent and often render reduced detail away from a user's fixation. Such techniques are dependent on spatio-temporally-accurate gaze tracking and can result in obvious visual artifacts when eye tracking is inaccurate. In this work we present a gaze-contingent rendering technique which only requires saccade detection, bypassing the need for highly-accurate eye tracking. In our first experiment, we show that visual acuity is reduced for several hundred milliseconds after a saccade. In our second experiment, we use these results to reduce the rendered image resolution after saccades in a controlled psychophysical setup, and find that observers cannot discriminate between saccade-contingent reduced-resolution rendering and full-resolution rendering. Finally, in our third experiment, we introduce a 90 pixels per degree headset and validate our saccade-contingent rendering method under typical VR viewing conditions.
The Why, When, and How to Use Active Learning in Large-Data-Driven 3D Object Detection for Safe Autonomous Driving: An Empirical Exploration
Authors: Authors: Ross Greer, Bjørk Antoniussen, Mathias V. Andersen, Andreas Møgelmose, Mohan M. Trivedi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.16634
Pdf link: https://arxiv.org/pdf/2401.16634
Abstract Active learning strategies for 3D object detection in autonomous driving datasets may help to address challenges of data imbalance, redundancy, and high-dimensional data. We demonstrate the effectiveness of entropy querying to select informative samples, aiming to reduce annotation costs and improve model performance. We experiment using the BEVFusion model for 3D object detection on the nuScenes dataset, comparing active learning to random sampling and demonstrating that entropy querying outperforms in most cases. The method is particularly effective in reducing the performance gap between majority and minority classes. Class-specific analysis reveals efficient allocation of annotated resources for limited data budgets, emphasizing the importance of selecting diverse and informative data for model training. Our findings suggest that entropy querying is a promising strategy for selecting data that enhances model learning in resource-constrained environments.
Generalization of LiNGAM that allows confounding
Authors: Authors: Joe Suzuki, Tian-Le Yang
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2401.16661
Pdf link: https://arxiv.org/pdf/2401.16661
Abstract LiNGAM determines the variable order from cause to effect using additive noise models, but it faces challenges with confounding. Previous methods maintained LiNGAM's fundamental structure while trying to identify and address variables affected by confounding. As a result, these methods required significant computational resources regardless of the presence of confounding, and they did not ensure the detection of all confounding types. In contrast, this paper enhances LiNGAM by introducing LiNGAM-MMI, a method that quantifies the magnitude of confounding using KL divergence and arranges the variables to minimize its impact. This method efficiently achieves a globally optimal variable order through the shortest path problem formulation. LiNGAM-MMI processes data as efficiently as traditional LiNGAM in scenarios without confounding while effectively addressing confounding situations. Our experimental results suggest that LiNGAM-MMI more accurately determines the correct variable order, both in the presence and absence of confounding.
The Detection and Understanding of Fictional Discourse
Authors: Authors: Andrew Piper, Haiqi Zhou
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.16678
Pdf link: https://arxiv.org/pdf/2401.16678
Abstract In this paper, we present a variety of classification experiments related to the task of fictional discourse detection. We utilize a diverse array of datasets, including contemporary professionally published fiction, historical fiction from the Hathi Trust, fanfiction, stories from Reddit, folk tales, GPT-generated stories, and anglophone world literature. Additionally, we introduce a new feature set of word "supersenses" that facilitate the goal of semantic generalization. The detection of fictional discourse can help enrich our knowledge of large cultural heritage archives and assist with the process of understanding the distinctive qualities of fictional storytelling more broadly.
Characterization of Magnetic Labyrinthine Structures through Junctions and Terminals Detection using Template Matching and CNN
Authors: Authors: Vinícius Yu Okubo, Kotaro Shimizu, B. S. Shivaram, Hae Yong Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.16688
Pdf link: https://arxiv.org/pdf/2401.16688
Abstract In material sciences, characterizing faults in periodic structures is vital for understanding material properties. To characterize magnetic labyrinthine patterns, it is necessary to accurately identify junctions and terminals, often featuring over a thousand closely packed defects per image. This study introduces a new technique called TM-CNN (Template Matching - Convolutional Neural Network) designed to detect a multitude of small objects in images, such as defects in magnetic labyrinthine patterns. TM-CNN was used to identify these structures in 444 experimental images, and the results were explored to deepen the understanding of magnetic materials. It employs a two-stage detection approach combining template matching, used in initial detection, with a convolutional neural network, used to eliminate incorrect identifications. To train a CNN classifier, it is necessary to create a large number of training images. This difficulty prevents the use of CNN in many practical applications. TM-CNN significantly reduces the manual workload for creating training images by automatically making most of the annotations and leaving only a small number of corrections to human reviewers. In testing, TM-CNN achieved an impressive F1 score of 0.988, far outperforming traditional template matching and CNN-based object detection algorithms.
Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers
Authors: Authors: Jianbin Jiao, Xina Cheng, Weijie Chen, Xiaoting Yin, Hao Shi, Kailun Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.16700
Pdf link: https://arxiv.org/pdf/2401.16700
Abstract 3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable self-attention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multi-perspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves state-of-the-art performance on this dataset.
LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras
Authors: Authors: Fei Teng, Jiaming Zhang, Jiawei Liu, Kunyu Peng, Xina Cheng, Zhiyong Li, Kailun Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.16712
Pdf link: https://arxiv.org/pdf/2401.16712
Abstract Leveraging the rich information extracted from light field (LF) cameras is instrumental for dense prediction tasks. However, adapting light field data to enhance Salient Object Detection (SOD) still follows the traditional RGB methods and remains under-explored in the community. Previous approaches predominantly employ a custom two-stream design to discover the implicit angular feature within light field cameras, leading to significant information isolation between different LF representations. In this study, we propose an efficient paradigm (LF Tracy) to address this limitation. We eschew the conventional specialized fusion and decoder architecture for a dual-stream backbone in favor of a unified, single-pipeline approach. This comprises firstly a simple yet effective data augmentation strategy called MixLD to bridge the connection of spatial, depth, and implicit angular information under different LF representations. A highly efficient information aggregation (IA) module is then introduced to boost asymmetric feature-wise information fusion. Owing to this innovative approach, our model surpasses the existing state-of-the-art methods, particularly demonstrating a 23% improvement over previous results on the latest large-scale PKU dataset. By utilizing only 28.9M parameters, the model achieves a 10% increase in accuracy with 3M additional parameters compared to its backbone using RGB images and an 86% rise to its backbone using LF images. The source code will be made publicly available at https://github.com/FeiBryantkit/LF-Tracy.
Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models
Authors: Authors: Ming Shan Hee, Shivam Sharma, Rui Cao, Palash Nandi, Preslav Nakov, Tanmoy Chakraborty, Roy Ka-Wei Lee
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.16727
Pdf link: https://arxiv.org/pdf/2401.16727
Abstract In the evolving landscape of online communication, moderating hate speech (HS) presents an intricate challenge, compounded by the multimodal nature of digital content. This comprehensive survey delves into the recent strides in HS moderation, spotlighting the burgeoning role of large language models (LLMs) and large multimodal models (LMMs). Our exploration begins with a thorough analysis of current literature, revealing the nuanced interplay between textual, visual, and auditory elements in propagating HS. We uncover a notable trend towards integrating these modalities, primarily due to the complexity and subtlety with which HS is disseminated. A significant emphasis is placed on the advances facilitated by LLMs and LMMs, which have begun to redefine the boundaries of detection and moderation capabilities. We identify existing gaps in research, particularly in the context of underrepresented languages and cultures, and the need for solutions to handle low-resource settings. The survey concludes with a forward-looking perspective, outlining potential avenues for future research, including the exploration of novel AI methodologies, the ethical governance of AI in moderation, and the development of more nuanced, context-aware systems. This comprehensive overview aims to catalyze further research and foster a collaborative effort towards more sophisticated, responsible, and human-centric approaches to HS moderation in the digital era.\footnote{ \textcolor{red}{WARNING: This paper contains offensive examples.
Detecting Racist Text in Bengali: An Ensemble Deep Learning Framework
Authors: Authors: S. S. Saruar, Nusrat, Sadia
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.16748
Pdf link: https://arxiv.org/pdf/2401.16748
Abstract Racism is an alarming phenomenon in our country as well as all over the world. Every day we have come across some racist comments in our daily life and virtual life. Though we can eradicate this racism from virtual life (such as Social Media). In this paper, we have tried to detect those racist comments with NLP and deep learning techniques. We have built a novel dataset in the Bengali Language. Further, we annotated the dataset and conducted data label validation. After extensive utilization of deep learning methodologies, we have successfully achieved text detection with an impressive accuracy rate of 87.94\% using the Ensemble approach. We have applied RNN and LSTM models using BERT Embeddings. However, the MCNN-LSTM model performed highest among all those models. Lastly, the Ensemble approach has been followed to combine all the model results to increase overall performance.
Detection and Recovery Against Deep Neural Network Fault Injection Attacks Based on Contrastive Learning
Authors: Authors: Chenan Wang, Pu Zhao, Siyue Wang, Xue Lin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.16766
Pdf link: https://arxiv.org/pdf/2401.16766
Abstract Deep Neural Network (DNN) models when implemented on executing devices as the inference engines are susceptible to Fault Injection Attacks (FIAs) that manipulate model parameters to disrupt inference execution with disastrous performance. This work introduces Contrastive Learning (CL) of visual representations i.e., a self-supervised learning approach into the deep learning training and inference pipeline to implement DNN inference engines with self-resilience under FIAs. Our proposed CL based FIA Detection and Recovery (CFDR) framework features (i) real-time detection with only a single batch of testing data and (ii) fast recovery effective even with only a small amount of unlabeled testing data. Evaluated with the CIFAR-10 dataset on multiple types of FIAs, our CFDR shows promising detection and recovery effectiveness.
Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian Approach
Authors: Authors: Hao Zhang, Qingfeng Lin, Yang Li, Lei Cheng, Yik-Chung Wu
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.16775
Pdf link: https://arxiv.org/pdf/2401.16775
Abstract Activity detection is an important task in the next generation grant-free multiple access. While there are a number of existing algorithms designed for this purpose, they mostly require precise information about the network, such as large-scale fading coefficients, small-scale fading channel statistics, noise variance at the access points, and user activity probability. Acquiring these information would take a significant overhead and their estimated values might not be accurate. This problem is even more severe in cell-free networks as there are many of these parameters to be acquired. Therefore, this paper sets out to investigate the activity detection problem without the above-mentioned information. In order to handle so many unknown parameters, this paper employs the Bayesian approach, where the unknown variables are endowed with prior distributions which effectively act as regularizations. Together with the likelihood function, a maximum a posteriori (MAP) estimator and a variational inference algorithm are derived. Extensive simulations demonstrate that the proposed methods, even without the knowledge of these system parameters, perform better than existing state-of-the-art methods, such as covariance-based and approximate message passing methods.
Detecting LLM-Assisted Writing in Scientific Communication: Are We There Yet?
Authors: Authors: Teddy Lazebnik, Ariel Rosenfeld
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.16807
Pdf link: https://arxiv.org/pdf/2401.16807
Abstract Large Language Models (LLMs), exemplified by ChatGPT, have significantly reshaped text generation, particularly in the realm of writing assistance. While ethical considerations underscore the importance of transparently acknowledging LLM use, especially in scientific communication, genuine acknowledgment remains infrequent. A potential avenue to encourage accurate acknowledging of LLM-assisted writing involves employing automated detectors. Our evaluation of four cutting-edge LLM-generated text detectors reveals their suboptimal performance compared to a simple ad-hoc detector designed to identify abrupt writing style changes around the time of LLM proliferation. We contend that the development of specialized detectors exclusively dedicated to LLM-assisted writing detection is necessary. Such detectors could play a crucial role in fostering more authentic recognition of LLM involvement in scientific communication, addressing the current challenges in acknowledgment practices.
Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code
Authors: Authors: Wenjie Qu, Dong Yin, Zixin He, Wei Zou, Tianyang Tao, Jinyuan Jia, Jiaheng Zhang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.16820
Pdf link: https://arxiv.org/pdf/2401.16820
Abstract Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text generated by a LLM. Consequently, this enables the detection of texts generated by a LLM as well as the tracing of generated texts to a specific user. The major limitation of existing watermark techniques is that they cannot accurately or efficiently extract the watermark from a text, especially when the watermark is a long bit string. This key limitation impedes their deployment for real-world applications, e.g., tracing generated texts to a specific user. This work introduces a novel watermarking method for LLM-generated text grounded in \textbf{error-correction codes} to address this challenge. We provide strong theoretical analysis, demonstrating that under bounded adversarial word/token edits (insertion, deletion, and substitution), our method can correctly extract watermarks, offering a provable robustness guarantee. This breakthrough is also evidenced by our extensive experimental results. The experiments show that our method substantially outperforms existing baselines in both accuracy and robustness on benchmark datasets. For instance, when embedding a bit string of length 12 into a 200-token generated text, our approach attains an impressive match rate of $98.4\%$, surpassing the performance of Yoo et al. (state-of-the-art baseline) at $85.6\%$. When subjected to a copy-paste attack involving the injection of 50 tokens to generated texts with 200 words, our method maintains a substantial match rate of $90.8\%$, while the match rate of Yoo et al. diminishes to below $65\%$.
Evaluating ML-Based Anomaly Detection Across Datasets of Varied Integrity: A Case Study
Authors: Authors: Adrian Pekar, Richard Jozsa
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2401.16843
Pdf link: https://arxiv.org/pdf/2401.16843
Abstract Cybersecurity remains a critical challenge in the digital age, with network traffic flow anomaly detection being a key pivotal instrument in the fight against cyber threats. In this study, we address the prevalent issue of data integrity in network traffic datasets, which are instrumental in developing machine learning (ML) models for anomaly detection. We introduce two refined versions of the CICIDS-2017 dataset, NFS-2023-nTE and NFS-2023-TE, processed using NFStream to ensure methodologically sound flow expiration and labeling. Our research contrasts the performance of the Random Forest (RF) algorithm across the original CICIDS-2017, its refined counterparts WTMC-2021 and CRiSIS-2022, and our NFStream-generated datasets, in both binary and multi-class classification contexts. We observe that the RF model exhibits exceptional robustness, achieving consistent high-performance metrics irrespective of the underlying dataset quality, which prompts a critical discussion on the actual impact of data integrity on ML efficacy. Our study underscores the importance of continual refinement and methodological rigor in dataset generation for network security research. As the landscape of network threats evolves, so must the tools and techniques used to detect and analyze them.
Segmentation and Characterization of Macerated Fibers and Vessels Using Deep Learning
Authors: Authors: Saqib Qamar, Abu Imran Baba, Stéphane Verger, Magnus Andersson
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.16937
Pdf link: https://arxiv.org/pdf/2401.16937
Abstract Purpose: Wood comprises different cell types, such as fibers and vessels, defining its properties. Studying their shape, size, and arrangement in microscopic images is crucial for understanding wood samples. Typically, this involves macerating (soaking) samples in a solution to separate cells, then spreading them on slides for imaging with a microscope that covers a wide area, capturing thousands of cells. However, these cells often cluster and overlap in images, making the segmentation difficult and time-consuming using standard image-processing methods. Results: In this work, we develop an automatic deep learning segmentation approach that utilizes the one-stage YOLOv8 model for fast and accurate fiber and vessel segmentation and characterization in microscopy images. The model can analyze 32640 x 25920 pixels images and demonstrate effective cell detection and segmentation, achieving a mAP_0.5-0.95 of 78 %. To assess the model's robustness, we examined fibers from a genetically modified tree line known for longer fibers. The outcomes were comparable to previous manual measurements. Additionally, we created a user-friendly web application for image analysis and provided the code for use on Google Colab. Conclusion: By leveraging YOLOv8's advances, this work provides a deep learning solution to enable efficient quantification and analysis of wood cells suitable for practical applications.
WGAN-AFL: Seed Generation Augmented Fuzzer with Wasserstein-GAN
Authors: Authors: Liqun Yang, Chunan Li, Yongxin Qiu, Chaoren Wei, Jian Yang, Hongcheng Guo, Jinxin Ma, Zhoujun Li
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.16947
Pdf link: https://arxiv.org/pdf/2401.16947
Abstract The importance of addressing security vulnerabilities is indisputable, with software becoming crucial in sectors such as national defense and finance. Consequently, The security issues caused by software vulnerabilities cannot be ignored. Fuzz testing is an automated software testing technology that can detect vulnerabilities in the software. However, most previous fuzzers encounter challenges that fuzzing performance is sensitive to initial input seeds. In the absence of high-quality initial input seeds, fuzzers may expend significant resources on program path exploration, leading to a substantial decrease in the efficiency of vulnerability detection. To address this issue, we propose WGAN-AFL. By collecting high-quality testcases, we train a generative adversarial network (GAN) to learn their features, thereby obtaining high-quality initial input seeds. To overcome drawbacks like mode collapse and training instability inherent in GANs, we utilize the Wasserstein GAN (WGAN) architecture for training, further enhancing the quality of the generated seeds. Experimental results demonstrate that WGAN-AFL significantly outperforms the original AFL in terms of code coverage, new paths, and vulnerability discovery, demonstrating the effective enhancement of seed quality by WGAN-AFL.
Taxonomy of Mathematical Plagiarism
Authors: Authors: Ankit Satpute, Andre Greiner-Petter, Noah Gießing, Isabel Beckenbach, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2401.16969
Pdf link: https://arxiv.org/pdf/2401.16969
Abstract Plagiarism is a pressing concern, even more so with the availability of large language models. Existing plagiarism detection systems reliably find copied and moderately reworded text but fail for idea plagiarism, especially in mathematical science, which heavily uses formal mathematical notation. We make two contributions. First, we establish a taxonomy of mathematical content reuse by annotating potentially plagiarised 122 scientific document pairs. Second, we analyze the best-performing approaches to detect plagiarism and mathematical content similarity on the newly established taxonomy. We found that the best-performing methods for plagiarism and math content similarity achieve an overall detection score (PlagDet) of 0.06 and 0.16, respectively. The best-performing methods failed to detect most cases from all seven newly established math similarity types. Outlined contributions will benefit research in plagiarism detection systems, recommender systems, question-answering systems, and search engines. We make our experiment's code and annotated dataset available to the community: https://github.com/gipplab/Taxonomy-of-Mathematical-Plagiarism
ActDroid: An active learning framework for Android malware detection
Authors: Authors: Ali Muzaffar, Hani Ragab Hassen, Hind Zantout, Michael A Lones
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.16982
Pdf link: https://arxiv.org/pdf/2401.16982
Abstract The growing popularity of Android requires malware detection systems that can keep up with the pace of new software being released. According to a recent study, a new piece of malware appears online every 12 seconds. To address this, we treat Android malware detection as a streaming data problem and explore the use of active online learning as a means of mitigating the problem of labelling applications in a timely and cost-effective manner. Our resulting framework achieves accuracies of up to 96\%, requires as little of 24\% of the training data to be labelled, and compensates for concept drift that occurs between the release and labelling of an application. We also consider the broader practicalities of online learning within Android malware detection, and systematically explore the trade-offs between using different static, dynamic and hybrid feature sets to classify malware.
Finetuning Large Language Models for Vulnerability Detection
Authors: Authors: Alexey Shestov, Anton Cheshkov, Rodion Levichev, Ravil Mussabayev, Pavel Zadorozhny, Evgeny Maslov, Chibirev Vadim, Egor Bulychev
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.17010
Pdf link: https://arxiv.org/pdf/2401.17010
Abstract This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for vulnerability detection through further finetuning. To accelerate training, we modify WizardCoder's training procedure, also we investigate optimal training regimes. For the imbalanced dataset with many more negative examples than positive, we also explore different techniques to improve classification performance. The finetuned WizardCoder model achieves improvement in ROC AUC and F1 measures on balanced and imbalanced vulnerability datasets over CodeBERT-like model, demonstrating the effectiveness of adapting pretrained LLMs for vulnerability detection in source code. The key contributions are finetuning the state-of-the-art code LLM, WizardCoder, increasing its training speed without the performance harm, optimizing the training procedure and regimes, handling class imbalance, and improving performance on difficult vulnerability detection datasets. This demonstrates the potential for transfer learning by finetuning large pretrained language models for specialized source code analysis tasks.
Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again
Authors: Authors: Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.17052
Pdf link: https://arxiv.org/pdf/2401.17052
Abstract Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classification and regression. In this work, we investigate using retrieval-augmented models for anomaly detection on tabular data. We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features of \textit{normal} samples. We test the effectiveness of KNN-based and attention-based modules to select relevant samples to help in the reconstruction process of the target sample. Our experiments on a benchmark of 31 tabular datasets reveal that augmenting this reconstruction-based anomaly detection (AD) method with non-parametric relationships via retrieval modules may significantly boost performance.
Joint Semantic Communication and Target Sensing for 6G Communication System
Authors: Authors: Yinchao Yang, Mohammad Shikh-Bahaei, Zhaohui Yang, Chongwen Huang, Wei Xu, Zhaoyang Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.17108
Pdf link: https://arxiv.org/pdf/2401.17108
Abstract This paper investigates the secure resource allocation for a downlink integrated sensing and communication system with multiple legal users and potential eavesdroppers. In the considered model, the base station (BS) simultaneously transmits sensing and communication signals through beamforming design, where the sensing signals can be viewed as artificial noise to enhance the security of communication signals. To further enhance the security in the semantic layer, the semantic information is extracted from the original information before transmission. The user side can only successfully recover the received information with the help of the knowledge base shared with the BS, which is stored in advance. Our aim is to maximize the sum semantic secrecy rate of all users while maintaining the minimum quality of service for each user and guaranteeing overall sensing performance. To solve this sum semantic secrecy rate maximization problem, an iterative algorithm is proposed using the alternating optimization method. The simulation results demonstrate the superiority of the proposed algorithm in terms of secure semantic communication and reliable detection.
A Bearing-Angle Approach for Unknown Target Motion Analysis Based on Visual Measurements
Authors: Authors: Zian Ning, Yin Zhang, Jianan Li, Zhang Chen, Shiyu Zhao
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.17117
Pdf link: https://arxiv.org/pdf/2401.17117
Abstract Vision-based estimation of the motion of a moving target is usually formulated as a bearing-only estimation problem where the visual measurement is modeled as a bearing vector. Although the bearing-only approach has been studied for decades, a fundamental limitation of this approach is that it requires extra lateral motion of the observer to enhance the target's observability. Unfortunately, the extra lateral motion conflicts with the desired motion of the observer in many tasks. It is well-known that, once a target has been detected in an image, a bounding box that surrounds the target can be obtained. Surprisingly, this common visual measurement especially its size information has not been well explored up to now. In this paper, we propose a new bearing-angle approach to estimate the motion of a target by modeling its image bounding box as bearing-angle measurements. Both theoretical analysis and experimental results show that this approach can significantly enhance the observability without relying on additional lateral motion of the observer. The benefit of the bearing-angle approach comes with no additional cost because a bounding box is a standard output of object detection algorithms. The approach simply exploits the information that has not been fully exploited in the past. No additional sensing devices or special detection algorithms are required.
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
Authors: Authors: Adrian S. Roman, Baladithya Balamurugan, Rithik Pothuganti
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2401.17129
Pdf link: https://arxiv.org/pdf/2401.17129
Abstract This technical report details our work towards building an enhanced audio-visual sound event localization and detection (SELD) network. We build on top of the audio-only SELDnet23 model and adapt it to be audio-visual by merging both audio and video information prior to the gated recurrent unit (GRU) of the audio-only network. Our model leverages YOLO and DETIC object detectors. We also build a framework that implements audio-visual data augmentation and audio-visual synthetic data generation. We deliver an audio-visual SELDnet system that outperforms the existing audio-visual SELD baseline.
Optical Tactile Sensing for Aerial Multi-Contact Interaction: Design, Integration, and Evaluation
Authors: Authors: Emanuele Aucone, Carmelo Sferrazza, Manuel Gregor, Raffaello D'Andrea, Stefano Mintchev
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.17149
Pdf link: https://arxiv.org/pdf/2401.17149
Abstract Distributed tactile sensing for multi-force detection is crucial for various aerial robot interaction tasks. However, current contact sensing solutions on drones only exploit single end-effector sensors and cannot provide distributed multi-contact sensing. Designed to be easily mounted at the bottom of a drone, we propose an optical tactile sensor that features a large and curved soft sensing surface, a hollow structure and a new illumination system. Even when spaced only 2 cm apart, multiple contacts can be detected simultaneously using our software pipeline, which provides real-world quantities of 3D contact locations (mm) and 3D force vectors (N), with an accuracy of 1.5 mm and 0.17 N respectively. We demonstrate the sensor's applicability and reliability onboard and in real-time with two demos related to i) the estimation of the compliance of different perches and subsequent re-alignment and landing on the stiffer one, and ii) the mapping of sparse obstacles. The implementation of our distributed tactile sensor represents a significant step towards attaining the full potential of drones as versatile robots capable of interacting with and navigating within complex environments.
Proactive Detection of Voice Cloning with Localized Watermarking
Authors: Authors: Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.17264
Pdf link: https://arxiv.org/pdf/2401.17264
Abstract In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.
YOLO-World: Real-Time Open-Vocabulary Object Detection
Authors: Authors: Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.17270
Pdf link: https://arxiv.org/pdf/2401.17270
Abstract The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation.
A simple, strong baseline for building damage detection on the xBD dataset
Authors: Authors: Sebastian Gerard, Paul Borne-Pons, Josephine Sullivan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.17271
Pdf link: https://arxiv.org/pdf/2401.17271
Abstract We construct a strong baseline method for building damage detection by starting with the highly-engineered winning solution of the xView2 competition, and gradually stripping away components. This way, we obtain a much simpler method, while retaining adequate performance. We expect the simplified solution to be more widely and easily applicable. This expectation is based on the reduced complexity, as well as the fact that we choose hyperparameters based on simple heuristics, that transfer to other datasets. We then re-arrange the xView2 dataset splits such that the test locations are not seen during training, contrary to the competition setup. In this setting, we find that both the complex and the simplified model fail to generalize to unseen locations. Analyzing the dataset indicates that this failure to generalize is not only a model-based problem, but that the difficulty might also be influenced by the unequal class distributions between events. Code, including the baseline model, is available under https://github.com/PaulBorneP/Xview2_Strong_Baseline
Keyword: face recognition

Optimal-Landmark-Guided Image Blending for Face Morphing Attacks
Authors: Authors: Qiaoyun He, Zongyong Deng, Zuyuan He, Qijun Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.16722
Pdf link: https://arxiv.org/pdf/2401.16722
Abstract In this paper, we propose a novel approach for conducting face morphing attacks, which utilizes optimal-landmark-guided image blending. Current face morphing attacks can be categorized into landmark-based and generation-based approaches. Landmark-based methods use geometric transformations to warp facial regions according to averaged landmarks but often produce morphed images with poor visual quality. Generation-based methods, which employ generation models to blend multiple face images, can achieve better visual quality but are often unsuccessful in generating morphed images that can effectively evade state-of-the-art face recognition systems~(FRSs). Our proposed method overcomes the limitations of previous approaches by optimizing the morphing landmarks and using Graph Convolutional Networks (GCNs) to combine landmark and appearance features. We model facial landmarks as nodes in a bipartite graph that is fully connected and utilize GCNs to simulate their spatial and structural relationships. The aim is to capture variations in facial shape and enable accurate manipulation of facial appearance features during the warping process, resulting in morphed facial images that are highly realistic and visually faithful. Experiments on two public datasets prove that our method inherits the advantages of previous landmark-based and generation-based methods and generates morphed images with higher quality, posing a more significant threat to state-of-the-art FRSs.
Keyword: augmentation

Hybrid Transformer and Spatial-Temporal Self-Supervised Learning for Long-term Traffic Prediction
Authors: Authors: Wang Zhu, Doudou Zhang, Baichao Long, Jianli Xiao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.16453
Pdf link: https://arxiv.org/pdf/2401.16453
Abstract Long-term traffic prediction has always been a challenging task due to its dynamic temporal dependencies and complex spatial dependencies. In this paper, we propose a model that combines hybrid Transformer and spatio-temporal self-supervised learning. The model enhances its robustness by applying adaptive data augmentation techniques at the sequence-level and graph-level of the traffic data. It utilizes Transformer to overcome the limitations of recurrent neural networks in capturing long-term sequences, and employs Chebyshev polynomial graph convolution to capture complex spatial dependencies. Furthermore, considering the impact of spatio-temporal heterogeneity on traffic speed, we design two self-supervised learning tasks to model the temporal and spatial heterogeneity, thereby improving the accuracy and generalization ability of the model. Experimental evaluations are conducted on two real-world datasets, PeMS04 and PeMS08, and the results are visualized and analyzed, demonstrating the superior performance of the proposed model.
KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants
Authors: Authors: Kaustubh D. Dhole
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2401.16454
Pdf link: https://arxiv.org/pdf/2401.16454
Abstract An effective multi-turn instruction-following assistant can be developed by creating a simulator that can generate useful interaction data. Apart from relying on its intrinsic weights, an ideal user simulator should also be able to bootstrap external knowledge rapidly in its raw form to simulate the multifarious diversity of text available over the internet. Previous user simulators generally lacked diversity, were mostly closed domain, and necessitated rigid schema making them inefficient to rapidly scale to incorporate external knowledge. In this regard, we introduce, Kaucus, a Knowledge-Augmented User Simulator framework, to outline a process of creating diverse user simulators, that can seamlessly exploit external knowledge as well as benefit downstream assistant model training. Through two GPT-J based simulators viz., a Retrieval Augmented Simulator and a Summary Controlled Simulator we generate diverse simulator-assistant interactions. Through reward and preference model-based evaluations, we find that these interactions serve as useful training data and create more helpful downstream assistants. We also find that incorporating knowledge through retrieval augmentation or summary control helps create better assistants.
Augmenting Replay in World Models for Continual Reinforcement Learning
Authors: Authors: Luke Yang, Levin Kuhlmann, Gideon Kowadlo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.16650
Pdf link: https://arxiv.org/pdf/2401.16650
Abstract In continual RL, the environment of a reinforcement learning (RL) agent undergoes change. A successful system should appropriately balance the conflicting requirements of retaining agent performance on already learned tasks, stability, whilst learning new tasks, plasticity. The first-in-first-out buffer is commonly used to enhance learning in such settings but requires significant memory. We explore the application of an augmentation to this buffer which alleviates the memory constraints, and use it with a world model model-based reinforcement learning algorithm, to evaluate its effectiveness in facilitating continual learning. We evaluate the effectiveness of our method in Procgen and Atari RL benchmarks and show that the distribution matching augmentation to the replay-buffer used in the context of latent world models can successfully prevent catastrophic forgetting with significantly reduced computational overhead. Yet, we also find such a solution to not be entirely infallible, and other failure modes such as the opposite -- lacking plasticity and being unable to learn a new task -- to be a potential limitation in continual learning systems.
LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras
Authors: Authors: Fei Teng, Jiaming Zhang, Jiawei Liu, Kunyu Peng, Xina Cheng, Zhiyong Li, Kailun Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.16712
Pdf link: https://arxiv.org/pdf/2401.16712
Abstract Leveraging the rich information extracted from light field (LF) cameras is instrumental for dense prediction tasks. However, adapting light field data to enhance Salient Object Detection (SOD) still follows the traditional RGB methods and remains under-explored in the community. Previous approaches predominantly employ a custom two-stream design to discover the implicit angular feature within light field cameras, leading to significant information isolation between different LF representations. In this study, we propose an efficient paradigm (LF Tracy) to address this limitation. We eschew the conventional specialized fusion and decoder architecture for a dual-stream backbone in favor of a unified, single-pipeline approach. This comprises firstly a simple yet effective data augmentation strategy called MixLD to bridge the connection of spatial, depth, and implicit angular information under different LF representations. A highly efficient information aggregation (IA) module is then introduced to boost asymmetric feature-wise information fusion. Owing to this innovative approach, our model surpasses the existing state-of-the-art methods, particularly demonstrating a 23% improvement over previous results on the latest large-scale PKU dataset. By utilizing only 28.9M parameters, the model achieves a 10% increase in accuracy with 3M additional parameters compared to its backbone using RGB images and an 86% rise to its backbone using LF images. The source code will be made publicly available at https://github.com/FeiBryantkit/LF-Tracy.
Encoding Temporal Statistical-space Priors via Augmented Representation
Authors: Authors: Insu Choi, Woosung Koh, Gimin Kang, Yuntae Jang, Woo Chang Kim
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.16808
Pdf link: https://arxiv.org/pdf/2401.16808
Abstract Modeling time series data remains a pervasive issue as the temporal dimension is inherent to numerous domains. Despite significant strides in time series forecasting, high noise-to-signal ratio, non-normality, non-stationarity, and lack of data continue challenging practitioners. In response, we leverage a simple representation augmentation technique to overcome these challenges. Our augmented representation acts as a statistical-space prior encoded at each time step. In response, we name our method Statistical-space Augmented Representation (SSAR). The underlying high-dimensional data-generating process inspires our representation augmentation. We rigorously examine the empirical generalization performance on two data sets with two downstream temporal learning algorithms. Our approach significantly beats all five up-to-date baselines. Moreover, the highly modular nature of our approach can easily be applied to various settings. Lastly, fully-fledged theoretical perspectives are available throughout the writing for a clear and rigorous understanding.
Active Generation Network of Human Skeleton for Action Recognition
Authors: Authors: Long Liu, Xin Wang, Fangming Li, Jiayu Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.17086
Pdf link: https://arxiv.org/pdf/2401.17086
Abstract Data generation is a data augmentation technique for enhancing the generalization ability for skeleton-based human action recognition. Most existing data generation methods face challenges to ensure the temporal consistency of the dynamic information for action. In addition, the data generated by these methods lack diversity when only a few training samples are available. To solve those problems, We propose a novel active generative network (AGN), which can adaptively learn various action categories by motion style transfer to generate new actions when the data for a particular action is only a single sample or few samples. The AGN consists of an action generation network and an uncertainty metric network. The former, with ST-GCN as the Backbone, can implicitly learn the morphological features of the target action while preserving the category features of the source action. The latter guides generating actions. Specifically, an action recognition model generates prediction vectors for each action, which is then scored using an uncertainty metric. Finally, UMN provides the uncertainty sampling basis for the generated actions.
NNOSE: Nearest Neighbor Occupational Skill Extraction
Authors: Authors: Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.17092
Pdf link: https://arxiv.org/pdf/2401.17092
Abstract The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks -- combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, \textbf{N}earest \textbf{N}eighbor \textbf{O}ccupational \textbf{S}kill \textbf{E}xtraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This improves skill extraction \emph{without} additional fine-tuning. Crucially, we observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30\% span-F1 in cross-dataset settings.
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
Authors: Authors: Adrian S. Roman, Baladithya Balamurugan, Rithik Pothuganti
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2401.17129
Pdf link: https://arxiv.org/pdf/2401.17129
Abstract This technical report details our work towards building an enhanced audio-visual sound event localization and detection (SELD) network. We build on top of the audio-only SELDnet23 model and adapt it to be audio-visual by merging both audio and video information prior to the gated recurrent unit (GRU) of the audio-only network. Our model leverages YOLO and DETIC object detectors. We also build a framework that implements audio-visual data augmentation and audio-visual synthetic data generation. We deliver an audio-visual SELDnet system that outperforms the existing audio-visual SELD baseline.
Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers
Authors: Authors: Lei Xu, Sarah Alnegheimish, Laure Berti-Equille, Alfredo Cuesta-Infante, Kalyan Veeramachaneni
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.17196
Pdf link: https://arxiv.org/pdf/2401.17196
Abstract In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers, which malicious users can exploit to efficiently create a multitude of adversarial examples. This paper studies this problem and makes the following key contributions: (1) We introduce a novel metric \r{ho} to quantitatively assess a classifier's robustness against single-word perturbation. (2) We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate, better preserving sentence meaning, while reducing computation costs compared to state-of-the-art adversarial methods. (3) We propose SP-Defense, which aims to improve \r{ho} by applying data augmentation in learning. Experimental results on 4 datasets and BERT and distilBERT classifiers show that SP-Defense improves \r{ho} by 14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.
Self-Supervised Representation Learning for Nerve Fiber Distribution Patterns in 3D-PLI
Authors: Authors: Alexander Oberstrass, Sascha E. A. Muenzing, Meiqi Niu, Nicola Palomero-Gallagher, Christian Schiffer, Markus Axer, Katrin Amunts, Timo Dickscheid
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.17207
Pdf link: https://arxiv.org/pdf/2401.17207
Abstract A comprehensive understanding of the organizational principles in the human brain requires, among other factors, well-quantifiable descriptors of nerve fiber architecture. Three-dimensional polarized light imaging (3D-PLI) is a microscopic imaging technique that enables insights into the fine-grained organization of myelinated nerve fibers with high resolution. Descriptors characterizing the fiber architecture observed in 3D-PLI would enable downstream analysis tasks such as multimodal correlation studies, clustering, and mapping. However, best practices for observer-independent characterization of fiber architecture in 3D-PLI are not yet available. To this end, we propose the application of a fully data-driven approach to characterize nerve fiber architecture in 3D-PLI images using self-supervised representation learning. We introduce a 3D-Context Contrastive Learning (CL-3D) objective that utilizes the spatial neighborhood of texture examples across histological brain sections of a 3D reconstructed volume to sample positive pairs for contrastive learning. We combine this sampling strategy with specifically designed image augmentations to gain robustness to typical variations in 3D-PLI parameter maps. The approach is demonstrated for the 3D reconstructed occipital lobe of a vervet monkey brain. We show that extracted features are highly sensitive to different configurations of nerve fibers, yet robust to variations between consecutive brain sections arising from histological processing. We demonstrate their practical applicability for retrieving clusters of homogeneous fiber architecture and performing data mining for interactively selected templates of specific components of fiber architecture such as U-fibers.

LeeKyungwook / get-arxiv-noti

New submissions for Wed, 31 Jan 24 #957

Keyword: detection

FaKnow: A Unified Library for Fake News Detection

Evaluating Deep Networks for Detecting User Familiarity with VR from Hand Interactions

SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

Error detection using pneumatic logic

Saccade-Contingent Rendering

The Why, When, and How to Use Active Learning in Large-Data-Driven 3D Object Detection for Safe Autonomous Driving: An Empirical Exploration

Generalization of LiNGAM that allows confounding

The Detection and Understanding of Fictional Discourse

Characterization of Magnetic Labyrinthine Structures through Junctions and Terminals Detection using Template Matching and CNN

Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras

Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models

Detecting Racist Text in Bengali: An Ensemble Deep Learning Framework

Detection and Recovery Against Deep Neural Network Fault Injection Attacks Based on Contrastive Learning

Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian Approach

Detecting LLM-Assisted Writing in Scientific Communication: Are We There Yet?

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Evaluating ML-Based Anomaly Detection Across Datasets of Varied Integrity: A Case Study

Segmentation and Characterization of Macerated Fibers and Vessels Using Deep Learning

WGAN-AFL: Seed Generation Augmented Fuzzer with Wasserstein-GAN

Taxonomy of Mathematical Plagiarism

ActDroid: An active learning framework for Android malware detection

Finetuning Large Language Models for Vulnerability Detection

Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again

Joint Semantic Communication and Target Sensing for 6G Communication System

A Bearing-Angle Approach for Unknown Target Motion Analysis Based on Visual Measurements

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

Optical Tactile Sensing for Aerial Multi-Contact Interaction: Design, Integration, and Evaluation

Proactive Detection of Voice Cloning with Localized Watermarking

YOLO-World: Real-Time Open-Vocabulary Object Detection

A simple, strong baseline for building damage detection on the xBD dataset

Keyword: face recognition

Optimal-Landmark-Guided Image Blending for Face Morphing Attacks

Keyword: augmentation

Hybrid Transformer and Spatial-Temporal Self-Supervised Learning for Long-term Traffic Prediction

KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants

Augmenting Replay in World Models for Continual Reinforcement Learning

LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras

Encoding Temporal Statistical-space Priors via Augmented Representation

Active Generation Network of Human Skeleton for Action Recognition

NNOSE: Nearest Neighbor Occupational Skill Extraction

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers

Self-Supervised Representation Learning for Nerve Fiber Distribution Patterns in 3D-PLI