New submissions for Thu, 18 Apr 24

Keyword: detection

Authenticity in Authorship: The Writer's Integrity Framework for Verifying Human-Generated Text

Authors: Authors: Sanad Aburass, Maha Abu Rumman
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.10781
Pdf link: https://arxiv.org/pdf/2404.10781
Abstract The "Writer's Integrity" framework introduces a paradigm shift in maintaining the sanctity of human-generated text in the realms of academia, research, and publishing. This innovative system circumvents the shortcomings of current AI detection tools by monitoring the writing process, rather than the product, capturing the distinct behavioral footprint of human authorship. Here, we offer a comprehensive examination of the framework, its development, and empirical results. We highlight its potential in revolutionizing the validation of human intellectual work, emphasizing its role in upholding academic integrity and intellectual property rights in the face of sophisticated AI models capable of emulating human-like text. This paper also discusses the implementation considerations, addressing potential user concerns regarding ease of use and privacy, and outlines a business model for tech companies to monetize the framework effectively. Through licensing, partnerships, and subscriptions, companies can cater to universities, publishers, and independent writers, ensuring the preservation of original thought and effort in written content. This framework is open source and available here, https://github.com/sanadv/Integrity.github.io
Graph Vertex Embeddings: Distance, Regularization and Community Detection
Authors: Authors: Radosław Nowak, Adam Małkowski, Daniel Cieślak, Piotr Sokół, Paweł Wawrzyński
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10784
Pdf link: https://arxiv.org/pdf/2404.10784
Abstract Graph embeddings have emerged as a powerful tool for representing complex network structures in a low-dimensional space, enabling the use of efficient methods that employ the metric structure in the embedding space as a proxy for the topological structure of the data. In this paper, we explore several aspects that affect the quality of a vertex embedding of graph-structured data. To this effect, we first present a family of flexible distance functions that faithfully capture the topological distance between different vertices. Secondly, we analyze vertex embeddings as resulting from a fitted transformation of the distance matrix rather than as a direct result of optimization. Finally, we evaluate the effectiveness of our proposed embedding constructions by performing community detection on a host of benchmark datasets. The reported results are competitive with classical algorithms that operate on the entire graph while benefitting from a substantially reduced computational complexity due to the reduced dimensionality of the representations.
Multimodal Attack Detection for Action Recognition Models
Authors: Authors: Furkan Mumcu, Yasin Yilmaz
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10790
Pdf link: https://arxiv.org/pdf/2404.10790
Abstract Adversarial machine learning attacks on video action recognition models is a growing research area and many effective attacks were introduced in recent years. These attacks show that action recognition models can be breached in many ways. Hence using these models in practice raises significant security concerns. However, there are very few works which focus on defending against or detecting attacks. In this work, we propose a novel universal detection method which is compatible with any action recognition model. In our extensive experiments, we show that our method consistently detects various attacks against different target models with high true positive rates while satisfying very low false positive rates. Tested against four state-of-the-art attacks targeting four action recognition models, the proposed detector achieves an average AUC of 0.911 over 16 test cases while the best performance achieved by the existing detectors is 0.645 average AUC. This 41.2% improvement is enabled by the robustness of the proposed detector to varying attack methods and target models. The lowest AUC achieved by our detector across the 16 test cases is 0.837 while the competing detector's performance drops as low as 0.211. We also show that the proposed detector is robust to varying attack strengths. In addition, we analyze our method's real-time performance with different hardware setups to demonstrate its potential as a practical defense mechanism.
Reconfigurable Edge Hardware for Intelligent IDS: Systematic Approach
Authors: Authors: Wadid Foudhaili, Anouar Nechi, Celine Thermann, Mohammad Al Johmani, Rainer Buchty, Mladen Berekovic, Saleh Mulhem
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.10792
Pdf link: https://arxiv.org/pdf/2404.10792
Abstract Intrusion detection systems (IDS) are crucial security measures nowadays to enforce network security. Their task is to detect anomalies in network communication and identify, if not thwart, possibly malicious behavior. Recently, machine learning has been deployed to construct intelligent IDS. This approach, however, is quite challenging particularly in distributed, highly dynamic, yet resource-constrained systems like Edge setups. In this paper, we tackle this issue from multiple angles by analyzing the concept of intelligent IDS (I-IDS) while addressing the specific requirements of Edge devices with a special focus on reconfigurability. Then, we introduce a systematic approach to constructing the I-IDS on reconfigurable Edge hardware. For this, we implemented our proposed IDS on state-of-the-art Field Programmable Gate Arrays (FPGAs) technology as (1) a purely FPGA-based dataflow processor (DFP) and (2) a co-designed approach featuring RISC-V soft-core as FPGA-based soft-core processor (SCP). We complete our paper with a comparison of the state of the art (SoA) in this domain. The results show that DFP and SCP are both suitable for Edge applications from hardware resource and energy efficiency perspectives. Our proposed DFP solution clearly outperforms the SoA and demonstrates that required high performance can be achieved without prohibitively high hardware costs. This makes our proposed DFP suitable for Edge-based high-speed applications like modern communication technology.
Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective
Authors: Authors: Khushnaseeb Roshan, Aasim Zafar
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10796
Pdf link: https://arxiv.org/pdf/2404.10796
Abstract The rapid advancement of artificial intelligence within the realm of cybersecurity raises significant security concerns. The vulnerability of deep learning models in adversarial attacks is one of the major issues. In adversarial machine learning, malicious users try to fool the deep learning model by inserting adversarial perturbation inputs into the model during its training or testing phase. Subsequently, it reduces the model confidence score and results in incorrect classifications. The novel key contribution of the research is to empirically test the black-box adversarial transferability phenomena in cyber attack detection systems. It indicates that the adversarial perturbation input generated through the surrogate model has a similar impact on the target model in producing the incorrect classification. To empirically validate this phenomenon, surrogate and target models are used. The adversarial perturbation inputs are generated based on the surrogate-model for which the hacker has complete information. Based on these adversarial perturbation inputs, both surrogate and target models are evaluated during the inference phase. We have done extensive experimentation over the CICDDoS-2019 dataset, and the results are classified in terms of various performance metrics like accuracy, precision, recall, and f1-score. The findings indicate that any deep learning model is highly susceptible to adversarial attacks, even if the attacker does not have access to the internal details of the target model. The results also indicate that white-box adversarial attacks have a severe impact compared to black-box adversarial attacks. There is a need to investigate and explore adversarial defence techniques to increase the robustness of the deep learning models against adversarial attacks.
Advancing Network Intrusion Detection: Integrating Graph Neural Networks with Scattering Transform and Node2Vec for Enhanced Anomaly Detection
Authors: Authors: Abdeljalil Zoubir, Badr Missaoui
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10800
Pdf link: https://arxiv.org/pdf/2404.10800
Abstract In this paper, we present two novel methods in Network Intrusion Detection Systems (NIDS) using Graph Neural Networks (GNNs). The first approach, Scattering Transform with E-GraphSAGE (STEG), utilizes the scattering transform to conduct multi-resolution analysis of edge feature vectors. This provides a detailed representation that is essential for identifying subtle anomalies in network traffic. The second approach improves node representation by initiating with Node2Vec, diverging from standard methods of using uniform values, thereby capturing a more accurate and holistic network picture. Our methods have shown significant improvements in performance compared to existing state-of-the-art methods in benchmark NIDS datasets.
Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning
Authors: Authors: Amit Kumar Bhuyan, Hrishikesh Dutta, Subir Biswas
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2404.10842
Pdf link: https://arxiv.org/pdf/2404.10842
Abstract This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without the requirement of a large audio database for training. An unsupervised online update mechanism is proposed for the Federated Learning model which depends on cosine similarity of speaker embeddings. Moreover, the proposed diarization system solves the problem of speaker change detection via. unsupervised segmentation techniques using Hotelling's t-squared Statistic and Bayesian Information Criterion. In this new approach, speaker change detection is biased around detected quasi-silences, which reduces the severity of the trade-off between the missed detection and false detection rates. Additionally, the computational overhead due to frame-by-frame identification of speakers is reduced via. unsupervised clustering of speech segments. The results demonstrate the effectiveness of the proposed training method in the presence of non-IID speech data. It also shows a considerable improvement in the reduction of false and missed detection at the segmentation stage, while reducing the computational overhead. Improved accuracy and reduced computational cost makes the mechanism suitable for real-time speaker diarization across a distributed IoT audio network.
UruDendro, a public dataset of cross-section images of Pinus taeda
Authors: Authors: Henry Marichal, Diego Passarella, Christine Lucas, Ludmila Profumo, Verónica Casaravilla, María Noel Rocha Galli, Serrana Ambite, Gregory Randall
Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2404.10856
Pdf link: https://arxiv.org/pdf/2404.10856
Abstract The automatic detection of tree-ring boundaries and other anatomical features using image analysis has progressed substantially over the past decade with advances in machine learning and imagery technology, as well as increasing demands from the dendrochronology community. This paper presents a publicly available database of 64 scanned images of transverse sections of commercially grown Pinus taeda trees from northern Uruguay, ranging from 17 to 24 years old. The collection contains several challenging features for automatic ring detection, including illumination and surface preparation variation, fungal infection (blue stains), knot formation, missing cortex or interruptions in outer rings, and radial cracking. This dataset can be used to develop and test automatic tree ring detection algorithms. This paper presents to the dendrochronology community one such method, Cross-Section Tree-Ring Detection (CS-TRD), which identifies and marks complete annual rings in cross-sections for tree species presenting a clear definition between early and latewood. We compare the CS-TRD performance against the ground truth manual delineation of all rings over the UruDendro dataset. The CS-TRD software identified rings with an average F-score of 89% and RMSE error of 5.27px for the entire database in less than 20 seconds per image. Finally, we propose a robust measure of the ring growth using the \emph{equivalent radius} of a circle having the same area enclosed by the detected tree ring. Overall, this study contributes to the dendrochronologist's toolbox of fast and low-cost methods to automatically detect rings in conifer species, particularly for measuring diameter growth rates and stem transverse area using entire cross-sections.
OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery
Authors: Authors: Matthew Inkawhich, Nathan Inkawhich, Hao Yang, Jingyang Zhang, Randolph Linderman, Yiran Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10865
Pdf link: https://arxiv.org/pdf/2404.10865
Abstract An object detector's ability to detect and flag \textit{novel} objects during open-world deployments is critical for many real-world applications. Unfortunately, much of the work in open object detection today is disjointed and fails to adequately address applications that prioritize unknown object recall \textit{in addition to} known-class accuracy. To close this gap, we present a new task called Open-Set Object Detection and Discovery (OSODD) and as a solution propose the Open-Set Regions with ViT features (OSR-ViT) detection framework. OSR-ViT combines a class-agnostic proposal network with a powerful ViT-based classifier. Its modular design simplifies optimization and allows users to easily swap proposal solutions and feature extractors to best suit their application. Using our multifaceted evaluation protocol, we show that OSR-ViT obtains performance levels that far exceed state-of-the-art supervised methods. Our method also excels in low-data settings, outperforming supervised baselines using a fraction of the training data.
Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network
Authors: Authors: Yusra Alkendi, Rana Azzam, Sajid Javed, Lakmal Seneviratne, Yahya Zweiri
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10940
Pdf link: https://arxiv.org/pdf/2404.10940
Abstract Moving object segmentation is critical to interpret scene dynamics for robotic navigation systems in challenging environments. Neuromorphic vision sensors are tailored for motion perception due to their asynchronous nature, high temporal resolution, and reduced power consumption. However, their unconventional output requires novel perception paradigms to leverage their spatially sparse and temporally dense nature. In this work, we propose a novel event-based motion segmentation algorithm using a Graph Transformer Neural Network, dubbed GTNN. Our proposed algorithm processes event streams as 3D graphs by a series of nonlinear transformations to unveil local and global spatiotemporal correlations between events. Based on these correlations, events belonging to moving objects are segmented from the background without prior knowledge of the dynamic scene geometry. The algorithm is trained on publicly available datasets including MOD, EV-IMO, and \textcolor{black}{EV-IMO2} using the proposed training scheme to facilitate efficient training on extensive datasets. Moreover, we introduce the Dynamic Object Mask-aware Event Labeling (DOMEL) approach for generating approximate ground-truth labels for event-based motion segmentation datasets. We use DOMEL to label our own recorded Event dataset for Motion Segmentation (EMS-DOMEL), which we release to the public for further research and benchmarking. Rigorous experiments are conducted on several unseen publicly-available datasets where the results revealed that GTNN outperforms state-of-the-art methods in the presence of dynamic background variations, motion patterns, and multiple dynamic objects with varying sizes and velocities. GTNN achieves significant performance gains with an average increase of 9.4% and 4.5% in terms of motion segmentation accuracy (IoU%) and detection rate (DR%), respectively.
Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection
Authors: Authors: Nawfal Guefrachi, Jian Shi, Hakim Ghazzai, Ahmad Alsharoa
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.10978
Pdf link: https://arxiv.org/pdf/2404.10978
Abstract The integration of Light Detection and Ranging (LiDAR) and Internet of Things (IoT) technologies offers transformative opportunities for public health informatics in urban safety and pedestrian well-being. This paper proposes a novel framework utilizing these technologies for enhanced 3D object detection and activity classification in urban traffic scenarios. By employing elevated LiDAR, we obtain detailed 3D point cloud data, enabling precise pedestrian activity monitoring. To overcome urban data scarcity, we create a specialized dataset through simulated traffic environments in Blender, facilitating targeted model training. Our approach employs a modified Point Voxel-Region-based Convolutional Neural Network (PV-RCNN) for robust 3D detection and PointNet for classifying pedestrian activities, significantly benefiting urban traffic management and public health by offering insights into pedestrian behavior and promoting safer urban environments. Our dual-model approach not only enhances urban traffic management but also contributes significantly to public health by providing insights into pedestrian behavior and promoting safer urban environment.
Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images
Authors: Authors: Junbiao Pang, Zailin Dong, Jiaxin Deng, Mengyuan Zhu, Yunwei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.10985
Pdf link: https://arxiv.org/pdf/2404.10985
Abstract Parsing Computer-Aided Design (CAD) drawings is a fundamental step for CAD revision, semantic-based management, and the generation of 3D prototypes in both the architecture and engineering industries. Labeling symbols from a CAD drawing is a challenging yet notorious task from a practical point of view. In this work, we propose to label and spot symbols from CAD images that are converted from CAD drawings. The advantage of spotting symbols from CAD images lies in the low requirement of labelers and the low-cost annotation. However, pixel-wise spotting symbols from CAD images is challenging work. We propose a pixel-wise point location via Progressive Gaussian Kernels (PGK) to balance between training efficiency and location accuracy. Besides, we introduce a local offset to the heatmap-based point location method. Based on the keypoints detection, we propose a symbol grouping method to redraw the rectangle symbols in CAD images. We have released a dataset containing CAD images of equipment rooms from telecommunication industrial CAD drawings. Extensive experiments on this real-world dataset show that the proposed method has good generalization ability.
How to deal with glare for improved perception of Autonomous Vehicles
Authors: Authors: Muhammad Z. Alam, Zeeshan Kaleem, Sousso Kelouwani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10992
Pdf link: https://arxiv.org/pdf/2404.10992
Abstract Vision sensors are versatile and can capture a wide range of visual cues, such as color, texture, shape, and depth. This versatility, along with the relatively inexpensive availability of machine vision cameras, played an important role in adopting vision-based environment perception systems in autonomous vehicles (AVs). However, vision-based perception systems can be easily affected by glare in the presence of a bright source of light, such as the sun or the headlights of the oncoming vehicle at night or simply by light reflecting off snow or ice-covered surfaces; scenarios encountered frequently during driving. In this paper, we investigate various glare reduction techniques, including the proposed saturated pixel-aware glare reduction technique for improved performance of the computer vision (CV) tasks employed by the perception layer of AVs. We evaluate these glare reduction methods based on various performance metrics of the CV algorithms used by the perception layer. Specifically, we considered object detection, object recognition, object tracking, depth estimation, and lane detection which are crucial for autonomous driving. The experimental findings validate the efficacy of the proposed glare reduction approach, showcasing enhanced performance across diverse perception tasks and remarkable resilience against varying levels of glare.
Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement
Authors: Authors: Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.11036
Pdf link: https://arxiv.org/pdf/2404.11036
Abstract Content moderation faces a challenging task as social media's ability to spread hate speech contrasts with its role in promoting global connectivity. With rapidly evolving slang and hate speech, the adaptability of conventional deep learning to the fluid landscape of online dialogue remains limited. In response, causality inspired disentanglement has shown promise by segregating platform specific peculiarities from universal hate indicators. However, its dependency on available ground truth target labels for discerning these nuances faces practical hurdles with the incessant evolution of platforms and the mutable nature of hate speech. Using confidence based reweighting and contrastive regularization, this study presents HATE WATCH, a novel framework of weakly supervised causal disentanglement that circumvents the need for explicit target labeling and effectively disentangles input features into invariant representations of hate. Empirical validation across platforms two with target labels and two without positions HATE WATCH as a novel method in cross platform hate speech detection with superior performance. HATE WATCH advances scalable content moderation techniques towards developing safer online communities.
Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
Authors: Authors: Ying Zhang, Bo Peng, Jiaran Zhou, Huiyu Zhou, Junyu Dong, Yuezun Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11054
Pdf link: https://arxiv.org/pdf/2404.11054
Abstract The task of video inpainting detection is to expose the pixel-level inpainted regions within a video sequence. Existing methods usually focus on leveraging spatial and temporal inconsistencies. However, these methods typically employ fixed operations to combine spatial and temporal clues, limiting their applicability in different scenarios. In this paper, we introduce a novel Multilateral Temporal-view Pyramid Transformer ({\em MumPy}) that collaborates spatial-temporal clues flexibly. Our method utilizes a newly designed multilateral temporal-view encoder to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module to enhance the diversity of these collaborations. Subsequently, we develop a multi-pyramid decoder to aggregate the various types of features and generate detection maps. By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions. We validate our method on existing datasets and also introduce a new challenging and large-scale Video Inpainting dataset based on the YouTube-VOS dataset, which employs several more recent inpainting methods. The results demonstrate the superiority of our method in both in-domain and cross-domain evaluation scenarios.
Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon
Authors: Authors: Jingrong Wang, Bo Xu, Ronghe Jin, Shoujian Zhang, Kefu Gao, Jingnan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.11070
Pdf link: https://arxiv.org/pdf/2404.11070
Abstract Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Fully Convolutional Network (FCN) is proposed for GNSS NLOS detection. Building upon this, a novel NLOS detection and mitigation algorithm (named S-NDM) is extended to the tightly coupled Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), and visual feature system which is called Sky-GVIO, with the aim of achieving continuous and accurate positioning in urban canyon environments. Furthermore, the system harmonizes Single Point Positioning (SPP) with Real-Time Kinematic (RTK) methodologies to bolster its operational versatility and resilience. In urban canyon environments, the positioning performance of S-NDM algorithm proposed in this paper is evaluated under different tightly coupled SPP-related and RTK-related models. The results exhibit that Sky-GVIO system achieves meter-level accuracy under SPP mode and sub-decimeter precision with RTK, surpassing the performance of GNSS/INS/Vision frameworks devoid of S-NDM. Additionally, the sky-view image dataset, inclusive of training and evaluation subsets, has been made publicly accessible for scholarly exploration at https://github.com/whuwangjr/sky-view-images .
KernJC: Automated Vulnerable Environment Generation for Linux Kernel Vulnerabilities
Authors: Authors: Bonan Ruan, Jiahao Liu, Chuqi Zhang, Zhenkai Liang
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.11107
Pdf link: https://arxiv.org/pdf/2404.11107
Abstract Linux kernel vulnerability reproduction is a critical task in system security. To reproduce a kernel vulnerability, the vulnerable environment and the Proof of Concept (PoC) program are needed. Most existing research focuses on the generation of PoC, while the construction of environment is overlooked. However, establishing an effective vulnerable environment to trigger a vulnerability is challenging. Firstly, it is hard to guarantee that the selected kernel version for reproduction is vulnerable, as the vulnerability version claims in online databases can occasionally be spurious. Secondly, many vulnerabilities can not be reproduced in kernels built with default configurations. Intricate non-default kernel configurations must be set to include and trigger a kernel vulnerability, but less information is available on how to recognize these configurations. To solve these challenges, we propose a patch-based approach to identify real vulnerable kernel versions and a graph-based approach to identify necessary configs for activating a specific vulnerability. We implement these approaches in a tool, KernJC, automating the generation of vulnerable environments for kernel vulnerabilities. To evaluate the efficacy of KernJC, we build a dataset containing 66 representative real-world vulnerabilities with PoCs from kernel vulnerability research in the past five years. The evaluation shows that KernJC builds vulnerable environments for all these vulnerabilities, 48.5% of which require non-default configs, and 4 have incorrect version claims in the National Vulnerability Database (NVD). Furthermore, we conduct large-scale spurious version detection on kernel vulnerabilities and identify 128 vulnerabilities which have spurious version claims in NVD. To foster future research, we release KernJC with the dataset in the community.
D-Aug: Enhancing Data Augmentation for Dynamic LiDAR Scenes
Authors: Authors: Jiaxing Zhao, Peng Zheng, Rui Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11127
Pdf link: https://arxiv.org/pdf/2404.11127
Abstract Creating large LiDAR datasets with pixel-level labeling poses significant challenges. While numerous data augmentation methods have been developed to reduce the reliance on manual labeling, these methods predominantly focus on static scenes and they overlook the importance of data augmentation for dynamic scenes, which is critical for autonomous driving. To address this issue, we propose D-Aug, a LiDAR data augmentation method tailored for augmenting dynamic scenes. D-Aug extracts objects and inserts them into dynamic scenes, considering the continuity of these objects across consecutive frames. For seamless insertion into dynamic scenes, we propose a reference-guided method that involves dynamic collision detection and rotation alignment. Additionally, we present a pixel-level road identification strategy to efficiently determine suitable insertion positions. We validated our method using the nuScenes dataset with various 3D detection and tracking methods. Comparative experiments demonstrate the superiority of D-Aug.
On the Performance of RIS-assisted Networks with HQAM
Authors: Authors: Thrassos K. Oikonomou, Dimitrios Tyrovolas, Sotiris A. Tegos, Panagiotis D. Diamantoulakis, Panagiotis Sarigiannidis, George K. Karagiannidis
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.11136
Pdf link: https://arxiv.org/pdf/2404.11136
Abstract In this paper, we investigate the application of hexagonal quadrature amplitude modulation (HQAM) in reconfigurable intelligent surface (RIS)-assisted networks, specifically focusing on its efficiency in reducing the number of required reflecting elements. Specifically, we present analytical expressions for the average symbol error probability (ASEP) and propose a new metric for conditioned energy efficiency, which assesses the network energy consumption while ensuring the ASEP remains below a certain threshold. Additionally, we introduce an innovative detection algorithm for HQAM constellations that implements sphere decoding in O(1) complexity. Finally, our study reveals that HQAM significantly enhances both the ASEP and energy efficiency compared to traditional quadrature amplitude modulation (QAM) schemes.
Personalized Heart Disease Detection via ECG Digital Twin Generation
Authors: Authors: Yaojun Hu, Jintai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.11171
Pdf link: https://arxiv.org/pdf/2404.11171
Abstract Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development.
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document
Authors: Authors: Joonho Yang, Seunghyun Yoon, Byeongjeong Kim, Hwanhee Lee
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.11184
Pdf link: https://arxiv.org/pdf/2404.11184
Abstract Through the advent of pre-trained language models, there have been notable advancements in abstractive summarization systems. Simultaneously, a considerable number of novel methods for evaluating factual consistency in abstractive summarization systems has been developed. But these evaluation approaches incorporate substantial limitations, especially on refinement and interpretability. In this work, we propose highly effective and interpretable factual inconsistency detection method metric Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document for abstractive summarization systems that is based on fine-grained atomic facts decomposition. Moreover, we align atomic facts decomposed from the summary with the source document through adaptive granularity expansion. These atomic facts represent a more fine-grained unit of information, facilitating detailed understanding and interpretability of the summary's factual inconsistency. Experimental results demonstrate that our proposed factual consistency checking system significantly outperforms existing systems. We release the code at https://github.com/plm3332/FIZZ.
GhostNetV3: Exploring the Training Strategies for Compact Models
Authors: Authors: Zhenhua Liu, Zhiwei Hao, Kai Han, Yehui Tang, Yunhe Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11202
Pdf link: https://arxiv.org/pdf/2404.11202
Abstract Compact neural networks are specially designed for applications on edge devices with faster inference speed yet modest performance. However, training strategies of compact models are borrowed from that of conventional models at present, which ignores their difference in model capacity and thus may impede the performance of compact models. In this paper, by systematically investigating the impact of different training ingredients, we introduce a strong training strategy for compact models. We find that the appropriate designs of re-parameterization and knowledge distillation are crucial for training high-performance compact models, while some commonly used data augmentations for training conventional models, such as Mixup and CutMix, lead to worse performance. Our experiments on ImageNet-1K dataset demonstrate that our specialized training strategy for compact models is applicable to various architectures, including GhostNetV2, MobileNetV2 and ShuffleNetV2. Specifically, equipped with our strategy, GhostNetV3 1.3$\times$ achieves a top-1 accuracy of 79.1% with only 269M FLOPs and a latency of 14.46ms on mobile devices, surpassing its ordinarily trained counterpart by a large margin. Moreover, our observation can also be extended to object detection scenarios. PyTorch code and checkpoints can be found at https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnetv3_pytorch.
Prompt-tuning for Clickbait Detection via Text Summarization
Authors: Authors: Haoxiang Deng, Yi Zhu, Ye Wang, Jipeng Qiang, Yunhao Yuan, Yun Li, Runmei Zhang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.11206
Pdf link: https://arxiv.org/pdf/2404.11206
Abstract Clickbaits are surprising social posts or deceptive news headlines that attempt to lure users for more clicks, which have posted at unprecedented rates for more profit or commercial revenue. The spread of clickbait has significant negative impacts on the users, which brings users misleading or even click-jacking attacks. Different from fake news, the crucial problem in clickbait detection is determining whether the headline matches the corresponding content. Most existing methods compute the semantic similarity between the headlines and contents for detecting clickbait. However, due to significant differences in length and semantic features between headlines and contents, directly calculating semantic similarity is often difficult to summarize the relationship between them. To address this problem, we propose a prompt-tuning method for clickbait detection via text summarization in this paper, text summarization is introduced to summarize the contents, and clickbait detection is performed based on the similarity between the generated summary and the contents. Specifically, we first introduce a two-stage text summarization model to produce high-quality news summaries based on pre-trained language models, and then both the headlines and new generated summaries are incorporated as the inputs for prompt-tuning. Additionally, a variety of strategies are conducted to incorporate external knowledge for improving the performance of clickbait detection. The extensive experiments on well-known clickbait detection datasets demonstrate that our method achieved state-of-the-art performance.
Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions
Authors: Authors: Chuheng Wei, Guoyuan Wu, Matthew J. Barth
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.11214
Pdf link: https://arxiv.org/pdf/2404.11214
Abstract A significant challenge in the field of object detection lies in the system's performance under non-ideal imaging conditions, such as rain, fog, low illumination, or raw Bayer images that lack ISP processing. Our study introduces "Feature Corrective Transfer Learning", a novel approach that leverages transfer learning and a bespoke loss function to facilitate the end-to-end detection of objects in these challenging scenarios without the need to convert non-ideal images into their RGB counterparts. In our methodology, we initially train a comprehensive model on a pristine RGB image dataset. Subsequently, non-ideal images are processed by comparing their feature maps against those from the initial ideal RGB model. This comparison employs the Extended Area Novel Structural Discrepancy Loss (EANSDL), a novel loss function designed to quantify similarities and integrate them into the detection loss. This approach refines the model's ability to perform object detection across varying conditions through direct feature map correction, encapsulating the essence of Feature Corrective Transfer Learning. Experimental validation on variants of the KITTI dataset demonstrates a significant improvement in mean Average Precision (mAP), resulting in a 3.8-8.1% relative enhancement in detection under non-ideal conditions compared to the baseline model, and a less marginal performance difference within 1.3% of the mAP@[0.5:0.95] achieved under ideal conditions by the standard Faster RCNN algorithm.
AndroLog: Android Instrumentation and Code Coverage Analysis
Authors: Authors: Jordan Samhi, Andreas Zeller
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.11223
Pdf link: https://arxiv.org/pdf/2404.11223
Abstract Dynamic analysis has emerged as a pivotal technique for testing Android apps, enabling the detection of bugs, malicious code, and vulnerabilities. A key metric in evaluating the efficacy of tools employed by both research and practitioner communities for this purpose is code coverage. Obtaining code coverage typically requires planting probes within apps to gather coverage data during runtime. Due to the general unavailability of source code to analysts, there is a necessity for instrumenting apps to insert these probes in black-box environments. However, the tools available for such instrumentation are limited in their reliability and require intrusive changes interfering with apps' functionalities. This paper introduces AndroLog a novel tool developed on top of the Soot framework, designed to provide fine-grained coverage information at multiple levels, including class, methods, statements, and Android components. In contrast to existing tools, AndroLog leaves the responsibility to test apps to analysts, and its motto is simplicity. As demonstrated in this paper, AndroLog can instrument up to 98% of recent Android apps compared to existing tools with 79% and 48% respectively for COSMO and ACVTool. AndroLog also stands out for its potential for future enhancements to increase granularity on demand. We make AndroLog available to the community and provide a video demonstration of AndroLog (see section 8).
Simple In-place Data Augmentation for Surveillance Object Detection
Authors: Authors: Munkh-Erdene Otgonbold, Ganzorig Batnasan, Munkhjargal Gochoo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11226
Pdf link: https://arxiv.org/pdf/2404.11226
Abstract Motivated by the need to improve model performance in traffic monitoring tasks with limited labeled samples, we propose a straightforward augmentation technique tailored for object detection datasets, specifically designed for stationary camera-based applications. Our approach focuses on placing objects in the same positions as the originals to ensure its effectiveness. By applying in-place augmentation on objects from the same camera input image, we address the challenge of overlapping with original and previously selected objects. Through extensive testing on two traffic monitoring datasets, we illustrate the efficacy of our augmentation strategy in improving model performance, particularly in scenarios with limited labeled samples and imbalanced class distributions. Notably, our method achieves comparable performance to models trained on the entire dataset while utilizing only 8.5 percent of the original data. Moreover, we report significant improvements, with mAP@.5 increasing from 0.4798 to 0.5025, and the mAP@.5:.95 rising from 0.29 to 0.3138 on the FishEye8K dataset. These results highlight the potential of our augmentation approach in enhancing object detection models for traffic monitoring applications.
ONOT: a High-Quality ICAO-compliant Synthetic Mugshot Dataset
Authors: Authors: Nicolò Di Domenico, Guido Borghi, Annalisa Franco, Davide Maltoni
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11236
Pdf link: https://arxiv.org/pdf/2404.11236
Abstract Nowadays, state-of-the-art AI-based generative models represent a viable solution to overcome privacy issues and biases in the collection of datasets containing personal information, such as faces. Following this intuition, in this paper we introduce ONOT, a synthetic dataset specifically focused on the generation of high-quality faces in adherence to the requirements of the ISO/IEC 39794-5 standards that, following the guidelines of the International Civil Aviation Organization (ICAO), defines the interchange formats of face images in electronic Machine-Readable Travel Documents (eMRTD). The strictly controlled and varied mugshot images included in ONOT are useful in research fields related to the analysis of face images in eMRTD, such as Morphing Attack Detection and Face Quality Assessment. The dataset is publicly released, in combination with the generation procedure details in order to improve the reproducibility and enable future extensions.
Optical Image-to-Image Translation Using Denoising Diffusion Models: Heterogeneous Change Detection as a Use Case
Authors: Authors: João Gabriel Vinholi, Marco Chini, Anis Amziane, Renato Machado, Danilo Silva, Patrick Matgen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.11243
Pdf link: https://arxiv.org/pdf/2404.11243
Abstract We introduce an innovative deep learning-based method that uses a denoising diffusion-based model to translate low-resolution images to high-resolution ones from different optical sensors while preserving the contents and avoiding undesired artifacts. The proposed method is trained and tested on a large and diverse data set of paired Sentinel-II and Planet Dove images. We show that it can solve serious image generation issues observed when the popular classifier-free guided Denoising Diffusion Implicit Model (DDIM) framework is used in the task of Image-to-Image Translation of multi-sensor optical remote sensing images and that it can generate large images with highly consistent patches, both in colors and in features. Moreover, we demonstrate how our method improves heterogeneous change detection results in two urban areas: Beirut, Lebanon, and Austin, USA. Our contributions are: i) a new training and testing algorithm based on denoising diffusion models for optical image translation; ii) a comprehensive image quality evaluation and ablation study; iii) a comparison with the classifier-free guided DDIM framework; and iv) change detection experiments on heterogeneous data.
DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series
Authors: Authors: Zahra Zamanzadeh Darban, Geoffrey I. Webb, Mahsa Salehi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.11269
Pdf link: https://arxiv.org/pdf/2404.11269
Abstract Time series anomaly detection (TAD) faces a significant challenge due to the scarcity of labelled data, which hinders the development of accurate detection models. Unsupervised domain adaptation (UDA) addresses this challenge by leveraging a labelled dataset from a related domain to detect anomalies in a target dataset. Existing domain adaptation techniques assume that the number of anomalous classes does not change between the source and target domains. In this paper, we propose a novel Domain Adaptation Contrastive learning for Anomaly Detection in multivariate time series (DACAD) model to address this issue by combining UDA and contrastive representation learning. DACAD's approach includes an anomaly injection mechanism that introduces various types of synthetic anomalies, enhancing the model's ability to generalise across unseen anomalous classes in different domains. This method significantly broadens the model's adaptability and robustness. Additionally, we propose a supervised contrastive loss for the source domain and a self-supervised contrastive triplet loss for the target domain, improving comprehensive feature representation learning and extraction of domain-invariant features. Finally, an effective Centre-based Entropy Classifier (CEC) is proposed specifically for anomaly detection, facilitating accurate learning of normal boundaries in the source domain. Our extensive evaluation across multiple real-world datasets against leading models in time series anomaly detection and UDA underscores DACAD's effectiveness. The results validate DACAD's superiority in transferring knowledge across domains and its potential to mitigate the challenge of limited labelled data in time series anomaly detection.
LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking
Authors: Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.11294
Pdf link: https://arxiv.org/pdf/2404.11294
Abstract Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by analyzing system logs. Among these, semi-supervised methods have garnered increasing attention as they strike a balance between relaxed labeled data requirements and optimal detection performance, contrasting with their supervised and unsupervised counterparts. However, existing semi-supervised methods overlook the potential bias introduced by highly frequent log messages on the learned normal patterns, which leads to their less than satisfactory performance. In this study, we propose LogSD, a novel semi-supervised self-supervised learning approach. LogSD employs a dual-network architecture and incorporates a frequency-based masking scheme, a global-to-local reconstruction paradigm and three self-supervised learning tasks. These features enable LogSD to focus more on relatively infrequent log messages, thereby effectively learning less biased and more discriminative patterns from historical normal data. This emphasis ultimately leads to improved anomaly detection performance. Extensive experiments have been conducted on three commonly-used datasets and the results show that LogSD significantly outperforms eight state-of-the-art benchmark methods.
Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured
Authors: Authors: Hanlin Mo, Guoying Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11309
Pdf link: https://arxiv.org/pdf/2404.11309
Abstract Achieving rotation invariance in deep neural networks without relying on data has always been a hot research topic. Intrinsic rotation invariance can enhance the model's feature representation capability, enabling better performance in tasks such as multi-orientation object recognition and detection. Based on various types of non-learnable operators, including gradient, sort, local binary pattern, maximum, etc., this paper designs a set of new convolution operations that are natually invariant to arbitrary rotations. Unlike most previous studies, these rotation-invariant convolutions (RIConvs) have the same number of learnable parameters and a similar computational process as conventional convolution operations, allowing them to be interchangeable. Using the MNIST-Rot dataset, we first verify the invariance of these RIConvs under various rotation angles and compare their performance with previous rotation-invariant convolutional neural networks (RI-CNNs). Two types of RIConvs based on gradient operators achieve state-of-the-art results. Subsequently, we combine RIConvs with different types and depths of classic CNN backbones. Using the OuTex_00012, MTARSI, and NWPU-RESISC-45 datasets, we test their performance on texture recognition, aircraft type recognition, and remote sensing image classification tasks. The results show that RIConvs significantly improve the accuracy of these CNN backbones, especially when the training data is limited. Furthermore, we find that even with data augmentation, RIConvs can further enhance model performance.
Use of Parallel Explanatory Models to Enhance Transparency of Neural Network Configurations for Cell Degradation Detection
Authors: Authors: David Mulvey, Chuan Heng Foh, Muhammad Ali Imran, Rahim Tafazolli
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.11311
Pdf link: https://arxiv.org/pdf/2404.11311
Abstract In a previous paper, we have shown that a recurrent neural network (RNN) can be used to detect cellular network radio signal degradations accurately. We unexpectedly found, though, that accuracy gains diminished as we added layers to the RNN. To investigate this, in this paper, we build a parallel model to illuminate and understand the internal operation of neural networks, such as the RNN, which store their internal state in order to process sequential inputs. This model is widely applicable in that it can be used with any input domain where the inputs can be represented by a Gaussian mixture. By looking at the RNN processing from a probability density function perspective, we are able to show how each layer of the RNN transforms the input distributions to increase detection accuracy. At the same time we also discover a side effect acting to limit the improvement in accuracy. To demonstrate the fidelity of the model we validate it against each stage of RNN processing as well as the output predictions. As a result, we have been able to explain the reasons for the RNN performance limits with useful insights for future designs for RNNs and similar types of neural network.
Leveraging Fine-Grained Information and Noise Decoupling for Remote Sensing Change Detection
Authors: Authors: Qiangang Du, Jinlong Peng, Changan Wang, Xu Chen, Qingdong He, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11318
Pdf link: https://arxiv.org/pdf/2404.11318
Abstract Change detection aims to identify remote sense object changes by analyzing data between bitemporal image pairs. Due to the large temporal and spatial span of data collection in change detection image pairs, there are often a significant amount of task-specific and task-agnostic noise. Previous effort has focused excessively on denoising, with this goes a great deal of loss of fine-grained information. In this paper, we revisit the importance of fine-grained features in change detection and propose a series of operations for fine-grained information compensation and noise decoupling (FINO). First, the context is utilized to compensate for the fine-grained information in the feature space. Next, a shape-aware and a brightness-aware module are designed to improve the capacity for representation learning. The shape-aware module guides the backbone for more precise shape estimation, guiding the backbone network in extracting object shape features. The brightness-aware module learns a overall brightness estimation to improve the model's robustness to task-agnostic noise. Finally, a task-specific noise decoupling structure is designed as a way to improve the model's ability to separate noise interference from feature similarity. With these training schemes, our proposed method achieves new state-of-the-art (SOTA) results in multiple change detection benchmarks. The code will be made available.
Single-temporal Supervised Remote Change Detection for Domain Generalization
Authors: Authors: Qiangang Du, Jinlong Peng, Xu Chen, Qingdong He, Qiang Nie, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11326
Pdf link: https://arxiv.org/pdf/2404.11326
Abstract Change detection is widely applied in remote sensing image analysis. Existing methods require training models separately for each dataset, which leads to poor domain generalization. Moreover, these methods rely heavily on large amounts of high-quality pair-labelled data for training, which is expensive and impractical. In this paper, we propose a multimodal contrastive learning (ChangeCLIP) based on visual-language pre-training for change detection domain generalization. Additionally, we propose a dynamic context optimization for prompt learning. Meanwhile, to address the data dependency issue of existing methods, we introduce a single-temporal and controllable AI-generated training strategy (SAIN). This allows us to train the model using a large number of single-temporal images without image pairs in the real world, achieving excellent generalization. Extensive experiments on series of real change detection datasets validate the superiority and strong generalization of ChangeCLIP, outperforming state-of-the-art change detection methods. Code will be available.
The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology
Authors: Authors: Juan L. Gamella, Jonas Peters, Peter Bühlmann
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.11341
Pdf link: https://arxiv.org/pdf/2404.11341
Abstract In some fields of AI, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets. Researchers must often turn to simulated data, which yields limited information about the applicability of the proposed methods to real problems. As a step forward, we have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems. The devices, which we call causal chambers, are computer-controlled laboratories that allow us to manipulate and measure an array of variables from these physical systems, providing a rich testbed for algorithms from a variety of fields. We illustrate potential applications through a series of case studies in fields such as causal discovery, out-of-distribution generalization, change point detection, independent component analysis, and symbolic regression. For applications to causal inference, the chambers allow us to carefully perform interventions. We also provide and empirically validate a causal model of each chamber, which can be used as ground truth for different tasks. All hardware and software is made open source, and the datasets are publicly available at causalchamber.org or through the Python package causalchamber.
Calibrating Bayesian Learning via Regularization, Confidence Minimization, and Selective Inference
Authors: Authors: Jiayi Huang, Sangwoo Park, Osvaldo Simeone
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.11350
Pdf link: https://arxiv.org/pdf/2404.11350
Abstract The application of artificial intelligence (AI) models in fields such as engineering is limited by the known difficulty of quantifying the reliability of an AI's decision. A well-calibrated AI model must correctly report its accuracy on in-distribution (ID) inputs, while also enabling the detection of out-of-distribution (OOD) inputs. A conventional approach to improve calibration is the application of Bayesian ensembling. However, owing to computational limitations and model misspecification, practical ensembling strategies do not necessarily enhance calibration. This paper proposes an extension of variational inference (VI)-based Bayesian learning that integrates calibration regularization for improved ID performance, confidence minimization for OOD detection, and selective calibration to ensure a synergistic use of calibration regularization and confidence minimization. The scheme is constructed successively by first introducing calibration-regularized Bayesian learning (CBNN), then incorporating out-of-distribution confidence minimization (OCM) to yield CBNN-OCM, and finally integrating also selective calibration to produce selective CBNN-OCM (SCBNN-OCM). Selective calibration rejects inputs for which the calibration performance is expected to be insufficient. Numerical results illustrate the trade-offs between ID accuracy, ID calibration, and OOD calibration attained by both frequentist and Bayesian learning methods. Among the main conclusions, SCBNN-OCM is seen to achieve best ID and OOD performance as compared to existing state-of-the-art approaches at the cost of rejecting a sufficiently large number of inputs.
Consisaug: A Consistency-based Augmentation for Polyp Detection in Endoscopy Image Analysis
Authors: Authors: Ziyu Zhou, Wenyuan Shen, Chang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11355
Pdf link: https://arxiv.org/pdf/2404.11355
Abstract Colorectal cancer (CRC), which frequently originates from initially benign polyps, remains a significant contributor to global cancer-related mortality. Early and accurate detection of these polyps via colonoscopy is crucial for CRC prevention. However, traditional colonoscopy methods depend heavily on the operator's experience, leading to suboptimal polyp detection rates. Besides, the public database are limited in polyp size and shape diversity. To enhance the available data for polyp detection, we introduce Consisaug, an innovative and effective methodology to augment data that leverages deep learning. We utilize the constraint that when the image is flipped the class label should be equal and the bonding boxes should be consistent. We implement our Consisaug on five public polyp datasets and at three backbones, and the results show the effectiveness of our method.
Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness
Authors: Authors: Hangtao Zhang, Shengshan Hu, Yichen Wang, Leo Yu Zhang, Ziqi Zhou, Xianlong Wang, Yanjun Zhang, Chao Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11357
Pdf link: https://arxiv.org/pdf/2404.11357
Abstract Object detection tasks, crucial in safety-critical systems like autonomous driving, focus on pinpointing object locations. These detectors are known to be susceptible to backdoor attacks. However, existing backdoor techniques have primarily been adapted from classification tasks, overlooking deeper vulnerabilities specific to object detection. This paper is dedicated to bridging this gap by introducing Detector Collapse} (DC), a brand-new backdoor attack paradigm tailored for object detection. DC is designed to instantly incapacitate detectors (i.e., severely impairing detector's performance and culminating in a denial-of-service). To this end, we develop two innovative attack schemes: Sponge for triggering widespread misidentifications and Blinding for rendering objects invisible. Remarkably, we introduce a novel poisoning strategy exploiting natural objects, enabling DC to act as a practical backdoor in real-world environments. Our experiments on different detectors across several benchmarks show a significant improvement ($\sim$10\%-60\% absolute and $\sim$2-7$\times$ relative) in attack efficacy over state-of-the-art attacks.
SERENE: A Collusion Resilient Replication-based Verification Framework
Authors: Authors: Amir Esmaeili, Abderrahmen Mtibaa
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.11410
Pdf link: https://arxiv.org/pdf/2404.11410
Abstract The rapid advancement of autonomous driving technology is accompanied by substantial challenges, particularly the reliance on remote task execution without ensuring a reliable and accurate returned results. This reliance on external compute servers, which may be malicious or rogue, represents a major security threat. While researchers have been exploring verifiable computing, and replication-based task verification as a simple, fast, and dependable method to assess the correctness of results. However, colluding malicious workers can easily defeat this method. Existing collusion detection and mitigation solutions often require the use of a trusted third party server or verified tasks which may be hard to guarantee, or solutions that assume the presence of a minority of colluding servers. We propose SERENE a collusion resilient replication-based verification framework that detects, and mitigates colluding workers. Unlike state-of-the-art solutions, SERENE uses a lightweight detection algorithm that detects collusion based on a single verification task. Mitigation requires a two stage process to group the workers and identifying colluding from honest workers. We implement and compare SERENE's performance to Staab et. al, resulting in an average of 50\% and 60\% accuracy improvement in detection and mitigation accuracy respectively.
EcoMLS: A Self-Adaptation Approach for Architecting Green ML-Enabled Systems
Authors: Authors: Meghana Tedla, Shubham Kulkarni, Karthik Vaidhyanathan
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.11411
Pdf link: https://arxiv.org/pdf/2404.11411
Abstract The sustainability of Machine Learning-Enabled Systems (MLS), particularly with regard to energy efficiency, is an important challenge in their development and deployment. Self-adaptation techniques, recognized for their potential in energy savings within software systems, have yet to be extensively explored in Machine Learning-Enabled Systems (MLS), where runtime uncertainties can significantly impact model performance and energy consumption. This variability, alongside the fluctuating energy demands of ML models during operation, necessitates a dynamic approach. Addressing these challenges, we introduce EcoMLS approach, which leverages the Machine Learning Model Balancer concept to enhance the sustainability of MLS through runtime ML model switching. By adapting to monitored runtime conditions, EcoMLS optimally balances energy consumption with model confidence, demonstrating a significant advancement towards sustainable, energy-efficient machine learning solutions. Through an object detection exemplar, we illustrate the application of EcoMLS, showcasing its ability to reduce energy consumption while maintaining high model accuracy throughout its use. This research underscores the feasibility of enhancing MLS sustainability through intelligent runtime adaptations, contributing a valuable perspective to the ongoing discourse on energy-efficient machine learning.
SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow
Authors: Authors: Orcun Cetintas, Tim Meinhardt, Guillem Brasó, Laura Leal-Taixé
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11426
Pdf link: https://arxiv.org/pdf/2404.11426
Abstract Increasing the annotation efficiency of trajectory annotations from videos has the potential to enable the next generation of data-hungry tracking algorithms to thrive on large-scale datasets. Despite the importance of this task, there are currently very few works exploring how to efficiently label tracking datasets comprehensively. In this work, we introduce SPAM, a tracking data engine that provides high-quality labels with minimal human intervention. SPAM is built around two key insights: i) most tracking scenarios can be easily resolved. To take advantage of this, we utilize a pre-trained model to generate high-quality pseudo-labels, reserving human involvement for a smaller subset of more difficult instances; ii) handling the spatiotemporal dependencies of track annotations across time can be elegantly and efficiently formulated through graphs. Therefore, we use a unified graph formulation to address the annotation of both detections and identity association for tracks across time. Based on these insights, SPAM produces high-quality annotations with a fraction of ground truth labeling cost. We demonstrate that trackers trained on SPAM labels achieve comparable performance to those trained on human annotations while requiring only 3-20% of the human labeling effort. Hence, SPAM paves the way towards highly efficient labeling of large-scale tracking datasets. Our code and models will be available upon acceptance.
CarcassFormer: An End-to-end Transformer-based Framework for Simultaneous Localization, Segmentation and Classification of Poultry Carcass Defect
Authors: Authors: Minh Tran, Sang Truong, Arthur F. A. Fernandes, Michael T. Kidd, Ngan Le
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11429
Pdf link: https://arxiv.org/pdf/2404.11429
Abstract In the food industry, assessing the quality of poultry carcasses during processing is a crucial step. This study proposes an effective approach for automating the assessment of carcass quality without requiring skilled labor or inspector involvement. The proposed system is based on machine learning (ML) and computer vision (CV) techniques, enabling automated defect detection and carcass quality assessment. To this end, an end-to-end framework called CarcassFormer is introduced. It is built upon a Transformer-based architecture designed to effectively extract visual representations while simultaneously detecting, segmenting, and classifying poultry carcass defects. Our proposed framework is capable of analyzing imperfections resulting from production and transport welfare issues, as well as processing plant stunner, scalder, picker, and other equipment malfunctions. To benchmark the framework, a dataset of 7,321 images was initially acquired, which contained both single and multiple carcasses per image. In this study, the performance of the CarcassFormer system is compared with other state-of-the-art (SOTA) approaches for both classification, detection, and segmentation tasks. Through extensive quantitative experiments, our framework consistently outperforms existing methods, demonstrating remarkable improvements across various evaluation metrics such as AP, AP@50, and AP@75. Furthermore, the qualitative results highlight the strengths of CarcassFormer in capturing fine details, including feathers, and accurately localizing and segmenting carcasses with high precision. To facilitate further research and collaboration, the pre-trained model and source code of CarcassFormer is available for research purposes at: \url{https://github.com/UARK-AICV/CarcassFormer}.
Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems
Authors: Authors: Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.11488
Pdf link: https://arxiv.org/pdf/2404.11488
Abstract This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized frames (192$\times$192 pixels). To tackle the accuracy degradation due to the reduced image input size, MR2-ByteTrack correlates the output detections over time using the ByteTrack tracker and corrects potential misclassification using a novel probabilistic Rescore algorithm. By interleaving two down-sized images for every high-resolution one as the input of different state-of-the-art DNN object detectors with our MR2-ByteTrack, we demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller compared to a baseline frame-by-frame inference scheme using exclusively full-resolution images. Code available at: https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack
Strategic Network Inspection with Location-Specific Detection Capabilities
Authors: Authors: Bastián Bahamondes, Mathieu Dahan
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2404.11545
Pdf link: https://arxiv.org/pdf/2404.11545
Abstract We consider a two-person network inspection game, in which a defender positions a limited number of detectors to detect multiple attacks caused by an attacker. We assume that detection is imperfect, and each detector location is associated with a probability of detecting attacks within its set of monitored network components. The objective of the defender (resp. attacker) is to minimize (resp. maximize) the expected number of undetected attacks. To compute Nash Equilibria (NE) for this large-scale zero-sum game, we formulate a linear program with a small number of constraints, which we solve via column generation. We provide an exact mixed-integer program for the pricing problem, which entails computing a defender's pure best response, and leverage its supermodular structure to derive two efficient approaches to obtain approximate NE with theoretical guarantees: A column generation and a multiplicative weights update (MWU) algorithm with approximate best responses. To address the computational challenges posed by combinatorial attacker strategies, each iteration of our MWU algorithm requires computing a projection under the unnormalized relative entropy. We provide a closed-form solution and a linear-time algorithm for the projection problem. Our computational results in real-world gas distribution networks illustrate the performance and scalability of our solution approaches.
Explainable Artificial Intelligence Techniques for Accurate Fault Detection and Diagnosis: A Review
Authors: Authors: Ahmed Maged, Salah Haridy, Herman Shen
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.11597
Pdf link: https://arxiv.org/pdf/2404.11597
Abstract As the manufacturing industry advances with sensor integration and automation, the opaque nature of deep learning models in machine learning poses a significant challenge for fault detection and diagnosis. And despite the related predictive insights Artificial Intelligence (AI) can deliver, advanced machine learning engines often remain a black box. This paper reviews the eXplainable AI (XAI) tools and techniques in this context. We explore various XAI methodologies, focusing on their role in making AI decision-making transparent, particularly in critical scenarios where humans are involved. We also discuss current limitations and potential future research that aims to balance explainability with model performance while improving trustworthiness in the context of AI applications for critical industrial use cases.
Variational Bayesian Last Layers
Authors: Authors: James Harrison, John Willes, Jasper Snoek
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.11599
Pdf link: https://arxiv.org/pdf/2404.11599
Abstract We introduce a deterministic variational formulation for training Bayesian last layer neural networks. This yields a sampling-free, single-pass model and loss that effectively improves uncertainty estimation. Our variational Bayesian last layer (VBLL) can be trained and evaluated with only quadratic complexity in last layer width, and is thus (nearly) computationally free to add to standard architectures. We experimentally investigate VBLLs, and show that they improve predictive accuracy, calibration, and out of distribution detection over baselines across both regression and classification. Finally, we investigate combining VBLL layers with variational Bayesian feature learning, yielding a lower variance collapsed variational inference method for Bayesian neural networks.
Keyword: face recognition

MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU
Authors: Authors: Xueyuan Gong, Yain-whar Si, Zheng Zhang, Xiaochen Yuan, Ke Wang, Xinyuan Zhang, Cong Lin, Xiaoxiang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11118
Pdf link: https://arxiv.org/pdf/2404.11118
Abstract Face recognition (FR) has seen significant advancements due to the utilization of large-scale datasets. Training deep FR models on large-scale datasets with multiple GPUs is now a common practice. In fact, computing power has evolved into a foundational and indispensable resource in the area of deep learning. It is nearly impossible to train a deep FR model without holding adequate hardware resources. Recognizing this challenge, some FR approaches have started exploring ways to reduce the time complexity of the fully-connected layer in FR models. Unlike other approaches, this paper introduces a simple yet highly effective approach, Moving Haar Learning Rate (MHLR) scheduler, for scheduling the learning rate promptly and accurately in the training process. MHLR supports large-scale FR training with only one GPU, which is able to accelerate the model to 1/4 of its original training time without sacrificing more than 1% accuracy. More specifically, MHLR only needs $30$ hours to train the model ResNet100 on the dataset WebFace12M containing more than 12M face images with 0.6M identities. Extensive experiments validate the efficiency and effectiveness of MHLR.
Keyword: augmentation

Incubating Text Classifiers Following User Instruction with Nothing but LLM
Authors: Authors: Letian Peng, Jingbo Shang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.10877
Pdf link: https://arxiv.org/pdf/2404.10877
Abstract In this paper, we aim to generate text classification data given arbitrary class definitions (i.e., user instruction), so one can train a small text classifier without any human annotation or raw corpus. Compared with pioneer attempts, our proposed Incubator is the first framework that can handle complicated and even mutually dependent classes (e.g., "TED Talk given by Educator" and "Other"). Specifically, Incubator is an LLM firstly tuned on the instruction-to-data mappings that we obtained from classification datasets and descriptions on HuggingFace together with in-context augmentation by GPT-4. We then refine Incubator by learning on the cluster centers of semantic textual embeddings to emphasize the uniformity and semantic diversity in generations. We compare Incubator on various classification tasks with strong baselines such as direct LLM-based inference and training data generation by prompt engineering. Experiments show Incubator is able to (1) perform well on traditional benchmarks, (2) take label dependency and user preference into consideration, and (3) enable logical text mining by incubating multiple classifiers.
Exploring Augmentation and Cognitive Strategies for AI based Synthetic Personae
Authors: Authors: Rafael Arias Gonzalez, Steve DiPaola
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2404.10890
Pdf link: https://arxiv.org/pdf/2404.10890
Abstract Large language models (LLMs) hold potential for innovative HCI research, including the creation of synthetic personae. However, their black-box nature and propensity for hallucinations pose challenges. To address these limitations, this position paper advocates for using LLMs as data augmentation systems rather than zero-shot generators. We further propose the development of robust cognitive and memory frameworks to guide LLM responses. Initial explorations suggest that data enrichment, episodic memory, and self-reflection techniques can improve the reliability of synthetic personae and open up new avenues for HCI research.
A Concise Tiling Strategy for Preserving Spatial Context in Earth Observation Imagery
Authors: Authors: Ellianna Abrahams, Tasha Snow, Matthew R. Siegfried, Fernando Pérez
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.10927
Pdf link: https://arxiv.org/pdf/2404.10927
Abstract We propose a new tiling strategy, Flip-n-Slide, which has been developed for specific use with large Earth observation satellite images when the location of objects-of-interest (OoI) is unknown and spatial context can be necessary for class disambiguation. Flip-n-Slide is a concise and minimalistic approach that allows OoI to be represented at multiple tile positions and orientations. This strategy introduces multiple views of spatio-contextual information, without introducing redundancies into the training set. By maintaining distinct transformation permutations for each tile overlap, we enhance the generalizability of the training set without misrepresenting the true data distribution. Our experiments validate the effectiveness of Flip-n-Slide in the task of semantic segmentation, a necessary data product in geophysical studies. We find that Flip-n-Slide outperforms the previous state-of-the-art augmentation routines for tiled data in all evaluation metrics. For underrepresented classes, Flip-n-Slide increases precision by as much as 15.8%.
More Room for Language: Investigating the Effect of Retrieval on Language Models
Authors: Authors: David Samuel, Lucas Georges Gabriel Charpentier, Sondre Wold
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.10939
Pdf link: https://arxiv.org/pdf/2404.10939
Abstract Retrieval-augmented language models pose a promising alternative to standard language modeling. During pretraining, these models search in a corpus of documents for contextually relevant information that could aid the language modeling objective. We introduce an 'ideal retrieval' methodology to study these models in a fully controllable setting. We conduct an extensive evaluation to examine how retrieval augmentation affects the behavior of the underlying language model. Among other things, we observe that these models: i) save substantially less world knowledge in their weights, ii) are better at understanding local context and inter-word dependencies, but iii) are worse at comprehending global context.
Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation
Authors: Authors: Yeonguk Yu, Sungho Shin, Seunghyeok Back, Minhwan Ko, Sangjun Noh, Kyoobin Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.10966
Pdf link: https://arxiv.org/pdf/2404.10966
Abstract Test-time adaptation (TTA) aims to adapt a pre-trained model to a new test domain without access to source data after deployment. Existing approaches typically rely on self-training with pseudo-labels since ground-truth cannot be obtained from test data. Although the quality of pseudo labels is important for stable and accurate long-term adaptation, it has not been previously addressed. In this work, we propose DPLOT, a simple yet effective TTA framework that consists of two components: (1) domain-specific block selection and (2) pseudo-label generation using paired-view images. Specifically, we select blocks that involve domain-specific feature extraction and train these blocks by entropy minimization. After blocks are adjusted for current test domain, we generate pseudo-labels by averaging given test images and corresponding flipped counterparts. By simply using flip augmentation, we prevent a decrease in the quality of the pseudo-labels, which can be caused by the domain gap resulting from strong augmentation. Our experimental results demonstrate that DPLOT outperforms previous TTA methods in CIFAR10-C, CIFAR100-C, and ImageNet-C benchmarks, reducing error by up to 5.4%, 9.1%, and 2.9%, respectively. Also, we provide an extensive analysis to demonstrate effectiveness of our framework. Code is available at https://github.com/gist-ailab/domain-specific-block-selection-and-paired-view-pseudo-labeling-for-online-TTA.
CORE: Data Augmentation for Link Prediction via Information Bottleneck
Authors: Authors: Kaiwen Dong, Zhichun Guo, Nitesh V. Chawla
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2404.11032
Pdf link: https://arxiv.org/pdf/2404.11032
Abstract Link prediction (LP) is a fundamental task in graph representation learning, with numerous applications in diverse domains. However, the generalizability of LP models is often compromised due to the presence of noisy or spurious information in graphs and the inherent incompleteness of graph data. To address these challenges, we draw inspiration from the Information Bottleneck principle and propose a novel data augmentation method, COmplete and REduce (CORE) to learn compact and predictive augmentations for LP models. In particular, CORE aims to recover missing edges in graphs while simultaneously removing noise from the graph structures, thereby enhancing the model's robustness and performance. Extensive experiments on multiple benchmark datasets demonstrate the applicability and superiority of CORE over state-of-the-art methods, showcasing its potential as a leading approach for robust LP in graph representation learning.
LADDER: An Efficient Framework for Video Frame Interpolation
Authors: Authors: Tong Shen, Dong Li, Ziheng Gao, Lu Tian, Emad Barsoum
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11108
Pdf link: https://arxiv.org/pdf/2404.11108
Abstract Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc. This paper introduces an efficient video frame interpolation framework that aims to strike a favorable balance between efficiency and quality. Our framework follows a general paradigm consisting of a flow estimator and a refinement module, while incorporating carefully designed components. First of all, we adopt depth-wise convolution with large kernels in the flow estimator that simultaneously reduces the parameters and enhances the receptive field for encoding rich context and handling complex motion. Secondly, diverging from a common design for the refinement module with a UNet-structure (encoder-decoder structure), which we find redundant, our decoder-only refinement module directly enhances the result from coarse to fine features, offering a more efficient process. In addition, to address the challenge of handling high-definition frames, we also introduce an innovative HD-aware augmentation strategy during training, leading to consistent enhancement on HD images. Extensive experiments are conducted on diverse datasets, Vimeo90K, UCF101, Xiph and SNU-FILM. The results demonstrate that our approach achieves state-of-the-art performance with clear improvement while requiring much less FLOPs and parameters, reaching to a better spot for balancing efficiency and quality.
Consistency Training by Synthetic Question Generation for Conversational Question Answering
Authors: Authors: Hamed Hematian Hemati, Hamid Beigy
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.11109
Pdf link: https://arxiv.org/pdf/2404.11109
Abstract Efficiently modeling historical information is a critical component in addressing user queries within a conversational question-answering (QA) context, as historical context plays a vital role in clarifying the user's questions. However, irrelevant history induces noise in the reasoning process, especially for those questions with a considerable historical context. In our novel model-agnostic approach, referred to as CoTaH (Consistency-Trained augmented History), we augment the historical information with synthetic questions and subsequently employ consistency training to train a model that utilizes both real and augmented historical data to implicitly make the reasoning robust to irrelevant history. To the best of our knowledge, this is the first instance of research using question generation as a form of data augmentation to model conversational QA settings. By citing a common modeling error prevalent in previous research, we introduce a new baseline model and compare our model's performance against it, demonstrating an improvement in results, particularly when dealing with questions that include a substantial amount of historical context. The source code can be found on our GitHub page.
D-Aug: Enhancing Data Augmentation for Dynamic LiDAR Scenes
Authors: Authors: Jiaxing Zhao, Peng Zheng, Rui Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11127
Pdf link: https://arxiv.org/pdf/2404.11127
Abstract Creating large LiDAR datasets with pixel-level labeling poses significant challenges. While numerous data augmentation methods have been developed to reduce the reliance on manual labeling, these methods predominantly focus on static scenes and they overlook the importance of data augmentation for dynamic scenes, which is critical for autonomous driving. To address this issue, we propose D-Aug, a LiDAR data augmentation method tailored for augmenting dynamic scenes. D-Aug extracts objects and inserts them into dynamic scenes, considering the continuity of these objects across consecutive frames. For seamless insertion into dynamic scenes, we propose a reference-guided method that involves dynamic collision detection and rotation alignment. Additionally, we present a pixel-level road identification strategy to efficiently determine suitable insertion positions. We validated our method using the nuScenes dataset with various 3D detection and tracking methods. Comparative experiments demonstrate the superiority of D-Aug.
GhostNetV3: Exploring the Training Strategies for Compact Models
Authors: Authors: Zhenhua Liu, Zhiwei Hao, Kai Han, Yehui Tang, Yunhe Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11202
Pdf link: https://arxiv.org/pdf/2404.11202
Abstract Compact neural networks are specially designed for applications on edge devices with faster inference speed yet modest performance. However, training strategies of compact models are borrowed from that of conventional models at present, which ignores their difference in model capacity and thus may impede the performance of compact models. In this paper, by systematically investigating the impact of different training ingredients, we introduce a strong training strategy for compact models. We find that the appropriate designs of re-parameterization and knowledge distillation are crucial for training high-performance compact models, while some commonly used data augmentations for training conventional models, such as Mixup and CutMix, lead to worse performance. Our experiments on ImageNet-1K dataset demonstrate that our specialized training strategy for compact models is applicable to various architectures, including GhostNetV2, MobileNetV2 and ShuffleNetV2. Specifically, equipped with our strategy, GhostNetV3 1.3$\times$ achieves a top-1 accuracy of 79.1% with only 269M FLOPs and a latency of 14.46ms on mobile devices, surpassing its ordinarily trained counterpart by a large margin. Moreover, our observation can also be extended to object detection scenarios. PyTorch code and checkpoints can be found at https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnetv3_pytorch.
Simple In-place Data Augmentation for Surveillance Object Detection
Authors: Authors: Munkh-Erdene Otgonbold, Ganzorig Batnasan, Munkhjargal Gochoo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11226
Pdf link: https://arxiv.org/pdf/2404.11226
Abstract Motivated by the need to improve model performance in traffic monitoring tasks with limited labeled samples, we propose a straightforward augmentation technique tailored for object detection datasets, specifically designed for stationary camera-based applications. Our approach focuses on placing objects in the same positions as the originals to ensure its effectiveness. By applying in-place augmentation on objects from the same camera input image, we address the challenge of overlapping with original and previously selected objects. Through extensive testing on two traffic monitoring datasets, we illustrate the efficacy of our augmentation strategy in improving model performance, particularly in scenarios with limited labeled samples and imbalanced class distributions. Notably, our method achieves comparable performance to models trained on the entire dataset while utilizing only 8.5 percent of the original data. Moreover, we report significant improvements, with mAP@.5 increasing from 0.4798 to 0.5025, and the mAP@.5:.95 rising from 0.29 to 0.3138 on the FishEye8K dataset. These results highlight the potential of our augmentation approach in enhancing object detection models for traffic monitoring applications.
The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data
Authors: Authors: Zixuan Zhu, Rui Wang, Cong Zou, Lihua Jing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11265
Pdf link: https://arxiv.org/pdf/2404.11265
Abstract Recently, backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs). The attacked model behaves normally on benign samples but outputs a specific result when the trigger is present. However, compared with the rocketing progress of backdoor attacks, existing defenses are difficult to deal with these threats effectively or require benign samples to work, which may be unavailable in real scenarios. In this paper, we find that the poisoned samples and benign samples can be distinguished with prediction entropy. This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples. Firstly, we sacrifice the Victim network to be a powerful poisoned sample detector by training on suspicious samples. Secondly, we train the Beneficiary network on the credible samples selected by the Victim to inhibit backdoor injection. Thirdly, a semi-supervised suppression strategy is adopted for erasing potential backdoors and improving model performance. Furthermore, to better inhibit missed poisoned samples, we propose a strong data augmentation method, AttentionMix, which works well with our proposed V&B framework. Extensive experiments on two widely used datasets against 6 state-of-the-art attacks demonstrate that our framework is effective in preventing backdoor injection and robust to various attacks while maintaining the performance on benign samples. Our code is available at https://github.com/Zixuan-Zhu/VaB.
Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured
Authors: Authors: Hanlin Mo, Guoying Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.11309
Pdf link: https://arxiv.org/pdf/2404.11309
Abstract Achieving rotation invariance in deep neural networks without relying on data has always been a hot research topic. Intrinsic rotation invariance can enhance the model's feature representation capability, enabling better performance in tasks such as multi-orientation object recognition and detection. Based on various types of non-learnable operators, including gradient, sort, local binary pattern, maximum, etc., this paper designs a set of new convolution operations that are natually invariant to arbitrary rotations. Unlike most previous studies, these rotation-invariant convolutions (RIConvs) have the same number of learnable parameters and a similar computational process as conventional convolution operations, allowing them to be interchangeable. Using the MNIST-Rot dataset, we first verify the invariance of these RIConvs under various rotation angles and compare their performance with previous rotation-invariant convolutional neural networks (RI-CNNs). Two types of RIConvs based on gradient operators achieve state-of-the-art results. Subsequently, we combine RIConvs with different types and depths of classic CNN backbones. Using the OuTex_00012, MTARSI, and NWPU-RESISC-45 datasets, we test their performance on texture recognition, aircraft type recognition, and remote sensing image classification tasks. The results show that RIConvs significantly improve the accuracy of these CNN backbones, especially when the training data is limited. Furthermore, we find that even with data augmentation, RIConvs can further enhance model performance.
JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA
Authors: Authors: Zeyu Zhang, Xuyin Qi, Mingxi Chen, Guangxi Li, Ryan Pham, Ayub Zuhair, Ella Berry, Zhibin Liao, Owen Siggs, Robert Mclaughlin, Jamie Craig, Minh-Son To
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.11525
Pdf link: https://arxiv.org/pdf/2404.11525
Abstract The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offering the potential for diagnosing sleep-related disorders. To bridge this gap, our paper presents three key contributions. Firstly, we propose JointViT, a novel model based on the Vision Transformer architecture, incorporating a joint loss function for supervision. Secondly, we introduce a balancing augmentation technique during data preprocessing to improve the model's performance, particularly on the long-tail distribution within the OCTA dataset. Lastly, through comprehensive experiments on the OCTA dataset, our proposed method significantly outperforms other state-of-the-art methods, achieving improvements of up to 12.28% in overall accuracy. This advancement lays the groundwork for the future utilization of OCTA in diagnosing sleep-related disorders. See project website https://steve-zeyu-zhang.github.io/JointViT

LeeKyungwook / get-arxiv-noti

New submissions for Thu, 18 Apr 24 #1068

Keyword: detection

Authenticity in Authorship: The Writer's Integrity Framework for Verifying Human-Generated Text

Graph Vertex Embeddings: Distance, Regularization and Community Detection

Multimodal Attack Detection for Action Recognition Models

Reconfigurable Edge Hardware for Intelligent IDS: Systematic Approach

Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective

Advancing Network Intrusion Detection: Integrating Graph Neural Networks with Scattering Transform and Node2Vec for Enhanced Anomaly Detection

Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning

UruDendro, a public dataset of cross-section images of Pinus taeda

OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery

Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network

Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection

Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images

How to deal with glare for improved perception of Autonomous Vehicles

Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement

Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection

Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon

KernJC: Automated Vulnerable Environment Generation for Linux Kernel Vulnerabilities

D-Aug: Enhancing Data Augmentation for Dynamic LiDAR Scenes

On the Performance of RIS-assisted Networks with HQAM

Personalized Heart Disease Detection via ECG Digital Twin Generation

FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document

GhostNetV3: Exploring the Training Strategies for Compact Models

Prompt-tuning for Clickbait Detection via Text Summarization

Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions

AndroLog: Android Instrumentation and Code Coverage Analysis

Simple In-place Data Augmentation for Surveillance Object Detection

ONOT: a High-Quality ICAO-compliant Synthetic Mugshot Dataset

Optical Image-to-Image Translation Using Denoising Diffusion Models: Heterogeneous Change Detection as a Use Case

DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series

LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking

Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured

Use of Parallel Explanatory Models to Enhance Transparency of Neural Network Configurations for Cell Degradation Detection

Leveraging Fine-Grained Information and Noise Decoupling for Remote Sensing Change Detection

Single-temporal Supervised Remote Change Detection for Domain Generalization

The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology

Calibrating Bayesian Learning via Regularization, Confidence Minimization, and Selective Inference

Consisaug: A Consistency-based Augmentation for Polyp Detection in Endoscopy Image Analysis

Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness

SERENE: A Collusion Resilient Replication-based Verification Framework

EcoMLS: A Self-Adaptation Approach for Architecting Green ML-Enabled Systems

SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow

CarcassFormer: An End-to-end Transformer-based Framework for Simultaneous Localization, Segmentation and Classification of Poultry Carcass Defect

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

Strategic Network Inspection with Location-Specific Detection Capabilities

Explainable Artificial Intelligence Techniques for Accurate Fault Detection and Diagnosis: A Review

Variational Bayesian Last Layers

Keyword: face recognition

MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU

Keyword: augmentation

Incubating Text Classifiers Following User Instruction with Nothing but LLM

Exploring Augmentation and Cognitive Strategies for AI based Synthetic Personae

A Concise Tiling Strategy for Preserving Spatial Context in Earth Observation Imagery

More Room for Language: Investigating the Effect of Retrieval on Language Models

Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation

CORE: Data Augmentation for Link Prediction via Information Bottleneck

LADDER: An Efficient Framework for Video Frame Interpolation

Consistency Training by Synthetic Question Generation for Conversational Question Answering

D-Aug: Enhancing Data Augmentation for Dynamic LiDAR Scenes

GhostNetV3: Exploring the Training Strategies for Compact Models

Simple In-place Data Augmentation for Surveillance Object Detection

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured

JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA