New submissions for Tue, 9 Jan 24

Keyword: detection

Forensic Video Analytic Software

Authors: Authors: Anton Jeran Ratnarajah, Sahani Goonetilleke, Dumindu Tissera, Kapilan Balagopalan, Ranga Rodrigo
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.02960
Pdf link: https://arxiv.org/pdf/2401.02960
Abstract Law enforcement officials heavily depend on Forensic Video Analytic (FVA) Software in their evidence extraction process. However present-day FVA software are complex, time consuming, equipment dependent and expensive. Developing countries struggle to gain access to this gateway to a secure haven. The term forensic pertains the application of scientific methods to the investigation of crime through post-processing, whereas surveillance is the close monitoring of real-time feeds. The principle objective of this Final Year Project was to develop an efficient and effective FVA Software, addressing the shortcomings through a stringent and systematic review of scholarly research papers, online databases and legal documentation. The scope spans multiple object detection, multiple object tracking, anomaly detection, activity recognition, tampering detection, general and specific image enhancement and video synopsis. Methods employed include many machine learning techniques, GPU acceleration and efficient, integrated architecture development both for real-time and postprocessing. For this CNN, GMM, multithreading and OpenCV C++ coding were used. The implications of the proposed methodology would rapidly speed up the FVA process especially through the novel video synopsis research arena. This project has resulted in three research outcomes Moving Object Based Collision Free Video Synopsis, Forensic and Surveillance Analytic Tool Architecture and Tampering Detection Inter-Frame Forgery. The results include forensic and surveillance panel outcomes with emphasis on video synopsis and Sri Lankan context. Principal conclusions include the optimization and efficient algorithm integration to overcome limitations in processing power, memory and compromise between real-time performance and accuracy.
Deep Anomaly Detection in Text
Authors: Authors: Andrei Manolache
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.02971
Pdf link: https://arxiv.org/pdf/2401.02971
Abstract Deep anomaly detection methods have become increasingly popular in recent years, with methods like Stacked Autoencoders, Variational Autoencoders, and Generative Adversarial Networks greatly improving the state-of-the-art. Other methods rely on augmenting classical models (such as the One-Class Support Vector Machine), by learning an appropriate kernel function using Neural Networks. Recent developments in representation learning by self-supervision are proving to be very beneficial in the context of anomaly detection. Inspired by the advancements in anomaly detection using self-supervised learning in the field of computer vision, this thesis aims to develop a method for detecting anomalies by exploiting pretext tasks tailored for text corpora. This approach greatly improves the state-of-the-art on two datasets, 20Newsgroups, and AG News, for both semi-supervised and unsupervised anomaly detection, thus proving the potential for self-supervised anomaly detectors in the field of natural language processing.
CANAMRF: An Attention-Based Model for Multimodal Depression Detection
Authors: Authors: Yuntao Wei, Yuzhe Zhang, Shuyang Zhang, Hong Zhang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.02995
Pdf link: https://arxiv.org/pdf/2401.02995
Abstract Multimodal depression detection is an important research topic that aims to predict human mental states using multimodal data. Previous methods treat different modalities equally and fuse each modality by na\"ive mathematical operations without measuring the relative importance between them, which cannot obtain well-performed multimodal representations for downstream depression tasks. In order to tackle the aforementioned concern, we present a Cross-modal Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for multimodal depression detection. CANAMRF is constructed by a multimodal feature extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid Attention Module. Through experimentation on two benchmark datasets, CANAMRF demonstrates state-of-the-art performance, underscoring the effectiveness of our proposed approach.
Advancing DDoS Attack Detection: A Synergistic Approach Using Deep Residual Neural Networks and Synthetic Oversampling
Authors: Authors: Ali Alfatemi, Mohamed Rahouti, Ruhul Amin, Sarah ALJamal, Kaiqi Xiong, Yufeng Xin
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03116
Pdf link: https://arxiv.org/pdf/2401.03116
Abstract Distributed Denial of Service (DDoS) attacks pose a significant threat to the stability and reliability of online systems. Effective and early detection of such attacks is pivotal for safeguarding the integrity of networks. In this work, we introduce an enhanced approach for DDoS attack detection by leveraging the capabilities of Deep Residual Neural Networks (ResNets) coupled with synthetic oversampling techniques. Because of the inherent class imbalance in many cyber-security datasets, conventional methods often struggle with false negatives, misclassifying subtle DDoS patterns as benign. By applying the Synthetic Minority Over-sampling Technique (SMOTE) to the CICIDS dataset, we balance the representation of benign and malicious data points, enabling the model to better discern intricate patterns indicative of an attack. Our deep residual network, tailored for this specific task, further refines the detection process. Experimental results on a real-world dataset demonstrate that our approach achieves an accuracy of 99.98%, significantly outperforming traditional methods. This work underscores the potential of combining advanced data augmentation techniques with deep learning models to bolster cyber-security defenses.
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection
Authors: Authors: Yuanpeng Tu, Boshen Zhang, Liang Liu, Yuxi Li, Chenhai Xu, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Cai Rong Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03145
Pdf link: https://arxiv.org/pdf/2401.03145
Abstract Industrial anomaly detection is generally addressed as an unsupervised task that aims at locating defects with only normal training samples. Recently, numerous 2D anomaly detection methods have been proposed and have achieved promising results, however, using only the 2D RGB data as input is not sufficient to identify imperceptible geometric surface anomalies. Hence, in this work, we focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets, i.e., ImageNet, to construct feature databases. And we empirically find that directly using these pre-trained models is not optimal, it can either fail to detect subtle defects or mistake abnormal features as normal ones. This may be attributed to the domain gap between target industrial data and source data.Towards this problem, we propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.Both intra-modal adaptation and cross-modal alignment are optimized from a local-to-global perspective in LSFA to ensure the representation quality and consistency in the inference stage.Extensive experiments demonstrate that our method not only brings a significant performance boost to feature embedding based approaches, but also outperforms previous State-of-The-Art (SoTA) methods prominently on both MVTec-3D AD and Eyecandies datasets, e.g., LSFA achieves 97.1% I-AUROC on MVTec-3D, surpass previous SoTA by +3.4%.
Semi-supervised learning via DQN for log anomaly detection
Authors: Authors: Yingying He, Xiaobing Pei, Lihong Shen
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03151
Pdf link: https://arxiv.org/pdf/2401.03151
Abstract Log anomaly detection plays a critical role in ensuring the security and maintenance of modern software systems. At present, the primary approach for detecting anomalies in log data is through supervised anomaly detection. Nonetheless, existing supervised methods heavily rely on labeled data, which can be frequently limited in real-world scenarios. In this paper, we propose a semi-supervised log anomaly detection method that combines the DQN algorithm from deep reinforcement learning, which is called DQNLog. DQNLog leverages a small amount of labeled data and a large-scale unlabeled dataset, effectively addressing the challenges of imbalanced data and limited labeling. This approach not only learns known anomalies by interacting with an environment biased towards anomalies but also discovers unknown anomalies by actively exploring the unlabeled dataset. Additionally, DQNLog incorporates a cross-entropy loss term to prevent model overestimation during Deep Reinforcement Learning (DRL). Our evaluation on three widely-used datasets demonstrates that DQNLog significantly improves recall rate and F1-score while maintaining precision, validating its practicality.
Controllable Image Synthesis of Industrial Data Using Stable Diffusion
Authors: Authors: Gabriele Valvano, Antonino Agostino, Giovanni De Magistris, Antonino Graziano, Giacomo Veneri
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03152
Pdf link: https://arxiv.org/pdf/2401.03152
Abstract Training supervised deep neural networks that perform defect detection and segmentation requires large-scale fully-annotated datasets, which can be hard or even impossible to obtain in industrial environments. Generative AI offers opportunities to enlarge small industrial datasets artificially, thus enabling the usage of state-of-the-art supervised approaches in the industry. Unfortunately, also good generative models need a lot of data to train, while industrial datasets are often tiny. Here, we propose a new approach for reusing general-purpose pre-trained generative models on industrial data, ultimately allowing the generation of self-labelled defective images. First, we let the model learn the new concept, entailing the novel data distribution. Then, we force it to learn to condition the generative process, producing industrial images that satisfy well-defined topological characteristics and show defects with a given geometry and location. To highlight the advantage of our approach, we use the synthetic dataset to optimise a crack segmentor for a real industrial use case. When the available data is small, we observe considerable performance increase under several metrics, showing the method's potential in production environments.
Learning Persistent Community Structures in Dynamic Networks via Topological Data Analysis
Authors: Authors: Dexu Kong, Anping Zhang, Yang Li
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03194
Pdf link: https://arxiv.org/pdf/2401.03194
Abstract Dynamic community detection methods often lack effective mechanisms to ensure temporal consistency, hindering the analysis of network evolution. In this paper, we propose a novel deep graph clustering framework with temporal consistency regularization on inter-community structures, inspired by the concept of minimal network topological changes within short intervals. Specifically, to address the representation collapse problem, we first introduce MFC, a matrix factorization-based deep graph clustering algorithm that preserves node embedding. Based on static clustering results, we construct probabilistic community networks and compute their persistence homology, a robust topological measure, to assess structural similarity between them. Moreover, a novel neural network regularization TopoReg is introduced to ensure the preservation of topological similarity between inter-community structures over time intervals. Our approach enhances temporal consistency and clustering accuracy on real-world datasets with both fixed and varying numbers of communities. It is also a pioneer application of TDA in temporally persistent community detection, offering an insightful contribution to field of network analysis. Code and data are available at the public git repository: https://github.com/kundtx/MFC_TopoReg
SecureReg: A Combined Framework for Proactively Exposing Malicious Domain Name Registrations
Authors: Authors: Furkan Çolhak, Mert İlhan Ecevit, Hasan Dağ, Reiner Creutzburg
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.03196
Pdf link: https://arxiv.org/pdf/2401.03196
Abstract Rising cyber threats, with miscreants registering thousands of new domains daily for Internet-scale attacks like spam, phishing, and drive-by downloads, emphasize the need for innovative detection methods. This paper introduces a cutting-edge approach for identifying suspicious domains at the onset of the registration process. The accompanying data pipeline generates crucial features by comparing new domains to registered domains,emphasizing the crucial similarity score. Leveraging a novel combination of Natural Language Processing (NLP) techniques, including a pretrained Canine model, and Multilayer Perceptron (MLP) models, our system analyzes semantic and numerical attributes, providing a robust solution for early threat detection. This integrated approach significantly reduces the window of vulnerability, fortifying defenses against potential threats. The findings demonstrate the effectiveness of the integrated approach and contribute to the ongoing efforts in developing proactive strategies to mitigate the risks associated with illicit online activities through the early identification of suspicious domain registrations.
The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models
Authors: Authors: Junyi Li, Jie Chen, Ruiyang Ren, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.03205
Pdf link: https://arxiv.org/pdf/2401.03205
Abstract In the era of large language models (LLMs), hallucination (i.e., the tendency to generate factually incorrect content) poses great challenge to trustworthy and reliable deployment of LLMs in real-world applications. To tackle the LLM hallucination, three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them (mitigation). To address these challenges, this work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation. Specially, we construct a new hallucination benchmark HaluEval 2.0, and designs a simple yet effective detection method for LLM hallucination. Furthermore, we zoom into the different training or utilization stages of LLMs and extensively analyze the potential factors that lead to the LLM hallucination. Finally, we implement and examine a series of widely used techniques to mitigate the hallucinations in LLMs. Our work has led to several important findings to understand the hallucination origin and mitigate the hallucinations in LLMs. Our code and data can be accessed at https://github.com/RUCAIBox/HaluEval-2.0.
SeqNAS: Neural Architecture Search for Event Sequence Classification
Authors: Authors: Igor Udovichenko, Egor Shvetsov, Denis Divitsky, Dmitry Osin, Ilya Trofimov, Anatoly Glushenko, Ivan Sukharev, Dmitry Berestenev, Evgeny Burnaev
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.03246
Pdf link: https://arxiv.org/pdf/2401.03246
Abstract Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregular timestamps. Despite the usefulness of NAS methods previous approaches only have been applied to other domains images texts or time series. Our work addresses this limitation by introducing a novel NAS algorithm SeqNAS specifically designed for event sequence classification. We develop a simple yet expressive search space that leverages commonly used building blocks for event sequence classification including multihead self attention convolutions and recurrent cells. To perform the search we adopt sequential Bayesian Optimization and utilize previously trained models as an ensemble of teachers to augment knowledge distillation. As a result of our work we demonstrate that our method surpasses state of the art NAS methods and popular architectures suitable for sequence classification and holds great potential for various industrial applications.
Group Activity Recognition using Unreliable Tracked Pose
Authors: Authors: Haritha Thilakarathne, Aiden Nibali, Zhen He, Stuart Morgan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03262
Pdf link: https://arxiv.org/pdf/2401.03262
Abstract Group activity recognition in video is a complex task due to the need for a model to recognise the actions of all individuals in the video and their complex interactions. Recent studies propose that optimal performance is achieved by individually tracking each person and subsequently inputting the sequence of poses or cropped images/optical flow into a model. This helps the model to recognise what actions each person is performing before they are merged to arrive at the group action class. However, all previous models are highly reliant on high quality tracking and have only been evaluated using ground truth tracking information. In practice it is almost impossible to achieve highly reliable tracking information for all individuals in a group activity video. We introduce an innovative deep learning-based group activity recognition approach called Rendered Pose based Group Activity Recognition System (RePGARS) which is designed to be tolerant of unreliable tracking and pose information. Experimental results confirm that RePGARS outperforms all existing group activity recognition algorithms tested which do not use ground truth detection and tracking information.
Real Time Human Detection by Unmanned Aerial Vehicles
Authors: Authors: Walid Guettala, Ali Sayah, Laid Kahloul, Ahmed Tibermacine
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.03275
Pdf link: https://arxiv.org/pdf/2401.03275
Abstract One of the most important problems in computer vision and remote sensing is object detection, which identifies particular categories of diverse things in pictures. Two crucial data sources for public security are the thermal infrared (TIR) remote sensing multi-scenario photos and videos produced by unmanned aerial vehicles (UAVs). Due to the small scale of the target, complex scene information, low resolution relative to the viewable videos, and dearth of publicly available labeled datasets and training models, their object detection procedure is still difficult. A UAV TIR object detection framework for pictures and videos is suggested in this study. The Forward-looking Infrared (FLIR) cameras used to gather ground-based TIR photos and videos are used to create the ``You Only Look Once'' (YOLO) model, which is based on CNN architecture. Results indicated that in the validating task, detecting human object had an average precision at IOU (Intersection over Union) = 0.5, which was 72.5\%, using YOLOv7 (YOLO version 7) state of the art model \cite{1}, while the detection speed around 161 frames per second (FPS/second). The usefulness of the YOLO architecture is demonstrated in the application, which evaluates the cross-detection performance of people in UAV TIR videos under a YOLOv7 model in terms of the various UAVs' observation angles. The qualitative and quantitative evaluation of object detection from TIR pictures and videos using deep-learning models is supported favorably by this work.
Multi-View 3D Instance Segmentation of Structural Anomalies for Enhanced Structural Inspection of Concrete Bridges
Authors: Authors: Christian Benz, Volker Rodehorst
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03298
Pdf link: https://arxiv.org/pdf/2401.03298
Abstract For effective structural damage assessment, the instances of damages need to be localized in the world of a 3D model. Due to a lack of data, the detection of structural anomalies can currently not be directly learned and performed in 3D space. In this work, a three-stage approach is presented, which uses the good performance of detection models on image level to segment instances of anomalies in the 3D space. In the detection stage, semantic segmentation predictions are produced on image level. The mapping stage transfers the image-level prediction onto the respective point cloud. In the extraction stage, 3D anomaly instances are extracted from the segmented point cloud. Cloud contraction is used to transform cracks into their medial axis representation. For areal anomalies the bounding polygon is extracted by means of alpha shapes. The approach covers the classes crack, spalling, and corrosion and the three image-level segmentation models TopoCrack, nnU-Net, and DetectionHMA are compared. Granted a localization tolerance of 4cm, IoUs of over 90% can be achieved for crack and corrosion and 41% for spalling, which appears to be a specifically challenging class. Detection on instance-level measured in AP is about 45% for crack and spalling and 73% for corrosion.
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital Twins
Authors: Authors: João Borges, Felipe Bastos, Ilan Correa, Pedro Batista, Aldebaro Klautau
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.03310
Pdf link: https://arxiv.org/pdf/2401.03310
Abstract Digital twins are an important technology for advancing mobile communications, specially in use cases that require simultaneously simulating the wireless channel, 3D scenes and machine learning. Aiming at providing a solution to this demand, this work describes a modular co-simulation methodology called CAVIAR. Here, CAVIAR is upgraded to support a message passing library and enable the virtual counterpart of a digital twin system using different 6G-related simulators. The main contributions of this work are the detailed description of different CAVIAR architectures, the implementation of this methodology to assess a 6G use case of UAV-based search and rescue mission (SAR), and the generation of benchmarking data about the computational resource usage. For executing the SAR co-simulation we adopt five open-source solutions: the physical and link level network simulator Sionna, the simulator for autonomous vehicles AirSim, scikit-learn for training a decision tree for MIMO beam selection, Yolov8 for the detection of rescue targets and NATS for message passing. Results for the implemented SAR use case suggest that the methodology can run in a single machine, with the main demanded resources being the CPU processing and the GPU memory.
Spatiotemporally adaptive compression for scientific dataset with feature preservation -- a case study on simulation data with extreme climate events analysis
Authors: Authors: Qian Gong, Chengzhu Zhang, Xin Liang, Viktor Reshniak, Jieyang Chen, Anand Rangarajan, Sanjay Ranka, Nicolas Vidal, Lipeng Wan, Paul Ullrich, Norbert Podhorszki, Robert Jacob, Scott Klasky
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2401.03317
Pdf link: https://arxiv.org/pdf/2401.03317
Abstract Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy compression. We investigate the trade-off between data precision and temporal output rates, revealing that reducing data precision and increasing timestep frequency lead to more accurate analysis outcomes. Additionally, we integrate spatiotemporal feature detection with data compression and demonstrate that performing adaptive error-bounded compression in higher dimensional space enables greater compression ratios, leveraging the error propagation theory of a transformation-based compressor. To evaluate our approach, we conduct experiments using the well-known E3SM climate simulation code and apply our method to compress variables used for cyclone tracking. Our results show a significant reduction in storage size while enhancing the quality of cyclone tracking analysis, both quantitatively and qualitatively, in comparison to the prevalent timestep decimation approach. Compared to three state-of-the-art lossy compressors lacking feature preservation capabilities, our adaptive compression framework improves perfectly matched cases in TC tracking by 26.4-51.3% at medium compression ratios and by 77.3-571.1% at large compression ratios, with a merely 5-11% computational overhead.
Attention and Autoencoder Hybrid Model for Unsupervised Online Anomaly Detection
Authors: Authors: Seyed Amirhossein Najafi, Mohammad Hassan Asemani, Peyman Setoodeh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.03322
Pdf link: https://arxiv.org/pdf/2401.03322
Abstract This paper introduces a hybrid attention and autoencoder (AE) model for unsupervised online anomaly detection in time series. The autoencoder captures local structural patterns in short embeddings, while the attention model learns long-term features, facilitating parallel computing with positional encoding. Unique in its approach, our proposed hybrid model combines attention and autoencoder for the first time in time series anomaly detection. It employs an attention-based mechanism, akin to the deep transformer model, with key architectural modifications for predicting the next time step window in the autoencoder's latent space. The model utilizes a threshold from the validation dataset for anomaly detection and introduces an alternative method based on analyzing the first statistical moment of error, improving accuracy without dependence on a validation dataset. Evaluation on diverse real-world benchmark datasets and comparing with other well-established models, confirms the effectiveness of our proposed model in anomaly detection.
Walnut Detection Through Deep Learning Enhanced by Multispectral Synthetic Images
Authors: Authors: Kaiming Fu, Tong Lei, Maryia Halubok, Brian N. Bailey
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03331
Pdf link: https://arxiv.org/pdf/2401.03331
Abstract The accurate identification of walnuts within orchards brings forth a plethora of advantages, profoundly amplifying the efficiency and productivity of walnut orchard management. Nevertheless, the unique characteristics of walnut trees, characterized by their closely resembling shapes, colors, and textures between the walnuts and leaves, present a formidable challenge in precisely distinguishing between them during the annotation process. In this study, we present a novel approach to improve walnut detection efficiency, utilizing YOLOv5 trained on an enriched image set that incorporates both real and synthetic RGB and NIR images. Our analysis comparing results from our original and augmented datasets shows clear improvements in detection when using the synthetic images.
3GPP Release 18 Wake-up Receiver: Feature Overview and Evaluations
Authors: Authors: Andreas Hoglund, Mohammad Mozaffari, Yanpeng Yang, Giuseppe Moschetti, Kittipong Kittichokechai, Ravikiran Nory
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.03333
Pdf link: https://arxiv.org/pdf/2401.03333
Abstract Enhancing the energy efficiency of devices stands as one of the key requirements in the fifth-generation (5G) cellular network and its evolutions toward the next generation wireless technology. Specifically, for battery-limited Internet-of-Things (IoT) devices where downlink monitoring significantly contributes to energy consumption, efficient solutions are required for power saving while addressing performance tradeoffs. In this regard, the use of a low-power wake-up receiver (WUR) and wake-up signal (WUS) is an attractive solution for reducing the energy consumption of devices without compromising the downlink latency. This paper provides an overview of the standardization study on the design of low-power WUR and WUS within Release 18 of the third-generation partnership project (3GPP). We describe design principles, receiver architectures, waveform characteristics, and device procedures upon detection of WUS. In addition, we provide representative results to show the performance of the WUR in terms of power saving, coverage, and network overhead along with highlighting design tradeoffs.
Classifying cow stall numbers using YOLO
Authors: Authors: Dheeraj Vajjarapu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03340
Pdf link: https://arxiv.org/pdf/2401.03340
Abstract This paper introduces the CowStallNumbers dataset, a collection of images extracted from videos focusing on cow teats, designed to advance the field of cow stall number detection. The dataset comprises 1042 training images and 261 test images, featuring stall numbers ranging from 0 to 60. To enhance the dataset, we performed fine-tuning on a YOLO model and applied data augmentation techniques, including random crop, center crop, and random rotation. The experimental outcomes demonstrate a notable 95.4\% accuracy in recognizing stall numbers.
Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection
Authors: Authors: Zhangkai Wu, Longbing Cao, Qi Zhang, Junxian Zhou, Hui Chen
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.03341
Pdf link: https://arxiv.org/pdf/2401.03341
Abstract Due to their unsupervised training and uncertainty estimation, deep Variational Autoencoders (VAEs) have become powerful tools for reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based TSAD methods, either statistical or deep, tune meta-priors to estimate the likelihood probability for effectively capturing spatiotemporal dependencies in the data. However, these methods confront the challenge of inherent data scarcity, which is often the case in anomaly detection tasks. Such scarcity easily leads to latent holes, discontinuous regions in latent space, resulting in non-robust reconstructions on these discontinuous spaces. We propose a novel generative framework that combines VAEs with self-supervised learning (SSL) to address this issue.
An Investigation of Large Language Models for Real-World Hate Speech Detection
Authors: Authors: Keyan Guo, Alexander Hu, Jaden Mu, Ziheng Shi, Ziming Zhao, Nishant Vishwamitra, Hongxin Hu
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2401.03346
Pdf link: https://arxiv.org/pdf/2401.03346
Abstract Hate speech has emerged as a major problem plaguing our social spaces today. While there have been significant efforts to address this problem, existing methods are still significantly limited in effectively detecting hate speech online. A major limitation of existing methods is that hate speech detection is a highly contextual problem, and these methods cannot fully capture the context of hate speech to make accurate predictions. Recently, large language models (LLMs) have demonstrated state-of-the-art performance in several natural language tasks. LLMs have undergone extensive training using vast amounts of natural language data, enabling them to grasp intricate contextual details. Hence, they could be used as knowledge bases for context-aware hate speech detection. However, a fundamental problem with using LLMs to detect hate speech is that there are no studies on effectively prompting LLMs for context-aware hate speech detection. In this study, we conduct a large-scale study of hate speech detection, employing five established hate speech datasets. We discover that LLMs not only match but often surpass the performance of current benchmark machine learning models in identifying hate speech. By proposing four diverse prompting strategies that optimize the use of LLMs in detecting hate speech. Our study reveals that a meticulously crafted reasoning prompt can effectively capture the context of hate speech by fully utilizing the knowledge base in LLMs, significantly outperforming existing techniques. Furthermore, although LLMs can provide a rich knowledge base for the contextual detection of hate speech, suitable prompting strategies play a crucial role in effectively leveraging this knowledge base for efficient detection.
Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks
Authors: Authors: Puja Trivedi, Mark Heimann, Rushil Anirudh, Danai Koutra, Jayaraman J. Thiagarajan
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.03350
Pdf link: https://arxiv.org/pdf/2401.03350
Abstract While graph neural networks (GNNs) are widely used for node and graph representation learning tasks, the reliability of GNN uncertainty estimates under distribution shifts remains relatively under-explored. Indeed, while post-hoc calibration strategies can be used to improve in-distribution calibration, they need not also improve calibration under distribution shift. However, techniques which produce GNNs with better intrinsic uncertainty estimates are particularly valuable, as they can always be combined with post-hoc strategies later. Therefore, in this work, we propose G-$\Delta$UQ, a novel training framework designed to improve intrinsic GNN uncertainty estimates. Our framework adapts the principle of stochastic data centering to graph data through novel graph anchoring strategies, and is able to support partially stochastic GNNs. While, the prevalent wisdom is that fully stochastic networks are necessary to obtain reliable estimates, we find that the functional diversity induced by our anchoring strategies when sampling hypotheses renders this unnecessary and allows us to support G-$\Delta$UQ on pretrained models. Indeed, through extensive evaluation under covariate, concept and graph size shifts, we show that G-$\Delta$UQ leads to better calibrated GNNs for node and graph classification. Further, it also improves performance on the uncertainty-based tasks of out-of-distribution detection and generalization gap estimation. Overall, our work provides insights into uncertainty estimation for GNNs, and demonstrates the utility of G-$\Delta$UQ in obtaining reliable estimates.
Optimisation and Performance Computation of a Phase Frequency Detector Module for IoT Devices
Authors: Authors: Md. Shahriar Khan Hemel, Mamun Bin Ibne Reaz, Sawal Hamid Bin Md Ali, Mohammad Arif Sobhan Bhuiyan, Mahdi H. Miraz
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2401.03389
Pdf link: https://arxiv.org/pdf/2401.03389
Abstract The Internet of Things (IoT) is pivotal in transforming the way we live and interact with our surroundings. To cope with the advancement in technologies, it is vital to acquire accuracy with the speed. A phase frequency detector (PFD) is a critical device to regulate and provide accurate frequency in IoT devices. Designing a PFD poses challenges in achieving precise phase detection, minimising dead zones, optimising power consumption, and ensuring robust performance across various operational frequencies, necessitating complex engineering and innovative solutions. This study delves into optimising a PFD circuit, designed using 90 nm standard CMOS technology, aiming to achieve superior operational frequencies. An efficient and high-frequency PFD design is crafted and analysed using cadence virtuoso. The study focused on investigating the impact of optimising PFD design. With the optimised PFD, an operational frequency of 5 GHz has been achieved, along with a power consumption of only 29 {\mu}W. The dead zone of the PFD was only 25 ps.
Ensemble Defense System: A Hybrid IDS Approach for Effective Cyber Threat Detection
Authors: Authors: Sarah Alharbi, Arshiya Khan
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.03491
Pdf link: https://arxiv.org/pdf/2401.03491
Abstract Sophisticated cyber attacks present significant challenges for organizations in detecting and preventing such threats. To address this critical need for advanced defense mechanisms, we propose an Ensemble Defense System (EDS). An EDS is a cybersecurity framework aggregating multiple security tools designed to monitor and alert an organization during cyber attacks. The proposed EDS leverages a comprehensive range of Intrusion Detection System (IDS) capabilities by introducing a hybrid of signature-based IDS and anomaly-based IDS tools. It also incorporates Elasticsearch, an open-source Security Information and Event Management (SIEM) tool, to facilitate data analysis and interactive visualization of alerts generated from IDSs. The effectiveness of the EDS is evaluated through a payload from a bash script that executes various attacks, including port scanning, privilege escalation, and Denial-of-Service (DoS). The evaluation demonstrates the EDS's ability to detect diverse cyber attacks.
Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
Authors: Authors: Rongqin Liang, Yuanman Li, Jiantao Zhou, Xia Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03522
Pdf link: https://arxiv.org/pdf/2401.03522
Abstract Traffic anomaly detection (TAD) in driving videos is critical for ensuring the safety of autonomous driving and advanced driver assistance systems. Previous single-stage TAD methods primarily rely on frame prediction, making them vulnerable to interference from dynamic backgrounds induced by the rapid movement of the dashboard camera. While two-stage TAD methods appear to be a natural solution to mitigate such interference by pre-extracting background-independent features (such as bounding boxes and optical flow) using perceptual algorithms, they are susceptible to the performance of first-stage perceptual algorithms and may result in error propagation. In this paper, we introduce TTHF, a novel single-stage method aligning video clips with text prompts, offering a new perspective on traffic anomaly detection. Unlike previous approaches, the supervised signal of our method is derived from languages rather than orthogonal one-hot vectors, providing a more comprehensive representation. Further, concerning visual representation, we propose to model the high frequency of driving videos in the temporal domain. This modeling captures the dynamic changes of driving scenes, enhances the perception of driving behavior, and significantly improves the detection of traffic anomalies. In addition, to better perceive various types of traffic anomalies, we carefully design an attentive anomaly focusing mechanism that visually and linguistically guides the model to adaptively focus on the visual context of interest, thereby facilitating the detection of traffic anomalies. It is shown that our proposed TTHF achieves promising performance, outperforming state-of-the-art competitors by +5.4% AUC on the DoTA dataset and achieving high generalization on the DADA dataset.
Detecting Anomalies in Blockchain Transactions using Machine Learning Classifiers and Explainability Analysis
Authors: Authors: Mohammad Hasan, Mohammad Shahriar Rahman, Helge Janicke, Iqbal H. Sarker
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.03530
Pdf link: https://arxiv.org/pdf/2401.03530
Abstract As the use of Blockchain for digital payments continues to rise in popularity, it also becomes susceptible to various malicious attacks. Successfully detecting anomalies within Blockchain transactions is essential for bolstering trust in digital payments. However, the task of anomaly detection in Blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions. Although several studies have been conducted in the field, a limitation persists: the lack of explanations for the model's predictions. This study seeks to overcome this limitation by integrating eXplainable Artificial Intelligence (XAI) techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions. The Shapley Additive exPlanation (SHAP) method is employed to measure the contribution of each feature, and it is compatible with ensemble models. Moreover, we present rules for interpreting whether a Bitcoin transaction is anomalous or not. Additionally, we have introduced an under-sampling algorithm named XGBCLUS, designed to balance anomalous and non-anomalous transaction data. This algorithm is compared against other commonly used under-sampling and over-sampling techniques. Finally, the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers. Our experimental results demonstrate that: (i) XGBCLUS enhances TPR and ROC-AUC scores compared to state-of-the-art under-sampling and over-sampling techniques, and (ii) our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy, TPR, and FPR scores.
SeTformer is What You Need for Vision and Language
Authors: Authors: Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, Michael Felsberg
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03540
Pdf link: https://arxiv.org/pdf/2401.03540
Abstract The dot product self-attention (DPSA) is a fundamental component of transformers. However, scaling them to long sequences, like documents or high-resolution images, becomes prohibitively expensive due to quadratic time and memory complexities arising from the softmax operation. Kernel methods are employed to simplify computations by approximating softmax but often lead to performance drops compared to softmax attention. We propose SeTformer, a novel transformer, where DPSA is purely replaced by Self-optimal Transport (SeT) for achieving better performance and computational efficiency. SeT is based on two essential softmax properties: maintaining a non-negative attention matrix and using a nonlinear reweighting mechanism to emphasize important tokens in input sequences. By introducing a kernel cost function for optimal transport, SeTformer effectively satisfies these properties. In particular, with small and basesized models, SeTformer achieves impressive top-1 accuracies of 84.7% and 86.2% on ImageNet-1K. In object detection, SeTformer-base outperforms the FocalNet counterpart by +2.2 mAP, using 38% fewer parameters and 29% fewer FLOPs. In semantic segmentation, our base-size model surpasses NAT by +3.5 mIoU with 33% fewer parameters. SeTformer also achieves state-of-the-art results in language modeling on the GLUE benchmark. These findings highlight SeTformer's applicability in vision and language tasks.
Improving Transferability of Network Intrusion Detection in a Federated Learning Setup
Authors: Authors: Shreya Ghosh, Abu Shafin Mohammad Mahdee Jameel, Aly El Gamal
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.03560
Pdf link: https://arxiv.org/pdf/2401.03560
Abstract Network Intrusion Detection Systems (IDS) aim to detect the presence of an intruder by analyzing network packets arriving at an internet connected device. Data-driven deep learning systems, popular due to their superior performance compared to traditional IDS, depend on availability of high quality training data for diverse intrusion classes. A way to overcome this limitation is through transferable learning, where training for one intrusion class can lead to detection of unseen intrusion classes after deployment. In this paper, we provide a detailed study on the transferability of intrusion detection. We investigate practical federated learning configurations to enhance the transferability of intrusion detection. We propose two techniques to significantly improve the transferability of a federated intrusion detection system. The code for this work can be found at https://github.com/ghosh64/transferability.
Invisible Reflections: Leveraging Infrared Laser Reflections to Target Traffic Sign Perception
Authors: Authors: Takami Sato, Sri Hrushikesh Varma Bhupathiraju, Michael Clifford, Takeshi Sugawara, Qi Alfred Chen, Sara Rampazzi
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03582
Pdf link: https://arxiv.org/pdf/2401.03582
Abstract All vehicles must follow the rules that govern traffic behavior, regardless of whether the vehicles are human-driven or Connected Autonomous Vehicles (CAVs). Road signs indicate locally active rules, such as speed limits and requirements to yield or stop. Recent research has demonstrated attacks, such as adding stickers or projected colored patches to signs, that cause CAV misinterpretation, resulting in potential safety issues. Humans can see and potentially defend against these attacks. But humans can not detect what they can not observe. We have developed an effective physical-world attack that leverages the sensitivity of filterless image sensors and the properties of Infrared Laser Reflections (ILRs), which are invisible to humans. The attack is designed to affect CAV cameras and perception, undermining traffic sign recognition by inducing misclassification. In this work, we formulate the threat model and requirements for an ILR-based traffic sign perception attack to succeed. We evaluate the effectiveness of the ILR attack with real-world experiments against two major traffic sign recognition architectures on four IR-sensitive cameras. Our black-box optimization methodology allows the attack to achieve up to a 100% attack success rate in indoor, static scenarios and a >80.5% attack success rate in our outdoor, moving vehicle scenarios. We find the latest state-of-the-art certifiable defense is ineffective against ILR attacks as it mis-certifies >33.5% of cases. To address this, we propose a detection strategy based on the physical properties of IR laser reflections which can detect 96% of ILR attacks.
Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems
Authors: Authors: Victor Adewopo, Nelly Elsayed, Zag Elsayed, Murat Ozer, Constantinos Zekios, Ahmed Abdelgawad, Magdy Bayoumi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.03587
Pdf link: https://arxiv.org/pdf/2401.03587
Abstract In the dynamic urban landscape, where the interplay of vehicles and pedestrians defines the rhythm of life, integrating advanced technology for safety and efficiency is increasingly crucial. This study delves into the application of cutting-edge technological methods in smart cities, focusing on enhancing public safety through improved traffic accident detection. Action recognition plays a pivotal role in interpreting visual data and tracking object motion such as human pose estimation in video sequences. The challenges of action recognition include variability in rapid actions, limited dataset, and environmental factors such as (Weather, Illumination, and Occlusions). In this paper, we present a novel comprehensive dataset for traffic accident detection. This datasets is specifically designed to bolster computer vision and action recognition systems in predicting and detecting road traffic accidents. We integrated datasets from wide variety of data sources, road networks, weather conditions, and regions across the globe. This approach is underpinned by empirical studies, aiming to contribute to the discourse on how technology can enhance the quality of life in densely populated areas. This research aims to bridge existing research gaps by introducing benchmark datasets that leverage state-of-the-art algorithms tailored for traffic accident detection in smart cities. These dataset is expected to advance academic research and also enhance real-time accident detection applications, contributing significantly to the evolution of smart urban environments. Our study marks a pivotal step towards safer, more efficient smart cities, harnessing the power of AI and machine learning to transform urban living.
Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling
Authors: Authors: Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Hongyang Zhou, Hongfa Wang, Xu-Cheng Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03637
Pdf link: https://arxiv.org/pdf/2401.03637
Abstract Scene text spotting is a challenging task, especially for inverse-like scene text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed. In this paper, we propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS, which can effectively spot inverse-like scene texts without sacrificing general ones. Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM). To optimize and train REM, we propose a joint reading-order estimation loss consisting of a classification loss, an orthogonality loss, and a distribution loss. With the help of IBM, we can divide the initial text boundary into two symmetric control points and iteratively refine the new text boundary using a lightweight boundary refinement module (BRM) for adapting to various shapes and scales. To alleviate the incompatibility between text detection and recognition, we propose a dynamic sampling module (DSM) with a thin-plate spline that can dynamically sample appropriate features for recognition in the detected text region. Without extra supervision, the DSM can proactively learn to sample appropriate features for text recognition through the gradient returned by the recognition module. Extensive experiments on both challenging scene text and inverse-like scene text datasets demonstrate that our method achieves superior performance both on irregular and inverse-like text spotting.
Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education
Authors: Authors: Wei Hung Pan, Ming Jie Chok, Jonathan Leong Shan Wong, Yung Xin Shin, Yeong Shian Poon, Zhou Yang, Chun Yong Chong, David Lo, Mei Kuan Lim
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.03676
Pdf link: https://arxiv.org/pdf/2401.03676
Abstract Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Detectors. This is achieved by generating code in response to a given question using different variants. We collected a dataset comprising 5,069 samples, with each sample consisting of a textual description of a coding problem and its corresponding human-written Python solution codes. These samples were obtained from various sources, including 80 from Quescol, 3,264 from Kaggle, and 1,725 from LeetCode. From the dataset, we created 13 sets of code problem variant prompts, which were used to instruct ChatGPT to generate the outputs. Subsequently, we assessed the performance of five AIGC detectors. Our results demonstrate that existing AIGC Detectors perform poorly in distinguishing between human-written code and AI-generated code.
Overview of the 2023 ICON Shared Task on Gendered Abuse Detection in Indic Languages
Authors: Authors: Aatman Vaidya, Arnav Arora, Aditya Joshi, Tarunima Prabhakar
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03677
Pdf link: https://arxiv.org/pdf/2401.03677
Abstract This paper reports the findings of the ICON 2023 on Gendered Abuse Detection in Indic Languages. The shared task deals with the detection of gendered abuse in online text. The shared task was conducted as a part of ICON 2023, based on a novel dataset in Hindi, Tamil and the Indian dialect of English. The participants were given three subtasks with the train dataset consisting of approximately 6500 posts sourced from Twitter. For the test set, approximately 1200 posts were provided. The shared task received a total of 9 registrations. The best F-1 scores are 0.616 for subtask 1, 0.572 for subtask 2 and, 0.616 and 0.582 for subtask 3. The paper contains examples of hateful content owing to its topic.
From Data to Insights: A Comprehensive Survey on Advanced Applications in Thyroid Cancer Research
Authors: Authors: Xinyu Zhang, Vincent CS Lee, Feng Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03722
Pdf link: https://arxiv.org/pdf/2401.03722
Abstract Thyroid cancer, the most prevalent endocrine cancer, has gained significant global attention due to its impact on public health. Extensive research efforts have been dedicated to leveraging artificial intelligence (AI) methods for the early detection of this disease, aiming to reduce its morbidity rates. However, a comprehensive understanding of the structured organization of research applications in this particular field remains elusive. To address this knowledge gap, we conducted a systematic review and developed a comprehensive taxonomy of machine learning-based applications in thyroid cancer pathogenesis, diagnosis, and prognosis. Our primary objective was to facilitate the research community's ability to stay abreast of technological advancements and potentially lead the emerging trends in this field. This survey presents a coherent literature review framework for interpreting the advanced techniques used in thyroid cancer research. A total of 758 related studies were identified and scrutinized. To the best of our knowledge, this is the first review that provides an in-depth analysis of the various aspects of AI applications employed in the context of thyroid cancer. Furthermore, we highlight key challenges encountered in this domain and propose future research opportunities for those interested in studying the latest trends or exploring less-investigated aspects of thyroid cancer research. By presenting this comprehensive review and taxonomy, we contribute to the existing knowledge in the field, while providing valuable insights for researchers, clinicians, and stakeholders in advancing the understanding and management of this disease.
Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach
Authors: Authors: Huanyu Liu, Jianfeng Cai, Tingjia Zhang, Hongsheng Li, Siyuan Wang, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03742
Pdf link: https://arxiv.org/pdf/2401.03742
Abstract Flowcharts and mind maps, collectively known as flowmind, are vital in daily activities, with hand-drawn versions facilitating real-time collaboration. However, there's a growing need to digitize them for efficient processing. Automated conversion methods are essential to overcome manual conversion challenges. Existing sketch recognition methods face limitations in practical situations, being field-specific and lacking digital conversion steps. Our paper introduces the Flowmind2digital method and hdFlowmind dataset to address these challenges. Flowmind2digital, utilizing neural networks and keypoint detection, achieves a record 87.3% accuracy on our dataset, surpassing previous methods by 11.9%. The hdFlowmind dataset, comprising 1,776 annotated flowminds across 22 scenarios, outperforms existing datasets. Additionally, our experiments emphasize the importance of simple graphics, enhancing accuracy by 9.3%.
Flying Bird Object Detection Algorithm in Surveillance Video
Authors: Authors: Ziwei Sun, Zexi Hua, Hengchao Li, Yan Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03749
Pdf link: https://arxiv.org/pdf/2401.03749
Abstract Aiming at the characteristics of the flying bird object in surveillance video, such as the single frame image feature is not obvious, the size is small in most cases, and asymmetric, this paper proposes a Flying Bird Object Detection method for Surveillance Video (FBOD-SV). Firstly, a new feature aggregation module, the Correlation Attention Feature Aggregation (Co-Attention-FA) module, is designed to aggregate the features of the flying bird object according to the bird object's correlation on multiple consecutive frames of images. Secondly, a Flying Bird Object Detection Network (FBOD-Net) with down-sampling and then up-sampling is designed, which uses a large feature layer that fuses fine spatial information and large receptive field information to detect special multi-scale (mostly small-scale) bird objects. Finally, the SimOTA dynamic label allocation method is applied to One-Category object detection, and the SimOTA-OC dynamic label strategy is proposed to solve the difficult problem of label allocation caused by irregular flying bird objects. In this paper, the algorithm's performance is verified by the experimental data set of the surveillance video of the flying bird object of the traction substation. The experimental results show that the surveillance video flying bird object detection method proposed in this paper effectively improves the detection performance of flying bird objects.
MvKSR: Multi-view Knowledge-guided Scene Recovery for Hazy and Rainy Degradation
Authors: Authors: Dong Yang, Wenyu Xu, Yuxu Lu, Yuan Gao, Jingming Zhang, Yu Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.03800
Pdf link: https://arxiv.org/pdf/2401.03800
Abstract High-quality imaging is crucial for ensuring safety supervision and intelligent deployment in fields like transportation and industry. It enables precise and detailed monitoring of operations, facilitating timely detection of potential hazards and efficient management. However, adverse weather conditions, such as atmospheric haziness and precipitation, can have a significant impact on image quality. When the atmosphere contains dense haze or water droplets, the incident light scatters, leading to degraded captured images. This degradation is evident in the form of image blur and reduced contrast, increasing the likelihood of incorrect assessments and interpretations by intelligent imaging systems (IIS). To address the challenge of restoring degraded images in hazy and rainy conditions, this paper proposes a novel multi-view knowledge-guided scene recovery network (termed MvKSR). Specifically, guided filtering is performed on the degraded image to separate high/low-frequency components. Subsequently, an en-decoder-based multi-view feature coarse extraction module (MCE) is used to coarsely extract features from different views of the degraded image. The multi-view feature fine fusion module (MFF) will learn and infer the restoration of degraded images through mixed supervision under different views. Additionally, we suggest an atrous residual block to handle global restoration and local repair in hazy/rainy/mixed scenes. Extensive experimental results demonstrate that MvKSR outperforms other state-of-the-art methods in terms of efficiency and stability for restoring degraded scenarios in IIS.
WidthFormer: Toward Efficient Transformer-based BEV View Transformation
Authors: Authors: Chenhongyi Yang, Tianwei Lin, Lichao Huang, Elliot J. Crowley
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03836
Pdf link: https://arxiv.org/pdf/2401.03836
Abstract In this work, we present WidthFormer, a novel transformer-based Bird's-Eye-View (BEV) 3D detection method tailored for real-time autonomous-driving applications. WidthFormer is computationally efficient, robust and does not require any special engineering effort to deploy. In this work, we propose a novel 3D positional encoding mechanism capable of accurately encapsulating 3D geometric information, which enables our model to generate high-quality BEV representations with only a single transformer decoder layer. This mechanism is also beneficial for existing sparse 3D object detectors. Inspired by the recently-proposed works, we further improve our model's efficiency by vertically compressing the image features when serving as attention keys and values. We also introduce two modules to compensate for potential information loss due to feature compression. Experimental evaluation on the widely-used nuScenes 3D object detection benchmark demonstrates that our method outperforms previous approaches across different 3D detection architectures. More importantly, our model is highly efficient. For example, when using $256\times 704$ input images, it achieves 1.5 ms latency on NVIDIA 3090 GPU. Furthermore, WidthFormer also exhibits strong robustness to different degrees of camera perturbations. Our study offers valuable insights into the deployment of BEV transformation methods in real-world, complex road environments. Code is available at https://github.com/ChenhongyiYang/WidthFormer .
UFO: Unidentified Foreground Object Detection in 3D Point Cloud
Authors: Authors: Hyunjun Choi, Hawook Jeong, Jin Young Choi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03846
Pdf link: https://arxiv.org/pdf/2401.03846
Abstract In this paper, we raise a new issue on Unidentified Foreground Object (UFO) detection in 3D point clouds, which is a crucial technology in autonomous driving in the wild. UFO detection is challenging in that existing 3D object detectors encounter extremely hard challenges in both 3D localization and Out-of-Distribution (OOD) detection. To tackle these challenges, we suggest a new UFO detection framework including three tasks: evaluation protocol, methodology, and benchmark. The evaluation includes a new approach to measure the performance on our goal, i.e. both localization and OOD detection of UFOs. The methodology includes practical techniques to enhance the performance of our goal. The benchmark is composed of the KITTI Misc benchmark and our additional synthetic benchmark for modeling a more diverse range of UFOs. The proposed framework consistently enhances performance by a large margin across all four baseline detectors: SECOND, PointPillars, PV-RCNN, and PartA2, giving insight for future work on UFO detection in the wild.
Survey and Analysis of DNS Filtering Components
Authors: Authors: Jonathan Magnusson
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2401.03864
Pdf link: https://arxiv.org/pdf/2401.03864
Abstract The Domain Name System (DNS) comprises name servers translating domain names into, commonly, IP addresses. Authoritative name servers hosts the resource records (RR) for certain zones, and resolver name servers are responsible for querying and answering DNS queries on behalf of their clients. Unfortunately, cybercriminals often use DNS for malicious purposes, such as phishing, malware distribution, and botnet communication. To combat these threats, filtering resolvers have become increasingly popular, employing various techniques to identify and block malicious requests. In this paper, we survey several techniques to implement and enhance the capabilities of filtering resolvers including response policy zones, threat intelligence feeds, and detection of algorithmically generated domains. We identify the current trends of each area and find missing intersections in the literature, which could be used to improve the effectiveness of filtering resolvers. In addition, we propose future work designing a framework for filtering resolvers using state-of-the-art approaches identified in this study.
RoboFusion: Towards Robust Multi-Modal 3D obiect Detection via SAM
Authors: Authors: Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, Li Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03907
Pdf link: https://arxiv.org/pdf/2401.03907
Abstract Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). However, while achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. Meanwhile, with the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in autonomous driving. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for autonomous driving scenarios named SAM-AD. To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference. Lastly, we employ self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise. In summary, our RoboFusion gradually reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multi-modal 3D object detection. Consequently, our RoboFusion achieves state-of-the-art performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenes-C benchmarks.
TextMachina: Seamless Generation of Machine-Generated Text Datasets
Authors: Authors: Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.03946
Pdf link: https://arxiv.org/pdf/2401.03946
Abstract Recent advancements in Large Language Models (LLMs) have led to high-quality Machine-Generated Text (MGT), giving rise to countless new use cases and applications. However, easy access to LLMs is posing new challenges due to misuse. To address malicious usage, researchers have released datasets to effectively train models on MGT-related tasks. Similar strategies are used to compile these datasets, but no tool currently unifies them. In this scenario, we introduce TextMachina, a modular and extensible Python framework, designed to aid in the creation of high-quality, unbiased datasets to build robust models for MGT-related tasks such as detection, attribution, or boundary detection. It provides a user-friendly pipeline that abstracts away the inherent intricacies of building MGT datasets, such as LLM integrations, prompt templating, and bias mitigation. The quality of the datasets generated by TextMachina has been assessed in previous works, including shared tasks where more than one hundred teams trained robust MGT detectors.
MS-DETR: Efficient DETR Training with Mixed Supervision
Authors: Authors: Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03989
Pdf link: https://arxiv.org/pdf/2401.03989
Abstract DETR accomplishes end-to-end object detection through iteratively generating multiple object candidates based on image features and promoting one candidate for each ground-truth object. The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates. We aim at improving the DETR training efficiency by explicitly supervising the candidate generation procedure through mixing one-to-one supervision and one-to-many supervision. Our approach, namely MS-DETR, is simple, and places one-to-many supervision to the object queries of the primary decoder that is used for inference. In comparison to existing DETR variants with one-to-many supervision, such as Group DETR and Hybrid DETR, our approach does not need additional decoder branches or object queries. The object queries of the primary decoder in our approach directly benefit from one-to-many supervision and thus are superior in object candidate prediction. Experimental results show that our approach outperforms related DETR variants, such as DN-DETR, Hybrid DETR, and Group DETR, and the combination with related DETR variants further improves the performance.
Generative adversarial wavelet neural operator: Application to fault detection and isolation of multivariate time series data
Authors: Authors: Jyoti Rani, Tapas Tripura, Hariprasad Kodamana, Souvik Chakraborty
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.04004
Pdf link: https://arxiv.org/pdf/2401.04004
Abstract Fault detection and isolation in complex systems are critical to ensure reliable and efficient operation. However, traditional fault detection methods often struggle with issues such as nonlinearity and multivariate characteristics of the time series variables. This article proposes a generative adversarial wavelet neural operator (GAWNO) as a novel unsupervised deep learning approach for fault detection and isolation of multivariate time series processes.The GAWNO combines the strengths of wavelet neural operators and generative adversarial networks (GANs) to effectively capture both the temporal distributions and the spatial dependencies among different variables of an underlying system. The approach of fault detection and isolation using GAWNO consists of two main stages. In the first stage, the GAWNO is trained on a dataset of normal operating conditions to learn the underlying data distribution. In the second stage, a reconstruction error-based threshold approach using the trained GAWNO is employed to detect and isolate faults based on the discrepancy values. We validate the proposed approach using the Tennessee Eastman Process (TEP) dataset and Avedore wastewater treatment plant (WWTP) and N2O emissions named as WWTPN2O datasets. Overall, we showcase that the idea of harnessing the power of wavelet analysis, neural operators, and generative models in a single framework to detect and isolate faults has shown promising results compared to various well-established baselines in the literature.
Identifying Fabricated Networks within Authorship-for-Sale Enterprises
Authors: Authors: Simon J. Porter, Leslie D. McIntosh
Subjects: Digital Libraries (cs.DL)
Arxiv link: https://arxiv.org/abs/2401.04022
Pdf link: https://arxiv.org/pdf/2401.04022
Abstract Fabricated papers do not just need text, images, and data, they also require a fabricated or partially fabricated network of authors. Most authors' on a fabricated paper have not been associated with the research, but rather are added through a transaction. This lack of deeper connection means that there is a low likelihood that co-authors on fabricated papers will ever appear together on the same paper more than once. This paper constructs a model that encodes some of the key characteristics of this activity in anauthorship-for-sale' network with the aim to create a robust method to detect this type of activity. A characteristic network fingerprint arises from this model that provides a robust statistical approach to the detection of paper-mill networks. The model suggested in this paper detects networks that have a statistically significant overlap with other approaches that principally rely on textual analysis for the detection of fraudulent papers. Researchers connected to networks identified using the methodology outlined in this paper are shown to be connected with 37% of papers identified through the tortured-phrase and clay-feet methods deployed in the Problematic Paper Screener website. Finally, methods to limit the expansion and propagation of these networks is discussed both in technological and social terms.
Digital Twin for Autonomous Surface Vessels for Safe Maritime Navigation
Authors: Authors: Daniel Menges, Andreas Von Brandis, Adil Rasheed
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.04032
Pdf link: https://arxiv.org/pdf/2401.04032
Abstract Autonomous surface vessels (ASVs) play an increasingly important role in the safety and sustainability of open sea operations. Since most maritime accidents are related to human failure, intelligent algorithms for autonomous collision avoidance and path following can drastically reduce the risk in the maritime sector. A DT is a virtual representative of a real physical system and can enhance the situational awareness (SITAW) of such an ASV to generate optimal decisions. This work builds on an existing DT framework for ASVs and demonstrates foundations for enabling predictive, prescriptive, and autonomous capabilities. In this context, sophisticated target tracking approaches are crucial for estimating and predicting the position and motion of other dynamic objects. The applied tracking method is enabled by real-time automatic identification system (AIS) data and synthetic light detection and ranging (Lidar) measurements. To guarantee safety during autonomous operations, we applied a predictive safety filter, based on the concept of nonlinear model predictive control (NMPC). The approaches are implemented into a DT built with the Unity game engine. As a result, this work demonstrates the potential of a DT capable of making predictions, playing through various what-if scenarios, and providing optimal control decisions according to its enhanced SITAW.
Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Authors: Authors: Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.04105
Pdf link: https://arxiv.org/pdf/2401.04105
Abstract Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly memory-intensive for tasks with high-resolution data, e.g., video understanding, small object detection, and point cloud analysis. In this paper, we propose Dynamic Reversible Dual-Residual Networks, or Dr$^2$Net, a novel family of network architectures that acts as a surrogate network to finetune a pretrained model with substantially reduced memory consumption. Dr$^2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. Due to its reversibility, intermediate activations, which can be reconstructed from output, are cleared from memory during training. We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision. We evaluate Dr$^2$Net on various pretrained models and various tasks, and show that it can reach comparable performance to conventional finetuning but with significantly less memory usage.
Keyword: face recognition

CATFace: Cross-Attribute-Guided Transformer with Self-Attention Distillation for Low-Quality Face Recognition
Authors: Authors: Niloufar Alipour Talemi, Hossein Kashiani, Nasser M. Nasrabadi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03037
Pdf link: https://arxiv.org/pdf/2401.03037
Abstract Although face recognition (FR) has achieved great success in recent years, it is still challenging to accurately recognize faces in low-quality images due to the obscured facial details. Nevertheless, it is often feasible to make predictions about specific soft biometric (SB) attributes, such as gender, and baldness even in dealing with low-quality images. In this paper, we propose a novel multi-branch neural network that leverages SB attribute information to boost the performance of FR. To this end, we propose a cross-attribute-guided transformer fusion (CATF) module that effectively captures the long-range dependencies and relationships between FR and SB feature representations. The synergy created by the reciprocal flow of information in the dual cross-attention operations of the proposed CATF module enhances the performance of FR. Furthermore, we introduce a novel self-attention distillation framework that effectively highlights crucial facial regions, such as landmarks by aligning low-quality images with those of their high-quality counterparts in the feature space. The proposed self-attention distillation regularizes our network to learn a unified quality-invariant feature representation in unconstrained environments. We conduct extensive experiments on various FR benchmarks varying in quality. Experimental results demonstrate the superiority of our FR method compared to state-of-the-art FR studies.
Transferable Learned Image Compression-Resistant Adversarial Perturbations
Authors: Authors: Yang Sui, Zhuohang Li, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Zhenzhong Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.03115
Pdf link: https://arxiv.org/pdf/2401.03115
Abstract Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of DNN-based image compression. With the rapid evolution of advanced image compression, DNN-based learned image compression has emerged as the promising approach for transmitting images in many security-critical applications, such as cloud-based face recognition and autonomous driving, due to its superior performance over traditional compression. Therefore, there is a pressing need to fully investigate the robustness of a classification system post-processed by learned image compression. To bridge this research gap, we explore the adversarial attack on a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules. Furthermore, to enhance the transferability of perturbations across various quality levels and architectures of learned image compression models, we introduce a saliency score-based sampling method to enable the fast generation of transferable perturbation. Extensive experiments with popular attack methods demonstrate the enhanced transferability of our proposed method when attacking images that have been post-processed with different learned image compression models.
Keyword: augmentation

Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion
Authors: Authors: Shangyu Wu, Ying Xiong, Yufei Cui, Xue Liu, Buzhou Tang, Tei-Wei Kuo, Chun Jason Xue
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.02993
Pdf link: https://arxiv.org/pdf/2401.02993
Abstract Retrieval-based augmentations that aim to incorporate knowledge from an external database into language models have achieved great success in various knowledge-intensive (KI) tasks, such as question-answering and text generation. However, integrating retrievals in non-knowledge-intensive (NKI) tasks, such as text classification, is still challenging. Existing works focus on concatenating retrievals to inputs as context to form the prompt-based inputs. Unfortunately, such methods require language models to have the capability to handle long texts. Besides, inferring such concatenated data would also consume a significant amount of computational resources. To solve these challenges, we propose \textbf{ReFusion} in this paper, a computation-efficient \textbf{Re}trieval representation \textbf{Fusion} with neural architecture search. The main idea is to directly fuse the retrieval representations into the language models. Specifically, we first propose an online retrieval module that retrieves representations of similar sentences. Then, we present a retrieval fusion module including two effective ranking schemes, i.e., reranker-based scheme and ordered-mask-based scheme, to fuse the retrieval representations with hidden states. Furthermore, we use Neural Architecture Search (NAS) to seek the optimal fusion structure across different layers. Finally, we conduct comprehensive experiments, and the results demonstrate our ReFusion can achieve superior and robust performance on various NKI tasks.
Advancing DDoS Attack Detection: A Synergistic Approach Using Deep Residual Neural Networks and Synthetic Oversampling
Authors: Authors: Ali Alfatemi, Mohamed Rahouti, Ruhul Amin, Sarah ALJamal, Kaiqi Xiong, Yufeng Xin
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03116
Pdf link: https://arxiv.org/pdf/2401.03116
Abstract Distributed Denial of Service (DDoS) attacks pose a significant threat to the stability and reliability of online systems. Effective and early detection of such attacks is pivotal for safeguarding the integrity of networks. In this work, we introduce an enhanced approach for DDoS attack detection by leveraging the capabilities of Deep Residual Neural Networks (ResNets) coupled with synthetic oversampling techniques. Because of the inherent class imbalance in many cyber-security datasets, conventional methods often struggle with false negatives, misclassifying subtle DDoS patterns as benign. By applying the Synthetic Minority Over-sampling Technique (SMOTE) to the CICIDS dataset, we balance the representation of benign and malicious data points, enabling the model to better discern intricate patterns indicative of an attack. Our deep residual network, tailored for this specific task, further refines the detection process. Experimental results on a real-world dataset demonstrate that our approach achieves an accuracy of 99.98%, significantly outperforming traditional methods. This work underscores the potential of combining advanced data augmentation techniques with deep learning models to bolster cyber-security defenses.
Enhancing Context Through Contrast
Authors: Authors: Kshitij Ambilduke, Aneesh Shetye, Diksha Bagade, Rishika Bhagwatkar, Khurshed Fitter, Prasad Vagdargi, Shital Chiddarwar
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.03314
Pdf link: https://arxiv.org/pdf/2401.03314
Abstract Neural machine translation benefits from semantically rich representations. Considerable progress in learning such representations has been achieved by language modelling and mutual information maximization objectives using contrastive learning. The language-dependent nature of language modelling introduces a trade-off between the universality of the learned representations and the model's performance on the language modelling tasks. Although contrastive learning improves performance, its success cannot be attributed to mutual information alone. We propose a novel Context Enhancement step to improve performance on neural machine translation by maximizing mutual information using the Barlow Twins loss. Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations, eradicating the risk of disrupting semantic information. Further, our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings. Finally, we evaluate the language-agnosticism of our embeddings through language classification and use them for neural machine translation to compare with state-of-the-art approaches.
Classifying cow stall numbers using YOLO
Authors: Authors: Dheeraj Vajjarapu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03340
Pdf link: https://arxiv.org/pdf/2401.03340
Abstract This paper introduces the CowStallNumbers dataset, a collection of images extracted from videos focusing on cow teats, designed to advance the field of cow stall number detection. The dataset comprises 1042 training images and 261 test images, featuring stall numbers ranging from 0 to 60. To enhance the dataset, we performed fine-tuning on a YOLO model and applied data augmentation techniques, including random crop, center crop, and random rotation. The experimental outcomes demonstrate a notable 95.4\% accuracy in recognizing stall numbers.
Predicting the Skies: A Novel Model for Flight-Level Passenger Traffic Forecasting
Authors: Authors: Sian Ehsani, Elina Sergeeva, Wendy Murdy, Benjamin Fox
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2401.03397
Pdf link: https://arxiv.org/pdf/2401.03397
Abstract Accurate prediction of flight-level passenger traffic is of paramount importance in airline operations, influencing key decisions from pricing to route optimization. This study introduces a novel, multimodal deep learning approach to the challenge of predicting flight-level passenger traffic, yielding substantial accuracy improvements compared to traditional models. Leveraging an extensive dataset from American Airlines, our model ingests historical traffic data, fare closure information, and seasonality attributes specific to each flight. Our proposed neural network integrates the strengths of Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), exploiting the temporal patterns and spatial relationships within the data to enhance prediction performance. Crucial to the success of our model is a comprehensive data processing strategy. We construct 3D tensors to represent data, apply careful masking strategies to mirror real-world dynamics, and employ data augmentation techniques to enrich the diversity of our training set. The efficacy of our approach is borne out in the results: our model demonstrates an approximate 33\% improvement in Mean Squared Error (MSE) compared to traditional benchmarks. This study, therefore, highlights the significant potential of deep learning techniques and meticulous data processing in advancing the field of flight traffic prediction.
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
Authors: Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2401.03473
Pdf link: https://arxiv.org/pdf/2401.03473
Abstract To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively.
Unifying Graph Contrastive Learning via Graph Message Augmentation
Authors: Authors: Ziyan Zhang, Bo Jiang, Jin Tang, Bin Luo
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03638
Pdf link: https://arxiv.org/pdf/2401.03638
Abstract Graph contrastive learning is usually performed by first conducting Graph Data Augmentation (GDA) and then employing a contrastive learning pipeline to train GNNs. As we know that GDA is an important issue for graph contrastive learning. Various GDAs have been developed recently which mainly involve dropping or perturbing edges, nodes, node attributes and edge attributes. However, to our knowledge, it still lacks a universal and effective augmentor that is suitable for different types of graph data. To address this issue, in this paper, we first introduce the graph message representation of graph data. Based on it, we then propose a novel Graph Message Augmentation (GMA), a universal scheme for reformulating many existing GDAs. The proposed unified GMA not only gives a new perspective to understand many existing GDAs but also provides a universal and more effective graph data augmentation for graph self-supervised learning tasks. Moreover, GMA introduces an easy way to implement the mixup augmentor which is natural for images but usually challengeable for graphs. Based on the proposed GMA, we then propose a unified graph contrastive learning, termed Graph Message Contrastive Learning (GMCL), that employs attribution-guided universal GMA for graph contrastive learning. Experiments on many graph learning tasks demonstrate the effectiveness and benefits of the proposed GMA and GMCL approaches.
NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation
Authors: Authors: Casimir Feldmann, Niall Siegenheim, Nikolas Hars, Lovro Rabuzin, Mert Ertugrul, Luca Wolfart, Marc Pollefeys, Zuria Bauer, Martin R. Oswald
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.03771
Pdf link: https://arxiv.org/pdf/2401.03771
Abstract The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and demonstrate the benefits of our approach to model performance and robustness. Our data augmentation pipeline, which we call "NeRFmentation", trains NeRFs on each scene in the dataset, filters out subpar NeRFs based on relevant metrics, and uses them to generate synthetic RGB-D images captured from new viewing directions. In this work, we apply our technique in conjunction with three state-of-the-art MDE architectures on the popular autonomous driving dataset KITTI, augmenting its training set of the Eigen split. We evaluate the resulting performance gain on the original test set, a separate popular driving set, and our own synthetic test set.
Limitations of Data-Driven Spectral Reconstruction -- An Optics-Aware Analysis
Authors: Authors: Qiang Fu, Matheus Souza, Eunsue Choi, Suhyun Shin, Seung-Hwan Baek, Wolfgang Heidrich
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.03835
Pdf link: https://arxiv.org/pdf/2401.03835
Abstract Hyperspectral imaging empowers computer vision systems with the distinct capability of identifying materials through recording their spectral signatures. Recent efforts in data-driven spectral reconstruction aim at extracting spectral information from RGB images captured by cost-effective RGB cameras, instead of dedicated hardware. In this paper we systematically analyze the performance of such methods, evaluating both the practical limitations with respect to current datasets and overfitting, as well as fundamental limits with respect to the nature of the information encoded in the RGB images, and the dependency of this information on the optical system of the camera. We find that the current models are not robust under slight variations, e.g., in noise level or compression of the RGB file. Both the methods and the datasets are also limited in their ability to cope with metameric colors. This issue can in part be overcome with metameric data augmentation. Moreover, optical lens aberrations can help to improve the encoding of the metameric information into the RGB image, which paves the road towards higher performing spectral imaging and reconstruction approaches.
TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series
Authors: Authors: Vijay Ekambaram, Arindam Jati, Nam H. Nguyen, Pankaj Dayama, Chandra Reddy, Wesley M. Gifford, Jayant Kalagnanam
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.03955
Pdf link: https://arxiv.org/pdf/2401.03955
Abstract Large Pretrained models for Zero/Few-shot learning excel in language and vision domains but encounter challenges in multivariate time series (TS) due to the diverse nature and scarcity of publicly available pretraining data. Consequently, there has been a recent surge in utilizing pretrained large language models (LLMs) with various adaptations for time series forecasting. These approaches employ cross-domain transfer learning, yielding highly impressive results. However, these models are typically very large ($\sim$ billion parameters), exhibit slow execution, and do not consider cross-channel correlations. To address this, we present Multi-level Tiny Time Mixers (TTM), a significantly smaller model based on the lightweight TSMixer architecture. TTM marks the first success in developing tiny pretrained models ($\le$1 million parameters), exclusively trained on public TS data with effective transfer learning capabilities. To tackle the complexity of pretraining on multiple datasets with varied temporal resolutions, we introduce several novel enhancements such as adaptive patching, dataset augmentation via downsampling, and resolution prefix tuning. Moreover, we employ a multi-level modeling strategy to effectively model channel correlations and incorporate exogenous signals during finetuning, a crucial capability lacking in existing benchmarks. TTM excels in few/zero-shot forecasting, demonstrating significant accuracy gains (12-38%) over existing benchmarks. Further, it achieves a remarkable 14-106X reduction in model parameters, enabling 54-65X faster training/inference as compared to the LLM-TS benchmarks. In fact, TTM's zero-shot results often surpass the few-shot results in many benchmarks, highlighting the efficacy of our approach. Code and Pretrained Models will be open-sourced.

LeeKyungwook / get-arxiv-noti

New submissions for Tue, 9 Jan 24 #925

Keyword: detection

Forensic Video Analytic Software

Deep Anomaly Detection in Text

CANAMRF: An Attention-Based Model for Multimodal Depression Detection

Advancing DDoS Attack Detection: A Synergistic Approach Using Deep Residual Neural Networks and Synthetic Oversampling

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

Semi-supervised learning via DQN for log anomaly detection

Controllable Image Synthesis of Industrial Data Using Stable Diffusion

Learning Persistent Community Structures in Dynamic Networks via Topological Data Analysis

SecureReg: A Combined Framework for Proactively Exposing Malicious Domain Name Registrations

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

SeqNAS: Neural Architecture Search for Event Sequence Classification

Group Activity Recognition using Unreliable Tracked Pose

Real Time Human Detection by Unmanned Aerial Vehicles

Multi-View 3D Instance Segmentation of Structural Anomalies for Enhanced Structural Inspection of Concrete Bridges

CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital Twins

Spatiotemporally adaptive compression for scientific dataset with feature preservation -- a case study on simulation data with extreme climate events analysis

Attention and Autoencoder Hybrid Model for Unsupervised Online Anomaly Detection

Walnut Detection Through Deep Learning Enhanced by Multispectral Synthetic Images

3GPP Release 18 Wake-up Receiver: Feature Overview and Evaluations

Classifying cow stall numbers using YOLO

Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection

An Investigation of Large Language Models for Real-World Hate Speech Detection

Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks

Optimisation and Performance Computation of a Phase Frequency Detector Module for IoT Devices

Ensemble Defense System: A Hybrid IDS Approach for Effective Cyber Threat Detection

Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos

Detecting Anomalies in Blockchain Transactions using Machine Learning Classifiers and Explainability Analysis

SeTformer is What You Need for Vision and Language

Improving Transferability of Network Intrusion Detection in a Federated Learning Setup

Invisible Reflections: Leveraging Infrared Laser Reflections to Target Traffic Sign Perception

Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems

Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

Overview of the 2023 ICON Shared Task on Gendered Abuse Detection in Indic Languages

From Data to Insights: A Comprehensive Survey on Advanced Applications in Thyroid Cancer Research

Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach

Flying Bird Object Detection Algorithm in Surveillance Video

MvKSR: Multi-view Knowledge-guided Scene Recovery for Hazy and Rainy Degradation

WidthFormer: Toward Efficient Transformer-based BEV View Transformation

UFO: Unidentified Foreground Object Detection in 3D Point Cloud

Survey and Analysis of DNS Filtering Components

RoboFusion: Towards Robust Multi-Modal 3D obiect Detection via SAM

TextMachina: Seamless Generation of Machine-Generated Text Datasets

MS-DETR: Efficient DETR Training with Mixed Supervision

Generative adversarial wavelet neural operator: Application to fault detection and isolation of multivariate time series data

Identifying Fabricated Networks within Authorship-for-Sale Enterprises

Digital Twin for Autonomous Surface Vessels for Safe Maritime Navigation

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

Keyword: face recognition

CATFace: Cross-Attribute-Guided Transformer with Self-Attention Distillation for Low-Quality Face Recognition

Transferable Learned Image Compression-Resistant Adversarial Perturbations

Keyword: augmentation

Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion

Advancing DDoS Attack Detection: A Synergistic Approach Using Deep Residual Neural Networks and Synthetic Oversampling

Enhancing Context Through Contrast

Classifying cow stall numbers using YOLO

Predicting the Skies: A Novel Model for Flight-Level Passenger Traffic Forecasting

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

Unifying Graph Contrastive Learning via Graph Message Augmentation

NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

Limitations of Data-Driven Spectral Reconstruction -- An Optics-Aware Analysis

TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series