New submissions for Fri, 12 Apr 24

Keyword: detection

Global versus Local: Evaluating AlexNet Architectures for Tropical Cyclone Intensity Estimation

Authors: Authors: Vikas Dwivedi
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2404.07395
Pdf link: https://arxiv.org/pdf/2404.07395
Abstract Given the destructive impacts of tropical cyclones, it is critical to have a reliable system for cyclone intensity detection. Various techniques are available for this purpose, each with differing levels of accuracy. In this paper, we introduce two ensemble-based models based on AlexNet architecture to estimate tropical cyclone intensity using visible satellite images. The first model, trained on the entire dataset, is called the global AlexNet model. The second model is a distributed version of AlexNet in which multiple AlexNets are trained separately on subsets of the training data categorized according to the Saffir-Simpson wind speed scale prescribed by the meterologists. We evaluated the performance of both models against a deep learning benchmark model called \textit{Deepti} using a publicly available cyclone image dataset. Results indicate that both the global model (with a root mean square error (RMSE) of 9.03 knots) and the distributed model (with a RMSE of 9.3 knots) outperform the benchmark model (with a RMSE of 13.62 knots). We provide a thorough discussion of our solution approach, including an explanantion of the AlexNet's performance using gradient class activation maps (grad-CAM). Our proposed solution strategy allows future experimentation with various deep learning models in both single and multi-channel settings.
Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing
Authors: Authors: Jaemin Kang, Hoeseok Yang, Hyungshin Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07405
Pdf link: https://arxiv.org/pdf/2404.07405
Abstract Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency. However, highly accurate two-stage detectors still need further exploitation for acceleration. In this paper, we propose a model simplification method for two-stage object detectors. Instead of constructing a general feature pyramid, we utilize only one feature extraction in the two-stage detector. To compensate for the accuracy drop, we apply a high pass filter to the RPN's score map. Our approach is applicable to any two-stage detector using a feature pyramid network. In the experiments with state-of-the-art two-stage detectors such as ReDet, Oriented-RCNN, and LSKNet, our method reduced computation costs upto 61.2% with the accuracy loss within 2.1% on the DOTAv1.5 dataset. Source code will be released.
Unveiling Behavioral Transparency of Protocols Communicated by IoT Networked Assets (Full Version)
Authors: Authors: Savindu Wannigama (1), Arunan Sivanathan (2), Ayyoob Hamza (2), Hassan Habibi Gharakheili (2) ((1) Department of Computer Engineering, University of Peradeniya, Sri Lanka. (2) School of EE&T, UNSW Sydney, Australia.)
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.07408
Pdf link: https://arxiv.org/pdf/2404.07408
Abstract Behavioral transparency for Internet-of-Things (IoT) networked assets involves two distinct yet interconnected tasks: (a) characterizing device types by discerning the patterns exhibited in their network traffic, and (b) assessing vulnerabilities they introduce to the network. While identifying communication protocols, particularly at the application layer, plays a vital role in effective network management, current methods are, at best, ad-hoc. Accurate protocol identification and attribute extraction from packet payloads are crucial for distinguishing devices and discovering vulnerabilities. This paper makes three contributions: (1) We process a public dataset to construct specific packet traces pertinent to six standard protocols (TLS, HTTP, DNS, NTP, DHCP, and SSDP) of ten commercial IoT devices. We manually analyze TLS and HTTP flows, highlighting their characteristics, parameters, and adherence to best practices-we make our data publicly available; (2) We develop a common model to describe protocol signatures that help with the systematic analysis of protocols even when communicated through non-standard port numbers; and, (3) We evaluate the efficacy of our data models for the six protocols, which constitute approximately 97% of our dataset. Our data models, except for SSDP in 0.3% of Amazon Echo's flows, produce no false positives for protocol detection. We draw insights into how various IoT devices behave across those protocols by applying these models to our IoT traces.
Privacy preserving layer partitioning for Deep Neural Network models
Authors: Authors: Kishore Rajasekar, Randolph Loh, Kar Wai Fok, Vrizlynn L. L. Thing
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.07437
Pdf link: https://arxiv.org/pdf/2404.07437
Abstract MLaaS (Machine Learning as a Service) has become popular in the cloud computing domain, allowing users to leverage cloud resources for running private inference of ML models on their data. However, ensuring user input privacy and secure inference execution is essential. One of the approaches to protect data privacy and integrity is to use Trusted Execution Environments (TEEs) by enabling execution of programs in secure hardware enclave. Using TEEs can introduce significant performance overhead due to the additional layers of encryption, decryption, security and integrity checks. This can lead to slower inference times compared to running on unprotected hardware. In our work, we enhance the runtime performance of ML models by introducing layer partitioning technique and offloading computations to GPU. The technique comprises two distinct partitions: one executed within the TEE, and the other carried out using a GPU accelerator. Layer partitioning exposes intermediate feature maps in the clear which can lead to reconstruction attacks to recover the input. We conduct experiments to demonstrate the effectiveness of our approach in protecting against input reconstruction attacks developed using trained conditional Generative Adversarial Network(c-GAN). The evaluation is performed on widely used models such as VGG-16, ResNet-50, and EfficientNetB0, using two datasets: ImageNet for Image classification and TON IoT dataset for cybersecurity attack detection.
Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks
Authors: Authors: Xinxing Zhao, Kar Wai Fok, Vrizlynn L. L. Thing
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.07464
Pdf link: https://arxiv.org/pdf/2404.07464
Abstract Network intrusion detection systems (NIDS) play a pivotal role in safeguarding critical digital infrastructures against cyber threats. Machine learning-based detection models applied in NIDS are prevalent today. However, the effectiveness of these machine learning-based models is often limited by the evolving and sophisticated nature of intrusion techniques as well as the lack of diverse and updated training samples. In this research, a novel approach for enhancing the performance of an NIDS through the integration of Generative Adversarial Networks (GANs) is proposed. By harnessing the power of GANs in generating synthetic network traffic data that closely mimics real-world network behavior, we address a key challenge associated with NIDS training datasets, which is the data scarcity. Three distinct GAN models (Vanilla GAN, Wasserstein GAN and Conditional Tabular GAN) are implemented in this work to generate authentic network traffic patterns specifically tailored to represent the anomalous activity. We demonstrate how this synthetic data resampling technique can significantly improve the performance of the NIDS model for detecting such activity. By conducting comprehensive experiments using the CIC-IDS2017 benchmark dataset, augmented with GAN-generated data, we offer empirical evidence that shows the effectiveness of our proposed approach. Our findings show that the integration of GANs into NIDS can lead to enhancements in intrusion detection performance for attacks with limited training data, making it a promising avenue for bolstering the cybersecurity posture of organizations in an increasingly interconnected and vulnerable digital landscape.
Trashbusters: Deep Learning Approach for Litter Detection and Tracking
Authors: Authors: Kashish Jain, Manthan Juthani, Jash Jain, Anant V. Nimkar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07467
Pdf link: https://arxiv.org/pdf/2404.07467
Abstract The illegal disposal of trash is a major public health and environmental concern. Disposing of trash in unplanned places poses serious health and environmental risks. We should try to restrict public trash cans as much as possible. This research focuses on automating the penalization of litterbugs, addressing the persistent problem of littering in public places. Traditional approaches relying on manual intervention and witness reporting suffer from delays, inaccuracies, and anonymity issues. To overcome these challenges, this paper proposes a fully automated system that utilizes surveillance cameras and advanced computer vision algorithms for litter detection, object tracking, and face recognition. The system accurately identifies and tracks individuals engaged in littering activities, attaches their identities through face recognition, and enables efficient enforcement of anti-littering policies. By reducing reliance on manual intervention, minimizing human error, and providing prompt identification, the proposed system offers significant advantages in addressing littering incidents. The primary contribution of this research lies in the implementation of the proposed system, leveraging advanced technologies to enhance surveillance operations and automate the penalization of litterbugs.
DeVAIC: A Tool for Security Assessment of AI-generated Code
Authors: Authors: Domenico Cotroneo, Roberta De Luca, Pietro Liguori
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.07548
Pdf link: https://arxiv.org/pdf/2404.07548
Abstract Context: AI code generators are revolutionizing code writing and software development, but their training on large datasets, including potentially untrusted source code, raises security concerns. Furthermore, these generators can produce incomplete code snippets that are challenging to evaluate using current solutions. Objective: This research work introduces DeVAIC (Detection of Vulnerabilities in AI-generated Code), a tool to evaluate the security of AI-generated Python code, which overcomes the challenge of examining incomplete code. Method: We followed a methodological approach that involved gathering vulnerable samples, extracting implementation patterns, and creating regular expressions to develop the proposed tool. The implementation of DeVAIC includes a set of detection rules based on regular expressions that cover 35 Common Weakness Enumerations (CWEs) falling under the OWASP Top 10 vulnerability categories. Results: We utilized four popular AI models to generate Python code, which we then used as a foundation to evaluate the effectiveness of our tool. DeVAIC demonstrated a statistically significant difference in its ability to detect security vulnerabilities compared to the state-of-the-art solutions, showing an F1 Score and Accuracy of 94% while maintaining a low computational cost of 0.14 seconds per code snippet, on average. Conclusions: The proposed tool provides a lightweight and efficient solution for vulnerability detection even on incomplete code.
SFSORT: Scene Features-based Simple Online Real-Time Tracker
Authors: Authors: M. M. Morsali, Z. Sharifi, F. Fallah, S. Hashembeiki, H. Mohammadzade, S. Bagheri Shouraki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07553
Pdf link: https://arxiv.org/pdf/2404.07553
Abstract This paper introduces SFSORT, the world's fastest multi-object tracking system based on experiments conducted on MOT Challenge datasets. To achieve an accurate and computationally efficient tracker, this paper employs a tracking-by-detection method, following the online real-time tracking approach established in prior literature. By introducing a novel cost function called the Bounding Box Similarity Index, this work eliminates the Kalman Filter, leading to reduced computational requirements. Additionally, this paper demonstrates the impact of scene features on enhancing object-track association and improving track post-processing. Using a 2.2 GHz Intel Xeon CPU, the proposed method achieves an HOTA of 61.7\% with a processing speed of 2242 Hz on the MOT17 dataset and an HOTA of 60.9\% with a processing speed of 304 Hz on the MOT20 dataset. The tracker's source code, fine-tuned object detection model, and tutorials are available at \url{https://github.com/gitmehrdad/SFSORT}.
Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing
Authors: Authors: ZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin, Baoyuan Wu, Yue Lu
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.07572
Pdf link: https://arxiv.org/pdf/2404.07572
Abstract Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Authors: Authors: Jihao Liu, Jinliang Zheng, Yu Liu, Hongsheng Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07603
Pdf link: https://arxiv.org/pdf/2404.07603
Abstract This paper proposes a GeneraLIst encoder-Decoder (GLID) pre-training method for better handling various downstream computer vision tasks. While self-supervised pre-training approaches, e.g., Masked Autoencoder, have shown success in transfer learning, task-specific sub-architectures are still required to be appended for different downstream tasks, which cannot enjoy the benefits of large-scale pre-training. GLID overcomes this challenge by allowing the pre-trained generalist encoder-decoder to be fine-tuned on various vision tasks with minimal task-specific architecture modifications. In the GLID training scheme, pre-training pretext task and other downstream tasks are modeled as "query-to-answer" problems, including the pre-training pretext task and other downstream tasks. We pre-train a task-agnostic encoder-decoder with query-mask pairs. During fine-tuning, GLID maintains the pre-trained encoder-decoder and queries, only replacing the topmost linear transformation layer with task-specific linear heads. This minimizes the pretrain-finetune architecture inconsistency and enables the pre-trained model to better adapt to downstream tasks. GLID achieves competitive performance on various vision tasks, including object detection, image segmentation, pose estimation, and depth estimation, outperforming or matching specialist models such as Mask2Former, DETR, ViTPose, and BinsFormer.
Automatic Detection of Dark Ship-to-Ship Transfers using Deep Learning and Satellite Imagery
Authors: Authors: Ollie Ballinger
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07607
Pdf link: https://arxiv.org/pdf/2404.07607
Abstract Despite extensive research into ship detection via remote sensing, no studies identify ship-to-ship transfers in satellite imagery. Given the importance of transshipment in illicit shipping practices, this is a significant gap. In what follows, I train a convolutional neural network to accurately detect 4 different types of cargo vessel and two different types of Ship-to-Ship transfer in PlanetScope satellite imagery. I then elaborate a pipeline for the automatic detection of suspected illicit ship-to-ship transfers by cross-referencing satellite detections with vessel borne GPS data. Finally, I apply this method to the Kerch Strait between Ukraine and Russia to identify over 400 dark transshipment events since 2022.
Multi-Image Visual Question Answering for Unsupervised Anomaly Detection
Authors: Authors: Jun Li, Cosmin I. Bercea, Philip Müller, Lina Felsner, Suhwan Kim, Daniel Rueckert, Benedikt Wiestler, Julia A. Schnabel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.07622
Pdf link: https://arxiv.org/pdf/2404.07622
Abstract Unsupervised anomaly detection enables the identification of potential pathological areas by juxtaposing original images with their pseudo-healthy reconstructions generated by models trained exclusively on normal images. However, the clinical interpretation of resultant anomaly maps presents a challenge due to a lack of detailed, understandable explanations. Recent advancements in language models have shown the capability of mimicking human-like understanding and providing detailed descriptions. This raises an interesting question: \textit{How can language models be employed to make the anomaly maps more explainable?} To the best of our knowledge, we are the first to leverage a language model for unsupervised anomaly detection, for which we construct a dataset with different questions and answers. Additionally, we present a novel multi-image visual question answering framework tailored for anomaly detection, incorporating diverse feature fusion strategies to enhance visual knowledge extraction. Our experiments reveal that the framework, augmented by our new Knowledge Q-Former module, adeptly answers questions on the anomaly detection dataset. Besides, integrating anomaly maps as inputs distinctly aids in improving the detection of unseen pathologies.
2DLIW-SLAM:2D LiDAR-Inertial-Wheel Odometry with Real-Time Loop Closure
Authors: Authors: Bin Zhang, Zexin Peng, Bi Zeng, Junjie Lu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.07644
Pdf link: https://arxiv.org/pdf/2404.07644
Abstract Due to budgetary constraints, indoor navigation typically employs 2D LiDAR rather than 3D LiDAR. However, the utilization of 2D LiDAR in Simultaneous Localization And Mapping (SLAM) frequently encounters challenges related to motion degeneracy, particularly in geometrically similar environments. To address this problem, this paper proposes a robust, accurate, and multi-sensor-fused 2D LiDAR SLAM system specifically designed for indoor mobile robots. To commence, the original LiDAR data undergoes meticulous processing through point and line extraction. Leveraging the distinctive characteristics of indoor environments, line-line constraints are established to complement other sensor data effectively, thereby augmenting the overall robustness and precision of the system. Concurrently, a tightly-coupled front-end is created, integrating data from the 2D LiDAR, IMU, and wheel odometry, thus enabling real-time state estimation. Building upon this solid foundation, a novel global feature point matching-based loop closure detection algorithm is proposed. This algorithm proves highly effective in mitigating front-end accumulated errors and ultimately constructs a globally consistent map. The experimental results indicate that our system fully meets real-time requirements. When compared to Cartographer, our system not only exhibits lower trajectory errors but also demonstrates stronger robustness, particularly in degeneracy problem.
Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method
Authors: Authors: Tashmoy Ghosh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.07649
Pdf link: https://arxiv.org/pdf/2404.07649
Abstract In this paper we have present an improved Cycle GAN based model for under water image enhancement. We have utilized the cycle consistent learning technique of the state-of-the-art Cycle GAN model with modification in the loss function in terms of depth-oriented attention which enhance the contrast of the overall image, keeping global content, color, local texture, and style information intact. We trained the Cycle GAN model with the modified loss functions on the benchmarked Enhancing Underwater Visual Perception (EUPV) dataset a large dataset including paired and unpaired sets of underwater images (poor and good quality) taken with seven distinct cameras in a range of visibility situation during research on ocean exploration and human-robot cooperation. In addition, we perform qualitative and quantitative evaluation which supports the given technique applied and provided a better contrast enhancement model of underwater imagery. More significantly, the upgraded images provide better results from conventional models and further for under water navigation, pose estimation, saliency prediction, object detection and tracking. The results validate the appropriateness of the model for autonomous underwater vehicles (AUV) in visual navigation.
Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Authors: Authors: Poulami Sinhamahapatra, Franziska Schwaiger, Shirsha Bose, Huiyu Wang, Karsten Roscher, Stephan Guennemann
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.07664
Pdf link: https://arxiv.org/pdf/2404.07664
Abstract Detecting and localising unknown or Out-of-distribution (OOD) objects in any scene can be a challenging task in vision. Particularly, in safety-critical cases involving autonomous systems like automated vehicles or trains. Supervised anomaly segmentation or open-world object detection models depend on training on exhaustively annotated datasets for every domain and still struggle in distinguishing between background and OOD objects. In this work, we present a plug-and-play generalised framework - PRototype-based zero-shot OOD detection Without Labels (PROWL). It is an inference-based method that does not require training on the domain dataset and relies on extracting relevant features from self-supervised pre-trained models. PROWL can be easily adapted to detect OOD objects in any operational design domain by specifying a list of known classes from this domain. PROWL, as an unsupervised method, outperforms other supervised methods trained without auxiliary OOD data on the RoadAnomaly and RoadObstacle datasets provided in SegmentMeIfYouCan (SMIYC) benchmark. We also demonstrate its suitability for other domains such as rail and maritime scenes.
Dealing with Subject Similarity in Differential Morphing Attack Detection
Authors: Authors: Nicolò Di Domenico, Guido Borghi, Annalisa Franco, Davide Maltoni
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.07667
Pdf link: https://arxiv.org/pdf/2404.07667
Abstract The advent of morphing attacks has posed significant security concerns for automated Face Recognition systems, raising the pressing need for robust and effective Morphing Attack Detection (MAD) methods able to effectively address this issue. In this paper, we focus on Differential MAD (D-MAD), where a trusted live capture, usually representing the criminal, is compared with the document image to classify it as morphed or bona fide. We show these approaches based on identity features are effective when the morphed image and the live one are sufficiently diverse; unfortunately, the effectiveness is significantly reduced when the same approaches are applied to look-alike subjects or in all those cases when the similarity between the two compared images is high (e.g. comparison between the morphed image and the accomplice). Therefore, in this paper, we propose ACIdA, a modular D-MAD system, consisting of a module for the attempt type classification, and two modules for the identity and artifacts analysis on input images. Successfully addressing this task would allow broadening the D-MAD applications including, for instance, the document enrollment stage, which currently relies entirely on human evaluation, thus limiting the possibility of releasing ID documents with manipulated images, as well as the automated gates to detect both accomplices and criminals. An extensive cross-dataset experimental evaluation conducted on the introduced scenario shows that ACIdA achieves state-of-the-art results, outperforming literature competitors, while maintaining good performance in traditional D-MAD benchmarks.
Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns
Authors: Authors: Hakan Yekta Yatbaz, Mehrdad Dianati, Konstantinos Koufos, Roger Woodman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.07685
Pdf link: https://arxiv.org/pdf/2404.07685
Abstract Monitoring the integrity of object detection for errors within the perception module of automated driving systems (ADS) is paramount for ensuring safety. Despite recent advancements in deep neural network (DNN)-based object detectors, their susceptibility to detection errors, particularly in the less-explored realm of 3D object detection, remains a significant concern. State-of-the-art integrity monitoring (also known as introspection) mechanisms in 2D object detection mainly utilise the activation patterns in the final layer of the DNN-based detector's backbone. However, that may not sufficiently address the complexities and sparsity of data in 3D object detection. To this end, we conduct, in this article, an extensive investigation into the effects of activation patterns extracted from various layers of the backbone network for introspecting the operation of 3D object detectors. Through a comparative analysis using Kitti and NuScenes datasets with PointPillars and CenterPoint detectors, we demonstrate that using earlier layers' activation patterns enhances the error detection performance of the integrity monitoring system, yet increases computational complexity. To address the real-time operation requirements in ADS, we also introduce a novel introspection method that combines activation patterns from multiple layers of the detector's backbone and report its performance.
Chaos in Motion: Unveiling Robustness in Remote Heart Rate Measurement through Brain-Inspired Skin Tracking
Authors: Authors: Jie Wang, Jing Lian, Minjie Ma, Junqiang Lei, Chunbiao Li, Bin Li, Jizhao Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07687
Pdf link: https://arxiv.org/pdf/2404.07687
Abstract Heart rate is an important physiological indicator of human health status. Existing remote heart rate measurement methods typically involve facial detection followed by signal extraction from the region of interest (ROI). These SOTA methods have three serious problems: (a) inaccuracies even failures in detection caused by environmental influences or subject movement; (b) failures for special patients such as infants and burn victims; (c) privacy leakage issues resulting from collecting face video. To address these issues, we regard the remote heart rate measurement as the process of analyzing the spatiotemporal characteristics of the optical flow signal in the video. We apply chaos theory to computer vision tasks for the first time, thus designing a brain-inspired framework. Firstly, using an artificial primary visual cortex model to extract the skin in the videos, and then calculate heart rate by time-frequency analysis on all pixels. Our method achieves Robust Skin Tracking for Heart Rate measurement, called HR-RST. The experimental results show that HR-RST overcomes the difficulty of environmental influences and effectively tracks the subject movement. Moreover, the method could extend to other body parts. Consequently, the method can be applied to special patients and effectively protect individual privacy, offering an innovative solution.
SWI-FEED: Smart Water IoT Framework for Evaluation of Energy and Data in Massive Scenarios
Authors: Authors: Antonino Pagano, Domenico Garlisi, Fabrizio Giuliano, Tiziana Cattai, Francesca Cuomo
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.07692
Pdf link: https://arxiv.org/pdf/2404.07692
Abstract This paper presents a comprehensive framework designed to facilitate the widespread deployment of the Internet of Things (IoT) for enhanced monitoring and optimization of Water Distribution Systems (WDSs). The framework aims to investigate the utilization of massive IoT in monitoring and optimizing WDSs, with a particular focus on leakage detection, energy consumption and wireless network performance assessment in real-world water networks. The framework integrates simulation environments at both the application level (using EPANET) and the radio level (using NS-3) within the LoRaWAN network. The paper culminates with a practical use case, alongside evaluation results concerning power consumption in a large-scale LoRaWAN network and strategies for optimal gateway positioning.
OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities
Authors: Authors: Lasse H. Hansen, Simon B. Jensen, Mark P. Philipsen, Andreas Møgelmose, Lars Bodum, Thomas B. Moeslund
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07711
Pdf link: https://arxiv.org/pdf/2404.07711
Abstract Identifying and classifying underground utilities is an important task for efficient and effective urban planning and infrastructure maintenance. We present OpenTrench3D, a novel and comprehensive 3D Semantic Segmentation point cloud dataset, designed to advance research and development in underground utility surveying and mapping. OpenTrench3D covers a completely novel domain for public 3D point cloud datasets and is unique in its focus, scope, and cost-effective capturing method. The dataset consists of 310 point clouds collected across 7 distinct areas. These include 5 water utility areas and 2 district heating utility areas. The inclusion of different geographical areas and main utilities (water and district heating utilities) makes OpenTrench3D particularly valuable for inter-domain transfer learning experiments. We provide benchmark results for the dataset using three state-of-the-art semantic segmentation models, PointNeXt, PointVector and PointMetaBase. Benchmarks are conducted by training on data from water areas, fine-tuning on district heating area 1 and evaluating on district heating area 2. The dataset is publicly available. With OpenTrench3D, we seek to foster innovation and progress in the field of 3D semantic segmentation in applications related to detection and documentation of underground utilities as well as in transfer learning methods in general.
Point cloud obstacle detection with the map filtration
Authors: Authors: Lukas Kratochvila
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.07730
Pdf link: https://arxiv.org/pdf/2404.07730
Abstract Obstacle detection is one of the basic tasks of a robot movement in an unknown environment. The use of a LiDAR (Light Detection And Ranging) sensor allows one to obtain a point cloud in the vicinity of the sensor. After processing this data, obstacles can be found and recorded on a map. For this task, I present a pipeline capable of detecting obstacles even on a computationally limited device. The pipeline was also tested on a real robot and qualitatively evaluated on a dataset, which was collected in Brno University of Technology lab. Time consumption was recorded and compared with 3D object detectors.
Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification
Authors: Authors: Ricardo Pereira, Luís Garrote, Tiago Barros, Ana Lopes, Urbano J. Nunes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07739
Pdf link: https://arxiv.org/pdf/2404.07739
Abstract Indoor scenes are usually characterized by scattered objects and their relationships, which turns the indoor scene classification task into a challenging computer vision task. Despite the significant performance boost in classification tasks achieved in recent years, provided by the use of deep-learning-based methods, limitations such as inter-category ambiguity and intra-category variation have been holding back their performance. To overcome such issues, gathering semantic information has been shown to be a promising source of information towards a more complete and discriminative feature representation of indoor scenes. Therefore, the work described in this paper uses both semantic information, obtained from object detection, and semantic segmentation techniques. While object detection techniques provide the 2D location of objects allowing to obtain spatial distributions between objects, semantic segmentation techniques provide pixel-level information that allows to obtain, at a pixel-level, a spatial distribution and shape-related features of the segmentation categories. Hence, a novel approach that uses a semantic segmentation mask to provide Hu-moments-based segmentation categories' shape characterization, designated by Segmentation-based Hu-Moments Features (SHMFs), is proposed. Moreover, a three-main-branch network, designated by GOS$^2$F$^2$App, that exploits deep-learning-based global features, object-based features, and semantic segmentation-based features is also proposed. GOS$^2$F$^2$App was evaluated in two indoor scene benchmark datasets: SUN RGB-D and NYU Depth V2, where, to the best of our knowledge, state-of-the-art results were achieved on both datasets, which present evidences of the effectiveness of the proposed approach.
3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces
Authors: Authors: Xuanming Cao, Chengyu Tao, Juan Du
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.07748
Pdf link: https://arxiv.org/pdf/2404.07748
Abstract The surface quality inspection of manufacturing parts based on 3D point cloud data has attracted increasing attention in recent years. The reason is that the 3D point cloud can capture the entire surface of manufacturing parts, unlike the previous practices that focus on some key product characteristics. However, achieving accurate 3D anomaly detection is challenging, due to the complex surfaces of manufacturing parts and the difficulty of collecting sufficient anomaly samples. To address these challenges, we propose a novel untrained anomaly detection method based on 3D point cloud data for complex manufacturing parts, which can achieve accurate anomaly detection in a single sample without training data. In the proposed framework, we transform an input sample into two sets of profiles along different directions. Based on one set of the profiles, a novel segmentation module is devised to segment the complex surface into multiple basic and simple components. In each component, another set of profiles, which have the nature of similar shapes, can be modeled as a low-rank matrix. Thus, accurate 3D anomaly detection can be achieved by using Robust Principal Component Analysis (RPCA) on these low-rank matrices. Extensive numerical experiments on different types of parts show that our method achieves promising results compared with the benchmark methods.
Generating consistent PDDL domains with Large Language Models
Authors: Authors: Pavel Smirnov, Frank Joublin, Antonello Ceravola, Michael Gienger
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.07751
Pdf link: https://arxiv.org/pdf/2404.07751
Abstract Large Language Models (LLMs) are capable of transforming natural language domain descriptions into plausibly looking PDDL markup. However, ensuring that actions are consistent within domains still remains a challenging task. In this paper we present a novel concept to significantly improve the quality of LLM-generated PDDL models by performing automated consistency checking during the generation process. Although the proposed consistency checking strategies still can't guarantee absolute correctness of generated models, they can serve as valuable source of feedback reducing the amount of correction efforts expected from a human in the loop. We demonstrate the capabilities of our error detection approach on a number of classical and custom planning domains (logistics, gripper, tyreworld, household, pizza).
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Authors: Authors: Lifan Jiang, Zhihui Wang, Changmiao Wang, Ming Li, Jiaxu Leng, Xindong Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07773
Pdf link: https://arxiv.org/pdf/2404.07773
Abstract Object detection, a quintessential task in the realm of perceptual computing, can be tackled using a generative methodology. In the present study, we introduce a novel framework designed to articulate object detection as a denoising diffusion process, which operates on perturbed bounding boxes of annotated entities. This framework, termed ConsistencyDet, leverages an innovative denoising concept known as the Consistency Model. The hallmark of this model is its self-consistency feature, which empowers the model to map distorted information from any temporal stage back to its pristine state, thereby realizing a ``one-step denoising'' mechanism. Such an attribute markedly elevates the operational efficiency of the model, setting it apart from the conventional Diffusion Model. Throughout the training phase, ConsistencyDet initiates the diffusion sequence with noise-infused boxes derived from the ground-truth annotations and conditions the model to perform the denoising task. Subsequently, in the inference stage, the model employs a denoising sampling strategy that commences with bounding boxes randomly sampled from a normal distribution. Through iterative refinement, the model transforms an assortment of arbitrarily generated boxes into the definitive detections. Comprehensive evaluations employing standard benchmarks, such as MS-COCO and LVIS, corroborate that ConsistencyDet surpasses other leading-edge detectors in performance metrics.
AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation
Authors: Authors: Yansheng Li, Kun Li, Yongjun Zhang, Linlin Wang, Dingwen Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.07788
Pdf link: https://arxiv.org/pdf/2404.07788
Abstract Scene graph generation (SGG) aims to understand the visual objects and their semantic relationships from one given image. Until now, lots of SGG datasets with the eyelevel view are released but the SGG dataset with the overhead view is scarcely studied. By contrast to the object occlusion problem in the eyelevel view, which impedes the SGG, the overhead view provides a new perspective that helps to promote the SGG by providing a clear perception of the spatial relationships of objects in the ground scene. To fill in the gap of the overhead view dataset, this paper constructs and releases an aerial image urban scene graph generation (AUG) dataset. Images from the AUG dataset are captured with the low-attitude overhead view. In the AUG dataset, 25,594 objects, 16,970 relationships, and 27,175 attributes are manually annotated. To avoid the local context being overwhelmed in the complex aerial urban scene, this paper proposes one new locality-preserving graph convolutional network (LPG). Different from the traditional graph convolutional network, which has the natural advantage of capturing the global context for SGG, the convolutional layer in the LPG integrates the non-destructive initial features of the objects with dynamically updated neighborhood information to preserve the local context under the premise of mining the global context. To address the problem that there exists an extra-large number of potential object relationship pairs but only a small part of them is meaningful in AUG, we propose the adaptive bounding box scaling factor for potential relationship detection (ABS-PRD) to intelligently prune the meaningless relationship pairs. Extensive experiments on the AUG dataset show that our LPG can significantly outperform the state-of-the-art methods and the effectiveness of the proposed locality-preserving strategy.
Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation
Authors: Authors: Stephen Bothwell, Abigail Swenor, David Chiang
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.07792
Pdf link: https://arxiv.org/pdf/2404.07792
Abstract This paper describes submissions from the team Nostra Domina to the EvaLatin 2024 shared task of emotion polarity detection. Given the low-resource environment of Latin and the complexity of sentiment in rhetorical genres like poetry, we augmented the available data through automatic polarity annotation. We present two methods for doing so on the basis of the $k$-means algorithm, and we employ a variety of Latin large language models (LLMs) in a neural architecture to better capture the underlying contextual sentiment representations. Our best approach achieved the second highest macro-averaged Macro-$F_1$ score on the shared task's test set.
Illicit Promotion on Twitter
Authors: Authors: Hongyu Wang, Ying Li, Ronghong Huang, Xianghang Mi
Subjects: Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2404.07797
Pdf link: https://arxiv.org/pdf/2404.07797
Abstract In this paper, we present an extensive study of the promotion of illicit goods and services on Twitter, a popular online social network(OSN). This study is made possible through the design and implementation of multiple novel tools for detecting and analyzing illicit promotion activities as well as their underlying campaigns. As the results, we observe that illicit promotion is prevalent on Twitter, along with noticeable existence on other three popular OSNs including Youtube, Facebook, and TikTok. Particularly, 12 million distinct posts of illicit promotion (PIPs) have been observed on the Twitter platform, which are widely distributed in 5 major natural languages and 10 categories of illicit goods and services, e.g., drugs, data leakage, gambling, and weapon sales. What are also observed are 580K Twitter accounts publishing PIPs as well as 37K distinct instant messaging (IM) accounts that are embedded in PIPs and serve as next hops of communication, which strongly indicates that the campaigns underpinning PIPs are also of a large scale. Also, an arms race between Twitter and illicit promotion operators is also observed. On one hand, Twitter is observed to conduct content moderation in a continuous manner and almost 80% PIPs will get gradually unpublished within six months since posted. However, in the meantime, miscreants adopt various evasion tactics to masquerade their PIPs, which renders more than 90% PIPs keeping hidden from the detection radar for two months or longer.
Voice-Assisted Real-Time Traffic Sign Recognition System Using Convolutional Neural Network
Authors: Authors: Mayura Manawadu, Udaya Wijenayake
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07807
Pdf link: https://arxiv.org/pdf/2404.07807
Abstract Traffic signs are important in communicating information to drivers. Thus, comprehension of traffic signs is essential for road safety and ignorance may result in road accidents. Traffic sign detection has been a research spotlight over the past few decades. Real-time and accurate detections are the preliminaries of robust traffic sign detection system which is yet to be achieved. This study presents a voice-assisted real-time traffic sign recognition system which is capable of assisting drivers. This system functions under two subsystems. Initially, the detection and recognition of the traffic signs are carried out using a trained Convolutional Neural Network (CNN). After recognizing the specific traffic sign, it is narrated to the driver as a voice message using a text-to-speech engine. An efficient CNN model for a benchmark dataset is developed for real-time detection and recognition using Deep Learning techniques. The advantage of this system is that even if the driver misses a traffic sign, or does not look at the traffic sign, or is unable to comprehend the sign, the system detects it and narrates it to the driver. A system of this type is also important in the development of autonomous vehicles.
Sparse Laneformer
Authors: Authors: Ji Liu, Zifeng Zhang, Mingjie Lu, Hongyang Wei, Dong Li, Yile Xie, Jinzhang Peng, Lu Tian, Ashish Sirasao, Emad Barsoum
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07821
Pdf link: https://arxiv.org/pdf/2404.07821
Abstract Lane detection is a fundamental task in autonomous driving, and has achieved great progress as deep learning emerges. Previous anchor-based methods often design dense anchors, which highly depend on the training dataset and remain fixed during inference. We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse anchor mechanism. To this end, we generate sparse anchors with position-aware lane queries and angle queries instead of traditional explicit anchors. We adopt Horizontal Perceptual Attention (HPA) to aggregate the lane features along the horizontal direction, and adopt Lane-Angle Cross Attention (LACA) to perform interactions between lane queries and angle queries. We also propose Lane Perceptual Attention (LPA) based on deformable cross attention to further refine the lane predictions. Our method, named Sparse Laneformer, is easy-to-implement and end-to-end trainable. Extensive experiments demonstrate that Sparse Laneformer performs favorably against the state-of-the-art methods, e.g., surpassing Laneformer by 3.0% F1 score and O2SFormer by 0.7% F1 score with fewer MACs on CULane with the same ResNet-34 backbone.
Streaming detection of significant delay changes in public transport systems
Authors: Authors: Przemysław Wrona, Maciej Grzenda, Marcin Luckner
Subjects: Machine Learning (cs.LG); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2404.07860
Pdf link: https://arxiv.org/pdf/2404.07860
Abstract Public transport systems are expected to reduce pollution and contribute to sustainable development. However, disruptions in public transport such as delays may negatively affect mobility choices. To quantify delays, aggregated data from vehicle locations systems are frequently used. However, delays observed at individual stops are caused inter alia by fluctuations in running times and propagation of delays occurring in other locations. Hence, in this work, we propose both the method detecting significant delays and reference architecture, relying on stream processing engines, in which the method is implemented. The method can complement the calculation of delays defined as deviation from schedules. This provides both online rather than batch identification of significant and repetitive delays, and resilience to the limited quality of location data. The method we propose can be used with different change detectors, such as ADWIN, applied to location data stream shuffled to individual edges of a transport graph. It can detect in an online manner at which edges statistically significant delays are observed and at which edges delays arise and are reduced. Detections can be used to model mobility choices and quantify the impact of repetitive rather than random disruptions on feasible trips with multimodal trip modelling engines. The evaluation performed with the public transport data of over 2000 vehicles confirms the merits of the method and reveals that a limited-size subgraph of a transport system graph causes statistically significant delays
LeapFrog: The Rowhammer Instruction Skip Attack
Authors: Authors: Andrew Adiletta, Caner Tol, Berk Sunar
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2404.07878
Pdf link: https://arxiv.org/pdf/2404.07878
Abstract Since its inception, Rowhammer exploits have rapidly evolved into increasingly sophisticated threats not only compromising data integrity but also the control flow integrity of victim processes. Nevertheless, it remains a challenge for an attacker to identify vulnerable targets (i.e., Rowhammer gadgets), understand the outcome of the attempted fault, and formulate an attack that yields useful results. In this paper, we present a new type of Rowhammer gadget, called a LeapFrog gadget, which, when present in the victim code, allows an adversary to subvert code execution to bypass a critical piece of code (e.g., authentication check logic, encryption rounds, padding in security protocols). The Leapfrog gadget manifests when the victim code stores the Program Counter (PC) value in the user or kernel stack (e.g., a return address during a function call) which, when tampered with, re-positions the return address to a location that bypasses a security-critical code pattern. This research also presents a systematic process to identify Leapfrog gadgets. This methodology enables the automated detection of susceptible targets and the determination of optimal attack parameters. We first showcase this new attack vector through a practical demonstration on a TLS handshake client/server scenario, successfully inducing an instruction skip in a client application. We then demonstrate the attack on real-world code found in the wild, implementing an attack on OpenSSL. Our findings extend the impact of Rowhammer attacks on control flow and contribute to the development of more robust defenses against these increasingly sophisticated threats.
Context-aware Video Anomaly Detection in Long-Term Datasets
Authors: Authors: Zhengye Yang, Richard Radke
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07887
Pdf link: https://arxiv.org/pdf/2404.07887
Abstract Video anomaly detection research is generally evaluated on short, isolated benchmark videos only a few minutes long. However, in real-world environments, security cameras observe the same scene for months or years at a time, and the notion of anomalous behavior critically depends on context, such as the time of day, day of week, or schedule of events. Here, we propose a context-aware video anomaly detection algorithm, Trinity, specifically targeted to these scenarios. Trinity is especially well-suited to crowded scenes in which individuals cannot be easily tracked, and anomalies are due to speed, direction, or absence of group motion. Trinity is a contrastive learning framework that aims to learn alignments between context, appearance, and motion, and uses alignment quality to classify videos as normal or anomalous. We evaluate our algorithm on both conventional benchmarks and a public webcam-based dataset we collected that spans more than three months of activity.
Anomaly Detection in Power Grids via Context-Agnostic Learning
Authors: Authors: SangWoo Park, Amritanshu Pandey
Subjects: Machine Learning (cs.LG); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2404.07898
Pdf link: https://arxiv.org/pdf/2404.07898
Abstract An important tool grid operators use to safeguard against failures, whether naturally occurring or malicious, involves detecting anomalies in the power system SCADA data. In this paper, we aim to solve a real-time anomaly detection problem. Given time-series measurement values coming from a fixed set of sensors on the grid, can we identify anomalies in the network topology or measurement data? Existing methods, primarily optimization-based, mostly use only a single snapshot of the measurement values and do not scale well with the network size. Recent data-driven ML techniques have shown promise by using a combination of current and historical data for anomaly detection but generally do not consider physical attributes like the impact of topology or load/generation changes on sensor measurements and thus cannot accommodate regular context-variability in the historical data. To address this gap, we propose a novel context-aware anomaly detection algorithm, GridCAL, that considers the effect of regular topology and load/generation changes. This algorithm converts the real-time power flow measurements to context-agnostic values, which allows us to analyze measurement coming from different grid contexts in an aggregate fashion, enabling us to derive a unified statistical model that becomes the basis of anomaly detection. Through numerical simulations on networks up to 2383 nodes, we show that our approach is accurate, outperforming state-of-the-art approaches, and is computationally efficient.
Triple Component Matrix Factorization: Untangling Global, Local, and Noisy Components
Authors: Authors: Naichen Shi, Salar Fattahi, Raed Al Kontar
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2404.07955
Pdf link: https://arxiv.org/pdf/2404.07955
Abstract In this work, we study the problem of common and unique feature extraction from noisy data. When we have N observation matrices from N different and associated sources corrupted by sparse and potentially gross noise, can we recover the common and unique components from these noisy observations? This is a challenging task as the number of parameters to estimate is approximately thrice the number of observations. Despite the difficulty, we propose an intuitive alternating minimization algorithm called triple component matrix factorization (TCMF) to recover the three components exactly. TCMF is distinguished from existing works in literature thanks to two salient features. First, TCMF is a principled method to separate the three components given noisy observations provably. Second, the bulk of the computation in TCMF can be distributed. On the technical side, we formulate the problem as a constrained nonconvex nonsmooth optimization problem. Despite the intricate nature of the problem, we provide a Taylor series characterization of its solution by solving the corresponding Karush-Kuhn-Tucker conditions. Using this characterization, we can show that the alternating minimization algorithm makes significant progress at each iteration and converges into the ground truth at a linear rate. Numerical experiments in video segmentation and anomaly detection highlight the superior feature extraction abilities of TCMF.
AD-NEv++ : The multi-architecture neuroevolution-based multivariate anomaly detection framework
Authors: Authors: Marcin Pietroń, Dominik Żurek, Kamil Faber, Roberto Corizzo
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.07968
Pdf link: https://arxiv.org/pdf/2404.07968
Abstract Anomaly detection tools and methods enable key analytical capabilities in modern cyberphysical and sensor-based systems. Despite the fast-paced development in deep learning architectures for anomaly detection, model optimization for a given dataset is a cumbersome and time-consuming process. Neuroevolution could be an effective and efficient solution to this problem, as a fully automated search method for learning optimal neural networks, supporting both gradient and non-gradient fine tuning. However, existing frameworks incorporating neuroevolution lack of support for new layers and architectures and are typically limited to convolutional and LSTM layers. In this paper we propose AD-NEv++, a three-stage neuroevolution-based method that synergically combines subspace evolution, model evolution, and fine-tuning. Our method overcomes the limitations of existing approaches by optimizing the mutation operator in the neuroevolution process, while supporting a wide spectrum of neural layers, including attention, dense, and graph convolutional layers. Our extensive experimental evaluation was conducted with widely adopted multivariate anomaly detection benchmark datasets, and showed that the models generated by AD-NEv++ outperform well-known deep learning architectures and neuroevolution-based approaches for anomaly detection. Moreover, results show that AD-NEv++ can improve and outperform the state-of-the-art GNN (Graph Neural Networks) model architecture in all anomaly detection benchmarks.
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning
Authors: Authors: Simon Schrodi, David T. Hoffmann, Max Argus, Volker Fischer, Thomas Brox
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.07983
Pdf link: https://arxiv.org/pdf/2404.07983
Abstract Contrastive vision-language models like CLIP have gained popularity for their versatile applicable learned representations in various downstream tasks. Despite their successes in some tasks, like zero-shot image recognition, they also perform surprisingly poor on other tasks, like attribute detection. Previous work has attributed these challenges to the modality gap, a separation of image and text in the shared representation space, and a bias towards objects over other factors, such as attributes. In this work we investigate both phenomena. We find that only a few embedding dimensions drive the modality gap. Further, we propose a measure for object bias and find that object bias does not lead to worse performance on other concepts, such as attributes. But what leads to the emergence of the modality gap and object bias? To answer this question we carefully designed an experimental setting which allows us to control the amount of shared information between the modalities. This revealed that the driving factor behind both, the modality gap and the object bias, is the information imbalance between images and captions.
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Authors: Authors: Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, Nicu Sebe
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.07990
Pdf link: https://arxiv.org/pdf/2404.07990
Abstract Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set. OpenBias has three stages. In the first phase, we leverage a Large Language Model (LLM) to propose biases given a set of captions. Secondly, the target generative model produces images using the same set of captions. Lastly, a Vision Question Answering model recognizes the presence and extent of the previously proposed biases. We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before. Via quantitative experiments, we demonstrate that OpenBias agrees with current closed-set bias detection methods and human judgement.
Keyword: face recognition

Trashbusters: Deep Learning Approach for Litter Detection and Tracking
Authors: Authors: Kashish Jain, Manthan Juthani, Jash Jain, Anant V. Nimkar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07467
Pdf link: https://arxiv.org/pdf/2404.07467
Abstract The illegal disposal of trash is a major public health and environmental concern. Disposing of trash in unplanned places poses serious health and environmental risks. We should try to restrict public trash cans as much as possible. This research focuses on automating the penalization of litterbugs, addressing the persistent problem of littering in public places. Traditional approaches relying on manual intervention and witness reporting suffer from delays, inaccuracies, and anonymity issues. To overcome these challenges, this paper proposes a fully automated system that utilizes surveillance cameras and advanced computer vision algorithms for litter detection, object tracking, and face recognition. The system accurately identifies and tracks individuals engaged in littering activities, attaches their identities through face recognition, and enables efficient enforcement of anti-littering policies. By reducing reliance on manual intervention, minimizing human error, and providing prompt identification, the proposed system offers significant advantages in addressing littering incidents. The primary contribution of this research lies in the implementation of the proposed system, leveraging advanced technologies to enhance surveillance operations and automate the penalization of litterbugs.
Dealing with Subject Similarity in Differential Morphing Attack Detection
Authors: Authors: Nicolò Di Domenico, Guido Borghi, Annalisa Franco, Davide Maltoni
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.07667
Pdf link: https://arxiv.org/pdf/2404.07667
Abstract The advent of morphing attacks has posed significant security concerns for automated Face Recognition systems, raising the pressing need for robust and effective Morphing Attack Detection (MAD) methods able to effectively address this issue. In this paper, we focus on Differential MAD (D-MAD), where a trusted live capture, usually representing the criminal, is compared with the document image to classify it as morphed or bona fide. We show these approaches based on identity features are effective when the morphed image and the live one are sufficiently diverse; unfortunately, the effectiveness is significantly reduced when the same approaches are applied to look-alike subjects or in all those cases when the similarity between the two compared images is high (e.g. comparison between the morphed image and the accomplice). Therefore, in this paper, we propose ACIdA, a modular D-MAD system, consisting of a module for the attempt type classification, and two modules for the identity and artifacts analysis on input images. Successfully addressing this task would allow broadening the D-MAD applications including, for instance, the document enrollment stage, which currently relies entirely on human evaluation, thus limiting the possibility of releasing ID documents with manipulated images, as well as the automated gates to detect both accomplices and criminals. An extensive cross-dataset experimental evaluation conducted on the introduced scenario shows that ACIdA achieves state-of-the-art results, outperforming literature competitors, while maintaining good performance in traditional D-MAD benchmarks.
Keyword: augmentation

AI-Guided Defect Detection Techniques to Model Single Crystal Diamond Growth
Authors: Authors: Rohan Reddy Mekala, Elias Garratt, Matthias Muehle, Arjun Srinivasan, Adam Porter, Mikael Lindvall
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.07306
Pdf link: https://arxiv.org/pdf/2404.07306
Abstract From a process development perspective, diamond growth via chemical vapor deposition has made significant strides. However, challenges persist in achieving high quality and large-area material production. These difficulties include controlling conditions to maintain uniform growth rates for the entire growth surface. As growth progresses, various factors or defect states emerge, altering the uniform conditions. These changes affect the growth rate and result in the formation of crystalline defects at the microscale. However, there is a distinct lack of methods to identify these defect states and their geometry using images taken during the growth process. This paper details seminal work on defect segmentation pipeline using in-situ optical images to identify features that indicate defective states that are visible at the macroscale. Using a semantic segmentation approach as applied in our previous work, these defect states and corresponding derivative features are isolated and classified by their pixel masks. Using an annotation focused human-in-the-loop software architecture to produce training datasets, with modules for selective data labeling using active learning, data augmentations, and model-assisted labeling, our approach achieves effective annotation accuracy and drastically reduces the time and cost of labeling by orders of magnitude. On the model development front, we found that deep learning-based algorithms are the most efficient. They can accurately learn complex representations from feature-rich datasets. Our best-performing model, based on the YOLOV3 and DeeplabV3plus architectures, achieved excellent accuracy for specific features of interest. Specifically, it reached 93.35% accuracy for center defects, 92.83% for polycrystalline defects, and 91.98% for edge defects.
GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data
Authors: Authors: Daniel Platnick, Sourena Khanzadeh, Alireza Sadeghian, Richard Anthony Valenzano
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07356
Pdf link: https://arxiv.org/pdf/2404.07356
Abstract Microplastic particle ingestion or inhalation by humans is a problem of growing concern. Unfortunately, current research methods that use machine learning to understand their potential harms are obstructed by a lack of available data. Deep learning techniques in particular are challenged by such domains where only small or imbalanced data sets are available. Overcoming this challenge often involves oversampling underrepresented classes or augmenting the existing data to improve model performance. This paper proposes GANsemble: a two-module framework connecting data augmentation with conditional generative adversarial networks (cGANs) to generate class-conditioned synthetic data. First, the data chooser module automates augmentation strategy selection by searching for the best data augmentation strategy. Next, the cGAN module uses this strategy to train a cGAN for generating enhanced synthetic data. We experiment with the GANsemble framework on a small and imbalanced microplastics data set. A Microplastic-cGAN (MPcGAN) algorithm is introduced, and baselines for synthetic microplastics (SYMP) data are established in terms of Frechet Inception Distance (FID) and Inception Scores (IS). We also provide a synthetic microplastics filter (SYMP-Filter) algorithm to increase the quality of generated SYMP. Additionally, we show the best amount of oversampling with augmentation to fix class imbalance in small microplastics data sets. To our knowledge, this study is the first application of generative AI to synthetically create microplastics data.
Leveraging Data Augmentation for Process Information Extraction
Authors: Authors: Julian Neuberger, Leonie Doll, Benedict Engelmann, Lars Ackermann, Stefan Jablonski
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.07501
Pdf link: https://arxiv.org/pdf/2404.07501
Abstract Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation of process models from readily available data. These include process mining on event logs, and generating business process models from natural language texts. Research in the latter field is regularly faced with the problem of limited data availability, hindering both evaluation and development of new techniques, especially learning-based ones. To overcome this data scarcity issue, in this paper we investigate the application of data augmentation for natural language text data. Data augmentation methods are well established in machine learning for creating new, synthetic data without human assistance. We find that many of these methods are applicable to the task of business process information extraction, improving the accuracy of extraction. Our study shows, that data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text, where currently mostly rule-based systems are still state of the art. Simple data augmentation techniques improved the $F_1$ score of mention extraction by 2.9 percentage points, and the $F_1$ of relation extraction by $4.5$. To better understand how data augmentation alters human annotated texts, we analyze the resulting text, visualizing and discussing the properties of augmented textual data. We make all code and experiments results publicly available.
Generalization Gap in Data Augmentation: Insights from Illumination
Authors: Authors: Jianqiang Xiao, Weiwen Guo, Junfeng Liu, Mengze Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.07514
Pdf link: https://arxiv.org/pdf/2404.07514
Abstract In the field of computer vision, data augmentation is widely used to enrich the feature complexity of training datasets with deep learning techniques. However, regarding the generalization capabilities of models, the difference in artificial features generated by data augmentation and natural visual features has not been fully revealed. This study focuses on the visual representation variable 'illumination', by simulating its distribution degradation and examining how data augmentation techniques enhance model performance on a classification task. Our goal is to investigate the differences in generalization between models trained with augmented data and those trained under real-world illumination conditions. Results indicate that after undergoing various data augmentation methods, model performance has been significantly improved. Yet, a noticeable generalization gap still exists after utilizing various data augmentation methods, emphasizing the critical role of feature diversity in the training set for enhancing model generalization.
AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports
Authors: Authors: Lukas Lange, Marc Müller, Ghazaleh Haratinezhad Torbati, Dragan Milchevski, Patrick Grau, Subhash Pujari, Annemarie Friedrich
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.07765
Pdf link: https://arxiv.org/pdf/2404.07765
Abstract Monitoring the threat landscape to be aware of actual or potential attacks is of utmost importance to cybersecurity professionals. Information about cyber threats is typically distributed using natural language reports. Natural language processing can help with managing this large amount of unstructured information, yet to date, the topic has received little attention. With this paper, we present AnnoCTR, a new CC-BY-SA-licensed dataset of cyber threat reports. The reports have been annotated by a domain expert with named entities, temporal expressions, and cybersecurity-specific concepts including implicitly mentioned techniques and tactics. Entities and concepts are linked to Wikipedia and the MITRE ATT&CK knowledge base, the most widely-used taxonomy for classifying types of attacks. Prior datasets linking to MITRE ATT&CK either provide a single label per document or annotate sentences out-of-context; our dataset annotates entire documents in a much finer-grained way. In an experimental study, we model the annotations of our dataset using state-of-the-art neural models. In our few-shot scenario, we find that for identifying the MITRE ATT&CK concepts that are mentioned explicitly or implicitly in a text, concept descriptions from MITRE ATT&CK are an effective source for training data augmentation.
MindBridge: A Cross-Subject Brain Decoding Framework
Authors: Authors: Shizun Wang, Songhua Liu, Zhenxiong Tan, Xinchao Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.07850
Pdf link: https://arxiv.org/pdf/2404.07850
Abstract Brain decoding, a pivotal field in neuroscience, aims to reconstruct stimuli from acquired brain signals, primarily utilizing functional magnetic resonance imaging (fMRI). Currently, brain decoding is confined to a per-subject-per-model paradigm, limiting its applicability to the same individual for whom the decoding model is trained. This constraint stems from three key challenges: 1) the inherent variability in input dimensions across subjects due to differences in brain size; 2) the unique intrinsic neural patterns, influencing how different individuals perceive and process sensory information; 3) limited data availability for new subjects in real-world scenarios hampers the performance of decoding models. In this paper, we present a novel approach, MindBridge, that achieves cross-subject brain decoding by employing only one model. Our proposed framework establishes a generic paradigm capable of addressing these challenges by introducing biological-inspired aggregation function and novel cyclic fMRI reconstruction mechanism for subject-invariant representation learning. Notably, by cycle reconstruction of fMRI, MindBridge can enable novel fMRI synthesis, which also can serve as pseudo data augmentation. Within the framework, we also devise a novel reset-tuning method for adapting a pretrained model to a new subject. Experimental results demonstrate MindBridge's ability to reconstruct images for multiple subjects, which is competitive with dedicated subject-specific models. Furthermore, with limited data for a new subject, we achieve a high level of decoding accuracy, surpassing that of subject-specific models. This advancement in cross-subject brain decoding suggests promising directions for wider applications in neuroscience and indicates potential for more efficient utilization of limited fMRI data in real-world scenarios. Project page: https://littlepure2333.github.io/MindBridge

LeeKyungwook / get-arxiv-noti

New submissions for Fri, 12 Apr 24 #1061

Keyword: detection

Global versus Local: Evaluating AlexNet Architectures for Tropical Cyclone Intensity Estimation

Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing

Unveiling Behavioral Transparency of Protocols Communicated by IoT Networked Assets (Full Version)

Privacy preserving layer partitioning for Deep Neural Network models

Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks

Trashbusters: Deep Learning Approach for Litter Detection and Tracking

DeVAIC: A Tool for Security Assessment of AI-generated Code

SFSORT: Scene Features-based Simple Online Real-Time Tracker

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

Automatic Detection of Dark Ship-to-Ship Transfers using Deep Learning and Satellite Imagery

Multi-Image Visual Question Answering for Unsupervised Anomaly Detection

2DLIW-SLAM:2D LiDAR-Inertial-Wheel Odometry with Real-Time Loop Closure

Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method

Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes

Dealing with Subject Similarity in Differential Morphing Attack Detection

Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns

Chaos in Motion: Unveiling Robustness in Remote Heart Rate Measurement through Brain-Inspired Skin Tracking

SWI-FEED: Smart Water IoT Framework for Evaluation of Energy and Data in Massive Scenarios

OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities

Point cloud obstacle detection with the map filtration

Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification

3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces

Generating consistent PDDL domains with Large Language Models

ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model

AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation

Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation

Illicit Promotion on Twitter

Voice-Assisted Real-Time Traffic Sign Recognition System Using Convolutional Neural Network

Sparse Laneformer

Streaming detection of significant delay changes in public transport systems

LeapFrog: The Rowhammer Instruction Skip Attack

Context-aware Video Anomaly Detection in Long-Term Datasets

Anomaly Detection in Power Grids via Context-Agnostic Learning

Triple Component Matrix Factorization: Untangling Global, Local, and Noisy Components

AD-NEv++ : The multi-architecture neuroevolution-based multivariate anomaly detection framework

Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Keyword: face recognition

Trashbusters: Deep Learning Approach for Litter Detection and Tracking

Dealing with Subject Similarity in Differential Morphing Attack Detection

Keyword: augmentation

AI-Guided Defect Detection Techniques to Model Single Crystal Diamond Growth

GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data

Leveraging Data Augmentation for Process Information Extraction

Generalization Gap in Data Augmentation: Insights from Illumination

AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports

MindBridge: A Cross-Subject Brain Decoding Framework