New submissions for Wednesday, 24 July 2024 (showing 362 of 362 entries )

Keyword: detection

Title:

      BSH for Collision Detection in Point Cloud models

Authors: Mauro Figueiredo, João Pereira, João Oliveira, Bruno Araujo
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Point cloud models are a common shape representation for several reasons. Three-dimensional scanning devices are widely used nowadays and points are an attractive primitive for rendering complex geometry. Nevertheless, there is not much literature on collision detection for point cloud models. This paper presents a novel collision detection algorithm for large point cloud models using voxels, octrees and bounding spheres hierarchies (BSH). The scene graph is divided in voxels. The objects of each voxel are organized into an octree. Due to the high number of points in the scene, each non-empty cell of the octree is organized in a bounding sphere hierarchy, based on an R-tree hierarchy like structure. The BSH hierarchies are used to group neighboring points and filter out very quickly parts of objects that do not interact with other models. Points derived from laser scanned data typically are not segmented and can have arbitrary spatial resolution thus introducing computational and modeling issues. We address these issues and our results show that the proposed collision detection algorithm effectively finds intersections between point cloud models since it is able to reduce the number of bounding volume checks and updates.
Title:
```
  Decentralized Federated Anomaly Detection in Smart Grids: A P2P Gossip Approach
```
Authors: Muhammad Akbar Husnoo, Adnan Anwar, Md Enamul Haque, A. N. Mahmood
Subjects: Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The increasing security and privacy concerns in the Smart Grid sector have led to a significant demand for robust intrusion detection systems within critical smart grid infrastructure. To address the challenges posed by privacy preservation and decentralized power system zones with distinct data ownership, Federated Learning (FL) has emerged as a promising privacy-preserving solution which facilitates collaborative training of attack detection models without necessitating the sharing of raw data. However, FL presents several implementation limitations in the power system domain due to its heavy reliance on a centralized aggregator and the risks of privacy leakage during model update transmission. To overcome these technical bottlenecks, this paper introduces a novel decentralized federated anomaly detection scheme based on two main gossip protocols namely Random Walk and Epidemic. Our findings indicate that the Random Walk protocol exhibits superior performance compared to the Epidemic protocol, highlighting its efficacy in decentralized federated learning environments. Experimental validation of the proposed framework utilizing publicly available industrial control systems datasets demonstrates superior attack detection accuracy while safeguarding data confidentiality and mitigating the impact of communication latency and stragglers. Furthermore, our approach yields a notable 35% improvement in training time compared to conventional FL, underscoring the efficacy and robustness of our decentralized learning method.
Title:
```
  Memory Management for Real-Time Appearance-Based Loop Closure Detection
```
Authors: Mathieu Labbé, François Michaud
Subjects: Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Loop closure detection is the process involved when trying to find a match between the current and a previously visited locations in SLAM. Over time, the amount of time required to process new observations increases with the size of the internal map, which may influence real-time processing. In this paper, we present a novel real-time loop closure detection approach for large-scale and long-term SLAM. Our approach is based on a memory management method that keeps computation time for each new observation under a fixed limit. Results demonstrate the approach's adaptability and scalability using four standard data sets.
Title:
```
  Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models
```
Authors: Aayush Saxena, Arit Kumar Bishwas, Ayush Ashok Mishra, Ryan Armstrong
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Deep learning models have achieved tremendous success in most of the industries in recent years. The evolution of these models has also led to an increase in the model size and energy requirement, making it difficult to deploy in production on low compute devices. An increase in the number of connected devices around the world warrants compressed models that can be easily deployed at the local devices with low compute capacity and power accessibility. A wide range of solutions have been proposed by different researchers to reduce the size and complexity of such models, prominent among them are, Weight Quantization, Parameter Pruning, Network Pruning, low-rank representation, weights sharing, neural architecture search, knowledge distillation etc. In this research work, we investigate the performance impacts on various trained deep learning models, compressed using quantization and pruning techniques. We implemented both, quantization and pruning, compression techniques on popular deep learning models used in the image classification, object detection, language models and generative models-based problem statements. We also explored performance of various large language models (LLMs) after quantization and low rank adaptation. We used the standard evaluation metrics (model's size, accuracy, and inference time) for all the related problem statements and concluded this paper by discussing the challenges and future work.
Title:
```
  Development of Multistage Machine Learning Classifier using Decision Trees and Boosting Algorithms over Darknet Network Traffic
```
Authors: Anjali Sureshkumar Nair, Dr. Prashant Nitnaware
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In recent years, the clandestine nature of darknet activities has presented an escalating challenge to cybersecurity efforts, necessitating sophisticated methods for the detection and classification of network traffic associated with these covert operations. The system addresses the significant challenge of class imbalance within Darknet traffic datasets, where malicious traffic constitutes a minority, hindering effective discrimination between normal and malicious behavior. By leveraging boosting algorithms like AdaBoost and Gradient Boosting coupled with decision trees, this study proposes a robust solution for network traffic classification. Boosting algorithms ensemble learning corrects errors iteratively and assigns higher weights to minority class instances, complemented by the hierarchical structure of decision trees. The additional Feature Selection which is a preprocessing method by utilizing Information Gain metrics, Fisher's Score, and Chi-Square test selection for features is employed. Rigorous experimentation with diverse Darknet traffic datasets validates the efficacy of the proposed multistage classifier, evaluated through various performance metrics such as accuracy, precision, recall, and F1-score, offering a comprehensive solution for accurate detection and classification of Darknet activities.
Title:
```
  Multilingual Fine-Grained News Headline Hallucination Detection
```
Authors: Jiaming Shen, Tianqi Liu, Jialu Liu, Zhen Qin, Jay Pavagadhi, Simon Baumgartner, Michael Bendersky
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The popularity of automated news headline generation has surged with advancements in pre-trained language models. However, these models often suffer from the ``hallucination'' problem, where the generated headline is not fully supported by its source article. Efforts to address this issue have predominantly focused on English, using over-simplistic classification schemes that overlook nuanced hallucination types. In this study, we introduce the first multilingual, fine-grained news headline hallucination detection dataset that contains over 11 thousand pairs in 5 languages, each annotated with detailed hallucination types by experts. We conduct extensive experiments on this dataset under two settings. First, we implement several supervised fine-tuning approaches as preparatory solutions and demonstrate this dataset's challenges and utilities. Second, we test various large language models' in-context learning abilities and propose two novel techniques, language-dependent demonstration selection and coarse-to-fine prompting, to boost the few-shot hallucination detection performance in terms of the example-F1 metric. We release this dataset to foster further research in multilingual, fine-grained headline hallucination detection.
Title:
```
  EfficientCD: A New Strategy For Change Detection Based With Bi-temporal Layers Exchanged
```
Authors: Sijun Dong, Yuwei Zhu, Geng Chen, Xiaoliang Meng
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract With the widespread application of remote sensing technology in environmental monitoring, the demand for efficient and accurate remote sensing image change detection (CD) for natural environments is growing. We propose a novel deep learning framework named EfficientCD, specifically designed for remote sensing image change detection. The framework employs EfficientNet as its backbone network for feature extraction. To enhance the information exchange between bi-temporal image feature maps, we have designed a new Feature Pyramid Network module targeted at remote sensing change detection, named ChangeFPN. Additionally, to make full use of the multi-level feature maps in the decoding stage, we have developed a layer-by-layer feature upsampling module combined with Euclidean distance to improve feature fusion and reconstruction during the decoding stage. The EfficientCD has been experimentally validated on four remote sensing datasets: LEVIR-CD, SYSU-CD, CLCD, and WHUCD. The experimental results demonstrate that EfficientCD exhibits outstanding performance in change detection accuracy. The code and pretrained models will be released at this https URL.
Title:
```
  Wallcamera: Reinventing the Wheel?
```
Authors: Aurélien Bourquard, Jeff Yan
Subjects: Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Developed at MIT CSAIL, the Wallcamera has captivated the public's imagination. Here, we show that the key insight underlying the Wallcamera is the same one that underpins the concept and the prototype of differential imaging forensics (DIF), both of which were validated and reported several years prior to the Wallcamera's debut. Rather than being the first to extract and amplify invisible signals -- aka latent evidence in the forensics context -- from wall reflections in a video, or the first to propose activity recognition following that approach, the Wallcamera's actual innovation is achieving activity recognition at a finer granularity than DIF demonstrated. In addition to activity recognition, DIF as conceived has a number of other applications in forensics, including 1) the recovery of a photographer's personal identifiable information such as body width, height, and even the color of their clothing, from a single photo, and 2) the detection of image tampering and deepfake videos.
Title:
```
  Pavement Fatigue Crack Detection and Severity Classification Based on Convolutional Neural Network
```
Authors: Zhen Wang, Dylan G. Ildefonzo, Linbing Wang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Due to the varying intensity of pavement cracks, the complexity of topological structure, and the noise of texture background, image classification for asphalt pavement cracking has proven to be a challenging problem. Fatigue cracking, also known as alligator cracking, is one of the common distresses of asphalt pavement. It is thus important to detect and monitor the condition of alligator cracking on roadway pavements. Most research in this area has typically focused on pixel-level detection of cracking using limited datasets. A novel deep convolutional neural network that can achieve two objectives is proposed. The first objective of the proposed neural network is to classify presence of fatigue cracking based on pavement surface images. The second objective is to classify the fatigue cracking severity level based on the Distress Identification Manual (DIM) standard. In this paper, a databank of 4484 high-resolution pavement surface images is established in which images are taken locally in the Town of Blacksburg, Virginia, USA. In the data pre-preparation, over 4000 images are labeled into 4 categories manually according to DIM standards. A four-layer convolutional neural network model is then built to achieve the goal of classification of images by pavement crack severity category. The trained model reached the highest accuracy among all existing methods. After only 30 epochs of training, the model achieved a crack existence classification accuracy of 96.23% and a severity level classification accuracy of 96.74%. After 20 epochs of training, the model achieved a pavement marking presence classification accuracy of 97.64%.
Title:
```
  PLayerTV: Advanced Player Tracking and Identification for Automatic Soccer Highlight Clips
```
Authors: Håkon Maric Solberg, Mehdi Houshmand Sarkhoosh, Sushant Gautam, Saeed Shafiee Sabet, Pål Halvorsen, Cise Midoglu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In the rapidly evolving field of sports analytics, the automation of targeted video processing is a pivotal advancement. We propose PlayerTV, an innovative framework which harnesses state-of-the-art AI technologies for automatic player tracking and identification in soccer videos. By integrating object detection and tracking, Optical Character Recognition (OCR), and color analysis, PlayerTV facilitates the generation of player-specific highlight clips from extensive game footage, significantly reducing the manual labor traditionally associated with such tasks. Preliminary results from the evaluation of our core pipeline, tested on a dataset from the Norwegian Eliteserien league, indicate that PlayerTV can accurately and efficiently identify teams and players, and our interactive Graphical User Interface (GUI) serves as a user-friendly application wrapping this functionality for streamlined use.
Title:
```
  FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network
```
Authors: Weiying Xie, Yusi Zhang, Tianlin Hui, Jiaqing Zhang, Jie Lei, Yunsong Li
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Multimodal object detection offers a promising prospect to facilitate robust detection in various visual conditions. However, existing two-stream backbone networks are challenged by complex fusion and substantial parameter increments. This is primarily due to large data distribution biases of multimodal homogeneous information. In this paper, we propose a novel multimodal object detector, named Low-rank Modal Adaptors (LMA) with a shared backbone. The shared parameters enhance the consistency of homogeneous information, while lightweight modal adaptors focus on modality unique features. Furthermore, we design an adaptive rank allocation strategy to adapt to the varying heterogeneity at different feature levels. When applied to two multimodal object detection datasets, experiments validate the effectiveness of our method. Notably, on DroneVehicle, LMA attains a 10.4% accuracy improvement over the state-of-the-art method with a 149M-parameters reduction. The code is available at this https URL. Our work was submitted to ACM MM in April 2024, but was rejected. We will continue to refine our work and paper writing next, mainly including proof of theory and multi-task applications of FoRA.
Title:
```
  Users Feel Guilty: Measurement of Illegal Software Installation Guide Videos on YouTube for Malware Distribution
```
Authors: Rei Yamagishi, Shota Fujii, Tatsuya Mori
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This study introduces and examines a sophisticated malware distribution technique that exploits popular video sharing platforms. In this attack, threat actors distribute malware through deceptive content that promises free versions of premium software and game cheats. Throughout this paper, we call this attack MalTube. MalTube is particularly insidious because it exploits the guilt feelings of users for engaging in potentially illegal activity, making them less likely to report the infection or ask for a help. To investigate this emerging threat, we developed video platform exploitation reconnaissance VIPER, a novel monitoring system designed to detect, monitor, and analyze MalTube activity at scale. Over a four-month data collection period, VIPER processed and analyzed 14,363 videos, 8,671 associated channels, and 1,269 unique fully qualified domain names associated with malware downloads. Our findings reveal that MalTube attackers primarily target young gamers, using the lure of free software and game cheats as infection vectors. The attackers employ various sophisticated social engineering techniques to maximize user engagement and ensure successful malware propagation. These techniques include the strategic use of platform-specific features such as trending keywords, emoticons, and eye-catching thumbnails. These tactics closely mimic legitimate content creation strategies while providing detailed instructions for malware infection. Based on our in-depth analysis, we propose a set of robust detection and mitigation strategies that exploit the invariant characteristics of MalTube videos, offering the potential for automated threat detection and prevention.
Title:
```
  Open-Set Biometrics: Beyond Good Closed-Set Models
```
Authors: Yiyang Su, Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Biometric recognition has primarily addressed closed-set identification, assuming all probe subjects are in the gallery. However, most practical applications involve open-set biometrics, where probe subjects may or may not be present in the gallery. This poses distinct challenges in effectively distinguishing individuals in the gallery while minimizing false detections. While it is commonly believed that powerful biometric models can excel in both closed- and open-set scenarios, existing loss functions are inconsistent with open-set evaluation. They treat genuine (mated) and imposter (non-mated) similarity scores symmetrically and neglect the relative magnitudes of imposter scores. To address these issues, we simulate open-set evaluation using minibatches during training and introduce novel loss functions: (1) the identification-detection loss optimized for open-set performance under selective thresholds and (2) relative threshold minimization to reduce the maximum negative score for each probe. Across diverse biometric tasks, including face recognition, gait recognition, and person re-identification, our experiments demonstrate the effectiveness of the proposed loss functions, significantly enhancing open-set performance while positively impacting closed-set performance. Our code and models are available at this https URL.
Title:
```
  Discovering overlapping communities in multi-layer directed networks
```
Authors: Huan Qing
Subjects: Subjects: Social and Information Networks (cs.SI); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This article explores the challenging problem of detecting overlapping communities in multi-layer directed networks. Our goal is to understand the underlying asymmetric overlapping community structure by analyzing the mixed memberships of nodes. We introduce a new model, the multi-layer mixed membership stochastic co-block model (multi-layer MM-ScBM), to model multi-layer directed networks in which nodes can belong to multiple communities. We develop a spectral procedure to estimate nodes' memberships in both sending and receiving patterns. Our method uses a successive projection algorithm on a few leading eigenvectors of two debiased aggregation matrices. To our knowledge, this is the first work to detect asymmetric overlapping communities in multi-layer directed networks. We demonstrate the consistent estimation properties of our method by providing per-node error rates under the multi-layer MM-ScBM framework. Our theoretical analysis reveals that increasing the overall sparsity, the number of nodes, or the number of layers can improve the accuracy of overlapping community detection. Extensive numerical experiments are conducted to validate these theoretical findings. We also apply our method to one real-world multi-layer directed network, gaining insightful results.
Title:
```
  LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera
```
Authors: Yukai Ma, Jianbiao Mei, Xuemeng Yang, Licheng Wen, Weihua Xu, Jiangning Zhang, Botian Shi, Yong Liu, Xingxing Zuo
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Semantic Scene Completion (SSC) is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robustness. Radar, increasingly utilized for 3D target detection, is gradually replacing LiDAR in autonomous driving applications, offering a robust sensing alternative. In this paper, we focus on the potential of 3D radar in semantic scene completion, pioneering cross-modal refinement techniques for improved robustness against weather and illumination changes, and enhancing SSC performance.Regarding model architecture, we propose a three-stage tight fusion approach on BEV to realize a fusion framework for point clouds and images. Based on this foundation, we designed three cross-modal distillation modules-CMRD, BRD, and PDD. Our approach enhances the performance in both radar-only (R-LiCROcc) and radar-camera (RC-LiCROcc) settings by distilling to them the rich semantic and structural information of the fused features of LiDAR and camera. Finally, our LC-Fusion (teacher model), R-LiCROcc and RC-LiCROcc achieve the best performance on the nuScenes-Occupancy dataset, with mIOU exceeding the baseline by 22.9%, 44.1%, and 15.5%, respectively. The project page is available at this https URL.
Title:
```
  Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection
```
Authors: Xin Zhou, Duc-Manh Tran, Thanh Le-Cong, Ting Zhang, Ivana Clairine Irsan, Joshua Sumarlin, Bach Le, David Lo
Subjects: Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Software vulnerabilities pose significant security challenges and potential risks to society, necessitating extensive efforts in automated vulnerability detection. There are two popular lines of work to address automated vulnerability detection. On one hand, Static Application Security Testing (SAST) is usually utilized to scan source code for security vulnerabilities, especially in industries. On the other hand, deep learning (DL)-based methods, especially since the introduction of large language models (LLMs), have demonstrated their potential in software vulnerability detection. However, there is no comparative study between SAST tools and LLMs, aiming to determine their effectiveness in vulnerability detection, understand the pros and cons of both SAST and LLMs, and explore the potential combination of these two families of approaches. In this paper, we compared 15 diverse SAST tools with 12 popular or state-of-the-art open-source LLMs in detecting software vulnerabilities from repositories of three popular programming languages: Java, C, and Python. The experimental results showed that SAST tools obtain low vulnerability detection rates with relatively low false positives, while LLMs can detect up 90\% to 100\% of vulnerabilities but suffer from high false positives. By further ensembling the SAST tools and LLMs, the drawbacks of both SAST tools and LLMs can be mitigated to some extent. Our analysis sheds light on both the current progress and future directions for software vulnerability detection.
Title:
```
  Efficient Detection of Commutative Factors in Factor Graphs
```
Authors: Malte Luttermann, Johann Machemer, Marcel Gehrke
Subjects: Subjects: Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Lifted probabilistic inference exploits symmetries in probabilistic graphical models to allow for tractable probabilistic inference with respect to domain sizes. To exploit symmetries in, e.g., factor graphs, it is crucial to identify commutative factors, i.e., factors having symmetries within themselves due to their arguments being exchangeable. The current state of the art to check whether a factor is commutative with respect to a subset of its arguments iterates over all possible subsets of the factor's arguments, i.e., $O(2^n)$ iterations for a factor with $n$ arguments in the worst case. In this paper, we efficiently solve the problem of detecting commutative factors in a factor graph. In particular, we introduce the detection of commutative factors (DECOR) algorithm, which allows us to drastically reduce the computational effort for checking whether a factor is commutative in practice. We prove that DECOR efficiently identifies restrictions to drastically reduce the number of required iterations and validate the efficiency of DECOR in our empirical evaluation.
Title:
```
  TAPTRv2: Attention-based Position Update Improves Tracking Any Point
```
Authors: Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Feng Li, Tianhe Ren, Bohan Li, Lei Zhang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection TRansformer (DETR) and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its reliance on cost-volume,which contaminates the point queryś content feature and negatively impacts both visibility prediction and cost-volume computation. In TAPTRv2, we propose a novel attention-based position update (APU) operation and use key-aware deformable attention to realize. For each query, this operation uses key-aware attention weights to combine their corresponding deformable sampling positions to predict a new query position. This design is based on the observation that local attention is essentially the same as cost-volume, both of which are computed by dot-production between a query and its surrounding features. By introducing this new operation, TAPTRv2 not only removes the extra burden of cost-volume computation, but also leads to a substantial performance improvement. TAPTRv2 surpasses TAPTR and achieves state-of-the-art performance on many challenging datasets, demonstrating the superiority
Title:
```
  DeepClean: Integrated Distortion Identification and Algorithm Selection for Rectifying Image Corruptions
```
Authors: Aditya Kapoor, Harshad Khadilkar, Jayvardhana Gubbi
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Distortion identification and rectification in images and videos is vital for achieving good performance in downstream vision applications. Instead of relying on fixed trial-and-error based image processing pipelines, we propose a two-level sequential planning approach for automated image distortion classification and rectification. At the higher level it detects the class of corruptions present in the input image, if any. The lower level selects a specific algorithm to be applied, from a set of externally provided candidate algorithms. The entire two-level setup runs in the form of a single forward pass during inference and it is to be queried iteratively until the retrieval of the original image. We demonstrate improvements compared to three baselines on the object detection task on COCO image dataset with rich set of distortions. The advantage of our approach is its dynamic reconfiguration, conditioned on the input image and generalisability to unseen candidate algorithms at inference time, since it relies only on the comparison of their output of the image embeddings.
Title:
```
  Hidden Web Caches Discovery
```
Authors: Matteo Golinelli, Bruno Crispo
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Web caches play a crucial role in web performance and scalability. However, detecting cached responses is challenging when web servers do not reliably communicate the cache status through standardized headers. This paper presents a novel methodology for cache detection using timing analysis. Our approach eliminates the dependency on cache status headers, making it applicable to any web server. The methodology relies on sending paired requests using HTTP multiplexing functionality and makes heavy use of cache-busting to control the origin of the responses. By measuring the time it takes to receive responses from paired requests, we can determine if a response is cached or not. In each pair, one request is cache-busted to force retrieval from the origin server, while the other request is not and might be served from the cache, if present. A faster response time for the non-cache-busted request compared to the cache-busted one suggests the first one is coming from the cache. We implemented this approach in a tool and achieved an estimated accuracy of 89.6% compared to state-of-the-art methods based on cache status headers. Leveraging our cache detection approach, we conducted a large-scale experiment on the Tranco Top 50k websites. We identified a significant presence of hidden caches (5.8%) that do not advertise themselves through headers. Additionally, we employed our methodology to detect Web Cache Deception (WCD) vulnerabilities in these hidden caches. We discovered that 1.020 of them are susceptible to WCD vulnerabilities, potentially leaking sensitive data. Our findings demonstrate the effectiveness of our timing analysis methodology for cache discovery and highlight the importance of a tool that does not rely on cache-communicated cache status headers.
Title:
```
  Understanding Impacts of Electromagnetic Signal Injection Attacks on Object Detection
```
Authors: Youqian Zhang, Chunxi Yang, Eugene Y. Fu, Qinhong Jiang, Chen Yan, Sze-Yiu Chau, Grace Ngai, Hong-Va Leong, Xiapu Luo, Wenyuan Xu
Subjects: Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Object detection can localize and identify objects in images, and it is extensively employed in critical multimedia applications such as security surveillance and autonomous driving. Despite the success of existing object detection models, they are often evaluated in ideal scenarios where captured images guarantee the accurate and complete representation of the detecting scenes. However, images captured by image sensors may be affected by different factors in real applications, including cyber-physical attacks. In particular, attackers can exploit hardware properties within the systems to inject electromagnetic interference so as to manipulate the images. Such attacks can cause noisy or incomplete information about the captured scene, leading to incorrect detection results, potentially granting attackers malicious control over critical functions of the systems. This paper presents a research work that comprehensively quantifies and analyzes the impacts of such attacks on state-of-the-art object detection models in practice. It also sheds light on the underlying reasons for the incorrect detection outcomes.
Title:
```
  FACTTRACK: Time-Aware World State Tracking in Story Outlines
```
Authors: Zhiheng Lyu, Kevin Yang, Lingpeng Kong, Daniel Klein
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract While accurately detecting and correcting factual contradictions in language model outputs has become increasingly important as their capabilities improve, doing so is highly challenging. We propose a novel method, FACTTRACK, for tracking atomic facts and addressing factual contradictions. Crucially, FACTTRACK also maintains time-aware validity intervals for each fact, allowing for change over time. At a high level, FACTTRACK consists of a four-step pipeline to update a world state data structure for each new event: (1) decompose the event into directional atomic facts; (2) determine the validity interval of each atomic fact using the world state; (3) detect contradictions with existing facts in the world state; and finally (4) add new facts to the world state and update existing atomic facts. When we apply FACTTRACK to contradiction detection on structured story outlines, we find that FACTTRACK using LLaMA2-7B-Chat substantially outperforms a fair baseline using LLaMA2-7B-Chat, and achieves performance comparable to a GPT4 baseline. Moreover, when using GPT4, FACTTRACK significantly outperforms the GPT4 baseline.
Title:
```
  Securing Tomorrow's Smart Cities: Investigating Software Security in Internet of Vehicles and Deep Learning Technologies
```
Authors: Ridhi Jain, Norbert Tihanyi, Mohamed Amine Ferrag
Subjects: Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Integrating Deep Learning (DL) techniques in the Internet of Vehicles (IoV) introduces many security challenges and issues that require thorough examination. This literature review delves into the inherent vulnerabilities and risks associated with DL in IoV systems, shedding light on the multifaceted nature of security threats. Through an extensive analysis of existing research, we explore potential threats posed by DL algorithms, including adversarial attacks, data privacy breaches, and model poisoning. Additionally, we investigate the impact of DL on critical aspects of IoV security, such as intrusion detection, anomaly detection, and secure communication protocols. Our review emphasizes the complexities of ensuring the robustness, reliability, and trustworthiness of DL-based IoV systems, given the dynamic and interconnected nature of vehicular networks. Furthermore, we discuss the need for novel security solutions tailored to address these challenges effectively and enhance the security posture of DL-enabled IoV environments. By offering insights into these critical issues, this chapter aims to stimulate further research, innovation, and collaboration in securing DL techniques within the context of the IoV, thereby fostering a safer and more resilient future for vehicular communication and connectivity.
Title:
```
  ESOD: Efficient Small Object Detection on High-Resolution Images
```
Authors: Kai Liu, Zhihang Fu, Sheng Jin, Ze Chen, Fan Zhou, Rongxin Jiang, Yaowu Chen, Jieping Ye
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Enlarging input images is a straightforward and effective approach to promote small object detection. However, simple image enlargement is significantly expensive on both computations and GPU memory. In fact, small objects are usually sparsely distributed and locally clustered. Therefore, massive feature extraction computations are wasted on the non-target background area of images. Recent works have tried to pick out target-containing regions using an extra network and perform conventional object detection, but the newly introduced computation limits their final performance. In this paper, we propose to reuse the detector's backbone to conduct feature-level object-seeking and patch-slicing, which can avoid redundant feature extraction and reduce the computation cost. Incorporating a sparse detection head, we are able to detect small objects on high-resolution inputs (e.g., 1080P or larger) for superior performance. The resulting Efficient Small Object Detection (ESOD) approach is a generic framework, which can be applied to both CNN- and ViT-based detectors to save the computation and GPU memory costs. Extensive experiments demonstrate the efficacy and efficiency of our method. In particular, our method consistently surpasses the SOTA detectors by a large margin (e.g., 8% gains on AP) on the representative VisDrone, UAVDT, and TinyPerson datasets. Code will be made public soon.
Title:
```
  Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution
```
Authors: Kai Liu, Zhihang Fu, Sheng Jin, Chao Chen, Ze Chen, Rongxin Jiang, Fan Zhou, Yaowu Chen, Jieping Ye
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions. In real-world scenarios, however, the efficacy of existing OOD detection methods is often impeded by the inherent imbalance of in-distribution (ID) data, which causes significant performance decline. Through statistical observations, we have identified two common challenges faced by different OOD detectors: misidentifying tail class ID samples as OOD, while erroneously predicting OOD samples as head class from ID. To explain this phenomenon, we introduce a generalized statistical framework, termed ImOOD, to formulate the OOD detection problem on imbalanced data distribution. Consequently, the theoretical analysis reveals that there exists a class-aware bias item between balanced and imbalanced OOD detection, which contributes to the performance gap. Building upon this finding, we present a unified training-time regularization technique to mitigate the bias and boost imbalanced OOD detectors across architecture designs. Our theoretically grounded method translates into consistent improvements on the representative CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks against several state-of-the-art OOD detection approaches. Code will be made public soon.
Title:
```
  MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
```
Authors: Youngmin Oh, Hyung-Il Kim, Seong Tae Kim, Jung Uk Kim
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Monocular 3D object detection is an important challenging task in autonomous driving. Existing methods mainly focus on performing 3D detection in ideal weather conditions, characterized by scenarios with clear and optimal visibility. However, the challenge of autonomous driving requires the ability to handle changes in weather conditions, such as foggy weather, not just clear weather. We introduce MonoWAD, a novel weather-robust monocular 3D object detector with a weather-adaptive diffusion model. It contains two components: (1) the weather codebook to memorize the knowledge of the clear weather and generate a weather-reference feature for any input, and (2) the weather-adaptive diffusion model to enhance the feature representation of the input feature by incorporating a weather-reference feature. This serves an attention role in indicating how much improvement is needed for the input feature according to the weather conditions. To achieve this goal, we introduce a weather-adaptive enhancement loss to enhance the feature representation under both clear and foggy weather conditions. Extensive experiments under various weather conditions demonstrate that MonoWAD achieves weather-robust monocular 3D object detection. The code and dataset are released at this https URL.
Title:
```
  Lymphoid Infiltration Assessment of the Tumor Margins in H&E Slides
```
Authors: Zhuxian Guo, Amine Marzouki, Jean-François Emile, Henning Müller, Camille Kurtz, Nicolas Loménie
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Lymphoid infiltration at tumor margins is a key prognostic marker in solid tumors, playing a crucial role in guiding immunotherapy decisions. Current assessment methods, heavily reliant on immunohistochemistry (IHC), face challenges in tumor margin delineation and are affected by tissue preservation conditions. In contrast, we propose a Hematoxylin and Eosin (H&E) staining-based approach, underpinned by an advanced lymphocyte segmentation model trained on a public dataset for the precise detection of CD3+ and CD20+ lymphocytes. In our colorectal cancer study, we demonstrate that our H&E-based method offers a compelling alternative to traditional IHC, achieving comparable results in many cases. Our method's validity is further explored through a Turing test, involving blinded assessments by a pathologist of anonymized curves from H&E and IHC slides. This approach invites the medical community to consider Turing tests as a standard for evaluating medical applications involving expert human evaluation, thereby opening new avenues for enhancing cancer management and immunotherapy planning.
Title:
```
  Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models
```
Authors: Kenza Benkirane (1), Laura Gongas (1), Shahar Pelles (1), Naomi Fuchs (1), Joshua Darmon (1), Pontus Stenetorp (1), David Ifeoluwa Adelani (1), Eduardo Sanchez (1 and 2) ((1) University College London, (2) Meta)
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent advancements in massively multilingual machine translation systems have significantly enhanced translation accuracy; however, even the best performing systems still generate hallucinations, severely impacting user trust. Detecting hallucinations in Machine Translation (MT) remains a critical challenge, particularly since existing methods excel with High-Resource Languages (HRLs) but exhibit substantial limitations when applied to Low-Resource Languages (LRLs). This paper evaluates hallucination detection approaches using Large Language Models (LLMs) and semantic similarity within massively multilingual embeddings. Our study spans 16 language directions, covering HRLs, LRLs, with diverse scripts. We find that the choice of model is essential for performance. On average, for HRLs, Llama3-70B outperforms the previous state of the art by as much as 0.16 MCC (Matthews Correlation Coefficient). However, for LRLs we observe that Claude Sonnet outperforms other LLMs on average by 0.03 MCC. The key takeaway from our study is that LLMs can achieve performance comparable or even better than previously proposed models, despite not being explicitly trained for any machine translation task. However, their advantage is less significant for LRLs.
Title:
```
  Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
```
Authors: Trinh Le Ba Khanh, Huy-Hung Nguyen, Long Hoang Pham, Duong Nguyen-Ngoc Tran, Jae Wook Jeon
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In object detection, unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, UDA's reliance on labeled source data restricts its adaptability in privacy-related scenarios. This study focuses on source-free object detection (SFOD), which adapts a source-trained detector to an unlabeled target domain without using labeled source data. Recent advancements in self-training, particularly with the Mean Teacher (MT) framework, show promise for SFOD deployment. However, the absence of source supervision significantly compromises the stability of these approaches. We identify two primary issues, (1) uncontrollable degradation of the teacher model due to inopportune updates from the student model, and (2) the student model's tendency to replicate errors from incorrect pseudo labels, leading to it being trapped in a local optimum. Both factors contribute to a detrimental circular dependency, resulting in rapid performance degradation in recent self-training frameworks. To tackle these challenges, we propose the Dynamic Retraining-Updating (DRU) mechanism, which actively manages the student training and teacher updating processes to achieve co-evolutionary training. Additionally, we introduce Historical Student Loss to mitigate the influence of incorrect pseudo labels. Our method achieves state-of-the-art performance in the SFOD setting on multiple domain adaptation benchmarks, comparable to or even surpassing advanced UDA methods. The code will be released at this https URL
Title:
```
  Spurious Correlations in Concept Drift: Can Explanatory Interaction Help?
```
Authors: Cristiana Lalletti, Stefano Teso
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Long-running machine learning models face the issue of concept drift (CD), whereby the data distribution changes over time, compromising prediction performance. Updating the model requires detecting drift by monitoring the data and/or the model for unexpected changes. We show that, however, spurious correlations (SCs) can spoil the statistics tracked by detection algorithms. Motivated by this, we introduce ebc-exstream, a novel detector that leverages model explanations to identify potential SCs and human feedback to correct for them. It leverages an entropy-based heuristic to reduce the amount of necessary feedback, cutting annotation costs. Our preliminary experiments on artificially confounded data highlight the promise of ebc-exstream for reducing the impact of SCs on detection.
Title:
```
  Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data
```
Authors: Julian Schelb, Roberto Ulloa, Andreas Spitz
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Researchers in the political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model the detection of topic-related content as a binary classification task and compare the accuracy of fine-tuned pre-trained encoder models against in-context learning strategies. Using only a few hundred annotated data points per topic, we detect content related to three German policies in a database of scraped webpages. We compare multilingual and monolingual models, as well as zero and few-shot approaches, and investigate the impact of negative sampling strategies and the combination of URL & content-based features. Our results show that a small sample of annotated data is sufficient to train an effective classifier. Fine-tuning encoder-based models yields better results than in-context learning. Classifiers using both URL & content-based features perform best, while using URLs alone provides adequate results when content is unavailable.
Title:
```
  QPT V2: Masked Image Modeling Advances Visual Scoring
```
Authors: Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetic information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms. Code and models will be released at \url{this https URL}.
Title:
```
  Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
```
Authors: Junyan Wu, Wei Lu, Xiangyang Luo, Rui Yang, Qian Wang, Xiaochun Cao
Subjects: Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce a novel coarse-to-fine proposal refinement framework (CFPRF) that incorporates a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization. Specifically, the FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions. The PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN. To learn robust discriminative features, we devise a difference-aware feature learning (DAFL) module guided by contrastive representation learning to enlarge the sensitive differences between different frames induced by minor manipulations. We further design a boundary-aware feature enhancement (BAFE) module to capture the contextual information of multiple transition boundaries and guide the interaction between boundary information and temporal features via a cross-attention mechanism. Extensive experiments show that our CFPRF achieves state-of-the-art performance on various datasets, including LAV-DF, ASVS2019PS, and HAD.
Title:
```
  COALA: A Practical and Vision-Centric Federated Learning Platform
```
Authors: Weiming Zhuang, Jian Xu, Chen Chen, Jingtao Li, Lingjuan Lyu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize into three levels: task, data, and model. At the task level, COALA extends support from simple classification to 15 computer vision tasks, including object detection, segmentation, pose estimation, and more. It also facilitates federated multiple-task learning, allowing clients to tackle multiple tasks simultaneously. At the data level, COALA goes beyond supervised FL to benchmark both semi-supervised FL and unsupervised FL. It also benchmarks feature distribution shifts other than commonly considered label distribution shifts. In addition to dealing with static data, it supports federated continual learning for continuously changing data in real-world scenarios. At the model level, COALA benchmarks FL with split models and different models in different clients. COALA platform offers three degrees of customization for these practical FL scenarios, including configuration customization, components customization, and workflow customization. We conduct systematic benchmarking experiments for the practical FL scenarios and highlight potential opportunities for further advancements in FL. Codes are open sourced at this https URL.
Title:
```
  Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs
```
Authors: Yifan Xia, Zichen Xie, Peiyu Liu, Kangjie Lu, Yan Liu, Wenhai Wang, Shouling Ji
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract While the automated detection of cryptographic API misuses has progressed significantly, its precision diminishes for intricate targets due to the reliance on manually defined patterns. Large Language Models (LLMs), renowned for their contextual understanding, offer a promising avenue to address existing shortcomings. However, applying LLMs in this security-critical domain presents challenges, particularly due to the unreliability stemming from LLMs' stochastic nature and the well-known issue of hallucination. To explore the prevalence of LLMs' unreliable analysis and potential solutions, this paper introduces a systematic evaluation framework to assess LLMs in detecting cryptographic misuses, utilizing a comprehensive dataset encompassing both manually-crafted samples and real-world projects. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. Nevertheless, we demonstrate how a constrained problem scope, coupled with LLMs' self-correction capability, significantly enhances the reliability of the detection. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks. Moreover, we identify the failure patterns that persistently hinder LLMs' reliability, including both cryptographic knowledge deficiency and code semantics misinterpretation. Guided by these insights, we develop an LLM-based workflow to examine open-source repositories, leading to the discovery of 63 real-world cryptographic misuses. Of these, 46 have been acknowledged by the development community, with 23 currently being addressed and 6 resolved. Reflecting on developers' feedback, we offer recommendations for future research and the development of LLM-based security tools.
Title:
```
  A Tale of Two DL Cities: When Library Tests Meet Compiler
```
Authors: Qingchao Shen, Yongqiang Tian, Haoyang Ma, Junjie Chen, Lili Huang, Ruifeng Fu, Shing-Chi Cheung, Zan Wang
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation.Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common objective with DL library testing, indicating that the embedded knowledge in DL library tests is beneficial for testing the model loading stage of DL compilers. In this work, we propose OPERA to extract such domain knowledge from the test inputs for DL libraries. OPERA constructs diverse tests from the various test inputs for DL libraries (including the test inputs documented in DL libraries and those generated by recent fuzzers). In addition, it incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs that are more likely to detect diverse bugs earlier. We considered three sources of tests in DL libraries for migration and used eight frontends from three DL compilers (e.g., TVM, TensorRT, and OpenVINO) for evaluation. OPERA detected 170 previously unknown bugs in total, 90 of which have been confirmed/fixed by developers, demonstrating the effectiveness of such the migration-based idea. The test prioritization strategy in OPERA improves testing efficiency with migrated tests by 11.9%~47.4% on average compared to general test prioritization strategies.
Title:
```
  Velocity Driven Vision: Asynchronous Sensor Fusion Birds Eye View Models for Autonomous Vehicles
```
Authors: Seamie Hayes, Sushil Sharma, Ciarán Eising
Subjects: Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Fusing different sensor modalities can be a difficult task, particularly if they are asynchronous. Asynchronisation may arise due to long processing times or improper synchronisation during calibration, and there must exist a way to still utilise this previous information for the purpose of safe driving, and object detection in ego vehicle/ multi-agent trajectory prediction. Difficulties arise in the fact that the sensor modalities have captured information at different times and also at different positions in space. Therefore, they are not spatially nor temporally aligned. This paper will investigate the challenge of radar and LiDAR sensors being asynchronous relative to the camera sensors, for various time latencies. The spatial alignment will be resolved before lifting into BEV space via the transformation of the radar/LiDAR point clouds into the new ego frame coordinate system. Only after this can we concatenate the radar/LiDAR point cloud and lifted camera features. Temporal alignment will be remedied for radar data only, we will implement a novel method of inferring the future radar point positions using the velocity information. Our approach to resolving the issue of sensor asynchrony yields promising results. We demonstrate velocity information can drastically improve IoU for asynchronous datasets, as for a time latency of 360 milliseconds (ms), IoU improves from 49.54 to 53.63. Additionally, for a time latency of 550ms, the camera+radar (C+R) model outperforms the camera+LiDAR (C+L) model by 0.18 IoU. This is an advancement in utilising the often-neglected radar sensor modality, which is less favoured than LiDAR for autonomous driving purposes.
Title:
```
  A Framework for Pupil Tracking with Event Cameras
```
Authors: Khadija Iddrisu, Waseem Shariff, Suzanne Little
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Saccades are extremely rapid movements of both eyes that occur simultaneously, typically observed when an individual shifts their focus from one object to another. These movements are among the swiftest produced by humans and possess the potential to achieve velocities greater than that of blinks. The peak angular speed of the eye during a saccade can reach as high as 700°/s in humans, especially during larger saccades that cover a visual angle of 25°. Previous research has demonstrated encouraging outcomes in comprehending neurological conditions through the study of saccades. A necessary step in saccade detection involves accurately identifying the precise location of the pupil within the eye, from which additional information such as gaze angles can be inferred. Conventional frame-based cameras often struggle with the high temporal precision necessary for tracking very fast movements, resulting in motion blur and latency issues. Event cameras, on the other hand, offer a promising alternative by recording changes in the visual scene asynchronously and providing high temporal resolution and low latency. By bridging the gap between traditional computer vision and event-based vision, we present events as frames that can be readily utilized by standard deep learning algorithms. This approach harnesses YOLOv8, a state-of-the-art object detection technology, to process these frames for pupil tracking using the publicly accessible Ev-Eye dataset. Experimental results demonstrate the framework's effectiveness, highlighting its potential applications in neuroscience, ophthalmology, and human-computer interaction.
Title:
```
  FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process
```
Authors: Yuyan Bu, Qiang Sheng, Juan Cao, Peng Qi, Danding Wang, Jintao Li
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract As short-form video-sharing platforms become a significant channel for news consumption, fake news in short videos has emerged as a serious threat in the online information ecosystem, making developing detection methods for this new scenario an urgent need. Compared with that in text and image formats, fake news on short video platforms contains rich but heterogeneous information in various modalities, posing a challenge to effective feature utilization. Unlike existing works mostly focusing on analyzing what is presented, we introduce a novel perspective that considers how it might be created. Through the lens of the creative process behind news video production, our empirical analysis uncovers the unique characteristics of fake news videos in material selection and editing. Based on the obtained insights, we design FakingRecipe, a creative process-aware model for detecting fake news short videos. It captures the fake news preferences in material selection from sentimental and semantic aspects and considers the traits of material editing from spatial and temporal aspects. To improve evaluation comprehensiveness, we first construct FakeTT, an English dataset for this task, and conduct experiments on both FakeTT and the existing Chinese FakeSV dataset. The results show FakingRecipe's superiority in detecting fake news on short video platforms.
Title:
```
  PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
```
Authors: Junyi Li, Junfeng Wu, Weizhi Zhao, Song Bai, Xiang Bai
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We present PartGLEE, a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corresponding semantic parts. By incorporating a large amount of object-level data, the hierarchical relationships can be extended, enabling PartGLEE to recognize a rich variety of parts. We conduct comprehensive studies to validate the effectiveness of our method, PartGLEE achieves the state-of-the-art performance across various part-level tasks and obtain competitive results on object-level tasks. The proposed PartGLEE significantly enhances hierarchical modeling capabilities and part-level perception over our previous GLEE model. Further analysis indicates that the hierarchical cognitive ability of PartGLEE is able to facilitate a detailed comprehension in images for mLLMs. The model and code will be released at this https URL .
Keyword: face recognition

Title:
```
  Open-Set Biometrics: Beyond Good Closed-Set Models
```
Authors: Yiyang Su, Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Biometric recognition has primarily addressed closed-set identification, assuming all probe subjects are in the gallery. However, most practical applications involve open-set biometrics, where probe subjects may or may not be present in the gallery. This poses distinct challenges in effectively distinguishing individuals in the gallery while minimizing false detections. While it is commonly believed that powerful biometric models can excel in both closed- and open-set scenarios, existing loss functions are inconsistent with open-set evaluation. They treat genuine (mated) and imposter (non-mated) similarity scores symmetrically and neglect the relative magnitudes of imposter scores. To address these issues, we simulate open-set evaluation using minibatches during training and introduce novel loss functions: (1) the identification-detection loss optimized for open-set performance under selective thresholds and (2) relative threshold minimization to reduce the maximum negative score for each probe. Across diverse biometric tasks, including face recognition, gait recognition, and person re-identification, our experiments demonstrate the effectiveness of the proposed loss functions, significantly enhancing open-set performance while positively impacting closed-set performance. Our code and models are available at this https URL.
Title:
```
  Cluster Haptic Texture Database: Haptic Texture Database with Variety in Velocity and Direction of Sliding Contacts
```
Authors: Michikuni Eguchi, Madoka Ito, Tomohiro Hayase, Yuichi Hiroi, Takefumi Hiraki
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Human perception integrates multisensory information, with tactile perception playing a key role in object and surface recognition. While human-machine interfaces with haptic modalities offer enhanced system performance, existing datasets focus primarily on visual data, overlooking comprehensive haptic information. Previous haptic texture databases have recorded sound and acceleration signals, but often ignore the nuanced differences between probe-texture and finger-texture interactions. Recognizing this shortcoming, we present the Cluster Haptic Texture Database, a multimodal dataset that records visual, auditory, and haptic signals from an artificial urethane rubber fingertip interacting with different textured surfaces. This database, designed to mimic the properties of the human finger, includes five velocity levels and eight directional variations, providing a comprehensive study of tactile interactions. Our evaluations reveal the effectiveness of classifiers trained on this dataset in identifying surfaces, and the subtleties of estimating velocity and direction for each surface.
Title:
```
  Federated Learning for Face Recognition via Intra-subject Self-supervised Learning
```
Authors: Hansol Kim, Hoyeol Choi, Youngjun Kwak
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Federated Learning (FL) for face recognition aggregates locally optimized models from individual clients to construct a generalized face recognition model. However, previous studies present two major challenges: insufficient incorporation of self-supervised learning and the necessity for clients to accommodate multiple subjects. To tackle these limitations, we propose FedFS (Federated Learning for personalized Face recognition via intra-subject Self-supervised learning framework), a novel federated learning architecture tailored to train personalized face recognition models without imposing subjects. Our proposed FedFS comprises two crucial components that leverage aggregated features of the local and global models to cooperate with representations of an off-the-shelf model. These components are (1) adaptive soft label construction, utilizing dot product operations to reformat labels within intra-instances, and (2) intra-subject self-supervised learning, employing cosine similarity operations to strengthen robust intra-subject representations. Additionally, we introduce a regularization loss to prevent overfitting and ensure the stability of the optimized model. To assess the effectiveness of FedFS, we conduct comprehensive experiments on the DigiFace-1M and VGGFace datasets, demonstrating superior performance compared to previous methods.
Title:
```
  Can Large Language Models Automatically Jailbreak GPT-4V?
```
Authors: Yuanwei Wu, Yue Huang, Yixin Liu, Xiang Li, Pan Zhou, Lichao Sun
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract GPT-4V has attracted considerable attention due to its extraordinary capacity for integrating and processing multimodal information. At the same time, its ability of face recognition raises new safety concerns of privacy leakage. Despite researchers' efforts in safety alignment through RLHF or preprocessing filters, vulnerabilities might still be exploited. In our study, we introduce AutoJailbreak, an innovative automatic jailbreak technique inspired by prompt optimization. We leverage Large Language Models (LLMs) for red-teaming to refine the jailbreak prompt and employ weak-to-strong in-context learning prompts to boost efficiency. Furthermore, we present an effective search method that incorporates early stopping to minimize optimization time and token expenditure. Our experiments demonstrate that AutoJailbreak significantly surpasses conventional methods, achieving an Attack Success Rate (ASR) exceeding 95.3\%. This research sheds light on strengthening GPT-4V security, underscoring the potential for LLMs to be exploited in compromising GPT-4V integrity.
Keyword: augmentation

Title:
```
  Transformer-based Capacity Prediction for Lithium-ion Batteries with Data Augmentation
```
Authors: Gift Modekwe, Saif Al-Wahaibi, Qiugang Lu
Subjects: Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Lithium-ion batteries are pivotal to technological advancements in transportation, electronics, and clean energy storage. The optimal operation and safety of these batteries require proper and reliable estimation of battery capacities to monitor the state of health. Current methods for estimating the capacities fail to adequately account for long-term temporal dependencies of key variables (e.g., voltage, current, and temperature) associated with battery aging and degradation. In this study, we explore the usage of transformer networks to enhance the estimation of battery capacity. We develop a transformer-based battery capacity prediction model that accounts for both long-term and short-term patterns in battery data. Further, to tackle the data scarcity issue, data augmentation is used to increase the data size, which helps to improve the performance of the model. Our proposed method is validated with benchmark datasets. Simulation results show the effectiveness of data augmentation and the transformer network in improving the accuracy and robustness of battery capacity prediction.
Title:
```
  A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation
```
Authors: Yiping Zhang, Yuntao Shou, Tao Meng, Wei Ai, Keqin Li
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The age estimation task aims to use facial features to predict the age of people and is widely used in public security, marketing, identification, and other fields. However, the features are mainly concentrated in facial keypoints, and existing CNN and Transformer-based methods have inflexibility and redundancy for modeling complex irregular structures. Therefore, this paper proposes a Multi-view Mask Contrastive Learning Graph Convolutional Neural Network (MMCL-GCN) for age estimation. Specifically, the overall structure of the MMCL-GCN network contains a feature extraction stage and an age estimation stage. In the feature extraction stage, we introduce a graph structure to construct face images as input and then design a Multi-view Mask Contrastive Learning (MMCL) mechanism to learn complex structural and semantic information about face images. The learning mechanism employs an asymmetric siamese network architecture, which utilizes an online encoder-decoder structure to reconstruct the missing information from the original graph and utilizes the target encoder to learn latent representations for contrastive learning. Furthermore, to promote the two learning mechanisms better compatible and complementary, we adopt two augmentation strategies and optimize the joint losses. In the age estimation stage, we design a Multi-layer Extreme Learning Machine (ML-IELM) with identity mapping to fully use the features extracted by the online encoder. Then, a classifier and a regressor were constructed based on ML-IELM, which were used to identify the age grouping interval and accurately estimate the final age. Extensive experiments show that MMCL-GCN can effectively reduce the error of age estimation on benchmark datasets such as Adience, MORPH-II, and LAP-2016.
Title:
```
  OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection
```
Authors: Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, Yun (Eric)Liang
Subjects: Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent studies have illuminated that Large Language Models (LLMs) exhibit substantial potential in the realm of RTL (Register Transfer Level) code generation, with notable advancements evidenced by commercial models such as GPT-4 and Claude3-Opus. Despite their proficiency, these commercial LLMs often raise concerns regarding privacy and security. Conversely, open-source LLMs, which offer solutions to these concerns, have inferior performance in RTL code generation tasks to commercial models due to the lack of highquality open-source RTL datasets. To address this issue, we introduce OriGen, a fully open-source framework featuring self-reflection capabilities and a dataset augmentation methodology for generating high-quality, large-scale RTL code. We propose a novel code-to-code augmentation methodology that leverages knowledge distillation to enhance the quality of the open-source RTL code datasets. Additionally, OriGen is capable of correcting syntactic errors by leveraging a self-reflection process based on feedback from the compiler. The self-reflection ability of the model is facilitated by a carefully constructed dataset, which comprises a comprehensive collection of samples. Experimental results demonstrate that OriGen remarkably outperforms other open-source alternatives in RTL code generation, surpassing the previous best-performing LLM by 9.8% on the VerilogEval-Human benchmark. Furthermore, OriGen exhibits superior capabilities in self-reflection and error rectification, surpassing GPT-4 by 18.1% on the benchmark designed to evaluate the capability of self-reflection.
Title:
```
  FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP
```
Authors: Ewoenam Kwaku Tokpo, Toon Calders
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Despite the evolution of language models, they continue to portray harmful societal biases and stereotypes inadvertently learned from training data. These inherent biases often result in detrimental effects in various applications. Counterfactual Data Augmentation (CDA), which seeks to balance demographic attributes in training data, has been a widely adopted approach to mitigate bias in natural language processing. However, many existing CDA approaches rely on word substitution techniques using manually compiled word-pair dictionaries. These techniques often lead to out-of-context substitutions, resulting in potential quality issues. The advancement of model-based techniques, on the other hand, has been challenged by the need for parallel training data. Works in this area resort to manually generated parallel data that are expensive to collect and are consequently limited in scale. This paper proposes FairFlow, an automated approach to generating parallel data for training counterfactual text generator models that limits the need for human intervention. Furthermore, we show that FairFlow significantly overcomes the limitations of dictionary-based word-substitution approaches whilst maintaining good performance.
Title:
```
  Enhancing Encrypted Internet Traffic Classification Through Advanced Data Augmentation Techniques
```
Authors: Yehonatan Zion, Porat Aharon, Ran Dubin, Amit Dvir, Chen Hajaj
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The increasing popularity of online services has made Internet Traffic Classification a critical field of study. However, the rapid development of internet protocols and encryption limits usable data availability. This paper addresses the challenges of classifying encrypted internet traffic, focusing on the scarcity of open-source datasets and limitations of existing ones. We propose two Data Augmentation (DA) techniques to synthetically generate data based on real samples: Average augmentation and MTU augmentation. Both augmentations are aimed to improve the performance of the classifier, each from a different perspective: The Average augmentation aims to increase dataset size by generating new synthetic samples, while the MTU augmentation enhances classifier robustness to varying Maximum Transmission Units (MTUs). Our experiments, conducted on two well-known academic datasets and a commercial dataset, demonstrate the effectiveness of these approaches in improving model performance and mitigating constraints associated with limited and homogeneous datasets. Our findings underscore the potential of data augmentation in addressing the challenges of modern internet traffic classification. Specifically, we show that our augmentation techniques significantly enhance encrypted traffic classification models. This improvement can positively impact user Quality of Experience (QoE) by more accurately classifying traffic as video streaming (e.g., YouTube) or chat (e.g., Google Chat). Additionally, it can enhance Quality of Service (QoS) for file downloading activities (e.g., Google Docs).

LeeKyungwook / get-arxiv-noti

New submissions for Wednesday, 24 July 2024 (showing 362 of 362 entries ) #1198

Keyword: detection

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Keyword: face recognition

Title:

Title:

Title:

Title:

Keyword: augmentation

Title:

Title:

Title:

Title:

Title: