New submissions for Thursday, 25 July 2024 (showing 331 of 331 entries )

Keyword: detection

Title:

      Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

Authors: Kai Liu, Zhihang Fu, Chao Chen, Sheng Jin, Ze Chen, Mingyuan Tao, Rongxin Jiang, Jieping Ye
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary through automatic prompt tuning. Specifically, perceptual contexts perceive the inter-category difference (e.g., cats vs apples) for current classification tasks, while spurious contexts further identify spurious (similar but exactly not) OOD samples for every single category (e.g., cats vs panthers, apples vs peaches). The two contexts hierarchically construct the precise description for a certain category, which is, first roughly classifying a sample to the predicted category and then delicately identifying whether it is truly an ID sample or actually OOD. Moreover, the precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX). One can efficiently extend the set of recognizable categories by simply merging the hierarchical contexts learned under different sub-task settings. And extensive experiments are conducted to demonstrate CATEX's effectiveness, robustness, and category-extensibility. For instance, CATEX consistently surpasses the rivals by a large margin with several protocols on the challenging ImageNet-1K dataset. In addition, we offer new insights on how to efficiently scale up the prompt engineering in vision-language models to recognize thousands of object categories, as well as how to incorporate large language models (like GPT-3) to boost zero-shot applications. Code will be made public soon.
Title:
```
  Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection
```
Authors: Su Li, Wang Liang, Jianye Wang, Ziheng Zhang, Lei Zhang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Estimating abnormal posture based on 3D pose is vital in human pose analysis, yet it presents challenges, especially when reconstructing 3D human poses from monocular datasets with occlusions. Accurate reconstructions enable the restoration of 3D movements, which assist in the extraction of semantic details necessary for analyzing abnormal behaviors. However, most existing methods depend on predefined key points as a basis for estimating the coordinates of occluded joints, where variations in data quality have adversely affected the performance of these models. In this paper, we present OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos. The OAD2D employs optical flow to capture motion prior information in video streams, enriching the information on occluded human movements and ensuring temporal-spatial alignment of poses. Moreover, we reformulate the abnormal posture estimation by coupling it with Motion to Text (M2T) model in which, the VQVAE is employed to quantize motion features. This approach maps motion tokens to text tokens, allowing for a semantically interpretable analysis of motion, and enhancing the generalization of abnormal posture detection boosted by Language model. Our approach demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates to effectively mitigate occlusion issues. Our method, validated using the Human3.6M, 3DPW, and NTU RGB+D datasets, achieves a high $F_1-$Score of 0.94 on the NTU RGB+D dataset for medical condition detection. And we will release all of our code and data.
Title:
```
  What Matters in Range View 3D Object Detection
```
Authors: Benjamin Wilson, Nicholas Autio Mitchell, Jhony Kaesemodel Pontes, James Hays
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Lidar-based perception pipelines rely on 3D object detection models to interpret complex scenes. While multiple representations for lidar exist, the range-view is enticing since it losslessly encodes the entire lidar sensor output. In this work, we achieve state-of-the-art amongst range-view 3D object detection models without using multiple techniques proposed in past range-view literature. We explore range-view 3D object detection across two modern datasets with substantially different properties: Argoverse 2 and Waymo Open. Our investigation reveals key insights: (1) input feature dimensionality significantly influences the overall performance, (2) surprisingly, employing a classification loss grounded in 3D spatial proximity works as well or better compared to more elaborate IoU-based losses, and (3) addressing non-uniform lidar density via a straightforward range subsampling technique outperforms existing multi-resolution, range-conditioned networks. Our experiments reveal that techniques proposed in recent range-view literature are not needed to achieve state-of-the-art performance. Combining the above findings, we establish a new state-of-the-art model for range-view 3D object detection -- improving AP by 2.2% on the Waymo Open dataset while maintaining a runtime of 10 Hz. We establish the first range-view model on the Argoverse 2 dataset and outperform strong voxel-based baselines. All models are multi-class and open-source. Code is available at this https URL.
Title:
```
  Blockchain security for ransomware detection
```
Authors: Elodie Ngoie Mutombo, Mike Wa Nkongolo
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Blockchain networks are critical for safeguarding digital transactions and assets, but they are increasingly targeted by ransomware attacks exploiting zero-day vulnerabilities. Traditional detection techniques struggle due to the complexity of these exploits and the lack of comprehensive datasets. The UGRansome dataset addresses this gap by offering detailed features for analysing ransomware and zero-day attacks, including timestamps, attack types, protocols, network flows, and financial impacts in bitcoins (BTC). This study uses the Lazy Predict library to automate machine learning (ML) on the UGRansome dataset. The study aims to enhance blockchain security through ransomware detection based on zero-day exploit recognition using the UGRansome dataset. Lazy Predict streamlines different ML model comparisons and identifies effective algorithms for threat detection. Key features such as timestamps, protocols, and financial data are used to predict anomalies as zero-day threats and to classify known signatures as ransomware. Results demonstrate that ML can significantly improve cybersecurity in blockchain environments. The DecisionTreeClassifier and ExtraTreeClassifier, with their high performance and low training times, are ideal candidates for deployment in real-time threat detection systems.
Title:
```
  Fostering Microservice Maintainability Assurance through a Comprehensive Framework
```
Authors: Amr S. Abdelfattah
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Cloud-native systems represent a significant leap in constructing scalable, large systems, employing microservice architecture as a key element in developing distributed systems through self-contained components. However, the decentralized nature of these systems, characterized by separate source codes and deployments, introduces challenges in assessing system qualities. Microservice-based systems, with their inherent complexity and the need for coordinated changes across multiple microservices, lack established best practices and guidelines, leading to difficulties in constructing and comprehending the holistic system view. This gap can result in performance degradation and increased maintenance costs, potentially requiring system refactoring. The main goal of this project is to offer maintainability assurance for microservice practitioners. It introduces an automated assessment framework tailored to microservice architecture, enhancing practitioners' understanding and analytical capabilities of the multiple system perspectives. The framework addresses various granularity levels, from artifacts to constructing holistic views of static and dynamic system characteristics. It integrates diverse perspectives, encompassing human-centric elements like architectural visualization and automated evaluations, including coupling detection, testing coverage measurement, and semantic clone identification. Validation studies involving practitioners demonstrate the framework's effectiveness in addressing diverse quality and maintainability issues, revealing insights not apparent when analyzing individual microservices in isolation.
Title:
```
  Vision-Based Adaptive Robotics for Autonomous Surface Crack Repair
```
Authors: Joshua Genova, Eric Cabrera, Vedhus Hoskere
Subjects: Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Surface cracks in infrastructure can lead to significant deterioration and costly maintenance if not efficiently repaired. Manual repair methods are labor-intensive, time-consuming, and imprecise and thus difficult to scale to large areas. Breakthroughs in robotic perception and manipulation have advanced autonomous crack repair, but proposed methods lack end-to-end testing and adaptability to changing crack size. This paper presents an adaptive, autonomous system for surface crack detection and repair using robotics with advanced sensing technologies. The system uses an RGB-D camera for crack detection, a laser scanner for precise measurement, and an extruder and pump for material deposition. A novel validation procedure with 3D-printed crack specimens simulates real-world cracks and ensures testing repeatability. Our study shows that an adaptive system for crack filling is more efficient and effective than a fixed-speed approach, with experimental results confirming precision and consistency. This research paves the way for versatile, reliable robotic infrastructure maintenance.
Title:
```
  Regulating AI Adaptation: An Analysis of AI Medical Device Updates
```
Authors: Kevin Wu, Eric Wu, Kit Rodolfa, Daniel E. Ho, James Zou
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract While the pace of development of AI has rapidly progressed in recent years, the implementation of safe and effective regulatory frameworks has lagged behind. In particular, the adaptive nature of AI models presents unique challenges to regulators as updating a model can improve its performance but also introduce safety risks. In the US, the Food and Drug Administration (FDA) has been a forerunner in regulating and approving hundreds of AI medical devices. To better understand how AI is updated and its regulatory considerations, we systematically analyze the frequency and nature of updates in FDA-approved AI medical devices. We find that less than 2% of all devices report having been updated by being re-trained on new data. Meanwhile, nearly a quarter of devices report updates in the form of new functionality and marketing claims. As an illustrative case study, we analyze pneumothorax detection models and find that while model performance can degrade by as much as 0.18 AUC when evaluated on new sites, re-training on site-specific data can mitigate this performance drop, recovering up to 0.23 AUC. However, we also observed significant degradation on the original site after re-training using data from new sites, providing insight from one example that challenges the current one-model-fits-all approach to regulatory approvals. Our analysis provides an in-depth look at the current state of FDA-approved AI device updates and insights for future regulatory policies toward model updating and adaptive AI.
Title:
```
  Profitable Manipulations of Cryptographic Self-Selection are Statistically Detectable
```
Authors: Linda Cai, Jingyi Liu, S. Matthew Weinberg, Chenghan Zhou
Subjects: Subjects: Computer Science and Game Theory (cs.GT); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Cryptographic Self-Selection is a common primitive underlying leader-selection for Proof-of-Stake blockchain protocols. The concept was first popularized in Algorand [CM19], who also observed that the protocol might be manipulable. [FHWY22] provide a concrete manipulation that is strictly profitable for a staker of any size (and also prove upper bounds on the gains from manipulation). Separately, [YSZ23, BM24] initiate the study of undetectable profitable manipulations of consensus protocols with a focus on the seminal Selfish Mining strategy [ES14] for Bitcoin's Proof-of-Work longest-chain protocol. They design a Selfish Mining variant that, for sufficiently large miners, is strictly profitable yet also indistinguishable to an onlooker from routine latency (that is, a sufficiently large profit-maximizing miner could use their strategy to strictly profit over being honest in a way that still appears to the rest of the network as though everyone is honest but experiencing mildly higher latency. This avoids any risk of negatively impacting the value of the underlying cryptocurrency due to attack detection). We investigate the detectability of profitable manipulations of the canonical cryptographic self-selection leader selection protocol introduced in [CM19] and studied in [FHWY22], and establish that for any player with $\alpha < \frac{3-\sqrt{5}}{2} \approx 0.38$ fraction of the total stake, every strictly profitable manipulation is statistically detectable. Specifically, we consider an onlooker who sees only the random seed of each round (and does not need to see any other broadcasts by any other players). We show that the distribution of the sequence of random seeds when any player is profitably manipulating the protocol is inconsistent with any distribution that could arise by honest stakers being offline or timing out (for a natural stylized model of honest timeouts).
Title:
```
  DVPE: Divided View Position Embedding for Multi-View 3D Object Detection
```
Authors: Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Sparse query-based paradigms have achieved significant success in multi-view 3D detection for autonomous vehicles. Current research faces challenges in balancing between enlarging receptive fields and reducing interference when aggregating multi-view features. Moreover, different poses of cameras present challenges in training global attention models. To address these problems, this paper proposes a divided view method, in which features are modeled globally via the visibility crossattention mechanism, but interact only with partial features in a divided local virtual space. This effectively reduces interference from other irrelevant features and alleviates the training difficulties of the transformer by decoupling the position embedding from camera poses. Additionally, 2D historical RoI features are incorporated into the object-centric temporal modeling to utilize highlevel visual semantic information. The model is trained using a one-to-many assignment strategy to facilitate stability. Our framework, named DVPE, achieves state-of-the-art performance (57.2% mAP and 64.5% NDS) on the nuScenes test set. Codes will be available at this https URL.
Title:
```
  NewsUnfold: Creating a News-Reading Application That Indicates Linguistic Media Bias and Collects Feedback
```
Authors: Smi Hinterreiter, Martin Wessel, Fabian Schliski, Isao Echizen, Marc Erich Latoschik, Timo Spinde
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address digital media bias is to detect and indicate it automatically through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. Human-in-the-loop-based feedback mechanisms have proven an effective way to facilitate the data-gathering process. Therefore, we introduce and test feedback mechanisms for the media bias domain, which we then implement on NewsUnfold, a news-reading web application to collect reader feedback on machine-generated bias highlights within online news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, the feedback mechanism shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnfold demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses and continuously update datasets to changes in context.
Title:
```
  Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++
```
Authors: Anh The Nguyen, Triet Huynh Minh Le, M. Ali Babar
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has been largely unexplored. SV assessment is increasingly crucial as it provides detailed information on the exploitability, impacts, and severity of security defects, thereby aiding in their prioritization and remediation. Aims: We conduct the first empirical study to investigate and compare the performance of ML and DL models, many of which have been used for SV detection, for function-level SV assessment in C/C++. Method: Using 9,993 vulnerable C/C++ functions, we evaluated the performance of six multi-class ML models and five multi-class DL models for the SV assessment at the function level based on the Common Vulnerability Scoring System (CVSS). We further explore multi-task learning, which can leverage common vulnerable code to predict all SV assessment outputs simultaneously in a single model, and compare the effectiveness and efficiency of this model type with those of the original multi-class models. Results: We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time. Employing multi-task learning allows the DL models to perform significantly better, with an average of 8-22% increase in Matthews Correlation Coefficient (MCC). Conclusions: We distill the practices of using data-driven techniques for function-level SV assessment in C/C++, including the use of multi-task DL to balance efficiency and effectiveness. This can establish a strong foundation for future work in this area.
Title:
```
  Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments
```
Authors: Wei Gao, Zezhou Sun, Mingle Zhao, Cheng-Zhong Xu, Hui Kong
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The autonomous mapping of large-scale urban scenes presents significant challenges for autonomous robots. To mitigate the challenges, global planning, such as utilizing prior GPS trajectories from OpenStreetMap (OSM), is often used to guide the autonomous navigation of robots for mapping. However, due to factors like complex terrain, unexpected body movement, and sensor noise, the uncertainty of the robot's pose estimates inevitably increases over time, ultimately leading to the failure of robotic mapping. To address this issue, we propose a novel active loop closure procedure, enabling the robot to actively re-plan the previously planned GPS trajectory. The method can guide the robot to re-visit the previous places where the loop-closure detection can be performed to trigger the back-end optimization, effectively reducing errors and uncertainties in pose estimation. The proposed active loop closure mechanism is implemented and embedded into a real-time OSM-guided robot mapping framework. Empirical results on several large-scale outdoor scenarios demonstrate its effectiveness and promising performance.
Title:
```
  When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection
```
Authors: Adam Goodge, Bryan Hooi, Wee Siong Ng
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Contrastive Language-Image Pre-training (CLIP) achieves remarkable performance in various downstream tasks through the alignment of image and text input embeddings and holds great promise for anomaly detection. However, our empirical experiments show that the embeddings of text inputs unexpectedly tightly cluster together, far away from image embeddings, contrary to the model's contrastive training objective to align image-text input pairs. We show that this phenomenon induces a `similarity bias' - in which false negative and false positive errors occur due to bias in the similarities between images and the normal label text embeddings. To address this bias, we propose a novel methodology called BLISS which directly accounts for this similarity bias through the use of an auxiliary, external set of text inputs. BLISS is simple, it does not require strong inductive biases about anomalous behaviour nor an expensive training process, and it significantly outperforms baseline methods on benchmark image datasets, even when access to normal data is extremely limited.
Title:
```
  News Ninja: Gamified Annotation of Linguistic Bias in Online News
```
Authors: Smi Hinterreiter, Timo Spinde, Sebastian Oberdörfer, Isao Echizen, Marc Erich Latoschik
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent research shows that visualizing linguistic bias mitigates its negative effects. However, reliable automatic detection methods to generate such visualizations require costly, knowledge-intensive training data. To facilitate data collection for media bias datasets, we present News Ninja, a game employing data-collecting game mechanics to generate a crowdsourced dataset. Before annotating sentences, players are educated on media bias via a tutorial. Our findings show that datasets gathered with crowdsourced workers trained on News Ninja can reach significantly higher inter-annotator agreements than expert and crowdsourced datasets with similar data quality. As News Ninja encourages continuous play, it allows datasets to adapt to the reception and contextualization of news over time, presenting a promising strategy to reduce data collection expenses, educate players, and promote long-term bias mitigation.
Title:
```
  RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
```
Authors: Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this report, we present RT-DETRv2, an improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR, and opens up a set of bag-of-freebies for flexibility and practicality, as well as optimizing the training strategy to achieve enhanced performance. To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales in the deformable attention to achieve selective multi-scale feature extraction by the decoder. To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator that is specific to RT-DETR compared to YOLOs. This removes the deployment constraints typically associated with DETRs. For the training strategy, we propose dynamic data augmentation and scale-adaptive hyperparameters customization to improve performance without loss of speed. Source code and pre-trained models will be available at this https URL.
Title:
```
  Explainable Artificial Intelligence Techniques for Irregular Temporal Classification of Multidrug Resistance Acquisition in Intensive Care Unit Patients
```
Authors: Óscar Escudero-Arnanz, Cristina Soguero-Ruiz, Joaquín Álvarez-Rodríguez, Antonio G. Marques
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Antimicrobial Resistance represents a significant challenge in the Intensive Care Unit (ICU), where patients are at heightened risk of Multidrug-Resistant (MDR) infections-pathogens resistant to multiple antimicrobial agents. This study introduces a novel methodology that integrates Gated Recurrent Units (GRUs) with advanced intrinsic and post-hoc interpretability techniques for detecting the onset of MDR in patients across time. Within interpretability methods, we propose Explainable Artificial Intelligence (XAI) approaches to handle irregular Multivariate Time Series (MTS), introducing Irregular Time Shapley Additive Explanations (IT-SHAP), a modification of Shapley Additive Explanations designed for irregular MTS with Recurrent Neural Networks focused on temporal outputs. Our methodology aims to identify specific risk factors associated with MDR in ICU patients. GRU with Hadamard's attention demonstrated high initial specificity and increasing sensitivity over time, correlating with increased nosocomial infection risks during prolonged ICU stays. XAI analysis, enhanced by Hadamard attention and IT-SHAP, identified critical factors such as previous non-resistant cultures, specific antibiotic usage patterns, and hospital environment dynamics. These insights suggest that early detection of at-risk patients can inform interventions such as preventive isolation and customized treatments, significantly improving clinical outcomes. The proposed GRU model for temporal classification achieved an average Receiver Operating Characteristic Area Under the Curve of 78.27 +- 1.26 over time, indicating strong predictive performance. In summary, this study highlights the clinical utility of our methodology, which combines predictive accuracy with interpretability, thereby facilitating more effective healthcare interventions by professionals.
Title:
```
  ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only
```
Authors: Saad Lahlali, Nicolas Granger, Hervé Le Borgne, Quoc-Cuong Pham
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract 3D object detection plays a crucial role in various applications such as autonomous vehicles, robotics and augmented reality. However, training 3D detectors requires a costly precise annotation, which is a hindrance to scaling annotation to large datasets. To address this challenge, we propose a weakly supervised 3D annotator that relies solely on 2D bounding box annotations from images, along with size priors. One major problem is that supervising a 3D detection model using only 2D boxes is not reliable due to ambiguities between different 3D poses and their identical 2D projection. We introduce a simple yet effective and generic solution: we build 3D proxy objects with annotations by construction and add them to the training dataset. Our method requires only size priors to adapt to new classes. To better align 2D supervision with 3D detection, our method ensures depth invariance with a novel expression of the 2D losses. Finally, to detect more challenging instances, our annotator follows an offline pseudo-labelling scheme which gradually improves its 3D pseudo-labels. Extensive experiments on the KITTI dataset demonstrate that our method not only performs on-par or above previous works on the Car category, but also achieves performance close to fully supervised methods on more challenging classes. We further demonstrate the effectiveness and robustness of our method by being the first to experiment on the more challenging nuScenes dataset. We additionally propose a setting where weak labels are obtained from a 2D detector pre-trained on MS-COCO instead of human annotations.
Title:
```
  How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?
```
Authors: Leo Yu-Ho Lo, Huamin Qu
Subjects: Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this study, we address the growing issue of misleading charts, a prevalent problem that undermines the integrity of information dissemination. Misleading charts can distort the viewer's perception of data, leading to misinterpretations and decisions based on false information. The development of effective automatic detection methods for misleading charts is an urgent field of research. The recent advancement of multimodal Large Language Models (LLMs) has introduced a promising direction for addressing this challenge. We explored the capabilities of these models in analyzing complex charts and assessing the impact of different prompting strategies on the models' analyses. We utilized a dataset of misleading charts collected from the internet by prior research and crafted nine distinct prompts, ranging from simple to complex, to test the ability of four different multimodal LLMs in detecting over 21 different chart issues. Through three experiments--from initial exploration to detailed analysis--we progressively gained insights into how to effectively prompt LLMs to identify misleading charts and developed strategies to address the scalability challenges encountered as we expanded our detection range from the initial five issues to 21 issues in the final experiment. Our findings reveal that multimodal LLMs possess a strong capability for chart comprehension and critical thinking in data interpretation. There is significant potential in employing multimodal LLMs to counter misleading information by supporting critical thinking and enhancing visualization literacy. This study demonstrates the applicability of LLMs in addressing the pressing concern of misleading charts.
Title:
```
  Global and Local Confidence Based Fraud Detection Graph Neural Network
```
Authors: Jiaxun Liu, Yue Tian, Guanjun Liu
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This paper presents the Global and Local Confidence Graph Neural Network (GLC-GNN), an innovative approach to graph-based anomaly detection that addresses the challenges of heterophily and camouflage in fraudulent activities. By introducing a prototype to encapsulate the global features of a graph and calculating a Global Confidence (GC) value for each node, GLC-GNN effectively distinguishes between benign and fraudulent nodes. The model leverages GC to generate attention values for message aggregation, enhancing its ability to capture both homophily and heterophily. Through extensive experiments on four open datasets, GLC-GNN demonstrates superior performance over state-of-the-art models in accuracy and convergence speed, while maintaining a compact model size and expedited training process. The integration of global and local confidence measures in GLC-GNN offers a robust solution for detecting anomalies in graphs, with significant implications for fraud detection across diverse domains.
Title:
```
  Preliminary study on artificial intelligence methods for cybersecurity threat detection in computer networks based on raw data packets
```
Authors: Aleksander Ogonowski, Michał Żebrowski, Arkadiusz Ćwiek, Tobiasz Jarosiewicz, Konrad Klimaszewski, Adam Padee, Piotr Wasiuk, Michał Wójcik
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Most of the intrusion detection methods in computer networks are based on traffic flow characteristics. However, this approach may not fully exploit the potential of deep learning algorithms to directly extract features and patterns from raw packets. Moreover, it impedes real-time monitoring due to the necessity of waiting for the processing pipeline to complete and introduces dependencies on additional software components. In this paper, we investigate deep learning methodologies capable of detecting attacks in real-time directly from raw packet data within network traffic. We propose a novel approach where packets are stacked into windows and separately recognised, with a 2D image representation suitable for processing with computer vision models. Our investigation utilizes the CIC IDS-2017 dataset, which includes both benign traffic and prevalent real-world attacks, providing a comprehensive foundation for our research.
Title:
```
  AI Emergency Preparedness: Examining the federal government's ability to detect and respond to AI-related national security threats
```
Authors: Akash Wasil, Everett Smith, Corin Katzke, Justin Bullock
Subjects: Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We examine how the federal government can enhance its AI emergency preparedness: the ability to detect and prepare for time-sensitive national security threats relating to AI. Emergency preparedness can improve the government's ability to monitor and predict AI progress, identify national security threats, and prepare effective response plans for plausible threats and worst-case scenarios. Our approach draws from fields in which experts prepare for threats despite uncertainty about their exact nature or timing (e.g., counterterrorism, cybersecurity, pandemic preparedness). We focus on three plausible risk scenarios: (1) loss of control (threats from a powerful AI system that becomes capable of escaping human control), (2) cybersecurity threats from malicious actors (threats from a foreign actor that steals the model weights of a powerful AI system), and (3) biological weapons proliferation (threats from users identifying a way to circumvent the safeguards of a publicly-released model in order to develop biological weapons.) We evaluate the federal government's ability to detect, prevent, and respond to these threats. Then, we highlight potential gaps and offer recommendations to improve emergency preparedness. We conclude by describing how future work on AI emergency preparedness can be applied to improve policymakers' understanding of risk scenarios, identify gaps in detection capabilities, and form preparedness plans to improve the effectiveness of federal responses to AI-related national security threats.
Title:
```
  Media Manipulations in the Coverage of Events of the Ukrainian Revolution of Dignity: Historical, Linguistic, and Psychological Approaches
```
Authors: Ivan Khoma, Solomia Fedushko, Zoryana Kunch
Subjects: Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This article examines the use of manipulation in the coverage of events of the Ukrainian Revolution of Dignity in the mass media, namely in the content of the online newspaper Ukrainian Truth (Ukrainska pravda), online newspaper High Castle (Vysokyi Zamok), and online newspaper ZIK during the public protest, namely during the Ukrainian Revolution of Dignity. Contents of these online newspapers the historical, linguistic, and psychological approaches are used. Also media manipulations in the coverage of events of the Ukrainian Revolution of Dignity are studied. Internet resources that cover news are analyzed. Current and most popular Internet resources are identified. The content of online newspapers is analyzed and statistically processed. Internet content of newspapers by the level of significance of data (very significant data, significant data and insignificant data) is classified. The algorithm of detection of the media manipulations in the highlighting the course of the Ukrainian revolutions based on historical, linguistic, and psychological approaches is designed. Methods of counteracting information attacks in online newspapers are developed.
Title:
```
  AHMF: Adaptive Hybrid-Memory-Fusion Model for Driver Attention Prediction
```
Authors: Dongyang Xu, Qingfan Wang, Ji Ma, Xiangyun Zeng, Lei Chen
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Accurate driver attention prediction can serve as a critical reference for intelligent vehicles in understanding traffic scenes and making informed driving decisions. Though existing studies on driver attention prediction improved performance by incorporating advanced saliency detection techniques, they overlooked the opportunity to achieve human-inspired prediction by analyzing driving tasks from a cognitive science perspective. During driving, drivers' working memory and long-term memory play crucial roles in scene comprehension and experience retrieval, respectively. Together, they form situational awareness, facilitating drivers to quickly understand the current traffic situation and make optimal decisions based on past driving experiences. To explicitly integrate these two types of memory, this paper proposes an Adaptive Hybrid-Memory-Fusion (AHMF) driver attention prediction model to achieve more human-like predictions. Specifically, the model first encodes information about specific hazardous stimuli in the current scene to form working memories. Then, it adaptively retrieves similar situational experiences from the long-term memory for final prediction. Utilizing domain adaptation techniques, the model performs parallel training across multiple datasets, thereby enriching the accumulated driving experience within the long-term memory module. Compared to existing models, our model demonstrates significant improvements across various metrics on multiple public datasets, proving the effectiveness of integrating hybrid memories in driver attention prediction.
Title:
```
  Looking at Model Debiasing through the Lens of Anomaly Detection
```
Authors: Vito Paolo Pastore, Massimiliano Ciranni, Davide Marinelli, Francesca Odone, Vittorio Murino
Subjects: Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract It is widely recognized that deep neural networks are sensitive to bias in the data. This means that during training these models are likely to learn spurious correlations between data and labels, resulting in limited generalization abilities and low performance. In this context, model debiasing approaches can be devised aiming at reducing the model's dependency on such unwanted correlations, either leveraging the knowledge of bias information or not. In this work, we focus on the latter and more realistic scenario, showing the importance of accurately predicting the bias-conflicting and bias-aligned samples to obtain compelling performance in bias mitigation. On this ground, we propose to conceive the problem of model bias from an out-of-distribution perspective, introducing a new bias identification method based on anomaly detection. We claim that when data is mostly biased, bias-conflicting samples can be regarded as outliers with respect to the bias-aligned distribution in the feature space of a biased model, thus allowing for precisely detecting them with an anomaly detection method. Coupling the proposed bias identification approach with bias-conflicting data upsampling and augmentation in a two-step strategy, we reach state-of-the-art performance on synthetic and real benchmark datasets. Ultimately, our proposed approach shows that the data bias issue does not necessarily require complex debiasing methods, given that an accurate bias identification procedure is defined.
Keyword: face recognition

There is no result

Keyword: augmentation

Title:
```
  Topology Reorganized Graph Contrastive Learning with Mitigating Semantic Drift
```
Authors: Jiaqiang Zhang, Songcan Chen
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Graph contrastive learning (GCL) is an effective paradigm for node representation learning in graphs. The key components hidden behind GCL are data augmentation and positive-negative pair selection. Typical data augmentations in GCL, such as uniform deletion of edges, are generally blind and resort to local perturbation, which is prone to producing under-diversity views. Additionally, there is a risk of making the augmented data traverse to other classes. Moreover, most methods always treat all other samples as negatives. Such a negative pairing naturally results in sampling bias and likewise may make the learned representation suffer from semantic drift. Therefore, to increase the diversity of the contrastive view, we propose two simple and effective global topological augmentations to compensate current GCL. One is to mine the semantic correlation between nodes in the feature space. The other is to utilize the algebraic properties of the adjacency matrix to characterize the topology by eigen-decomposition. With the help of both, we can retain important edges to build a better view. To reduce the risk of semantic drift, a prototype-based negative pair selection is further designed which can filter false negative samples. Extensive experiments on various tasks demonstrate the advantages of the model compared to the state-of-the-art methods.
Title:
```
  Adapting Image-based RL Policies via Predicted Rewards
```
Authors: Weiyao Wang, Xinyuan Fang, Gregory D. Hager
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Image-based reinforcement learning (RL) faces significant challenges in generalization when the visual environment undergoes substantial changes between training and deployment. Under such circumstances, learned policies may not perform well leading to degraded results. Previous approaches to this problem have largely focused on broadening the training observation distribution, employing techniques like data augmentation and domain randomization. However, given the sequential nature of the RL decision-making problem, it is often the case that residual errors are propagated by the learned policy model and accumulate throughout the trajectory, resulting in highly degraded performance. In this paper, we leverage the observation that predicted rewards under domain shift, even though imperfect, can still be a useful signal to guide fine-tuning. We exploit this property to fine-tune a policy using reward prediction in the target domain. We have found that, even under significant domain shift, the predicted reward can still provide meaningful signal and fine-tuning substantially improves the original policy. Our approach, termed Predicted Reward Fine-tuning (PRFT), improves performance across diverse tasks in both simulated benchmarks and real-world experiments. More information is available at project web page: this https URL.
Title:
```
  ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering
```
Authors: Xiuying Chen, Tairan Wang, Taicheng Guo, Kehan Guo, Juexiao Zhou, Haoyang Li, Mingchen Zhuge, Jürgen Schmidhuber, Xin Gao, Xiangliang Zhang
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. While QA datasets are plentiful in areas like general domain and biomedicine, academic chemistry is less explored. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. We first address the issue of imbalanced label distribution by re-weighting the instance-wise loss based on the inverse frequency of each class, ensuring minority classes are not dominated by majority ones during optimization. Next, we utilize the unlabeled data to enrich the learning process, generating a variety of augmentations based on a SoftMix operation and ensuring their predictions align with the same target, i.e., pseudo-labels. To ensure the quality of the pseudo-labels, we propose a calibration procedure aimed at closely aligning the pseudo-label estimates of individual samples with a desired ground truth distribution. Experiments show that our QAMatch significantly outperforms the recent similar-scale baselines and Large Language Models (LLMs) not only on our ScholarChemQA dataset but also on four benchmark datasets. We hope our benchmark and model can facilitate and promote more research on chemical QA.
Title:
```
  RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
```
Authors: Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this report, we present RT-DETRv2, an improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR, and opens up a set of bag-of-freebies for flexibility and practicality, as well as optimizing the training strategy to achieve enhanced performance. To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales in the deformable attention to achieve selective multi-scale feature extraction by the decoder. To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator that is specific to RT-DETR compared to YOLOs. This removes the deployment constraints typically associated with DETRs. For the training strategy, we propose dynamic data augmentation and scale-adaptive hyperparameters customization to improve performance without loss of speed. Source code and pre-trained models will be available at this https URL.
Title:
```
  Domain Generalized Recaptured Screen Image Identification Using SWIN Transformer
```
Authors: Preeti Mehta, Aman Sagar, Suchi Kumari
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract An increasing number of classification approaches have been developed to address the issue of image rebroadcast and recapturing, a standard attack strategy in insurance frauds, face spoofing, and video piracy. However, most of them neglected scale variations and domain generalization scenarios, performing poorly in instances involving domain shifts, typically made worse by inter-domain and cross-domain scale variances. To overcome these issues, we propose a cascaded data augmentation and SWIN transformer domain generalization framework (DAST-DG) in the current research work Initially, we examine the disparity in dataset representation. A feature generator is trained to make authentic images from various domains indistinguishable. This process is then applied to recaptured images, creating a dual adversarial learning setup. Extensive experiments demonstrate that our approach is practical and surpasses state-of-the-art methods across different databases. Our model achieves an accuracy of approximately 82\% with a precision of 95\% on high-variance datasets.
Title:
```
  Deep Spherical Superpixels
```
Authors: Rémi Giraud, Michaël Clément
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Over the years, the use of superpixel segmentation has become very popular in various applications, serving as a preprocessing step to reduce data size by adapting to the content of the image, regardless of its semantic content. While the superpixel segmentation of standard planar images, captured with a 90° field of view, has been extensively studied, there has been limited focus on dedicated methods to omnidirectional or spherical images, captured with a 360° field of view. In this study, we introduce the first deep learning-based superpixel segmentation approach tailored for omnidirectional images called DSS (for Deep Spherical Superpixels). Our methodology leverages on spherical CNN architectures and the differentiable K-means clustering paradigm for superpixels, to generate superpixels that follow the spherical geometry. Additionally, we propose to use data augmentation techniques specifically designed for 360° images, enabling our model to efficiently learn from a limited set of annotated omnidirectional data. Our extensive validation across two datasets demonstrates that taking into account the inherent circular geometry of such images into our framework improves the segmentation performance over traditional and deep learning-based superpixel methods. Our code is available online.
Title:
```
  Looking at Model Debiasing through the Lens of Anomaly Detection
```
Authors: Vito Paolo Pastore, Massimiliano Ciranni, Davide Marinelli, Francesca Odone, Vittorio Murino
Subjects: Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract It is widely recognized that deep neural networks are sensitive to bias in the data. This means that during training these models are likely to learn spurious correlations between data and labels, resulting in limited generalization abilities and low performance. In this context, model debiasing approaches can be devised aiming at reducing the model's dependency on such unwanted correlations, either leveraging the knowledge of bias information or not. In this work, we focus on the latter and more realistic scenario, showing the importance of accurately predicting the bias-conflicting and bias-aligned samples to obtain compelling performance in bias mitigation. On this ground, we propose to conceive the problem of model bias from an out-of-distribution perspective, introducing a new bias identification method based on anomaly detection. We claim that when data is mostly biased, bias-conflicting samples can be regarded as outliers with respect to the bias-aligned distribution in the feature space of a biased model, thus allowing for precisely detecting them with an anomaly detection method. Coupling the proposed bias identification approach with bias-conflicting data upsampling and augmentation in a two-step strategy, we reach state-of-the-art performance on synthetic and real benchmark datasets. Ultimately, our proposed approach shows that the data bias issue does not necessarily require complex debiasing methods, given that an accurate bias identification procedure is defined.
Title:
```
  $VILA^2$: VILA Augmented VILA
```
Authors: Yunhao Fang, Ligeng Zhu, Yao Lu, Yan Wang, Pavlo Molchanov, Jang Hyun Cho, Marco Pavone, Song Han, Hongxu Yin
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). While model architectures and training infrastructures advance rapidly, data curation remains under-explored. When data quantity and quality become a bottleneck, existing work either directly crawls more raw data from the Internet that does not have a guarantee of data quality or distills from black-box commercial models (e.g., GPT-4V / Gemini) causing the performance upper bounded by that model. In this work, we introduce a novel approach that includes a self-augment step and a specialist-augment step to iteratively improve data quality and model performance. In the self-augment step, a VLM recaptions its own pretraining data to enhance data quality, and then retrains from scratch using this refined dataset to improve model performance. This process can iterate for several rounds. Once self-augmentation saturates, we employ several specialist VLMs finetuned from the self-augmented VLM with domain-specific expertise, to further infuse specialist knowledge into the generalist VLM through task-oriented recaptioning and retraining. With the combined self-augmented and specialist-augmented training, we introduce $VILA^2$ (VILA-augmented-VILA), a VLM family that consistently improves the accuracy on a wide range of tasks over prior art, and achieves new state-of-the-art results on MMMU leaderboard among open-sourced models.

LeeKyungwook / get-arxiv-noti

New submissions for Thursday, 25 July 2024 (showing 331 of 331 entries ) #1200

Keyword: detection

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Keyword: face recognition

Keyword: augmentation

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title: