Showing new listings for Wednesday, 2 October 2024

Keyword: detection

Title:

      iSurgARy: A mobile augmented reality solution for ventriculostomy in resource-limited settings

Authors: Zahra Asadi, Joshua Pardillo Castillo, Mehrdad Asadi, David S. Sinclair, Marta Kersten-Oertel
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Global disparities in neurosurgical care necessitate innovations addressing affordability and accuracy, particularly for critical procedures like ventriculostomy. This intervention, vital for managing life-threatening intracranial pressure increases, is associated with catheter misplacement rates exceeding 30% when using a freehand technique. Such misplacements hold severe consequences including haemorrhage, infection, prolonged hospital stays, and even morbidity and mortality. To address this issue, we present a novel, stand-alone mobile-based augmented reality system (iSurgARy) aimed at significantly improving ventriculostomy accuracy, particularly in resource-limited settings such as those in low- and middle-income countries. iSurgARy uses landmark based registration by taking advantage of Light Detection and Ranging (LiDaR) to allow for accurate surgical guidance. To evaluate iSurgARy, we conducted a two-phase user study. Initially, we assessed usability and learnability with novice participants using the System Usability Scale (SUS), incorporating their feedback to refine the application. In the second phase, we engaged human-computer interaction (HCI) and clinical domain experts to evaluate our application, measuring Root Mean Square Error (RMSE), System Usability Scale (SUS) and NASA Task Load Index (TLX) metrics to assess accuracy usability, and cognitive workload, respectively
Title:
```
  Transferable Unsupervised Outlier Detection Framework for Human Semantic Trajectories
```
Authors: Zheng Zhang, Hossein Amiri, Dazhou Yu, Yuntong Hu, Liang Zhao, Andreas Zufle
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Semantic trajectories, which enrich spatial-temporal data with textual information such as trip purposes or location activities, are key for identifying outlier behaviors critical to healthcare, social security, and urban planning. Traditional outlier detection relies on heuristic rules, which requires domain knowledge and limits its ability to identify unseen outliers. Besides, there lacks a comprehensive approach that can jointly consider multi-modal data across spatial, temporal, and textual dimensions. Addressing the need for a domain-agnostic model, we propose the Transferable Outlier Detection for Human Semantic Trajectories (TOD4Traj) framework.TOD4Traj first introduces a modality feature unification module to align diverse data feature representations, enabling the integration of multi-modal information and enhancing transferability across different datasets. A contrastive learning module is further pro-posed for identifying regular mobility patterns both temporally and across populations, allowing for a joint detection of outliers based on individual consistency and group majority patterns. Our experimental results have shown TOD4Traj's superior performance over existing models, demonstrating its effectiveness and adaptability in detecting human trajectory outliers across various datasets.
Title:
```
  Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation
```
Authors: Weiliang Qi (1), Jiahao Cao (2), Darsh Poddar (3), Sophia Li (4), Xinda Wang (5) ((1) University of Texas at Dallas, (2) Tsinghua University, (3) Lebanon Trail High School, (4) Lovejoy High School)
Subjects: Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract With the rapid development and widespread use of advanced network systems, software vulnerabilities pose a significant threat to secure communications and networking. Learning-based vulnerability detection systems, particularly those leveraging pre-trained language models, have demonstrated significant potential in promptly identifying vulnerabilities in communication networks and reducing the risk of exploitation. However, the shortage of accurately labeled vulnerability datasets hinders further progress in this field. Failing to represent real-world vulnerability data variety and preserve vulnerability semantics, existing augmentation approaches provide limited or even counterproductive contributions to model training. In this paper, we propose a data augmentation technique aimed at enhancing the performance of pre-trained language models for vulnerability detection. Given the vulnerability dataset, our method performs natural semantic-preserving program transformation to generate a large volume of new samples with enriched data diversity and variety. By incorporating our augmented dataset in fine-tuning a series of representative code pre-trained models (i.e., CodeBERT, GraphCodeBERT, UnixCoder, and PDBERT), up to 10.1% increase in accuracy and 23.6% increase in F1 can be achieved in the vulnerability detection task. Comparison results also show that our proposed method can substantially outperform other prominent vulnerability augmentation approaches.
Title:
```
  KPCA-CAM: Visual Explainability of Deep Computer Vision Models using Kernel PCA
```
Authors: Sachin Karmani, Thanushon Sivakaran, Gaurav Prasad, Mehmet Ali, Wenbo Yang, Sheyang Tang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Deep learning models often function as black boxes, providing no straightforward reasoning for their predictions. This is particularly true for computer vision models, which process tensors of pixel values to generate outcomes in tasks such as image classification and object detection. To elucidate the reasoning of these models, class activation maps (CAMs) are used to highlight salient regions that influence a model's output. This research introduces KPCA-CAM, a technique designed to enhance the interpretability of Convolutional Neural Networks (CNNs) through improved class activation maps. KPCA-CAM leverages Principal Component Analysis (PCA) with the kernel trick to capture nonlinear relationships within CNN activations more effectively. By mapping data into higher-dimensional spaces with kernel functions and extracting principal components from this transformed hyperplane, KPCA-CAM provides more accurate representations of the underlying data manifold. This enables a deeper understanding of the features influencing CNN decisions. Empirical evaluations on the ILSVRC dataset across different CNN models demonstrate that KPCA-CAM produces more precise activation maps, providing clearer insights into the model's reasoning compared to existing CAM algorithms. This research advances CAM techniques, equipping researchers and practitioners with a powerful tool to gain deeper insights into CNN decision-making processes and overall behaviors.
Title:
```
  Towards Precise Detection of Personal Information Leaks in Mobile Health Apps
```
Authors: Alireza Ardalani, Joseph Antonucci, Iulian Neamtiu
Subjects: Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Mobile apps are used in a variety of health settings, from apps that help providers, to apps designed for patients, to health and fitness apps designed for the general public. These apps ask the user for, and then collect and leak a wealth of Personal Information (PI). We analyze the PI that apps collect via their user interface, whether the app or third-party code is processing this information, and finally where the data is sent or stored. Prior work on leak detection in Android has focused on detecting leaks of (hardware) device-identifying information, or policy violations; however no work has looked at processing and leaking of PI in the context of health apps. The first challenge we tackle is extracting the semantic information contained in app UIs to discern the extent, and nature, of personal information. The second challenge we tackle is disambiguating between first-party, legitimate leaks (e.g,. the app storing data in its database) and third-party, problematic leaks, e.g., processing this information by, or sending it to, advertisers and analytics. We conducted a study on 1,243 Android apps: 623 medical apps and 621 health&fitness apps. We categorize PI into 16 types, grouped in 3 main categories: identity, medical, anthropometric. We found that the typical app has one first-party leak and five third-party leaks, though 221 apps had 20 or more leaks. Next, we show that third-party leaks (e.g., advertisers, analytics) are 5x more frequent than first-party leaks. Then, we show that 71% of leaks are to local storage (i.e., the phone, where data could be accessed by unauthorized apps) whereas 29% of leaks are to the network (e.g., Cloud). Finally, medical apps have 20% more PI leaks than health&fitness apps, due to collecting additional medical PI.
Title:
```
  Smart Contract Vulnerability Detection based on Static Analysis and Multi-Objective Search
```
Authors: Dongcheng Li, W. Eric Wong, Xiaodan Wang, Sean Pan, Liang-Seng Koh
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This paper introduces a method for detecting vulnerabilities in smart contracts using static analysis and a multi-objective optimization algorithm. We focus on four types of vulnerabilities: reentrancy, call stack overflow, integer overflow, and timestamp dependencies. Initially, smart contracts are compiled into an abstract syntax tree to analyze relationships between contracts and functions, including calls, inheritance, and data flow. These analyses are transformed into static evaluations and intermediate representations that reveal internal relations. Based on these representations, we examine contract's functions, variables, and data dependencies to detect the specified vulnerabilities. To enhance detection accuracy and coverage, we apply a multi-objective optimization algorithm to the static analysis process. This involves assigning initial numeric values to input data and monitoring changes in statement coverage and detection accuracy. Using coverage and accuracy as fitness values, we calculate Pareto front and crowding distance values to select the best individuals for the new parent population, iterating until optimization criteria are met. We validate our approach using an open-source dataset collected from Etherscan, containing 6,693 smart contracts. Experimental results show that our method outperforms state-of-the-art tools in terms of coverage, accuracy, efficiency, and effectiveness in detecting the targeted vulnerabilities.
Title:
```
  Performance Evaluation of Deep Learning-based Quadrotor UAV Detection and Tracking Methods
```
Authors: Mohssen E. Elshaar, Zeyad M. Manaa, Mohammed R. Elbalshy, Abdul Jabbar Siddiqui, Ayman M. Abdallah
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Unmanned Aerial Vehicles (UAVs) are becoming more popular in various sectors, offering many benefits, yet introducing significant challenges to privacy and safety. This paper investigates state-of-the-art solutions for detecting and tracking quadrotor UAVs to address these concerns. Cutting-edge deep learning models, specifically the YOLOv5 and YOLOv8 series, are evaluated for their performance in identifying UAVs accurately and quickly. Additionally, robust tracking systems, BoT-SORT and Byte Track, are integrated to ensure reliable monitoring even under challenging conditions. Our tests on the DUT dataset reveal that while YOLOv5 models generally outperform YOLOv8 in detection accuracy, the YOLOv8 models excel in recognizing less distinct objects, demonstrating their adaptability and advanced capabilities. Furthermore, BoT-SORT demonstrated superior performance over Byte Track, achieving higher IoU and lower center error in most cases, indicating more accurate and stable tracking. Code: this https URL Tracking demo: this https URL
Title:
```
  VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data
```
Authors: Xuefeng Du, Reshmi Ghosh, Robert Sim, Ahmed Salem, Vitor Carvalho, Emily Lawton, Yixuan Li, Jack W. Stokes
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Vision-language models (VLMs) are essential for contextual understanding of both visual and textual information. However, their vulnerability to adversarially manipulated inputs presents significant risks, leading to compromised outputs and raising concerns about the reliability in VLM-integrated applications. Detecting these malicious prompts is thus crucial for maintaining trust in VLM generations. A major challenge in developing a safeguarding prompt classifier is the lack of a large amount of labeled benign and malicious data. To address the issue, we introduce VLMGuard, a novel learning framework that leverages the unlabeled user prompts in the wild for malicious prompt detection. These unlabeled prompts, which naturally arise when VLMs are deployed in the open world, consist of both benign and malicious information. To harness the unlabeled data, we present an automated maliciousness estimation score for distinguishing between benign and malicious samples within this unlabeled mixture, thereby enabling the training of a binary prompt classifier on top. Notably, our framework does not require extra human annotations, offering strong flexibility and practicality for real-world applications. Extensive experiment shows VLMGuard achieves superior detection results, significantly outperforming state-of-the-art methods. Disclaimer: This paper may contain offensive examples; reader discretion is advised.
Title:
```
  Pre-Chirp-Domain Index Modulation for Full-Diversity Affine Frequency Division Multiplexing towards 6G
```
Authors: Guangyao Liu, Tianqi Mao, Zhenyu Xiao, Miaowen Wen
Subjects: Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Affine frequency division multiplexing (AFDM), tailored as a superior multicarrier technique utilizing chirp signals for high-mobility communications, is envisioned as a promising candidate for the sixth-generation (6G) wireless network. AFDM is based on the discrete affine Fourier transform (DAFT) with two adjustable parameters of the chirp signals, termed as the pre-chirp and post-chirp parameters, respectively. We show that the pre-chirp counterpart can be flexibly manipulated for additional degree-of-freedom (DoF). Therefore, this paper proposes a novel AFDM scheme with the pre-chirp index modulation (PIM) philosophy (AFDM-PIM), which can implicitly convey extra information bits through dynamic pre-chirp parameter assignment, thus enhancing both spectral and energy efficiency. Specifically, we first demonstrate that the subcarrier orthogonality is still maintained by applying distinct pre-chirp parameters to various subcarriers in the AFDM modulation process. Inspired by this property, each AFDM subcarrier is constituted with a unique pre-chirp signal according to the incoming bits. By such arrangement, extra binary bits can be embedded into the index patterns of pre-chirp parameter assignment without additional energy consumption. For performance analysis, we derive the asymptotically tight upper bounds on the average bit error rates (BERs) of the proposed schemes with maximum-likelihood (ML) detection, and validate that the proposed AFDM-PIM can achieve the optimal diversity order under doubly dispersive channels. Based on the derivations, we further propose an optimal pre-chirp alphabet design to enhance the BER performance via intelligent optimization algorithms. Simulations demonstrate that the proposed AFDM-PIM outperforms the classical benchmarks under doubly dispersive channel.
Title:
```
  PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection
```
Authors: Qihang Zhou, Jiangtao Yan, Shibo He, Wenchao Meng, Jiming Chen
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Zero-shot (ZS) 3D anomaly detection is a crucial yet unexplored field that addresses scenarios where target 3D training samples are unavailable due to practical concerns like privacy protection. This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects. PointAD provides a unified framework to comprehend 3D anomalies from both points and pixels. In this framework, PointAD renders 3D anomalies into multiple 2D renderings and projects them back into 3D space. To capture the generic anomaly semantics into PointAD, we propose hybrid representation learning that optimizes the learnable text prompts from 3D and 2D through auxiliary point clouds. The collaboration optimization between point and pixel representations jointly facilitates our model to grasp underlying 3D anomaly patterns, contributing to detecting and segmenting anomalies of unseen diverse 3D objects. Through the alignment of 3D and 2D space, our model can directly integrate RGB information, further enhancing the understanding of 3D anomalies in a plug-and-play manner. Extensive experiments show the superiority of PointAD in ZS 3D anomaly detection across diverse unseen objects.
Title:
```
  PclGPT: A Large Language Model for Patronizing and Condescending Language Detection
```
Authors: Hongbo Wang, Mingda Li, Junyu Lu, Hebin Xia, Liang Yang, Bo Xu, Ruizhu Liu, Hongfei Lin
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Disclaimer: Samples in this paper may be harmful and cause discomfort! Patronizing and condescending language (PCL) is a form of speech directed at vulnerable groups. As an essential branch of toxic language, this type of language exacerbates conflicts and confrontations among Internet communities and detrimentally impacts disadvantaged groups. Traditional pre-trained language models (PLMs) perform poorly in detecting PCL due to its implicit toxicity traits like hypocrisy and false sympathy. With the rise of large language models (LLMs), we can harness their rich emotional semantics to establish a paradigm for exploring implicit toxicity. In this paper, we introduce PclGPT, a comprehensive LLM benchmark designed specifically for PCL. We collect, annotate, and integrate the Pcl-PT/SFT dataset, and then develop a bilingual PclGPT-EN/CN model group through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate implicit toxic detection. Group detection results and fine-grained detection from PclGPT and other models reveal significant variations in the degree of bias in PCL towards different vulnerable groups, necessitating increased societal attention to protect them.
Title:
```
  Descriptor: Face Detection Dataset for Programmable Threshold-Based Sparse-Vision
```
Authors: Riadul Islam, Sri Ranga Sai Krishna Tummala, Joey Mulé, Rohith Kankipati, Suraj Jalapally, Dhandeep Challagundla, Chad Howard, Ryan Robucci
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Smart focal-plane and in-chip image processing has emerged as a crucial technology for vision-enabled embedded systems with energy efficiency and privacy. However, the lack of special datasets providing examples of the data that these neuromorphic sensors compute to convey visual information has hindered the adoption of these promising technologies. Neuromorphic imager variants, including event-based sensors, produce various representations such as streams of pixel addresses representing time and locations of intensity changes in the focal plane, temporal-difference data, data sifted/thresholded by temporal differences, image data after applying spatial transformations, optical flow data, and/or statistical representations. To address the critical barrier to entry, we provide an annotated, temporal-threshold-based vision dataset specifically designed for face detection tasks derived from the same videos used for Aff-Wild2. By offering multiple threshold levels (e.g., 4, 8, 12, and 16), this dataset allows for comprehensive evaluation and optimization of state-of-the-art neural architectures under varying conditions and settings compared to traditional methods. The accompanying tool flow for generating event data from raw videos further enhances accessibility and usability. We anticipate that this resource will significantly support the development of robust vision systems based on smart sensors that can process based on temporal-difference thresholds, enabling more accurate and efficient object detection and localization and ultimately promoting the broader adoption of low-power, neuromorphic imaging technologies. To support further research, we publicly released the dataset at \url{this https URL}.
Title:
```
  AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
```
Authors: Jiafei Duan, Wilbert Pumacay, Nishanth Kumar, Yi Ru Wang, Shulin Tian, Wentao Yuan, Ranjay Krishna, Dieter Fox, Ajay Mandlekar, Yijie Guo
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Robotic manipulation in open-world settings requires not only task execution but also the ability to detect and learn from failures. While recent advances in vision-language models (VLMs) and large language models (LLMs) have improved robots' spatial reasoning and problem-solving abilities, they still struggle with failure recognition, limiting their real-world applicability. We introduce AHA, an open-source VLM designed to detect and reason about failures in robotic manipulation using natural language. By framing failure detection as a free-form reasoning task, AHA identifies failures and provides detailed, adaptable explanations across different robots, tasks, and environments. We fine-tuned AHA using FailGen, a scalable framework that generates the first large-scale dataset of robotic failure trajectories, the AHA dataset. FailGen achieves this by procedurally perturbing successful demonstrations from simulation. Despite being trained solely on the AHA dataset, AHA generalizes effectively to real-world failure datasets, robotic systems, and unseen tasks. It surpasses the second-best model (GPT-4o in-context learning) by 10.3% and exceeds the average performance of six compared models including five state-of-the-art VLMs by 35.3% across multiple metrics and datasets. We integrate AHA into three manipulation frameworks that utilize LLMs/VLMs for reinforcement learning, task and motion planning, and zero-shot trajectory generation. AHA's failure feedback enhances these policies' performances by refining dense reward functions, optimizing task planning, and improving sub-task verification, boosting task success rates by an average of 21.4% across all three tasks compared to GPT-4 models.
Title:
```
  ECORS: An Ensembled Clustering Approach to Eradicate The Local And Global Outlier In Collaborative Filtering Recommender System
```
Authors: Mahamudul Hasan
Subjects: Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recommender systems are designed to suggest items based on user preferences, helping users navigate the vast amount of information available on the internet. Given the overwhelming content, outlier detection has emerged as a key research area in recommender systems. It involves identifying unusual or suspicious patterns in user behavior. However, existing studies in this field face several challenges, including the limited universality of algorithms, difficulties in selecting users, and a lack of optimization. In this paper, we propose an approach that addresses these challenges by employing various clustering algorithms. Specifically, we utilize a user-user matrix-based clustering technique to detect outliers. By constructing a user-user matrix, we can identify suspicious users in the system. Both local and global outliers are detected to ensure comprehensive analysis. Our experimental results demonstrate that this approach significantly improves the accuracy of outlier detection in recommender systems.
Title:
```
  TPN: Transferable Proto-Learning Network towards Few-shot Document-Level Relation Extraction
```
Authors: Yu Zhang, Zhao Kang
Subjects: Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Few-shot document-level relation extraction suffers from poor performance due to the challenging cross-domain transferability of NOTA (none-of-the-above) relation representation. In this paper, we introduce a Transferable Proto-Learning Network (TPN) to address the challenging issue. It comprises three core components: Hybrid Encoder hierarchically encodes semantic content of input text combined with attention information to enhance the relation representations. As a plug-and-play module for Out-of-Domain (OOD) Detection, Transferable Proto-Learner computes NOTA prototype through an adaptive learnable block, effectively mitigating NOTA bias across various domains. Dynamic Weighting Calibrator detects relation-specific classification confidence, serving as dynamic weights to calibrate the NOTA-dominant loss function. Finally, to bolster the model's cross-domain performance, we complement it with virtual adversarial training (VAT). We conduct extensive experimental analyses on FREDo and ReFREDo, demonstrating the superiority of TPN. Compared to state-of-the-art methods, our approach achieves competitive performance with approximately half the parameter size. Data and code are available at this https URL.
Title:
```
  A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
```
Authors: Niki Maria Foteinopoulou, Enjie Ghorbel, Djamila Aouada
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like face forgery detection, where viewers often struggle to distinguish between real and fabricated content. Vision and Large Language Models (VLLM) bridge computer vision and natural language, offering numerous applications driven by strong common-sense reasoning. Despite their success in various tasks, the potential of vision and language remains underexplored in face forgery detection, where they hold promise for enhancing explainability by leveraging the intrinsic reasoning capabilities of language to analyse fine-grained manipulation areas. As such, there is a need for a methodology that converts face forgery detection to a Visual Question Answering (VQA) task to systematically and fairly evaluate these capabilities. Previous efforts for unified benchmarks in deepfake detection have focused on the simpler binary task, overlooking evaluation protocols for fine-grained detection and text-generative models. We propose a multi-staged approach that diverges from the traditional binary decision paradigm to address this gap. In the first stage, we assess the models' performance on the binary task and their sensitivity to given instructions using several prompts. In the second stage, we delve deeper into fine-grained detection by identifying areas of manipulation in a multiple-choice VQA setting. In the third stage, we convert the fine-grained detection to an open-ended question and compare several matching strategies for the multi-label classification task. Finally, we qualitatively evaluate the fine-grained responses of the VLLMs included in the benchmark. We apply our benchmark to several popular models, providing a detailed comparison of binary, multiple-choice, and open-ended VQA evaluation across seven datasets. \url{this https URL}
Title:
```
  Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Utilizing Deep Learning and YOLO Integration
```
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This research focuses on the development of a drone equipped with pruning tools and a stereo vision camera to accurately detect and measure the spatial positions of tree branches. YOLO is employed for branch segmentation, while two depth estimation approaches, monocular and stereo, are investigated. In comparison to SGBM, deep learning techniques produce more refined and accurate depth maps. In the absence of ground-truth data, a fine-tuning process using deep neural networks is applied to approximate optimal depth values. This methodology facilitates precise branch detection and distance measurement, addressing critical challenges in the automation of pruning operations. The results demonstrate notable advancements in both accuracy and efficiency, underscoring the potential of deep learning to drive innovation and enhance automation in the agricultural sector.
Title:
```
  Design and Identification of Keypoint Patches in Unstructured Environments
```
Authors: Taewook Park, Seunghwan Kim, Hyondong Oh
Subjects: Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Reliable perception of targets is crucial for the stable operation of autonomous robots. A widely preferred method is keypoint identification in an image, as it allows direct mapping from raw images to 2D coordinates, facilitating integration with other algorithms like localization and path planning. In this study, we closely examine the design and identification of keypoint patches in cluttered environments, where factors such as blur and shadows can hinder detection. We propose four simple yet distinct designs that consider various scale, rotation and camera projection using a limited number of pixels. Additionally, we customize the Superpoint network to ensure robust detection under various types of image degradation. The effectiveness of our approach is demonstrated through real-world video tests, highlighting potential for vision-based autonomous systems.
Title:
```
  Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection
```
Authors: Pengxi Zeng, Alberto Presta, Jonah Reinis, Dinesh Bharadia, Hang Qiu, Pamela Cosman
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Efficient point cloud (PC) compression is crucial for streaming applications, such as augmented reality and cooperative perception. Classic PC compression techniques encode all the points in a frame. Tailoring compression towards perception tasks at the receiver side, we ask the question, "Can we remove the ground points during transmission without sacrificing the detection performance?" Our study reveals a strong dependency on the ground from state-of-the-art (SOTA) 3D object detection models, especially on those points below and around the object. In this work, we propose a lightweight obstacle-aware Pillar-based Ground Removal (PGR) algorithm. PGR filters out ground points that do not provide context to object recognition, significantly improving compression ratio without sacrificing the receiver side perception performance. Not using heavy object detection or semantic segmentation models, PGR is light-weight, highly parallelizable, and effective. Our evaluations on KITTI and Waymo Open Dataset show that SOTA detection models work equally well with PGR removing 20-30% of the points, with a speeding of 86 FPS.
Title:
```
  Detecci\'on Autom\'atica de Patolog\'ias en Notas Cl\'inicas en Espa\~nol Combinando Modelos de Lenguaje y Ontolog\'ias M\'edicos
```
Authors: Léon-Paul Schaub Torre, Pelayo Quirós, Helena García Mieres
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper we present a hybrid method for the automatic detection of dermatological pathologies in medical reports. We use a large language model combined with medical ontologies to predict, given a first appointment or follow-up medical report, the pathology a person may suffer from. The results show that teaching the model to learn the type, severity and location on the body of a dermatological pathology as well as in which order it has to learn these three features significantly increases its accuracy. The article presents the demonstration of state-of-the-art results for classification of medical texts with a precision of 0.84, micro and macro F1-score of 0.82 and 0.75, and makes both the method and the dataset used available to the community. -- En este artículo presentamos un método híbrido para la detección automática de patologías dermatológicas en informes médicos. Usamos un modelo de lenguaje amplio en español combinado con ontologías médicas para predecir, dado un informe médico de primera cita o de seguimiento, la patología del paciente. Los resultados muestran que el tipo, la gravedad y el sitio en el cuerpo de una patología dermatológica, así como en qué orden tiene un modelo que aprender esas tres características, aumentan su precisión. El artículo presenta la demostración de resultados comparables al estado del arte de clasificación de textos médicos con una precisión de 0.84, micro y macro F1-score de 0.82 y 0.75, y deja a disposición de la comunidad tanto el método como el conjunto de datos utilizado.
Title:
```
  Cross-Camera Data Association via GNN for Supervised Graph Clustering
```
Authors: Đorđe Nedeljković
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Cross-camera data association is one of the cornerstones of the multi-camera computer vision field. Although often integrated into detection and tracking tasks through architecture design and loss definition, it is also recognized as an independent challenge. The ultimate goal is to connect appearances of one item from all cameras, wherever it is visible. Therefore, one possible perspective on this task involves supervised clustering of the affinity graph, where nodes are instances captured by all cameras. They are represented by appropriate visual features and positional attributes. We leverage the advantages of GNN (Graph Neural Network) architecture to examine nodes' relations and generate representative edge embeddings. These embeddings are then classified to determine the existence or non-existence of connections in node pairs. Therefore, the core of this approach is graph connectivity prediction. Experimental validation was conducted on multicamera pedestrian datasets across diverse environments such as the laboratory, basketball court, and terrace. Our proposed method, named SGC-CCA, outperformed the state-of-the-art method named GNN-CCA across all clustering metrics, offering an end-to-end clustering solution without the need for graph post-processing. The code is available at this https URL.
Title:
```
  A Survey on Testing and Analysis of Quantum Software
```
Authors: Matteo Paltenghi, Michael Pradel
Subjects: Subjects: Software Engineering (cs.SE); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Quantum computing is getting increasing interest from both academia and industry, and the quantum software landscape has been growing rapidly. The quantum software stack comprises quantum programs, implementing algorithms, and platforms like IBM Qiskit, Google Cirq, and Microsoft Q#, enabling their development. To ensure the reliability and performance of quantum software, various techniques for testing and analyzing it have been proposed, such as test generation, bug pattern detection, and circuit optimization. However, the large amount of work and the fact that work on quantum software is performed by several research communities, make it difficult to get a comprehensive overview of the existing techniques. In this work, we provide an extensive survey of the state of the art in testing and analysis of quantum software. We discuss literature from several research communities, including quantum computing, software engineering, programming languages, and formal methods. Our survey covers a wide range of topics, including expected and unexpected behavior of quantum programs, testing techniques, program analysis approaches, optimizations, and benchmarks for testing and analyzing quantum software. We create novel connections between the discussed topics and present them in an accessible way. Finally, we discuss key challenges and open problems to inspire future research.
Title:
```
  Analysis of Cross-Domain Message Passing for OTFS Transmissions
```
Authors: Ruoxi Chong, Shuangyang Li, Zhiqiang Wei, Michail Matthaiou, Derrick Wing Kwan Ng, Giuseppe Caire
Subjects: Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper, we investigate the performance of the cross-domain iterative detection (CDID) framework with orthogonal time frequency space (OTFS) modulation, where two distinct CDID algorithms are presented. The proposed schemes estimate/detect the information symbols iteratively across the frequency domain and the delay-Doppler (DD) domain via passing either the a posteriori or extrinsic information. Building upon this framework, we investigate the error performance by considering the bias evolution and state evolution. Furthermore, we discuss their error performance in convergence and the DD domain error state lower bounds in each iteration. Specifically, we demonstrate that in convergence, the ultimate error performance of the CDID passing the a posteriori information can be characterized by two potential convergence points. In contrast, the ultimate error performance of the CDID passing the extrinsic information has only one convergence point, which, interestingly, aligns with the matched filter bound. Our numerical results confirm our analytical findings and unveil the promising error performance achieved by the proposed designs.
Title:
```
  RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations
```
Authors: Kaichen Zhou, Yang Cao, Teawhan Kim, Hao Zhao, Hao Dong, Kai Ming Ting, Ye Zhu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent advancements in industrial anomaly detection have been hindered by the lack of realistic datasets that accurately represent real-world conditions. Existing algorithms are often developed and evaluated using idealized datasets, which deviate significantly from real-life scenarios characterized by environmental noise and data corruption such as fluctuating lighting conditions, variable object poses, and unstable camera positions. To address this gap, we introduce the Realistic Anomaly Detection (RAD) dataset, the first multi-view RGB-based anomaly detection dataset specifically collected using a real robot arm, providing unique and realistic data scenarios. RAD comprises 4765 images across 13 categories and 4 defect types, collected from more than 50 viewpoints, providing a comprehensive and realistic benchmark. This multi-viewpoint setup mirrors real-world conditions where anomalies may not be detectable from every perspective. Moreover, by sampling varying numbers of views, the algorithm's performance can be comprehensively evaluated across different viewpoints. This approach enhances the thoroughness of performance assessment and helps improve the algorithm's robustness. Besides, to support 3D multi-view reconstruction algorithms, we propose a data augmentation method to improve the accuracy of pose estimation and facilitate the reconstruction of 3D point clouds. We systematically evaluate state-of-the-art RGB-based and point cloud-based models using RAD, identifying limitations and future research directions. The code and dataset could found at this https URL
Title:
```
  Discriminative community detection for multiplex networks
```
Authors: Meiby Ortiz-Bouza, Selin Aviyente
Subjects: Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Multiplex networks have emerged as a promising approach for modeling complex systems, where each layer represents a different mode of interaction among entities of the same type. A core task in analyzing these networks is to identify the community structure for a better understanding of the overall functioning of the network. While different methods have been proposed to detect the community structure of multiplex networks, the majority deal with extracting the consensus community structure across layers. In this paper, we address the community detection problem across two closely related multiplex networks. For example in neuroimaging studies, it is common to have multiple multiplex brain networks where each layer corresponds to an individual and each group to different experimental conditions. In this setting, one may be interested in both learning the community structure representing each experimental condition and the discriminative community structure between two groups. In this paper, we introduce two discriminative community detection algorithms based on spectral clustering. The first approach aims to identify the discriminative subgraph structure between the groups, while the second one learns the discriminative and the consensus community structures, simultaneously. The proposed approaches are evaluated on both simulated and real world multiplex networks.
Title:
```
  Show Me What's Wrong: Combining Charts and Text to Guide Data Analysis
```
Authors: Beatriz Feliciano, Rita Costa, Jean Alves, Javier Liébana, Diogo Duarte, Pedro Bizarro
Subjects: Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Analyzing and finding anomalies in multi-dimensional datasets is a cumbersome but vital task across different domains. In the context of financial fraud detection, analysts must quickly identify suspicious activity among transactional data. This is an iterative process made of complex exploratory tasks such as recognizing patterns, grouping, and comparing. To mitigate the information overload inherent to these steps, we present a tool combining automated information highlights, Large Language Model generated textual insights, and visual analytics, facilitating exploration at different levels of detail. We perform a segmentation of the data per analysis area and visually represent each one, making use of automated visual cues to signal which require more attention. Upon user selection of an area, our system provides textual and graphical summaries. The text, acting as a link between the high-level and detailed views of the chosen segment, allows for a quick understanding of relevant details. A thorough exploration of the data comprising the selection can be done through graphical representations. The feedback gathered in a study performed with seven domain experts suggests our tool effectively supports and guides exploratory analysis, easing the identification of suspicious information.
Title:
```
  TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
```
Authors: Kush Jain, Gabriel Synnaeve, Baptiste Rozière
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Code generation models can help improve many common software tasks ranging from code completion to defect prediction. Most of the existing benchmarks for code generation LLMs focus on code authoring or code completion. Surprisingly, there has been far less effort dedicated to benchmarking software testing, despite the strong correlation between well-tested software and effective bug detection. To address this gap, we create and release TestGenEval, a large-scale benchmark to measure test generation performance. Based on SWEBench, TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories. It covers initial tests authoring, test suite completion, and code coverage improvements. Test authoring simulates the process of a developer writing a test suite from scratch, while test completion mimics the scenario where a developer aims to improve the coverage of an existing test suite. We evaluate several popular models, with sizes ranging from 7B to 405B parameters. Our detailed analysis highlights TestGenEval's contribution to a comprehensive evaluation of test generation performance. In particular, models struggle to generate high-coverage test suites, with the best model, GPT-4o, achieving an average coverage of only 35.2%. This is primarily due to models struggling to reason about execution, and their frequent assertion errors when addressing complex code paths.
Title:
```
  Enhancing Web Spam Detection through a Blockchain-Enabled Crowdsourcing Mechanism
```
Authors: Noah Kader, Inwon Kang, Oshani Seneviratne
Subjects: Subjects: Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The proliferation of spam on the Web has necessitated the development of machine learning models to automate their detection. However, the dynamic nature of spam and the sophisticated evasion techniques employed by spammers often lead to low accuracy in these models. Traditional machine-learning approaches struggle to keep pace with spammers' constantly evolving tactics, resulting in a persistent challenge to maintain high detection rates. To address this, we propose blockchain-enabled incentivized crowdsourcing as a novel solution to enhance spam detection systems. We create an incentive mechanism for data collection and labeling by leveraging blockchain's decentralized and transparent framework. Contributors are rewarded for accurate labels and penalized for inaccuracies, ensuring high-quality data. A smart contract governs the submission and evaluation process, with participants staking cryptocurrency as collateral to guarantee integrity. Simulations show that incentivized crowdsourcing improves data quality, leading to more effective machine-learning models for spam detection. This approach offers a scalable and adaptable solution to the challenges of traditional methods.
Title:
```
  Review of blockchain application with Graph Neural Networks, Graph Convolutional Networks and Convolutional Neural Networks
```
Authors: Amy Ancelotti, Claudia Liason
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This paper reviews the applications of Graph Neural Networks (GNNs), Graph Convolutional Networks (GCNs), and Convolutional Neural Networks (CNNs) in blockchain technology. As the complexity and adoption of blockchain networks continue to grow, traditional analytical methods are proving inadequate in capturing the intricate relationships and dynamic behaviors of decentralized systems. To address these limitations, deep learning models such as GNNs, GCNs, and CNNs offer robust solutions by leveraging the unique graph-based and temporal structures inherent in blockchain architectures. GNNs and GCNs, in particular, excel in modeling the relational data of blockchain nodes and transactions, making them ideal for applications such as fraud detection, transaction verification, and smart contract analysis. Meanwhile, CNNs can be adapted to analyze blockchain data when represented as structured matrices, revealing hidden temporal and spatial patterns in transaction flows. This paper explores how these models enhance the efficiency, security, and scalability of both linear blockchains and Directed Acyclic Graph (DAG)-based systems, providing a comprehensive overview of their strengths and future research directions. By integrating advanced neural network techniques, we aim to demonstrate the potential of these models in revolutionizing blockchain analytics, paving the way for more sophisticated decentralized applications and improved network performance.
Title:
```
  OSSA: Unsupervised One-Shot Style Adaptation
```
Authors: Robin Gerster, Holger Caesar, Matthias Rapp, Alexander Wolpert, Michael Teutsch
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Despite their success in various vision tasks, deep neural network architectures often underperform in out-of-distribution scenarios due to the difference between training and target domain style. To address this limitation, we introduce One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Specifically, OSSA generates diverse target styles by perturbing the style statistics derived from a single target image and then applies these styles to a labeled source dataset at the feature level using Adaptive Instance Normalization (AdaIN). Extensive experiments show that OSSA establishes a new state-of-the-art among one-shot domain adaptation methods by a significant margin, and in some cases, even outperforms strong baselines that use thousands of unlabeled target images. By applying OSSA in various scenarios, including weather, simulated-to-real (sim2real), and visual-to-thermal adaptations, our study explores the overarching significance of the style gap in these contexts. OSSA's simplicity and efficiency allow easy integration into existing frameworks, providing a potentially viable solution for practical applications with limited data availability. Code is available at this https URL
Keyword: face recognition

There is no result

Keyword: augmentation

Title:
```
  Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation
```
Authors: Weiliang Qi (1), Jiahao Cao (2), Darsh Poddar (3), Sophia Li (4), Xinda Wang (5) ((1) University of Texas at Dallas, (2) Tsinghua University, (3) Lebanon Trail High School, (4) Lovejoy High School)
Subjects: Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract With the rapid development and widespread use of advanced network systems, software vulnerabilities pose a significant threat to secure communications and networking. Learning-based vulnerability detection systems, particularly those leveraging pre-trained language models, have demonstrated significant potential in promptly identifying vulnerabilities in communication networks and reducing the risk of exploitation. However, the shortage of accurately labeled vulnerability datasets hinders further progress in this field. Failing to represent real-world vulnerability data variety and preserve vulnerability semantics, existing augmentation approaches provide limited or even counterproductive contributions to model training. In this paper, we propose a data augmentation technique aimed at enhancing the performance of pre-trained language models for vulnerability detection. Given the vulnerability dataset, our method performs natural semantic-preserving program transformation to generate a large volume of new samples with enriched data diversity and variety. By incorporating our augmented dataset in fine-tuning a series of representative code pre-trained models (i.e., CodeBERT, GraphCodeBERT, UnixCoder, and PDBERT), up to 10.1% increase in accuracy and 23.6% increase in F1 can be achieved in the vulnerability detection task. Comparison results also show that our proposed method can substantially outperform other prominent vulnerability augmentation approaches.
Title:
```
  Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
```
Authors: Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Surgical video-language pretraining (VLP) faces unique challenges due to the knowledge domain gap and the scarcity of multi-modal data. This study aims to bridge the gap by addressing issues regarding textual information loss in surgical lecture videos and the spatial-temporal challenges of surgical VLP. We propose a hierarchical knowledge augmentation approach and a novel Procedure-Encoded Surgical Knowledge-Augmented Video-Language Pretraining (PeskaVLP) framework to tackle these issues. The knowledge augmentation uses large language models (LLM) for refining and enriching surgical concepts, thus providing comprehensive language supervision and reducing the risk of overfitting. PeskaVLP combines language supervision with visual self-supervision, constructing hard negative samples and employing a Dynamic Time Warping (DTW) based loss function to effectively comprehend the cross-modal procedural alignment. Extensive experiments on multiple public surgical scene understanding and cross-modal retrieval datasets show that our proposed method significantly improves zero-shot transferring performance and offers a generalist visual representation for further advancements in surgical scene understanding.
Title:
```
  SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs
```
Authors: Leheng Li, Weichao Qiu, Yingjie Cai, Xu Yan, Qing Lian, Bingbing Liu, Ying-Cong Chen
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The advancement of autonomous driving is increasingly reliant on high-quality annotated datasets, especially in the task of 3D occupancy prediction, where the occupancy labels require dense 3D annotation with significant human effort. In this paper, we propose SyntheOcc, which denotes a diffusion model that Synthesize photorealistic and geometric-controlled images by conditioning Occupancy labels in driving scenarios. This yields an unlimited amount of diverse, annotated, and controllable datasets for applications like training perception models and simulation. SyntheOcc addresses the critical challenge of how to efficiently encode 3D geometric information as conditional input to a 2D diffusion model. Our approach innovatively incorporates 3D semantic multi-plane images (MPIs) to provide comprehensive and spatially aligned 3D scene descriptions for conditioning. As a result, SyntheOcc can generate photorealistic multi-view images and videos that faithfully align with the given geometric labels (semantics in 3D voxel space). Extensive qualitative and quantitative evaluations of SyntheOcc on the nuScenes dataset prove its effectiveness in generating controllable occupancy datasets that serve as an effective data augmentation to perception models.
Title:
```
  Data Augmentation for 3DMM-based Arousal-Valence Prediction for HRI
```
Authors: Christian Arzate Cruz, Yotam Sechayk, Takeo Igarashi, Randy Gomez
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Humans use multiple communication channels to interact with each other. For instance, body gestures or facial expressions are commonly used to convey an intent. The use of such non-verbal cues has motivated the development of prediction models. One such approach is predicting arousal and valence (AV) from facial expressions. However, making these models accurate for human-robot interaction (HRI) settings is challenging as it requires handling multiple subjects, challenging conditions, and a wide range of facial expressions. In this paper, we propose a data augmentation (DA) technique to improve the performance of AV predictors using 3D morphable models (3DMM). We then utilize this approach in an HRI setting with a mediator robot and a group of three humans. Our augmentation method creates synthetic sequences for underrepresented values in the AV space of the SEWA dataset, which is the most comprehensive dataset with continuous AV labels. Results show that using our DA method improves the accuracy and robustness of AV prediction in real-time applications. The accuracy of our models on the SEWA dataset is 0.793 for arousal and valence.
Title:
```
  Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing
```
Authors: Deokhyung Kang, Seonjeong Hwang, Yunsu Kim, Gary Geunbae Lee
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent efforts have aimed to utilize multilingual pretrained language models (mPLMs) to extend semantic parsing (SP) across multiple languages without requiring extensive annotations. However, achieving zero-shot cross-lingual transfer for SP remains challenging, leading to a performance gap between source and target languages. In this study, we propose Cross-Lingual Back-Parsing (CBP), a novel data augmentation methodology designed to enhance cross-lingual transfer for SP. Leveraging the representation geometry of the mPLMs, CBP synthesizes target language utterances from source meaning representations. Our methodology effectively performs cross-lingual data augmentation in challenging zero-resource settings, by utilizing only labeled data in the source language and monolingual corpora. Extensive experiments on two cross-language SP benchmarks (Mschema2QA and Xspider) demonstrate that CBP brings substantial gains in the target language. Further analysis of the synthesized utterances shows that our method successfully generates target language utterances with high slot value alignment rates while preserving semantic integrity. Our codes and data are publicly available at this https URL.
Title:
```
  RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations
```
Authors: Kaichen Zhou, Yang Cao, Teawhan Kim, Hao Zhao, Hao Dong, Kai Ming Ting, Ye Zhu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent advancements in industrial anomaly detection have been hindered by the lack of realistic datasets that accurately represent real-world conditions. Existing algorithms are often developed and evaluated using idealized datasets, which deviate significantly from real-life scenarios characterized by environmental noise and data corruption such as fluctuating lighting conditions, variable object poses, and unstable camera positions. To address this gap, we introduce the Realistic Anomaly Detection (RAD) dataset, the first multi-view RGB-based anomaly detection dataset specifically collected using a real robot arm, providing unique and realistic data scenarios. RAD comprises 4765 images across 13 categories and 4 defect types, collected from more than 50 viewpoints, providing a comprehensive and realistic benchmark. This multi-viewpoint setup mirrors real-world conditions where anomalies may not be detectable from every perspective. Moreover, by sampling varying numbers of views, the algorithm's performance can be comprehensively evaluated across different viewpoints. This approach enhances the thoroughness of performance assessment and helps improve the algorithm's robustness. Besides, to support 3D multi-view reconstruction algorithms, we propose a data augmentation method to improve the accuracy of pose estimation and facilitate the reconstruction of 3D point clouds. We systematically evaluate state-of-the-art RGB-based and point cloud-based models using RAD, identifying limitations and future research directions. The code and dataset could found at this https URL
Title:
```
  Pseudo-Non-Linear Data Augmentation via Energy Minimization
```
Authors: Pingbang Hu, Mahito Sugiyama
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We propose a novel and interpretable data augmentation method based on energy-based modeling and principles from information geometry. Unlike black-box generative models, which rely on deep neural networks, our approach replaces these non-interpretable transformations with explicit, theoretically grounded ones, ensuring interpretability and strong guarantees such as energy minimization. Central to our method is the introduction of the backward projection algorithm, which reverses dimension reduction to generate new data. Empirical results demonstrate that our method achieves competitive performance with black-box generative models while offering greater transparency and interpretability.
Title:
```
  Targeted synthetic data generation for tabular data via hardness characterization
```
Authors: Tommaso Ferracci, Leonie Tabea Goldmann, Anton Hinel, Francesco Sanna Passino
Subjects: Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Synthetic data generation has been proven successful in improving model performance and robustness in the context of scarce or low-quality data. Using the data valuation framework to statistically identify beneficial and detrimental observations, we introduce a novel augmentation pipeline that generates only high-value training points based on hardness characterization. We first demonstrate via benchmarks on real data that Shapley-based data valuation methods perform comparably with learning-based methods in hardness characterisation tasks, while offering significant theoretical and computational advantages. Then, we show that synthetic data generators trained on the hardest points outperform non-targeted data augmentation on simulated data and on a large scale credit default prediction task. In particular, our approach improves the quality of out-of-sample predictions and it is computationally more efficient compared to non-targeted methods.

LeeKyungwook / get-arxiv-noti

Showing new listings for Wednesday, 2 October 2024 #1298

Keyword: detection

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Keyword: face recognition

Keyword: augmentation

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title: