New submissions for Thursday, 13 June 2024 (showing 438 of 438 entries )

Keyword: detection

Title:

      Individual Packet Features are a Risk to Model Generalisation in ML-Based Intrusion Detection

Authors: Kahraman Kostas, Mike Just, Michael A. Lones
Subjects: Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Machine learning is increasingly used for intrusion detection in IoT networks. This paper explores the effectiveness of using individual packet features (IPF), which are attributes extracted from a single network packet, such as timing, size, and source-destination information. Through literature review and experiments, we identify the limitations of IPF, showing they can produce misleadingly high detection rates. Our findings emphasize the need for approaches that consider packet interactions for robust intrusion detection. Additionally, we demonstrate that models based on IPF often fail to generalize across datasets, compromising their reliability in diverse IoT environments.
Title:
```
  VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models
```
Authors: Yu Liu, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen
Subjects: Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at this https URL.
Title:
```
  FP-Inconsistent: Detecting Evasive Bots using Browser Fingerprint Inconsistencies
```
Authors: Hari Venugopalan, Shaoor Munir, Shuaib Ahmed, Tangbaihe Wang, Samuel T. King, Zubair Shafiq
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract As browser fingerprinting is increasingly being used for bot detection, bots have started altering their fingerprints for evasion. We conduct the first large-scale evaluation of evasive bots to investigate whether and how altering fingerprints helps bots evade detection. To systematically investigate evasive bots, we deploy a honey site incorporating two anti-bot services (DataDome and BotD) and solicit bot traffic from 20 different bot services that purport to sell "realistic and undetectable traffic". Across half a million requests from 20 different bot services on our honey site, we find an average evasion rate of 52.93% against DataDome and 44.56% evasion rate against BotD. Our comparison of fingerprint attributes from bot services that evade each anti-bot service individually as well as bot services that evade both shows that bot services indeed alter different browser fingerprint attributes for evasion. Further, our analysis reveals the presence of inconsistent fingerprint attributes in evasive bots. Given evasive bots seem to have difficulty in ensuring consistency in their fingerprint attributes, we propose a data-driven approach to discover rules to detect such inconsistencies across space (two attributes in a given browser fingerprint) and time (a single attribute at two different points in time). These rules, which can be readily deployed by anti-bot services, reduce the evasion rate of evasive bots against DataDome and BotD by 48.11% and 44.95% respectively.
Title:
```
  Automated Pavement Cracks Detection and Classification Using Deep Learning
```
Authors: Selvia Nafaa, Hafsa Essam, Karim Ashour, Doaa Emad, Rana Mohamed, Mohammed Elhenawy, Huthaifa I. Ashqar, Abdallah A. Hassan, Taqwa I. Alhadidi
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Monitoring asset conditions is a crucial factor in building efficient transportation asset management. Because of substantial advances in image processing, traditional manual classification has been largely replaced by semi-automatic/automatic techniques. As a result, automated asset detection and classification techniques are required. This paper proposes a methodology to detect and classify roadway pavement cracks using the well-known You Only Look Once (YOLO) version five (YOLOv5) and version 8 (YOLOv8) algorithms. Experimental results indicated that the precision of pavement crack detection reaches up to 67.3% under different illumination conditions and image sizes. The findings of this study can assist highway agencies in accurately detecting and classifying asset conditions under different illumination conditions. This will reduce the cost and time that are associated with manual inspection, which can greatly reduce the cost of highway asset maintenance.
Title:
```
  Active Sidestick Control Integration for Enhanced Aircraft Flight Envelope Protection
```
Authors: Çağrı Ege Altunkaya, Fatih Erol, Akın Çatak, Volkan Mert, Pierluigi Capone, Şükrü Akif Ertürk, Emre Koyuncu
Subjects: Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The design of Envelope and Pilot-Induced Oscillation (PIO) Protection Features, and Failure Cases detection and prevention using Active Control Sidestick (ACS) is a challenging task. While helping the pilot to respect the envelope limitations also in failure scenarios and, therefore, increasing mission effectiveness, these features may have a significant impact on the aircraft's agility. ACS characteristics are investigated in an integrated environment. A set of effective and flexible control laws based on Incremental Nonlinear Dynamic Inversion have been developed in a state-of-the-art aircraft simulation model and coupled with a two-ways communication with the selected ACS. The model can run in real-time in a fixed-based simulator composed of representative cockpit and out-of-the-window.
Title:
```
  A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection
```
Authors: Can Akbas, Irem Su Arin, Sinan Onal
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent advancements in quality control across various industries have increasingly utilized the integration of video cameras and image processing for effective defect detection. A critical barrier to progress is the scarcity of comprehensive datasets featuring annotated defects, which are essential for developing and refining automated defect detection models. This systematic review, spanning from 2015 to 2023, identifies 15 publicly available datasets and critically examines them to assess their effectiveness and applicability for benchmarking and model development. Our findings reveal a diverse landscape of datasets, such as NEU-CLS, NEU-DET, DAGM, KolektorSDD, PCB Defect Dataset, and the Hollow Cylindrical Defect Detection Dataset, each with unique strengths and limitations in terms of image quality, defect type representation, and real-world applicability. The goal of this systematic review is to consolidate these datasets in a single location, providing researchers who seek such publicly available resources with a comprehensive reference.
Title:
```
  A Deep Learning Approach to Detect Complete Safety Equipment For Construction Workers Based On YOLOv7
```
Authors: Md. Shariful Islam, SM Shaqib, Shahriar Sultan Ramit, Shahrun Akter Khushbu, Mr. Abdus Sattar, Dr. Sheak Rashed Haider Noor
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In the construction sector, ensuring worker safety is of the utmost significance. In this study, a deep learning-based technique is presented for identifying safety gear worn by construction workers, such as helmets, goggles, jackets, gloves, and footwears. The recommended approach uses the YOLO v7 (You Only Look Once) object detection algorithm to precisely locate these safety items. The dataset utilized in this work consists of labeled images split into training, testing and validation sets. Each image has bounding box labels that indicate where the safety equipment is located within the image. The model is trained to identify and categorize the safety equipment based on the labeled dataset through an iterative training approach. We used custom dataset to train this model. Our trained model performed admirably well, with good precision, recall, and F1-score for safety equipment recognition. Also, the model's evaluation produced encouraging results, with a mAP@0.5 score of 87.7\%. The model performs effectively, making it possible to quickly identify safety equipment violations on building sites. A thorough evaluation of the outcomes reveals the model's advantages and points up potential areas for development. By offering an automatic and trustworthy method for safety equipment detection, this research makes a contribution to the fields of computer vision and workplace safety. The proposed deep learning-based approach will increase safety compliance and reduce the risk of accidents in the construction industry
Title:
```
  Vehicle Speed Detection System Utilizing YOLOv8: Enhancing Road Safety and Traffic Management for Metropolitan Areas
```
Authors: SM Shaqib, Alaya Parvin Alo, Shahriar Sultan Ramit, Afraz Ul Haque Rupak, Sadman Sadik Khan, Mr. Md. Sadekur Rahman
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in 2023 that 7,902 individuals lost their lives in traffic accidents during the course of the year. Efficient vehicle speed detection is essential to maintaining traffic safety. Reliable speed detection can also help gather important traffic data, which makes it easier to optimize traffic flow and provide safer road infrastructure. The YOLOv8 model can recognize and track cars in videos with greater speed and accuracy when trained under close supervision. By providing insights into the application of supervised learning in object identification for vehicle speed estimation and concentrating on the particular traffic conditions and safety concerns in Bangladesh, this work represents a noteworthy contribution to the area. The MAE was 3.5 and RMSE was 4.22 between the predicted speed of our model and the actual speed or the ground truth measured by the speedometer Promising increased efficiency and wider applicability in a variety of traffic conditions, the suggested solution offers a financially viable substitute for conventional approaches.
Title:
```
  Unleashing the Power of Transfer Learning Model for Sophisticated Insect Detection: Revolutionizing Insect Classification
```
Authors: Md. Mahmudul Hasan, SM Shaqib, Ms. Sharmin Akter, Rabiul Alam, Afraz Ul Haque, Shahrun akter khushbu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The purpose of the Insect Detection System for Crop and Plant Health is to keep an eye out for and identify insect infestations in farming areas. By utilizing cutting-edge technology like computer vision and machine learning, the system seeks to identify hazardous insects early and accurately. This would enable prompt response to save crops and maintain optimal plant health. The Method of this study includes Data Acquisition, Preprocessing, Data splitting, Model Implementation and Model evaluation. Different models like MobileNetV2, ResNet152V2, Xecption, Custom CNN was used in this study. In order to categorize insect photos, a Convolutional Neural Network (CNN) based on the ResNet152V2 architecture is constructed and evaluated in this work. Achieving 99% training accuracy and 97% testing accuracy, ResNet152V2 demonstrates superior performance among four implemented models. The results highlight its potential for real-world applications in insect classification and entomology studies, emphasizing efficiency and accuracy. To ensure food security and sustain agricultural output globally, finding insects is crucial. Cutting-edge technology, such as ResNet152V2 models, greatly influence automating and improving the accuracy of insect identification. Efficient insect detection not only minimizes crop losses but also enhances agricultural productivity, contributing to sustainable food production. This underscores the pivotal role of technology in addressing challenges related to global food security.
Title:
```
  Visibility-Aware RRT* for Safety-Critical Navigation of Perception-Limited Robots in Unknown Environments
```
Authors: Taekyung Kim, Dimitra Panagou
Subjects: Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Safe autonomous navigation in unknown environments remains a critical challenge for robots with limited sensing capabilities. While safety-critical control techniques, such as Control Barrier Functions (CBFs), have been proposed to ensure safety, their effectiveness relies on the assumption that the robot has complete knowledge of its surroundings. In reality, robots often operate with restricted field-of-view and finite sensing range, which can lead to collisions with unknown obstacles if the planning algorithm is agnostic to these limitations. To address this issue, we introduce the visibility-aware RRT* algorithm that combines sampling-based planning with CBFs to generate safe and efficient global reference paths in partially unknown environments. The algorithm incorporates a collision avoidance CBF and a novel visibility CBF, which guarantees that the robot remains within locally collision-free regions, enabling timely detection and avoidance of unknown obstacles. We conduct extensive experiments interfacing the path planners with two different safety-critical controllers, wherein our method outperforms all other compared baselines across both safety and efficiency aspects.
Title:
```
  REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy
```
Authors: Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung
Subjects: Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. For example, a higher p threshold in the nucleus (top-p) sampling increases the diversity but decreases the factuality, and vice versa. In this paper, we propose REAL (Residual Entropy from Asymptotic Line) sampling, a decoding method that achieves improved factuality and diversity over nucleus sampling by predicting an adaptive threshold of $p$. Specifically, REAL sampling predicts the step-wise likelihood of an LLM to hallucinate, and lowers the p threshold when an LLM is likely to hallucinate. Otherwise, REAL sampling increases the p threshold to boost the diversity. To predict the step-wise hallucination likelihood without supervision, we construct a Token-level Hallucination Forecasting (THF) model to predict the asymptotic entropy (i.e., inherent uncertainty) of the next token by extrapolating the next-token entropies from a series of LLMs with different sizes. If a LLM's entropy is higher than the asymptotic entropy (i.e., the LLM is more uncertain than it should be), the THF model predicts a high hallucination hazard, which leads to a lower p threshold in REAL sampling. In the FactualityPrompts benchmark, we demonstrate that REAL sampling based on a 70M THF model can substantially improve the factuality and diversity of 7B LLMs simultaneously, judged by both retrieval-based metrics and human evaluation. After combined with contrastive decoding, REAL sampling outperforms 9 sampling methods, and generates texts that are more factual than the greedy sampling and more diverse than the nucleus sampling with $p=0.5$. Furthermore, the predicted asymptotic entropy is also a useful unsupervised signal for hallucination detection tasks.
Title:
```
  The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition
```
Authors: Shahin Amiriparian, Lukas Christ, Alexander Kathan, Maurice Gerczuk, Niklas Müller, Steffen Klug, Lukas Stappen, Andreas König, Erik Cambria, Björn Schuller, Simone Eulitz
Subjects: Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset, focusing on the detection of spontaneous humor in a cross-lingual and cross-cultural setting. The main objective of MuSe 2024 is to unite a broad audience from various research domains, including multimodal sentiment analysis, audio-visual affective computing, continuous signal processing, and natural language processing. By fostering collaboration and exchange among experts in these fields, the MuSe 2024 endeavors to advance the understanding and application of sentiment analysis and affective computing across multiple modalities. This baseline paper provides details on each sub-challenge and its corresponding dataset, extracted features from each data modality, and discusses challenge baselines. For our baseline system, we make use of a range of Transformers and expert-designed features and train Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) models on them, resulting in a competitive baseline system. On the unseen test datasets of the respective sub-challenges, it achieves a mean Pearson's Correlation Coefficient ($\rho$) of 0.3573 for MuSe-Perception and an Area Under the Curve (AUC) value of 0.8682 for MuSe-Humor.
Title:
```
  Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing
```
Authors: Sina Tayebati, Theja Tulabandhula, Amit R. Trivedi
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this work, we propose a disruptively frugal LiDAR perception dataflow that generates rather than senses parts of the environment that are either predictable based on the extensive training of the environment or have limited consequence to the overall prediction accuracy. Therefore, the proposed methodology trades off sensing energy with training data for low-power robotics and autonomous navigation to operate frugally with sensors, extending their lifetime on a single battery charge. Our proposed generative pre-training strategy for this purpose, called as radially masked autoencoding (R-MAE), can also be readily implemented in a typical LiDAR system by selectively activating and controlling the laser power for randomly generated angular regions during on-field operations. Our extensive evaluations show that pre-training with R-MAE enables focusing on the radial segments of the data, thereby capturing spatial relationships and distances between objects more effectively than conventional procedures. Therefore, the proposed methodology not only reduces sensing energy but also improves prediction accuracy. For example, our extensive evaluations on Waymo, nuScenes, and KITTI datasets show that the approach achieves over a 5% average precision improvement in detection tasks across datasets and over a 4% accuracy improvement in transferring domains from Waymo and nuScenes to KITTI. In 3D object detection, it enhances small object detection by up to 4.37% in AP at moderate difficulty levels in the KITTI dataset. Even with 90% radial masking, it surpasses baseline models by up to 5.59% in mAP/mAPH across all object classes in the Waymo dataset. Additionally, our method achieves up to 3.17% and 2.31% improvements in mAP and NDS, respectively, on the nuScenes dataset, demonstrating its effectiveness with both single and fused LiDAR-camera modalities. this https URL.
Title:
```
  Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
```
Authors: Elaheh Baharlouei, Mahsa Shafaei, Yigeng Zhang, Hugo Jair Escalante, Thamar Solorio
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We address the challenge of detecting questionable content in online media, specifically the subcategory of comic mischief. This type of content combines elements such as violence, adult content, or sarcasm with humor, making it difficult to detect. Employing a multimodal approach is vital to capture the subtle details inherent in comic mischief content. To tackle this problem, we propose a novel end-to-end multimodal system for the task of comic mischief detection. As part of this contribution, we release a novel dataset for the targeted task consisting of three modalities: video, text (video captions and subtitles), and audio. We also design a HIerarchical Cross-attention model with CAPtions (HICCAP) to capture the intricate relationships among these modalities. The results show that the proposed approach makes a significant improvement over robust baselines and state-of-the-art models for comic mischief detection and its type classification. This emphasizes the potential of our system to empower users, to make informed decisions about the online content they choose to see. In addition, we conduct experiments on the UCF101, HMDB51, and XD-Violence datasets, comparing our model against other state-of-the-art approaches showcasing the outstanding performance of our proposed model in various scenarios.
Title:
```
  Zero-Shot Fake Video Detection by Audio-Visual Consistency
```
Authors: Xiaolou Li, Zehua Liu, Chen Chen, Lantian Li, Li Guo, Dong Wang
Subjects: Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent studies have advocated the detection of fake videos as a one-class detection task, predicated on the hypothesis that the consistency between audio and visual modalities of genuine data is more significant than that of fake data. This methodology, which solely relies on genuine audio-visual data while negating the need for forged counterparts, is thus delineated as a `zero-shot' detection paradigm. This paper introduces a novel zero-shot detection approach anchored in content consistency across audio and video. By employing pre-trained ASR and VSR models, we recognize the audio and video content sequences, respectively. Then, the edit distance between the two sequences is computed to assess whether the claimed video is genuine. Experimental results indicate that, compared to two mainstream approaches based on semantic consistency and temporal consistency, our approach achieves superior generalizability across various deepfake techniques and demonstrates strong robustness against audio-visual perturbations. Finally, state-of-the-art performance gains can be achieved by simply integrating the decision scores of these three systems.
Title:
```
  A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects
```
Authors: Jun Bai, Di Wu, Tristan Shelley, Peter Schubel, David Twine, John Russell, Xuesen Zeng, Ji Zhang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Material defects (MD) represent a primary challenge affecting product performance and giving rise to safety issues in related products. The rapid and accurate identification and localization of MD constitute crucial research endeavours in addressing contemporary challenges associated with MD. Although conventional non-destructive testing methods such as ultrasonic and X-ray approaches have mitigated issues related to low efficiency in manual inspections, they struggle to meet the diverse requirements of high precision, real-time speed, automation, and intelligence. In recent years, propelled by the swift advancement of machine learning (ML) technologies, particularly exemplified by deep learning, ML has swiftly emerged as the core technology and a prominent research direction for material defect detection (MDD). Through a comprehensive review of the latest literature, we systematically survey the ML techniques applied in MDD into five categories: unsupervised learning, supervised learning, semi-supervised learning, reinforcement learning, and generative learning. We provide a detailed analysis of the main principles and techniques used, together with the advantages and potential challenges associated with these techniques. Furthermore, the survey focuses on the techniques for defect detection in composite materials, which are important types of materials enjoying increasingly wide application in various industries such as aerospace, automotive, construction, and renewable energy. Finally, the survey explores potential future directions in MDD utilizing ML technologies. This comprehensive survey not only consolidates existing literature on ML-based MDD technologies but also serves as a foundational reference for future researchers and industrial practitioners, providing valuable insights and guidance in developing advanced and efficient MDD systems.
Title:
```
  Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection
```
Authors: Jaehoon Kim, Seungwan Jin, Sohyun Park, Someen Park, Kyungsik Han
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Detecting implicit hate speech that is not directly hateful remains a challenge. Recent research has attempted to detect implicit hate speech by applying contrastive learning to pre-trained language models such as BERT and RoBERTa, but the proposed models still do not have a significant advantage over cross-entropy loss-based learning. We found that contrastive learning based on randomly sampled batch data does not encourage the model to learn hard negative samples. In this work, we propose Label-aware Hard Negative sampling strategies (LAHN) that encourage the model to learn detailed features from hard negative samples, instead of naive negative samples in random batch, using momentum-integrated contrastive learning. LAHN outperforms the existing models for implicit hate speech detection both in- and cross-datasets. The code is available at this https URL
Title:
```
  Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation
```
Authors: Jie Ruan, Wenqing Wang, Xiaojun Wan
Subjects: Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Human evaluation serves as the gold standard for assessing the quality of Natural Language Generation (NLG) systems. Nevertheless, the evaluation guideline, as a pivotal element ensuring reliable and reproducible human assessment, has received limited attention.Our investigation revealed that only 29.84% of recent papers involving human evaluation at top conferences release their evaluation guidelines, with vulnerabilities identified in 77.09% of these guidelines. Unreliable evaluation guidelines can yield inaccurate assessment outcomes, potentially impeding the advancement of NLG in the right direction. To address these challenges, we take an initial step towards reliable evaluation guidelines and propose the first human evaluation guideline dataset by collecting annotations of guidelines extracted from existing papers as well as generated via Large Language Models (LLMs). We then introduce a taxonomy of eight vulnerabilities and formulate a principle for composing evaluation guidelines. Furthermore, a method for detecting guideline vulnerabilities has been explored using LLMs, and we offer a set of recommendations to enhance reliability in human evaluation. The annotated human evaluation guideline dataset and code for the vulnerability detection method are publicly available online.
Title:
```
  DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis
```
Authors: Meiziniu Li, Dongze Li, Jianmeng Liu, Jialun Cao, Yongqiang Tian, Shing-Chi Cheung
Subjects: Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. However, these techniques are limited in finding implementations that offer the same functionality and generating diverse test inputs for differential testing. This paper introduces DLLens, a novel differential testing technique for DL library testing. Our insight is that APIs in different DL libraries are commonly designed to accomplish various computations for the same set of published DL algorithms. Although the mapping of these APIs is not often one-to-one, we observe that their computations can be mutually simulated after proper composition and adaptation. The use of these simulation counterparts facilitates differential testing for the detection of functional DL library bugs. Leveraging the insight, we propose DLLens as a novel mechanism that utilizes a large language model (LLM) to synthesize valid counterparts of DL library APIs. To generate diverse test inputs, DLLens incorporates a static analysis method aided by LLM to extract path constraints from all execution paths in each API and its counterpart's implementations. These path constraints are then used to guide the generation of diverse test inputs. We evaluate DLLens on two popular DL libraries, TensorFlow and PyTorch. Our evaluation shows that DLLens can synthesize counterparts for more than twice as many APIs found by state-of-the-art techniques on these libraries. Moreover, DLLens can extract 26.7% more constraints and detect 2.5 times as many bugs as state-of-the-art techniques. DLLens has successfully found 56 bugs in recent TensorFlow and PyTorch libraries. Among them, 41 are previously unknown, 39 of which have been confirmed by developers after reporting, and 19 of those confirmed bugs have been fixed by developers.
Title:
```
  Political Leaning Inference through Plurinational Scenarios
```
Authors: Joseba Fernandez de Landa, Rodrigo Agerri
Subjects: Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Social media users express their political preferences via interaction with other users, by spontaneous declarations or by participation in communities within the network. This makes a social network such as Twitter a valuable data source to study computational science approaches to political learning inference. In this work we focus on three diverse regions in Spain (Basque Country, Catalonia and Galicia) to explore various methods for multi-party categorization, required to analyze evolving and complex political landscapes, and compare it with binary left-right approaches. We use a two-step method involving unsupervised user representations obtained from the retweets and their subsequent use for political leaning detection. Comprehensive experimentation on a newly collected and curated dataset comprising labeled users and their interactions demonstrate the effectiveness of using Relational Embeddings as representation method for political ideology detection in both binary and multi-party frameworks, even with limited training data. Finally, data visualization illustrates the ability of the Relational Embeddings to capture intricate intra-group and inter-group political affinities.
Title:
```
  Multivariate Log-based Anomaly Detection for Distributed Database
```
Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li, Yong Yang, Zhonghai Wu
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Distributed databases are fundamental infrastructures of today's large-scale software systems such as cloud systems. Detecting anomalies in distributed databases is essential for maintaining software availability. Existing approaches, predominantly developed using Loghub-a comprehensive collection of log datasets from various systems-lack datasets specifically tailored to distributed databases, which exhibit unique anomalies. Additionally, there's a notable absence of datasets encompassing multi-anomaly, multi-node logs. Consequently, models built upon these datasets, primarily designed for standalone systems, are inadequate for distributed databases, and the prevalent method of deeming an entire cluster anomalous based on irregularities in a single node leads to a high false-positive rate. This paper addresses the unique anomalies and multivariate nature of logs in distributed databases. We expose the first open-sourced, comprehensive dataset with multivariate logs from distributed databases. Utilizing this dataset, we conduct an extensive study to identify multiple database anomalies and to assess the effectiveness of state-of-the-art anomaly detection using multivariate log data. Our findings reveal that relying solely on logs from a single node is insufficient for accurate anomaly detection on distributed database. Leveraging these insights, we propose MultiLog, an innovative multivariate log-based anomaly detection approach tailored for distributed databases. Our experiments, based on this novel dataset, demonstrate MultiLog's superiority, outperforming existing state-of-the-art methods by approximately 12%.
Title:
```
  Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review
```
Authors: Anjali Raj, Adway Mitra, Manjira Sinha
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The major Sustainable Development Goals (SDG) 2030, set by the United Nations Development Program (UNDP), include sustainable cities and communities, no poverty, and reduced inequalities. However, millions of people live in slums or informal settlements with poor living conditions in many major cities around the world, especially in less developed countries. To emancipate these settlements and their inhabitants through government intervention, accurate data about slum location and extent is required. While ground survey data is the most reliable, such surveys are costly and time-consuming. An alternative is remotely sensed data obtained from very high-resolution (VHR) imagery. With the advancement of new technology, remote sensing based mapping of slums has emerged as a prominent research area. The parallel rise of Artificial Intelligence, especially Deep Learning has added a new dimension to this field as it allows automated analysis of satellite imagery to identify complex spatial patterns associated with slums. This article offers a detailed review and meta-analysis of research on slum mapping using remote sensing imagery from 2014 to 2024, with a special focus on deep learning approaches. Our analysis reveals a trend towards increasingly complex neural network architectures, with advancements in data preprocessing and model training techniques significantly enhancing slum identification accuracy. We have attempted to identify key methodologies that are effective across diverse geographic contexts. While acknowledging the transformative impact Convolutional Neural Networks (CNNs) in slum detection, our review underscores the absence of a universally optimal model, suggesting the need for context-specific adaptations. We also identify prevailing challenges in this field, such as data limitations and a lack of model explainability and suggest potential strategies for overcoming these.
Title:
```
  Efficient Network Traffic Feature Sets for IoT Intrusion Detection
```
Authors: Miguel Silva, João Vitorino, Eva Maia, Isabel Praça
Subjects: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The use of Machine Learning (ML) models in cybersecurity solutions requires high-quality data that is stripped of redundant, missing, and noisy information. By selecting the most relevant features, data integrity and model efficiency can be significantly improved. This work evaluates the feature sets provided by a combination of different feature selection methods, namely Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, in multiple IoT network datasets. The influence of the smaller feature sets on both the classification performance and the training time of ML models is compared, with the aim of increasing the computational efficiency of IoT intrusion detection. Overall, the most impactful features of each dataset were identified, and the ML models obtained higher computational efficiency while preserving a good generalization, showing little to no difference between the sets.
Title:
```
  A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR
```
Authors: Sasidhar Alavala, Anil Kumar Vadde, Aparnamala Kancheti, Subrahmanyam Gorthi
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper, we present our approach to the Auto WCEBleedGen Challenge V2 2024. Our solution combines the Swin Transformer for the initial classification of bleeding frames and RT-DETR for further detection of bleeding in Wireless Capsule Endoscopy (WCE), enhanced by a series of image preprocessing steps. These steps include converting images to Lab colour space, applying Contrast Limited Adaptive Histogram Equalization (CLAHE) for better contrast, and using Gaussian blur to suppress artefacts. The Swin Transformer utilizes a tiered architecture with shifted windows to efficiently manage self-attention calculations, focusing on local windows while enabling cross-window interactions. RT-DETR features an efficient hybrid encoder for fast processing of multi-scale features and an uncertainty-minimal query selection for enhanced accuracy. The class activation maps by Ablation-CAM are plausible to the model's decisions. On the validation set, this approach achieves a classification accuracy of 98.5% (best among the other state-of-the-art models) compared to 91.7% without any pre-processing and an $\text{AP}_{50}$ of 66.7% compared to 65.0% with state-of-the-art YOLOv8. On the test set, this approach achieves a classification accuracy and F1 score of 87.0% and 89.0% respectively.
Title:
```
  FakeSound: Deepfake General Audio Detection
```
Authors: Zeyu Xie, Baihan Li, Xuenan Xu, Zheng Liang, Kai Yu, Mengyue Wu
Subjects: Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract With the advancement of audio generation, generative models can produce highly realistic audios. However, the proliferation of deepfake general audio can pose negative consequences. Therefore, we propose a new task, deepfake general audio detection, which aims to identify whether audio content is manipulated and to locate deepfake regions. Leveraging an automated manipulation pipeline, a dataset named FakeSound for deepfake general audio detection is proposed, and samples can be viewed on website this https URL. The average binary accuracy of humans on all test sets is consistently below 0.6, which indicates the difficulty humans face in discerning deepfake audio and affirms the efficacy of the FakeSound dataset. A deepfake detection model utilizing a general audio pre-trained model is proposed as a benchmark system. Experimental results demonstrate that the performance of the proposed model surpasses the state-of-the-art in deepfake speech detection and human testers.
Title:
```
  MWIRSTD: A MWIR Small Target Detection Dataset
```
Authors: Nikhil Kumar, Avinash Upadhyay, Shreya Sharma, Manoj Sharma, Pravendra Singh
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract This paper presents a novel mid-wave infrared (MWIR) small target detection dataset (MWIRSTD) comprising 14 video sequences containing approximately 1053 images with annotated targets of three distinct classes of small objects. Captured using cooled MWIR imagers, the dataset offers a unique opportunity for researchers to develop and evaluate state-of-the-art methods for small object detection in realistic MWIR scenes. Unlike existing datasets, which primarily consist of uncooled thermal images or synthetic data with targets superimposed onto the background or vice versa, MWIRSTD provides authentic MWIR data with diverse targets and environments. Extensive experiments on various traditional methods and deep learning-based techniques for small target detection are performed on the proposed dataset, providing valuable insights into their efficacy. The dataset and code are available at this https URL.
Title:
```
  A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder
```
Authors: Lixian Zhang, Yi Zhao, Runmin Dong, Jinxiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Litong Feng, Haohuan Fu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Vast amounts of remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information which is essential for addressing global-scale challenges such as land use monitoring, disaster prevention, and environmental change mitigation. Despite various pre-training methods tailored to the characteristics of RS data, a key limitation persists: the inability to effectively integrate spatial, temporal, and spectral information within a single unified model. To unlock the potential of RS data, we construct a Spatial-Temporal-Spectral Structured Dataset (STSSD) characterized by the incorporation of multiple RS sources, diverse coverage, unified locations within image sets, and heterogeneity within images. Building upon this structured dataset, we propose an Anchor-Aware Masked AutoEncoder method (A$^{2}$-MAE), leveraging intrinsic complementary information from the different kinds of images and geo-information to reconstruct the masked patches during the pre-training phase. A$^{2}$-MAE integrates an anchor-aware masking strategy and a geographic encoding module to comprehensively exploit the properties of RS images. Specifically, the proposed anchor-aware masking strategy dynamically adapts the masking process based on the meta-information of a pre-selected anchor image, thereby facilitating the training on images captured by diverse types of RS sources within one model. Furthermore, we propose a geographic encoding method to leverage accurate spatial patterns, enhancing the model generalization capabilities for downstream applications that are generally location-related. Extensive experiments demonstrate our method achieves comprehensive improvements across various downstream tasks compared with existing RS pre-training methods, including image classification, semantic segmentation, and change detection tasks.
Title:
```
  AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection
```
Authors: Pia Pachinger, Janis Goldzycher, Anna Maria Planitzer, Wojciech Kusa, Allan Hanbury, Julia Neidhardt
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Model interpretability in toxicity detection greatly profits from token-level annotations. However, currently such annotations are only available in English. We introduce a dataset annotated for offensive language detection sourced from a news forum, notable for its incorporation of the Austrian German dialect, comprising 4,562 user comments. In addition to binary offensiveness classification, we identify spans within each comment constituting vulgar language or representing targets of offensive statements. We evaluate fine-tuned language models as well as large language models in a zero- and few-shot fashion. The results indicate that while fine-tuned models excel in detecting linguistic peculiarities such as vulgar dialect, large language models demonstrate superior performance in detecting offensiveness in AustroTox. We publish the data and code.
Title:
```
  Characterizing and Detecting Propaganda-Spreading Accounts on Telegram
```
Authors: Klim Kireev, Yevhen Mykhno, Carmela Troncoso, Rebekah Overdorf
Subjects: Subjects: Social and Information Networks (cs.SI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Information-based attacks on social media, such as disinformation campaigns and propaganda, are emerging cybersecurity threats. The security community has focused on countering these threats on social media platforms like X and Reddit. However, they also appear in instant-messaging social media platforms such as WhatsApp, Telegram, and Signal. In these platforms information-based attacks primarily happen in groups and channels, requiring manual moderation efforts by channel administrators. We collect, label, and analyze a large dataset of more than 17 million Telegram comments and messages. Our analysis uncovers two independent, coordinated networks that spread pro-Russian and pro-Ukrainian propaganda, garnering replies from real users. We propose a novel mechanism for detecting propaganda that capitalizes on the relationship between legitimate user messages and propaganda replies and is tailored to the information that Telegram makes available to moderators. Our method is faster, cheaper, and has a detection rate (97.6%) 11.6 percentage points higher than human moderators after seeing only one message from an account. It remains effective despite evolving propaganda.
Title:
```
  Scalable Defect Detection via Traversal on Code Graph
```
Authors: Zhengyao Liu, Xitong Zhong, Xingjing Deng, Shuo Hong, Xiang Gao, Hailong Sun
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of code structure and semantics. Despite the progress, existing graph-based analysis tools still face performance and scalability issues. The main bottleneck lies in the size and complexity of CPG, which makes analyzing large codebases inefficient and memory-consuming. Also, query rules used by the current tools can be over-specific. Hence, we introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities. It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency. Based on the CPG, it also offers a declarative query language to simplify the queries. Furthermore, it takes a step forward to integrate machine learning to enhance the generality of vulnerability detection. For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL.
Title:
```
  Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
```
Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi
Subjects: Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to-end generation process, skipping the final step of vocoder processing. This poses a significant challenge for current audio deepfake detection (ADD) models based on vocoder artifacts. To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform. We propose Codecfake dataset, which is generated by seven representative neural codec methods. Experiment results show that codec-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models on the Codecfake test set.
Title:
```
  Valeo4Cast: A Modular Approach to End-to-End Forecasting
```
Authors: Yihong Xu, Éloi Zablocki, Alexandre Boulch, Gilles Puy, Mickael Chen, Florent Bartoccioni, Nermin Samet, Oriane Siméoni, Spyros Gidaris, Tuan-Hung Vu, Andrei Bursuc, Eduardo Valle, Renaud Marlet, Matthieu Cord
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect from sensor data (cameras or LiDARs) the position and past trajectories of the different elements of the scene and predict their future location. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting and we use a modular approach instead. Following a recent study, we individually build and train detection, tracking, and forecasting modules. We then only use consecutive finetuning steps to integrate the modules better and alleviate compounding errors. Our study reveals that this simple yet effective approach significantly improves performance on the end-to-end forecasting benchmark. Consequently, our solution ranks first in the Argoverse 2 end-to-end Forecasting Challenge held at CVPR 2024 Workshop on Autonomous Driving (WAD), with 63.82 mAPf. We surpass forecasting results by +17.1 points over last year's winner and by +13.3 points over this year's runner-up. This remarkable performance in forecasting can be explained by our modular paradigm, which integrates finetuning strategies and significantly outperforms the end-to-end-trained counterparts.
Title:
```
  CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer
```
Authors: Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jieping Ye
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two frameworks for 3D object detection with minimal hand-crafted design. Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal. Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information. Additionally, CT3D ++ utilizes a point-to-key bidirectional encoder for more efficient feature encoding with reduced computational cost. By replacing the corresponding components of CT3D with these novel modules, CT3D++ achieves state-of-the-art performance on both the KITTI dataset and the large-scale Way-mo Open Dataset. The source code for our frameworks will be made accessible at this https URL.
Title:
```
  Chemistry3D: Robotic Interaction Benchmark for Chemistry Experiments
```
Authors: Shoujie Li, Yan Huang, Changqing Guo, Tong Wu, Jiawei Zhang, Linrui Zhang, Wenbo Ding
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The advent of simulation engines has revolutionized learning and operational efficiency for robots, offering cost-effective and swift pipelines. However, the lack of a universal simulation platform tailored for chemical scenarios impedes progress in robotic manipulation and visualization of reaction processes. Addressing this void, we present Chemistry3D, an innovative toolkit that integrates extensive chemical and robotic knowledge. Chemistry3D not only enables robots to perform chemical experiments but also provides real-time visualization of temperature, color, and pH changes during reactions. Built on the NVIDIA Omniverse platform, Chemistry3D offers interfaces for robot operation, visual inspection, and liquid flow control, facilitating the simulation of special objects such as liquids and transparent entities. Leveraging this toolkit, we have devised RL tasks, object detection, and robot operation scenarios. Additionally, to discern disparities between the rendering engine and the real world, we conducted transparent object detection experiments using Sim2Real, validating the toolkit's exceptional simulation performance. The source code is available at this https URL, and a related tutorial can be found at this https URL.
Title:
```
  Continuous fake media detection: adapting deepfake detectors to new generative techniques
```
Authors: Francesco Tassone, Luca Maiano, Irene Amerini
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Generative techniques continue to evolve at an impressively high rate, driven by the hype about these technologies. This rapid advancement severely limits the application of deepfake detectors, which, despite numerous efforts by the scientific community, struggle to achieve sufficiently robust performance against the ever-changing content. To address these limitations, in this paper, we propose an analysis of two continuous learning techniques on a Short and a Long sequence of fake media. Both sequences include a complex and heterogeneous range of deepfakes generated from GANs, computer graphics techniques, and unknown sources. Our study shows that continual learning could be important in mitigating the need for generalizability. In fact, we show that, although with some limitations, continual learning methods help to maintain good performance across the entire training sequence. For these techniques to work in a sufficiently robust way, however, it is necessary that the tasks in the sequence share similarities. In fact, according to our experiments, the order and similarity of the tasks can affect the performance of the models over time. To address this problem, we show that it is possible to group tasks based on their similarity. This small measure allows for a significant improvement even in longer sequences. This result suggests that continual techniques can be combined with the most promising detection methods, allowing them to catch up with the latest generative techniques. In addition to this, we propose an overview of how this learning approach can be integrated into a deepfake detection pipeline for continuous integration and continuous deployment (CI/CD). This allows you to keep track of different funds, such as social networks, new generative tools, or third-party datasets, and through the integration of continuous learning, allows constant maintenance of the detectors.
Title:
```
  Underneath the Numbers: Quantitative and Qualitative Gender Fairness in LLMs for Depression Prediction
```
Authors: Micol Spitale, Jiaee Cheong, Hatice Gunes
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Recent studies show bias in many machine learning models for depression detection, but bias in LLMs for this task remains unexplored. This work presents the first attempt to investigate the degree of gender bias present in existing LLMs (ChatGPT, LLaMA 2, and Bard) using both quantitative and qualitative approaches. From our quantitative evaluation, we found that ChatGPT performs the best across various performance metrics and LLaMA 2 outperforms other LLMs in terms of group fairness metrics. As qualitative fairness evaluation remains an open research question we propose several strategies (e.g., word count, thematic analysis) to investigate whether and how a qualitative evaluation can provide valuable insights for bias analysis beyond what is possible with quantitative evaluation. We found that ChatGPT consistently provides a more comprehensive, well-reasoned explanation for its prediction compared to LLaMA 2. We have also identified several themes adopted by LLMs to qualitatively evaluate gender fairness. We hope our results can be used as a stepping stone towards future attempts at improving qualitative evaluation of fairness for LLMs especially for high-stakes tasks such as depression detection.
Title:
```
  Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling
```
Authors: Gregorios A Katsios, Ning Sa, Tomek Strzalkowski
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract The identification of Figurative Language (FL) features in text is crucial for various Natural Language Processing (NLP) tasks, where understanding of the author's intended meaning and its nuances is key for successful communication. At the same time, the use of a specific blend of various FL forms most accurately reflects a writer's style, rather than the use of any single construct, such as just metaphors or irony. Thus, we postulate that FL features could play an important role in Authorship Attribution (AA) tasks. We believe that our is the first computational study of AA based on FL use. Accordingly, we propose a Multi-task Figurative Language Model (MFLM) that learns to detect multiple FL features in text at once. We demonstrate, through detailed evaluation across multiple test sets, that the our model tends to perform equally or outperform specialized binary models in FL detection. Subsequently, we evaluate the predictive capability of joint FL features towards the AA task on three datasets, observing improved AA performance through the integration of MFLM embeddings.
Title:
```
  A Sociotechnical Lens for Evaluating Computer Vision Models: A Case Study on Detecting and Reasoning about Gender and Emotion
```
Authors: Sha Luo, Sang Jung Kim, Zening Duan, Kaiping Chen
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In the evolving landscape of computer vision (CV) technologies, the automatic detection and interpretation of gender and emotion in images is a critical area of study. This paper investigates social biases in CV models, emphasizing the limitations of traditional evaluation metrics such as precision, recall, and accuracy. These metrics often fall short in capturing the complexities of gender and emotion, which are fluid and culturally nuanced constructs. Our study proposes a sociotechnical framework for evaluating CV models, incorporating both technical performance measures and considerations of social fairness. Using a dataset of 5,570 images related to vaccination and climate change, we empirically compared the performance of various CV models, including traditional models like DeepFace and FER, and generative models like GPT-4 Vision. Our analysis involved manually validating the gender and emotional expressions in a subset of images to serve as benchmarks. Our findings reveal that while GPT-4 Vision outperforms other models in technical accuracy for gender classification, it exhibits discriminatory biases, particularly in response to transgender and non-binary personas. Furthermore, the model's emotion detection skew heavily towards positive emotions, with a notable bias towards associating female images with happiness, especially when prompted by male personas. These findings underscore the necessity of developing more comprehensive evaluation criteria that address both validity and discriminatory biases in CV models. Our proposed framework provides guidelines for researchers to critically assess CV tools, ensuring their application in communication research is both ethical and effective. The significant contribution of this study lies in its emphasis on a sociotechnical approach, advocating for CV technologies that support social good and mitigate biases rather than perpetuate them.
Title:
```
  Using Deep Convolutional Neural Networks to Detect Rendered Glitches in Video Games
```
Authors: Carlos Garcia Ling, Konrad Tollmar, Linus Gisslen
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper, we present a method using Deep Convolutional Neural Networks (DCNNs) to detect common glitches in video games. The problem setting consists of an image (800x800 RGB) as input to be classified into one of five defined classes, normal image, or one of four different kinds of glitches (stretched, low resolution, missing and placeholder textures). Using a supervised approach, we train a ShuffleNetV2 using generated data. This work focuses on detecting texture graphical anomalies achieving arguably good performance with an accuracy of 86.8\%, detecting 88\% of the glitches with a false positive rate of 8.7\%, and with the models being able to generalize and detect glitches even in unseen objects. We apply a confidence measure as well to tackle the issue with false positives as well as an effective way of aggregating images to achieve better detection in production. The main use of this work is the partial automatization of graphical testing in the final stages of video game development.
Title:
```
  Dataset Enhancement with Instance-Level Augmentations
```
Authors: Orest Kupyn, Christian Rupprecht
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We present a method for expanding a dataset by incorporating knowledge from the wide distribution of pre-trained latent diffusion models. Data augmentations typically incorporate inductive biases about the image formation process into the training (e.g. translation, scaling, colour changes, etc.). Here, we go beyond simple pixel transformations and introduce the concept of instance-level data augmentation by repainting parts of the image at the level of object instances. The method combines a conditional diffusion model with depth and edge maps control conditioning to seamlessly repaint individual objects inside the scene, being applicable to any segmentation or detection dataset. Used as a data augmentation method, it improves the performance and generalization of the state-of-the-art salient object detection, semantic segmentation and object detection models. By redrawing all privacy-sensitive instances (people, license plates, etc.), the method is also applicable for data anonymization. We also release fully synthetic and anonymized expansions for popular datasets: COCO, Pascal VOC and DUTS.
Title:
```
  A New Class Biorthogonal Spline Wavelet for Image Edge Detection
```
Authors: Dujuan Zhou, Zizhao Yuan
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Spline wavelets have shown favorable characteristics for localizing in both time and frequency. In this paper, we propose a new biorthogonal cubic special spline wavelet (BCSSW), based on the Cohen-Daubechies-Feauveau wavelet construction method and the cubic special spline algorithm. BCSSW has better properties in compact support, symmetry, and frequency domain characteristics. However, current mainstream detection operators usually ignore the uncertain representation of regional pixels and global structures. To solve these problems, we propose a structural uncertainty-aware and multi-structure operator fusion detection algorithm (EDBSW) based on a new BCSSW spline wavelet. By constructing a spline wavelet that efficiently handles edge effects, we utilize structural uncertainty-aware modulus maxima to detect highly uncertain edge samples. The proposed wavelet detection operator utilizes the multi-structure morphological operator and fusion reconstruction strategy to effectively address anti-noise processing and edge information of different frequencies. Numerous experiments have demonstrated its excellent performance in reducing noise and capturing edge structure details.
Title:
```
  Vessel Re-identification and Activity Detection in Thermal Domain for Maritime Surveillance
```
Authors: Yasod Ginige, Ransika Gunasekara, Darsha Hewavitharana, Manjula Ariyarathne, Ranga Rodrigo, Peshala Jayasekara
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Maritime surveillance is vital to mitigate illegal activities such as drug smuggling, illegal fishing, and human trafficking. Vision-based maritime surveillance is challenging mainly due to visibility issues at night, which results in failures in re-identifying vessels and detecting suspicious activities. In this paper, we introduce a thermal, vision-based approach for maritime surveillance with object tracking, vessel re-identification, and suspicious activity detection capabilities. For vessel re-identification, we propose a novel viewpoint-independent algorithm which compares features of the sides of the vessel separately (separate side-spaces) leveraging shape information in the absence of color features. We propose techniques to adapt tracking and activity detection algorithms for the thermal domain and train them using a thermal dataset we created. This dataset will be the first publicly available benchmark dataset for thermal maritime surveillance. Our system is capable of re-identifying vessels with an 81.8% Top1 score and identifying suspicious activities with a 72.4\% frame mAP score; a new benchmark for each task in the thermal domain.
Title:
```
  Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization
```
Authors: Fengxiao Tang, Xiaonan Wang, Xun Yuan, Linfeng Luo, Ming Zhao, Nei Kato
Subjects: Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learning techniques, lack multi-scale adaptivity for heterogeneous device information, resulting in unsatisfactory diagnostic accuracy for DHNs. In this paper, we develop an LLM-assisted end-to-end intelligent network health management framework. The framework first proposes a Multi-Scale Semanticized Anomaly Detection Model (MSADM), incorporating semantic rule trees with an attention mechanism to address the multi-scale anomaly detection problem in DHNs. Secondly, a chain-of-thought-based large language model is embedded in downstream to adaptively analyze the fault detection results and produce an analysis report with detailed fault information and optimization strategies. Experimental results show that the accuracy of our proposed MSADM for heterogeneous network entity anomaly detection is as high as 91.31\%.
Title:
```
  Highly Connected Graph Partitioning: Exact Formulation and Solution Methods
```
Authors: Rahul Swamy, Douglas M. King, Sheldon H. Jacobson
Subjects: Subjects: Discrete Mathematics (cs.DM); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Graph partitioning (GP) and vertex connectivity have traditionally been two distinct fields of study. This paper introduces the highly connected graph partitioning (HCGP) problem, which partitions a graph into compact, size balanced, and $Q$-(vertex) connected parts for any $Q\geq 1$. This problem is valuable in applications that seek cohesion and fault-tolerance within their parts, such as community detection in social networks and resiliency-focused partitioning of power networks. Existing research in this fundamental interconnection primarily focuses on providing theoretical existence guarantees of highly connected partitions for a limited set of dense graphs, and do not include canonical GP considerations such as size balance and compactness. This paper's key contribution is providing a general modeling and algorithmic approach for HCGP, inspired by recent work in the political districting problem, a special case of HCGP with $Q=1$. This approach models $Q$-connectivity constraints as mixed integer programs for any $Q\geq 1$ and provides an efficient branch-and-cut method to solve HCGP. When solution time is a priority over optimality, this paper provides a heuristic method specifically designed for HCGP with $Q=2$. A computational analysis evaluates these methods using a test bed of instances from various real-world graphs. In this analysis, the branch-and-cut method finds an optimal solution within one hour in $82.8\%$ of the instances solved. For $Q=2$, small and sparse instances are challenging for the heuristic, whereas large and sparse instances are challenging for the exact method. Furthermore, this study quantifies the computational cost of ensuring higher connectivity using the branch-and-cut approach, compared to a baseline of ensuring $1$-connectivity. Overall, this work serves as an effective tool to partition a graph into resilient and cohesive parts.
Title:
```
  Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze
```
Authors: Michele Mazzamuto, Antonino Furnari, Giovanni Maria Farinella
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper, we address the challenge of unsupervised mistake detection in egocentric video through the analysis of gaze signals, a critical component for advancing user assistance in smart glasses. Traditional supervised methods, reliant on manually labeled mistakes, suffer from domain-dependence and scalability issues. This research introduces an unsupervised method for detecting mistakes in videos of human activities, overcoming the challenges of domain-specific requirements and the necessity for annotated data. By analyzing unusual gaze patterns that signal user disorientation during tasks, we propose a gaze completion model that forecasts eye gaze trajectories from incomplete inputs. The difference between the anticipated and observed gaze paths acts as an indicator for identifying errors. Our method is validated on the EPIC-Tent dataset, showing its superiority compared to current one-class supervised and unsupervised techniques.
Title:
```
  LaneCPP: Continuous 3D Lane Detection using Physical Priors
```
Authors: Maximilian Pittner, Joel Janai, Alexandru P. Condurache
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Monocular 3D lane detection has become a fundamental problem in the context of autonomous driving, which comprises the tasks of finding the road surface and locating lane markings. One major challenge lies in a flexible but robust line representation capable of modeling complex lane structures, while still avoiding unpredictable behavior. While previous methods rely on fully data-driven approaches, we instead introduce a novel approach LaneCPP that uses a continuous 3D lane detection model leveraging physical prior knowledge about the lane structure and road geometry. While our sophisticated lane model is capable of modeling complex road structures, it also shows robust behavior since physical constraints are incorporated by means of a regularization scheme that can be analytically applied to our parametric representation. Moreover, we incorporate prior knowledge about the road geometry into the 3D feature space by modeling geometry-aware spatial features, guiding the network to learn an internal road surface representation. In our experiments, we show the benefits of our contributions and prove the meaningfulness of using priors to make 3D lane detection more robust. The results show that LaneCPP achieves state-of-the-art performance in terms of F-Score and geometric errors.
Title:
```
  Transformation-Dependent Adversarial Attacks
```
Authors: Yaoteng Tan, Zikui Cai, M. Salman Asif
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We introduce transformation-dependent adversarial attacks, a new class of threats where a single additive perturbation can trigger diverse, controllable mis-predictions by systematically transforming the input (e.g., scaling, blurring, compression). Unlike traditional attacks with static effects, our perturbations embed metamorphic properties to enable different adversarial attacks as a function of the transformation parameters. We demonstrate the transformation-dependent vulnerability across models (e.g., convolutional networks and vision transformers) and vision tasks (e.g., image classification and object detection). Our proposed geometric and photometric transformations enable a range of targeted errors from one crafted input (e.g., higher than 90% attack success rate for classifiers). We analyze effects of model architecture and type/variety of transformations on attack effectiveness. This work forces a paradigm shift by redefining adversarial inputs as dynamic, controllable threats. We highlight the need for robust defenses against such multifaceted, chameleon-like perturbations that current techniques are ill-prepared for.
Title:
```
  ORES-Inspect: A technology probe for machine learning audits on enwiki
```
Authors: Zachary Levonian, Lauren Hagen, Lu Li, Jada Lilleboe, Solvejg Wastvedt, Aaron Halfaker, Loren Terveen
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Auditing the machine learning (ML) models used on Wikipedia is important for ensuring that vandalism-detection processes remain fair and effective. However, conducting audits is challenging because stakeholders have diverse priorities and assembling evidence for a model's [in]efficacy is technically complex. We designed an interface to enable editors to learn about and audit the performance of the ORES edit quality model. ORES-Inspect is an open-source web tool and a provocative technology probe for researching how editors think about auditing the many ML models used on Wikipedia. We describe the design of ORES-Inspect and our plans for further research with this system.
Title:
```
  AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind
```
Authors: Wei Ding, Fanhong Li, Ziteng Ji, Zhengrong Xue, Jia Liu
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We propose AToM-Bot, a novel task generation and execution framework for proactive robot-human interaction, which leverages the human mental and physical state inference capabilities of the Vision Language Model (VLM) prompted by the Affective Theory of Mind (AToM). Without requiring explicit commands by humans, AToM-Bot proactively generates and follows feasible tasks to improve general human well-being. When around humans, AToM-Bot first detects current human needs based on inferred human states and observations of the surrounding environment. It then generates tasks to fulfill these needs, taking into account its embodied constraints. We designed 16 daily life scenarios spanning 4 common scenes and tasked the same visual stimulus to 59 human subjects and our robot. We used the similarity between human open-ended answers and robot output, and the human satisfaction scores to metric robot performance. AToM-Bot received high human evaluations in need detection (6.42/7, 91.7%), embodied solution (6.15/7, 87.8%) and task execution (6.17/7, 88.1%). We show that AToM-Bot excels in generating and executing feasible plans to fulfill unspoken human needs. Videos and code are available at this https URL.
Title:
```
  Enhancing End-to-End Autonomous Driving with Latent World Model
```
Authors: Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, Tieniu Tan
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels. Specifically, our framework \textbf{LAW} uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame. The predicted latent features are supervised by the actually observed features in the future. This supervision jointly optimizes the latent feature learning and action prediction, which greatly enhances the driving performance. As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
Keyword: face recognition

There is no result

Keyword: augmentation

Title:
```
  DiffPop: Plausibility-Guided Object Placement Diffusion for Image Composition
```
Authors: Jiacheng Liu, Hang Zhou, Shida Wei, Rui Ma
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract In this paper, we address the problem of plausible object placement for the challenging task of realistic image composition. We propose DiffPop, the first framework that utilizes plausibility-guided denoising diffusion probabilistic model to learn the scale and spatial relations among multiple objects and the corresponding scene image. First, we train an unguided diffusion model to directly learn the object placement parameters in a self-supervised manner. Then, we develop a human-in-the-loop pipeline which exploits human labeling on the diffusion-generated composite images to provide the weak supervision for training a structural plausibility classifier. The classifier is further used to guide the diffusion sampling process towards generating the plausible object placement. Experimental results verify the superiority of our method for producing plausible and diverse composite images on the new Cityscapes-OP dataset and the public OPA dataset, as well as demonstrate its potential in applications such as data augmentation and multi-object placement tasks. Our dataset and code will be released.
Title:
```
  DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera
```
Authors: Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Hybrid Event-Based Vision Sensor (HybridEVS) is a novel sensor integrating traditional frame-based and event-based sensors, offering substantial benefits for applications requiring low-light, high dynamic range, and low-latency environments, such as smartphones and wearable devices. Despite its potential, the lack of Image signal processing (ISP) pipeline specifically designed for HybridEVS poses a significant challenge. To address this challenge, in this study, we propose a coarse-to-fine framework named DemosaicFormer which comprises coarse demosaicing and pixel correction. Coarse demosaicing network is designed to produce a preliminary high-quality estimate of the RGB image from the HybridEVS raw data while the pixel correction network enhances the performance of image restoration and mitigates the impact of defective pixels. Our key innovation is the design of a Multi-Scale Gating Module (MSGM) applying the integration of cross-scale features, which allows feature information to flow between different scales. Additionally, the adoption of progressive training and data augmentation strategies further improves model's robustness and effectiveness. Experimental results show superior performance against the existing methods both qualitatively and visually, and our DemosaicFormer achieves the best performance in terms of all the evaluation metrics in the MIPI 2024 challenge on Demosaic for Hybridevs Camera. The code is available at this https URL.
Title:
```
  It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
```
Authors: Taiming Lu, Lingfeng Shen, Xinyu Yang, Weiting Tan, Beidi Chen, Huaxiu Yao
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Reinforcement Learning from Human Feedback (RLHF) involves training policy models (PMs) and reward models (RMs) to align language models with human preferences. Instead of focusing solely on PMs and RMs independently, we propose to examine their interactions during fine-tuning, introducing the concept of seamlessness. Our study starts with observing the saturation phenomenon, where continual improvements in RM and PM do not translate into RLHF progress. Our analysis shows that RMs fail to assign proper scores to PM responses, resulting in a 35% mismatch rate with human preferences, highlighting a significant discrepancy between PM and RM. To measure seamlessness between PM and RM without human effort, we propose an automatic metric, SEAM. SEAM quantifies the discrepancies between PM and RM judgments induced by data samples. We validate the effectiveness of SEAM in data selection and model augmentation. Our experiments demonstrate that (1) using SEAM-filtered data for RL training improves RLHF performance by 4.5%, and (2) SEAM-guided model augmentation results in a 4% performance improvement over standard augmentation methods.
Title:
```
  2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation
```
Authors: Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Complex video object segmentation serves as a fundamental task for a wide range of downstream applications such as video editing and automatic data annotation. Here we present the 2nd place solution in the MOSE track of PVUW 2024. To mitigate problems caused by tiny objects, similar objects and fast movements in MOSE. We use instance segmentation to generate extra pretraining data from the valid and test set of MOSE. The segmented instances are combined with objects extracted from COCO to augment the training data and enhance semantic representation of the baseline model. Besides, motion blur is added during training to increase robustness against image blur induced by motion. Finally, we apply test time augmentation (TTA) and memory strategy to the inference stage. Our method ranked 2nd in the MOSE track of PVUW 2024, with a $\mathcal{J}$ of 0.8007, a $\mathcal{F}$ of 0.8683 and a $\mathcal{J}$\&$\mathcal{F}$ of 0.8345.
Title:
```
  Dataset Enhancement with Instance-Level Augmentations
```
Authors: Orest Kupyn, Christian Rupprecht
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract We present a method for expanding a dataset by incorporating knowledge from the wide distribution of pre-trained latent diffusion models. Data augmentations typically incorporate inductive biases about the image formation process into the training (e.g. translation, scaling, colour changes, etc.). Here, we go beyond simple pixel transformations and introduce the concept of instance-level data augmentation by repainting parts of the image at the level of object instances. The method combines a conditional diffusion model with depth and edge maps control conditioning to seamlessly repaint individual objects inside the scene, being applicable to any segmentation or detection dataset. Used as a data augmentation method, it improves the performance and generalization of the state-of-the-art salient object detection, semantic segmentation and object detection models. By redrawing all privacy-sensitive instances (people, license plates, etc.), the method is also applicable for data anonymization. We also release fully synthetic and anonymized expansions for popular datasets: COCO, Pascal VOC and DUTS.
Title:
```
  Strategies for Pretraining Neural Operators
```
Authors: Anthony Zhou, Cooper Lorsung, AmirPouya Hemmasian, Amir Barati Farimani
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/
Pdf link: https://arxiv.org/pdf/
Abstract Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance. Despite these advances, our understanding of how pretraining affects neural operators is still limited; studies generally propose tailored architectures and datasets that make it challenging to compare or examine different pretraining frameworks. To address this, we compare various pretraining methods without optimizing architecture choices to characterize pretraining dynamics on different models and datasets as well as to understand its scaling and generalization behavior. We find that pretraining is highly dependent on model and dataset choices, but in general transfer learning or physics-based pretraining strategies work best. In addition, pretraining performance can be further improved by using data augmentations. Lastly, pretraining is additionally beneficial when fine-tuning in scarce data regimes or when generalizing to downstream data similar to the pretraining distribution. Through providing insights into pretraining neural operators for physics prediction, we hope to motivate future work in developing and evaluating pretraining methods for PDEs.

LeeKyungwook / get-arxiv-noti

New submissions for Thursday, 13 June 2024 (showing 438 of 438 entries ) #1145

Keyword: detection

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Title:

Keyword: face recognition

Keyword: augmentation

Title:

Title:

Title:

Title:

Title:

Title: