Abstract
Traditional models focus on horizontal table detection but struggle in rotating contexts, limiting progress in table recognition. This paper introduces a new task: detecting table regions and localizing head-tail parts in rotation scenarios. We propose corresponding datasets, evaluation metrics, and methods. Our novel method, 'Adaptively Bounded Rotation,' addresses dataset scarcity in detecting rotated tables and their head-tail parts. We produced 'TRR360D,' a dataset incorporating semantic information of table head and tail, based on 'ICDAR2019MTD.' A new metric, 'R360 AP,' measures precision in detecting rotated regions and localizing head-tail parts. Our baseline, the high-speed and accurate 'RTMDet-S,' is chosen after extensive review and testing. We introduce 'RTHDet,' enhancing the baseline with a 'r360' rotated rectangle angle representation and an 'Angle Loss' branch, improving head-tail localization. By applying transfer learning and adaptive boundary rotation augmentation, RTHDet's AP50 (T<90) improved from 23.7% to 88.7% compared to the baseline. This demonstrates RTHDet's effectiveness in detecting rotating table regions and accurately localizing head and tail parts.RTHDet is integrated into the widely-used open-source MMRotate toolkit: https://github.com/open-mmlab/mmrotate/tree/dev-1.x/projects/RR360.
Techniques to Detect Crime Leaders within a Criminal Network: A Survey, Experimental, and Comparative Evaluations
Authors: Authors: Kamal Taha, Abdulhadi Shoufan
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Abstract
This survey paper offers a thorough analysis of techniques and algorithms used in the identification of crime leaders within criminal networks. For each technique, the paper examines its effectiveness, limitations, potential for improvement, and future prospects. The main challenge faced by existing survey papers focusing on algorithms for identifying crime leaders and predicting crimes is effectively categorizing these algorithms. To address this limitation, this paper proposes a new methodological taxonomy that hierarchically classifies algorithms into more detailed categories and specific techniques. The paper includes empirical and experimental evaluations to rank the different techniques. The combination of the methodological taxonomy, empirical evaluations, and experimental comparisons allows for a nuanced and comprehensive understanding of the techniques and algorithms for identifying crime leaders, assisting researchers in making informed decisions. Moreover, the paper offers valuable insights into the future prospects of techniques for identifying crime leaders, emphasizing potential advancements and opportunities for further research. Here's an overview of our empirical analysis findings and experimental insights, along with the solution we've devised: (1) PageRank and Eigenvector centrality are reliable for mapping network connections, (2) Katz Centrality can effectively identify influential criminals through indirect links, stressing their significance in criminal networks, (3) current models fail to account for the specific impacts of criminal influence levels, the importance of socio-economic context, and the dynamic nature of criminal networks and hierarchies, and (4) we propose enhancements, such as incorporating temporal dynamics and sentiment analysis to reflect the fluidity of criminal activities and relationships, which could improve the detection of key criminals .
Detection of tortured phrases in scientific literature
Abstract
This paper presents various automatic detection methods to extract so called tortured phrases from scientific papers. These tortured phrases, e.g. flag to clamor instead of signal to noise, are the results of paraphrasing tools used to escape plagiarism detection. We built a dataset and evaluated several strategies to flag previously undocumented tortured phrases. The proposed and tested methods are based on language models and either on embeddings similarities or on predictions of masked token. We found that an approach using token prediction and that propagates the scores to the chunk level gives the best results. With a recall value of .87 and a precision value of .61, it could retrieve new tortured phrases to be submitted to domain experts for validation.
Weighted Conformal LiDAR-Mapping for Structured SLAM
Authors: Authors: Natalia Prieto-Fernández, Sergio Fernández-Blanco, Álvaro Fernández-Blanco, José Alberto Benítez-Andrades, Francisco Carro-De-Lorenzo, Carmen Benavides
Abstract
One of the main challenges in simultaneous localization and mapping (SLAM) is real-time processing. High-computational loads linked to data acquisition and processing complicate this task. This article presents an efficient feature extraction approach for mapping structured environments. The proposed methodology, weighted conformal LiDAR-mapping (WCLM), is based on the extraction of polygonal profiles and propagation of uncertainties from raw measurement data. This is achieved using conformal M bius transformation. The algorithm has been validated experimentally using 2-D data obtained from a low-cost Light Detection and Ranging (LiDAR) range finder. The results obtained suggest that computational efficiency is significantly improved with reference to other state-of-the-art SLAM approaches.
A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model
Abstract
Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most affected. Stalking is a visible action that usually occurs before any criminal activity begins as the stalker begins to follow, loiter, and stare at the victim before committing any criminal activity such as assault, kidnapping, rape, and so on. Therefore, it has become a necessity to detect stalking as all of these criminal activities can be stopped in the first place through stalking detection. In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video with a minimal number of frames. We extract multiple relevant features, such as facial landmarks, head pose estimation, and relative distance, as numerical values from video frames. This data is fed into a multilayer perceptron (MLP) to perform a classification task between a stalking and a non-stalking scenario. Simultaneously, the video frames are fed into a combination of convolutional and LSTM models to extract the spatio-temporal features. We use a fusion of these numerical and spatio-temporal features to build a classifier to detect stalking incidents. Additionally, we introduce a dataset consisting of stalking and non-stalking videos gathered from various feature films and television series, which is also used to train the model. The experimental results show the efficiency and dynamism of our proposed stalker detection system, achieving 89.58% testing accuracy with a significant improvement as compared to the state-of-the-art approaches.
Extending RAIM with a Gaussian Mixture of Opportunistic Information
Abstract
GNSS are indispensable for various applications, but they are vulnerable to spoofing attacks. The original receiver autonomous integrity monitoring (RAIM) was not designed for securing GNSS. In this context, RAIM was extended with wireless signals, termed signals of opportunity (SOPs), or onboard sensors, typically assumed benign. However, attackers might also manipulate wireless networks, raising the need for a solution that considers untrustworthy SOPs. To address this, we extend RAIM by incorporating all opportunistic information, i.e., measurements from terrestrial infrastructures and onboard sensors, culminating in one function for robust GNSS spoofing detection. The objective is to assess the likelihood of GNSS spoofing by analyzing locations derived from extended RAIM solutions, which include location solutions from GNSS pseudorange subsets and wireless signal subsets of untrusted networks. Our method comprises two pivotal components: subset generation and location fusion. Subsets of ranging information are created and processed through positioning algorithms, producing temporary locations. Onboard sensors provide speed, acceleration, and attitude data, aiding in location filtering based on motion constraints. The filtered locations, modeled with uncertainty, are fused into a composite likelihood function normalized for GNSS spoofing detection. Theoretical assessments of GNSS-only and multi-infrastructure scenarios under uncoordinated and coordinated attacks are conducted. The detection of these attacks is feasible when the number of benign subsets exceeds a specific threshold. A real-world dataset from the Kista area is used for experimental validation. Comparative analysis against baseline methods shows a significant improvement in detection accuracy achieved by our Gaussian Mixture RAIM approach. Moreover, we discuss leveraging RAIM results for plausible location recovery.
Stitching the Spectrum: Semantic Spectrum Segmentation with Wideband Signal
Authors: Authors: Daniel Uvaydov, Milin Zhang, Clifton Paul Robinson, Salvatore D'Oro, Tommaso Melodia, Francesco Restuccia
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Spectrum has become an extremely scarce and congested resource. As a consequence, spectrum sensing enables the coexistence of different wireless technologies in shared spectrum bands. Most existing work requires spectrograms to classify signals. Ultimately, this implies that images need to be continuously created from I/Q samples, thus creating unacceptable latency for real-time operations. In addition, spectrogram-based approaches do not achieve sufficient granularity level as they are based on object detection performed on pixels and are based on rectangular bounding boxes. For this reason, we propose a completely novel approach based on semantic spectrum segmentation, where multiple signals are simultaneously classified and localized in both time and frequency at the I/Q level. Conversely from the state-of-the-art computer vision algorithm, we add non-local blocks to combine the spatial features of signals, and thus achieve better performance. In addition, we propose a novel data generation approach where a limited set of easy-to-collect real-world wireless signals are ``stitched together'' to generate large-scale, wideband, and diverse datasets. Experimental results obtained on multiple testbeds (including the Arena testbed) using multiple antennas, multiple sampling frequencies, and multiple radios over the course of 3 days show that our approach classifies and localizes signals with a mean intersection over union (IOU) of 96.70% across 5 wireless protocols while performing in real-time with a latency of 2.6 ms. Moreover, we demonstrate that our approach based on non-local blocks achieves 7% more accuracy when segmenting the most challenging signals with respect to the state-of-the-art U-Net algorithm. We will release our 17 GB dataset and code.
Early prediction of onset of sepsis in Clinical Setting
Abstract
This study proposes the use of Machine Learning models to predict the early onset of sepsis using deidentified clinical data from Montefiore Medical Center in Bronx, NY, USA. A supervised learning approach was adopted, wherein an XGBoost model was trained utilizing 80\% of the train dataset, encompassing 107 features (including the original and derived features). Subsequently, the model was evaluated on the remaining 20\% of the test data. The model was validated on prospective data that was entirely unseen during the training phase. To assess the model's performance at the individual patient level and timeliness of the prediction, a normalized utility score was employed, a widely recognized scoring methodology for sepsis detection, as outlined in the PhysioNet Sepsis Challenge paper. Metrics such as F1 Score, Sensitivity, Specificity, and Flag Rate were also devised. The model achieved a normalized utility score of 0.494 on test data and 0.378 on prospective data at threshold 0.3. The F1 scores were 80.8\% and 67.1\% respectively for the test data and the prospective data for the same threshold, highlighting its potential to be integrated into clinical decision-making processes effectively. These results bear testament to the model's robust predictive capabilities and its potential to substantially impact clinical decision-making processes.
How Does Unlabeled Data Provably Help Out-of-Distribution Detection?
Authors: Authors: Xuefeng Du, Zhen Fang, Ilias Diakonikolas, Yixuan Li
Abstract
Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD classifier. Currently, there is a lack of research on formally understanding how unlabeled data helps OOD detection. This paper bridges the gap by introducing a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness. The framework separates candidate outliers from the unlabeled data and then trains an OOD classifier using the candidate outliers and the labeled ID data. Theoretically, we provide rigorous error bounds from the lens of separability and learnability, formally justifying the two components in our algorithm. Our theory shows that SAL can separate the candidate outliers with small error rates, which leads to a generalization guarantee for the learned OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, reinforcing our theoretical insights. Code is publicly available at https://github.com/deeplearning-wisc/sal.
nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model
Abstract
In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimensional medical images. In this paper, we introduce nnMamba, a novel architecture that integrates the strengths of CNNs and the advanced long-range modeling capabilities of State Space Sequence Models (SSMs). nnMamba adds the SSMs to the convolutional residual-block to extract local features and model complex dependencies. For diffirent tasks, we build different blocks to learn the features. Extensive experiments demonstrate nnMamba's superiority over state-of-the-art methods in a suite of challenging tasks, including 3D image segmentation, classification, and landmark detection. nnMamba emerges as a robust solution, offering both the local representation ability of CNNs and the efficient global context processing of SSMs, setting a new standard for long-range dependency modeling in medical image analysis. Code is available at https://github.com/lhaof/nnMamba
A novel pattern recognition system for detecting Android malware by analyzing suspicious boot sequences
Authors: Authors: Jorge Maestre Vidal, Marco Antonio Sotelo Monge, Luis Javier García Villalba
Abstract
This paper introduces a malware detection system for smartphones based on studying the dynamic behavior of suspicious applications. The main goal is to prevent the installation of the malicious software on the victim systems. The approach focuses on identifying malware addressed against the Android platform. For that purpose, only the system calls performed during the boot process of the recently installed applications are studied. Thereby the amount of information to be considered is reduced, since only activities related with their initialization are taken into account. The proposal defines a pattern recognition system with three processing layers: monitoring, analysis and decision-making. First, in order to extract the sequences of system calls, the potentially compromised applications are executed on a safe and isolated environment. Then the analysis step generates the metrics required for decision-making. This level combines sequence alignment algorithms with bagging, which allow scoring the similarity between the extracted sequences considering their regions of greatest resemblance. At the decision-making stage, the Wilcoxon signed-rank test is implemented, which determines if the new software is labeled as legitimate or malicious. The proposal has been tested in different experiments that include an in-depth study of a particular use case, and the evaluation of its effectiveness when analyzing samples of well-known public datasets. Promising experimental results have been shown, hence demonstrating that the approach is a good complement to the strategies of the bibliography.
The Invisible Game on the Internet: A Case Study of Decoding Deceptive Patterns
Abstract
Deceptive patterns are design practices embedded in digital platforms to manipulate users, representing a widespread and long-standing issue in the web and mobile software development industry. Legislative actions highlight the urgency of globally regulating deceptive patterns. However, despite advancements in detection tools, a significant gap exists in assessing deceptive pattern risks. In this study, we introduce a comprehensive approach involving the interactions between the Adversary, Watchdog (e.g., detection tools), and Challengers (e.g., users) to formalize and decode deceptive pattern threats. Based on this, we propose a quantitative risk assessment system. Representative cases are analyzed to showcase the practicability of the proposed risk scoring system, emphasizing the importance of involving human factors in deceptive pattern risk assessment.
Reverse Engineering and Security Evaluation of Commercial Tags for RFID-Based IoT Applications
Authors: Authors: Tiago M. Fernández-Caramés, Paula Fraga-Lamas, Manuel Suárez-Albela, Luis Castedo
Subjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR)
Abstract
The Internet of Things (IoT) is a distributed system of physical objects that requires the seamless integration of hardware (e.g., sensors, actuators, electronics) and network communications in order to collect and exchange data. IoT smart objects need to be somehow identified to determine the origin of the data and to automatically detect the elements around us. One of the best positioned technologies to perform identification is RFID (Radio Frequency Identification), which in the last years has gained a lot of popularity in applications like access control, payment cards or logistics. Despite its popularity, RFID security has not been properly handled in numerous applications. To foster security in such applications, this article includes three main contributions. First, in order to establish the basics, a detailed review of the most common flaws found in RFID-based IoT systems is provided, including the latest attacks described in the literature. Second, a novel methodology that eases the detection and mitigation of such flaws is presented. Third, the latest RFID security tools are analyzed and the methodology proposed is applied through one of them (Proxmark 3) to validate it. Thus, the methodology is tested in different scenarios where tags are commonly used for identification. In such systems it was possible to clone transponders, extract information, and even emulate both tags and readers. Therefore, it is shown that the methodology proposed is useful for auditing security and reverse engineering RFID communications in IoT applications. It must be noted that, although this paper is aimed at fostering RFID communications security in IoT applications, the methodology can be applied to any RFID communications protocol.
Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning
Authors: Authors: Trilok Padhi, Ugur Kursuncu, Yaman Kumar, Valerie L. Shalin, Lane Peterson Fronczek
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract
The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory.
Environment-Centric Learning Approach for Gait Synthesis in Terrestrial Soft Robots
Abstract
Locomotion gaits are fundamental for control of soft terrestrial robots. However, synthesis of these gaits is challenging due to modeling of robot-environment interaction and lack of a mathematical framework. This work presents an environment-centric, data-driven and fault-tolerant probabilistic Model-Free Control (pMFC) framework that allows for soft multi-limb robots to learn from their environment and synthesize diverse sets of locomotion gaits for realizing open-loop control. Here, discretization of factors dominating robot-environment interactions enables an environment-specific graphical representation where the edges encode experimental locomotion data corresponding to the robot motion primitives. In this graph, locomotion gaits are defined as simple cycles that are transformation invariant, i.e., the locomotion is independent of the starting vertex of these periodic cycles. Gait synthesis, the problem of finding optimal locomotion gaits for a given substrate, is formulated as Binary Integer Linear Programming (BILP) problems with a linearized cost function, linear constraints, and iterative simple cycle detection. Experimentally, gaits are synthesized for varying robot-environment interactions. Variables include robot morphology - three-limb and four-limb robots, TerreSoRo-III and TerreSoRo-IV; substrate - rubber mat, whiteboard and carpet; and actuator functionality - simulated loss of robot limb actuation. On an average, gait synthesis improves the translation and rotation speeds by 82% and 97% respectively. The results highlight that data-driven methods are vital to soft robot locomotion control due to the significant influence of unexpected asymmetries in the system and the dependence of optimal gait sequences on the experimental robot-environment interaction.
BEAM: Beta Distribution Ray Denoising for Multi-view 3D Object Detection
Authors: Authors: Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-view 3D object detectors struggle with duplicate predictions due to the lack of depth information, resulting in false positive detections. In this study, we introduce BEAM, a novel Beta Distribution Ray Denoising approach that can be applied to any DETR-style multi-view 3D detector to explicitly incorporate structure prior knowledge of the scene. By generating rays from cameras to objects and sampling spatial denoising queries from the Beta distribution family along these rays, BEAM enhances the model's ability to distinguish spatial hard negative samples arising from ambiguous depths. BEAM is a plug-and-play technique that adds only marginal computational costs during training, while impressively preserving the inference speed. Extensive experiments and ablation studies on the NuScenes dataset demonstrate significant improvements over strong baselines, outperforming the state-of-the-art method StreamPETR by 1.9% mAP. The code will be available at https://github.com/LiewFeng/BEAM.
Online Informative Sampling using Semantic Features in Underwater Environments
Abstract
The underwater world remains largely unexplored, with Autonomous Underwater Vehicles (AUVs) playing a crucial role in sub-sea explorations. However, continuous monitoring of underwater environments using AUVs can generate a significant amount of data. In addition, sending live data feed from an underwater environment requires dedicated on-board data storage options for AUVs which can hinder requirements of other higher priority tasks. Informative sampling techniques offer a solution by condensing observations. In this paper, we present a semantically-aware online informative sampling (ON-IS) approach which samples an AUV's visual experience in real-time. Specifically, we obtain visual features from a fine-tuned object detection model to align the sampling outcomes with the desired semantic information. Our contributions are (a) a novel Semantic Online Informative Sampling (SON-IS) algorithm, (b) a user study to validate the proposed approach and (c) a novel evaluation metric to score our proposed algorithm with respect to the suggested samples by human subjects
MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats
Abstract
In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimation. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique overhead aerial detection vital for addressing real-world scenarios with higher fidelity than datasets captured on specific vantage points using thermal and RGB. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models, which has never been seen in other datasets. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for developing accurate and efficient solutions. Our proposed modalities are cost-effective and highly adaptable, allowing users to experiment and implement new UAV threat detection tools. Our dataset closely simulates real-world scenarios by incorporating ambient heavy machinery sounds. This approach enhances the dataset's applicability, capturing the exact challenges faced during proximate vehicular operations. It is expected that MMAUD can play a pivotal role in advancing UAV threat detection, classification, trajectory estimation capabilities, and beyond. Our dataset, codes, and designs will be available in https://github.com/ntu-aris/MMAUD.
Enhancing Embodied Object Detection through Language-Image Pre-training and Implicit Object Memory
Authors: Authors: Nicolas Harvey Chapman, Feras Dayoub, Will Browne, Chris Lehnert
Abstract
Deep-learning and large scale language-image training have produced image object detectors that generalise well to diverse environments and semantic classes. However, single-image object detectors trained on internet data are not optimally tailored for the embodied conditions inherent in robotics. Instead, robots must detect objects from complex multi-modal data streams involving depth, localisation and temporal correlation, a task termed embodied object detection. Paradigms such as Video Object Detection (VOD) and Semantic Mapping have been proposed to leverage such embodied data streams, but existing work fails to enhance performance using language-image training. In response, we investigate how an image object detector pre-trained using language-image data can be extended to perform embodied object detection. We propose a novel implicit object memory that uses projective geometry to aggregate the features of detected objects across long temporal horizons. The spatial and temporal information accumulated in memory is then used to enhance the image features of the base detector. When tested on embodied data streams sampled from diverse indoor scenes, our approach improves the base object detector by 3.09 mAP, outperforming alternative external memories designed for VOD and Semantic Mapping. Our method also shows a significant improvement of 16.90 mAP relative to baselines that perform embodied object detection without first training on language-image data, and is robust to sensor noise and domain shift experienced in real-world deployment.
Abstract
Knowledge graphs (KGs) have garnered significant attention for their vast potential across diverse domains. However, the issue of outdated facts poses a challenge to KGs, affecting their overall quality as real-world information evolves. Existing solutions for outdated fact detection often rely on manual recognition. In response, this paper presents DEAN (Deep outdatEd fAct detectioN), a novel deep learning-based framework designed to identify outdated facts within KGs. DEAN distinguishes itself by capturing implicit structural information among facts through comprehensive modeling of both entities and relations. To effectively uncover latent out-of-date information, DEAN employs a contrastive approach based on a pre-defined Relations-to-Nodes (R2N) graph, weighted by the number of entities. Experimental results demonstrate the effectiveness and superiority of DEAN over state-of-the-art baseline methods.
Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study
Authors: Authors: Joy Krishan Das, Saikat Mondal, Chanchal K.Roy
Abstract
Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solving efficiency. While previous studies have demonstrated the potential of ChatGPT in areas such as automatic program repair, debugging, and code generation, there is a lack of study on how developers explicitly utilize ChatGPT to resolve issues in their tracking system. Hence, this study aims to examine the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution. In addition, we assess the code reliability by confirming if the code produced by ChatGPT was integrated into the project's codebase using the clone detection tool NiCad. Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code, possibly due to concerns over the generation of "hallucinated code", as highlighted in the literature.
BotSSCL: Social Bot Detection with Self-Supervised Contrastive Learning
Authors: Authors: Mohammad Majid Akhtar, Navid Shadman Bhuiyan, Rahat Masood, Muhammad Ikram, Salil S. Kanhere
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract
The detection of automated accounts, also known as "social bots", has been an increasingly important concern for online social networks (OSNs). While several methods have been proposed for detecting social bots, significant research gaps remain. First, current models exhibit limitations in detecting sophisticated bots that aim to mimic genuine OSN users. Second, these methods often rely on simplistic profile features, which are susceptible to manipulation. In addition to their vulnerability to adversarial manipulations, these models lack generalizability, resulting in subpar performance when trained on one dataset and tested on another. To address these challenges, we propose a novel framework for social Bot detection with Self-Supervised Contrastive Learning (BotSSCL). Our framework leverages contrastive learning to distinguish between social bots and humans in the embedding space to improve linear separability. The high-level representations derived by BotSSCL enhance its resilience to variations in data distribution and ensure generalizability. We evaluate BotSSCL's robustness against adversarial attempts to manipulate bot accounts to evade detection. Experiments on two datasets featuring sophisticated bots demonstrate that BotSSCL outperforms other supervised, unsupervised, and self-supervised baseline methods. We achieve approx. 6% and approx. 8% higher (F1) performance than SOTA on both datasets. In addition, BotSSCL also achieves 67% F1 when trained on one dataset and tested with another, demonstrating its generalizability. Lastly, BotSSCL increases adversarial complexity and only allows 4% success to the adversary in evading detection.
INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection
Authors: Authors: Chao Chen, Kai Liu, Ze Chen, Yi Gu, Yue Wu, Mingyuan Tao, Zhihang Fu, Jieping Ye
Abstract
Knowledge hallucination have raised widespread concerns for the security and reliability of deployed LLMs. Previous efforts in detecting hallucinations have been employed at logit-level uncertainty estimation or language-level self-consistency evaluation, where the semantic information is inevitably lost during the token-decoding procedure. Thus, we propose to explore the dense semantic information retained within LLMs' \textbf{IN}ternal \textbf{S}tates for halluc\textbf{I}nation \textbf{DE}tection (\textbf{INSIDE}). In particular, a simple yet effective \textbf{EigenScore} metric is proposed to better evaluate responses' self-consistency, which exploits the eigenvalues of responses' covariance matrix to measure the semantic consistency/diversity in the dense embedding space. Furthermore, from the perspective of self-consistent hallucination detection, a test time feature clipping approach is explored to truncate extreme activations in the internal states, which reduces overconfident generations and potentially benefits the detection of overconfident hallucinations. Extensive experiments and ablation studies are performed on several popular LLMs and question-answering (QA) benchmarks, showing the effectiveness of our proposal.
MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction
Abstract
Neural implicit representations have recently been demonstrated in many fields including Simultaneous Localization And Mapping (SLAM). Current neural SLAM can achieve ideal results in reconstructing bounded scenes, but this relies on the input of RGB-D images. Neural-based SLAM based only on RGB images is unable to reconstruct the scale of the scene accurately, and it also suffers from scale drift due to errors accumulated during tracking. To overcome these limitations, we present MoD-SLAM, a monocular dense mapping method that allows global pose optimization and 3D reconstruction in real-time in unbounded scenes. Optimizing scene reconstruction by monocular depth estimation and using loop closure detection to update camera pose enable detailed and precise reconstruction on large scenes. Compared to previous work, our approach is more robust, scalable and versatile. Our experiments demonstrate that MoD-SLAM has more excellent mapping performance than prior neural SLAM methods, especially in large borderless scenes.
Weakly Supervised Anomaly Detection via Knowledge-Data Alignment
Authors: Authors: Haihong Zhao, Chenyi Zi, Yang Liu, Chen Zhang, Yan Zhou, Jia Li
Abstract
Anomaly detection (AD) plays a pivotal role in numerous web-based applications, including malware detection, anti-money laundering, device failure detection, and network fault analysis. Most methods, which rely on unsupervised learning, are hard to reach satisfactory detection accuracy due to the lack of labels. Weakly Supervised Anomaly Detection (WSAD) has been introduced with a limited number of labeled anomaly samples to enhance model performance. Nevertheless, it is still challenging for models, trained on an inadequate amount of labeled data, to generalize to unseen anomalies. In this paper, we introduce a novel framework Knowledge-Data Alignment (KDAlign) to integrate rule knowledge, typically summarized by human experts, to supplement the limited labeled data. Specifically, we transpose these rules into the knowledge space and subsequently recast the incorporation of knowledge as the alignment of knowledge and data. To facilitate this alignment, we employ the Optimal Transport (OT) technique. We then incorporate the OT distance as an additional loss term to the original objective function of WSAD methodologies. Comprehensive experimental results on five real-world datasets demonstrate that our proposed KDAlign framework markedly surpasses its state-of-the-art counterparts, achieving superior performance across various anomaly types.
Face Detection: Present State and Research Directions
Abstract
The majority of computer vision applications that handle images featuring humans use face detection as a core component. Face detection still has issues, despite much research on the topic. Face detection's accuracy and speed might yet be increased. This review paper shows the progress made in this area as well as the substantial issues that still need to be tackled. The paper provides research directions that can be taken up as research projects in the field of face detection.
NK Hybrid Genetic Algorithm for Clustering
Authors: Authors: Renato Tinós, Liang Zhao, Francisco Chicano, Darrell Whitley
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Abstract
The NK hybrid genetic algorithm for clustering is proposed in this paper. In order to evaluate the solutions, the hybrid algorithm uses the NK clustering validation criterion 2 (NKCV2). NKCV2 uses information about the disposition of $N$ small groups of objects. Each group is composed of $K+1$ objects of the dataset. Experimental results show that density-based regions can be identified by using NKCV2 with fixed small $K$. In NKCV2, the relationship between decision variables is known, which in turn allows us to apply gray box optimization. Mutation operators, a partition crossover, and a local search strategy are proposed, all using information about the relationship between decision variables. In partition crossover, the evaluation function is decomposed into $q$ independent components; partition crossover then deterministically returns the best among $2^q$ possible offspring with computational complexity $O(N)$. The NK hybrid genetic algorithm allows the detection of clusters with arbitrary shapes and the automatic estimation of the number of clusters. In the experiments, the NK hybrid genetic algorithm produced very good results when compared to another genetic algorithm approach and to state-of-art clustering algorithms.
A new method for optical steel rope non-destructive damage detection
Authors: Authors: Yunqing Bao, Bin Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This paper presents a novel algorithm for non-destructive damage detection for steel ropes in high-altitude environments (aerial ropeway). The algorithm comprises two key components: First, a segmentation model named RGBD-UNet is designed to accurately extract steel ropes from complex backgrounds. This model is equipped with the capability to process and combine color and depth information through the proposed CMA module. Second, a detection model named VovNetV3.5 is developed to differentiate between normal and abnormal steel ropes. It integrates the VovNet architecture with a DBB module to enhance performance. Besides, a novel background augmentation method is proposed to enhance the generalization ability of the segmentation model. Datasets containing images of steel ropes in different scenarios are created for the training and testing of both the segmentation and detection models. Experiments demonstrate a significant improvement over baseline models. On the proposed dataset, the highest accuracy achieved by the detection model reached 0.975, and the maximum F-measure achieved by the segmentation model reached 0.948.
Efficient Generation of Hidden Outliers for Improved Outlier Detection
Authors: Authors: Jose Cribeiro-Ramallo, Vadim Arzamasov, Klemens Böhm
Abstract
Outlier generation is a popular technique used for solving important outlier detection tasks. Generating outliers with realistic behavior is challenging. Popular existing methods tend to disregard the 'multiple views' property of outliers in high-dimensional spaces. The only existing method accounting for this property falls short in efficiency and effectiveness. We propose BISECT, a new outlier generation method that creates realistic outliers mimicking said property. To do so, BISECT employs a novel proposition introduced in this article stating how to efficiently generate said realistic outliers. Our method has better guarantees and complexity than the current methodology for recreating 'multiple views'. We use the synthetic outliers generated by BISECT to effectively enhance outlier detection in diverse datasets, for multiple use cases. For instance, oversampling with BISECT reduced the error by up to 3 times when compared with the baselines.
AED: Adaptable Error Detection for Few-shot Imitation Policy
Authors: Authors: Jia-Fong Yeh, Kuo-Han Hung, Pang-Chi Lo, Chi-Ming Chung, Tsung-Han Wu, Hung-Ting Su, Yi-Ting Chen, Winston H. Hsu
Abstract
We study how to report few-shot imitation (FSI) policies' behavior errors in novel environments, a novel task named adaptable error detection (AED). The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsistent with the intent of demonstrations. We develop a cross-domain benchmark for the challenging AED task, consisting of 329 base and 158 novel environments. This task introduces three challenges, including (1) detecting behavior errors in novel environments, (2) behavior errors occurring without revealing notable changes, and (3) lacking complete temporal information of the rollout due to the necessity of online detection. To address these challenges, we propose Pattern Observer (PrObe) to parse discernible patterns in the policy feature representations of normal or error states, whose effectiveness is verified in the proposed benchmark. Through our comprehensive evaluation, PrObe consistently surpasses strong baselines and demonstrates a robust capability to identify errors arising from a wide range of FSI policies. Moreover, we conduct comprehensive ablations and experiments (error correction, demonstration quality, etc.) to validate the practicality of our proposed task and methodology.
Can Large Language Models Detect Rumors on Social Media?
Authors: Authors: Qiang Liu, Xiang Tao, Junfei Wu, Shu Wu, Liang Wang
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Abstract
In this work, we investigate to use Large Language Models (LLMs) for rumor detection on social media. However, it is challenging for LLMs to reason over the entire propagation information on social media, which contains news contents and numerous comments, due to LLMs may not concentrate on key clues in the complex propagation information, and have trouble in reasoning when facing massive and redundant information. Accordingly, we propose an LLM-empowered Rumor Detection (LeRuD) approach, in which we design prompts to teach LLMs to reason over important clues in news and comments, and divide the entire propagation information into a Chain-of-Propagation for reducing LLMs' burden. We conduct extensive experiments on the Twitter and Weibo datasets, and LeRuD outperforms several state-of-the-art rumor detection models by 2.4% to 7.6%. Meanwhile, by applying LLMs, LeRuD requires no data for training, and thus shows more promising rumor detection ability in few-shot or zero-shot scenarios.
Joint Beamforming Design for the STAR-RIS-Enabled ISAC Systems with Multiple Targets and Multiple Users
Abstract
In this paper, the sensing beam pattern gain under simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS)-enabled integrated sensing and communications (ISAC) systems is investigated, in which multiple targets and multiple users exist. However, multiple targets detection introduces new challenges, since the STAR-RIS cannot directly send sensing beams and detect targets, the dual-functional base station (DFBS) is required to analyze the echoes of the targets. While the echoes reflected by different targets through STAR-RIS come from the same direction for the DFBS, making it impossible to distinguish them. To address the issue, we first introduce the signature sequence (SS) modulation scheme to the ISAC system, and thus, the DFBS can detect different targets by the SS-modulated sensing beams. Next, via the joint beamforming design of DFBS and STAR-RIS, we develop a maxmin sensing beam pattern gain problem, and meanwhile, considering the communication quality requirements, the interference limitations of other targets and users, the passive nature constraint of STAR-RIS, and the total transmit power limitation. Then, to tackle the complex non-convex problem, we propose an alternating optimization method to divide it into two quadratic semidefinite program subproblems and decouple the coupled variables. Drawing on mathematical transformation, semidefinite programming, as well as semidefinite relaxation techniques, these two subproblems are iteratively sloved until convergence, and the ultimate solutions are obtained. Finally, simulation results are conducted to validate the benefits and efficiency of our proposed scheme.
A Digital Twin Design Methodology for Control, Simulation, and Monitoring of Fluidic Circuits
Authors: Authors: Veyis Gunes
Subjects: Systems and Control (eess.SY); Emerging Technologies (cs.ET)
Abstract
We propose a synthesis method for the design of digital twins applicable to various systems (pneumatic, hydraulic, electrical/electronic circuits). The methodology allows representing the operation of these systems through an active digital twin, thereby enabling a more suitable and easier computer-aided design, simulation, control, and monitoring. Furthermore, our methodology enables the detection of a system's actions on its own inputs (for example, in pneumatics: backflow of gases trapped in part of a fluidic system onto its own inputs). During the simulation or monitoring phase, the approach also facilitates real-time diagnosis of the controlled system. The outputs, on the controlled physical system or its digital twin, do not depend only on the current inputs but also on the history of the inputs and the history of internal states and variables. In other words, the underlying sequential logic has a memory while an only combinational logic approach does not. These capabilities can contribute to the digital transformation of the factory of the future.
Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing
Authors: Authors: Jongmin Yu, Chen Bene Chi, Sebastiano Fichera, Paolo Paoletti, Devansh Mehta, Shan Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Road pavement detection and segmentation are critical for developing autonomous road repair systems. However, developing an instance segmentation method that simultaneously performs multi-class defect detection and segmentation is challenging due to the textural simplicity of road pavement image, the diversity of defect geometries, and the morphological ambiguity between classes. We propose a novel end-to-end method for multi-class road defect detection and segmentation. The proposed method comprises multiple spatial and channel-wise attention blocks available to learn global representations across spatial and channel-wise dimensions. Through these attention blocks, more globally generalised representations of morphological information (spatial characteristics) of road defects and colour and depth information of images can be learned. To demonstrate the effectiveness of our framework, we conducted various ablation studies and comparisons with prior methods on a newly collected dataset annotated with nine road defect classes. The experiments show that our proposed method outperforms existing state-of-the-art methods for multi-class road defect detection and segmentation methods.
The Use of a Large Language Model for Cyberbullying Detection
Abstract
The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in todays cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.
Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures
Authors: Authors: Alberto Corpas, Luis Costero, Guillermo Botella, Francisco D. Igual, Carlos García, Manuel Rodríguez
Abstract
This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost Asymmetric Multicore Processors (AMPs) with limited power budget. A modelling and task scheduling/allocation is proposed in order to efficiently make use of the existing features on big.LITTLE ARM processors, including: (I) source-code adaptation for parallel computing, which enables code acceleration by applying the OmpSs programming model, a task-based programming model that handles data-dependencies between tasks in a transparent fashion; (II) different OmpSs task allocation policies which take into account the processor asymmetry and can dynamically set processing resources in a more efficient way based on their particular features. The proposed mechanism can be efficiently applied to take advantage of the processing elements existing on low-cost and low-energy multi-core embedded devices executing object detection algorithms based on cascading classifiers. Although these classifiers yield the best results for detection algorithms in the field of computer vision, their high computational requirements prevent them from being used on these devices under real-time requirements. Finally, we compare the energy efficiency of a heterogeneous architecture based on asymmetric multicore processors with a suitable task scheduling, with that of a homogeneous symmetric architecture.
Use of Multi-CNNs for Section Analysis in Static Malware Detection
Authors: Authors: Tony Quertier, Grégoire Barrué
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
Existing research on malware detection focuses almost exclusively on the detection rate. However, in some cases, it is also important to understand the results of our algorithm, or to obtain more information, such as where to investigate in the file for an analyst. In this aim, we propose a new model to analyze Portable Executable files. Our method consists in splitting the files in different sections, then transform each section into an image, in order to train convolutional neural networks to treat specifically each identified section. Then we use all these scores returned by CNNs to compute a final detection score, using models that enable us to improve our analysis of the importance of each section in the final score.
Advancing Legal Reasoning: The Integration of AI to Navigate Complexities and Biases in Global Jurisprudence with Semi-Automated Arbitration Processes (SAAPs)
Authors: Authors: Michael De'Shazer
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract
This study consists of a novel approach toward the analysis of court judgments spanning five countries, including the United States, the United Kingdom, Rwanda, Sweden and Hong Kong. This study also explores the intersection of the latest advancements in artificial intelligence (AI) and legal analysis, emphasizing the role of AI (specifically generative AI) in identifying human biases and facilitating automated, valid, and coherent multisided argumentation of court judgments with the goal of ensuring consistent application of laws in and across various jurisdictions. By incorporating Advanced Language Models (ALMs) and a newly introduced human-AI collaborative framework, this paper seeks to analyze Grounded Theory-based research design with Advanced Language Models (ALMs) in the practice of law. SHIRLEY is the name of the AI-based application (built on top of OpenAI's GPT technology), focusing on detecting logical inconsistencies and biases across various legal decisions. SHIRLEY analysis is aggregated and is accompanied by a comparison-oriented AI-based application called SAM (also an ALM) to identify relative deviations in SHIRLEY bias detections. Further, a CRITIC is generated within semi-autonomous arbitration process via the ALM, SARA. A novel approach is introduced in the utilization of an AI arbitrator to critically evaluate biases and qualitative-in-nature nuances identified by the aforementioned AI applications (SAM in concert with SHIRLEY), based on the Hague Rules on Business and Human Rights Arbitration. This Semi-Automated Arbitration Process (SAAP) aims to uphold the integrity and fairness of legal judgments by ensuring a nuanced debate-resultant "understanding" through a hybrid system of AI and human-based collaborative analysis.
Human Emotions Analysis and Recognition Using EEG Signals in Response to 360$^\circ$ Videos
Authors: Authors: Haseeb ur Rahman Abbasi, Zeeshan Rashid, Muhammad Majid, Syed Muhammad Anwar
Abstract
Emotion recognition (ER) technology is an integral part for developing innovative applications such as drowsiness detection and health monitoring that plays a pivotal role in contemporary society. This study delves into ER using electroencephalography (EEG), within immersive virtual reality (VR) environments. There are four main stages in our proposed methodology including data acquisition, pre-processing, feature extraction, and emotion classification. Acknowledging the limitations of existing 2D datasets, we introduce a groundbreaking 3D VR dataset to elevate the precision of emotion elicitation. Leveraging the Interaxon Muse headband for EEG recording and Oculus Quest 2 for VR stimuli, we meticulously recorded data from 40 participants, prioritizing subjects without reported mental illnesses. Pre-processing entails rigorous cleaning, uniform truncation, and the application of a Savitzky-Golay filter to the EEG data. Feature extraction encompasses a comprehensive analysis of metrics such as power spectral density, correlation, rational and divisional asymmetry, and power spectrum. To ensure the robustness of our model, we employed a 10-fold cross-validation, revealing an average validation accuracy of 85.54\%, with a noteworthy maximum accuracy of 90.20\% in the best fold. Subsequently, the trained model demonstrated a commendable test accuracy of 82.03\%, promising favorable outcomes.
COPS: A Compact On-device Pipeline for real-time Smishing detection
Authors: Authors: Harichandana B S S, Sumit Kumar, Manjunath Bhimappa Ujjinakoppa, Barath Raj Kandur Raja
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
Smartphones have become indispensable in our daily lives and can do almost everything, from communication to online shopping. However, with the increased usage, cybercrime aimed at mobile devices is rocketing. Smishing attacks, in particular, have observed a significant upsurge in recent years. This problem is further exacerbated by the perpetrator creating new deceptive websites daily, with an average life cycle of under 15 hours. This renders the standard practice of keeping a database of malicious URLs ineffective. To this end, we propose a novel on-device pipeline: COPS that intelligently identifies features of fraudulent messages and URLs to alert the user in real-time. COPS is a lightweight pipeline with a detection module based on the Disentangled Variational Autoencoder of size 3.46MB for smishing and URL phishing detection, and we benchmark it on open datasets. We achieve an accuracy of 98.15% and 99.5%, respectively, for both tasks, with a false negative and false positive rate of a mere 0.037 and 0.015, outperforming previous works with the added advantage of ensuring real-time alerts on resource-constrained devices.
SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models
Authors: Authors: Yichen Shi, Yuhao Gao, Yingxin Lai, Hongyang Wang, Jun Feng, Lei He, Jun Wan, Changsheng Chen, Zitong Yu, Xiaochun Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multimodal large language models (MLLMs) have demonstrated remarkable problem-solving capabilities in various vision fields (e.g., generic object recognition and grounding) based on strong visual semantic representation and language reasoning ability. However, whether MLLMs are sensitive to subtle visual spoof/forged clues and how they perform in the domain of face attack detection (e.g., face spoofing and forgery detection) is still unexplored. In this paper, we introduce a new benchmark, namely SHIELD, to evaluate the ability of MLLMs on face spoofing and forgery detection. Specifically, we design true/false and multiple-choice questions to evaluate multimodal face data in these two face security tasks. For the face anti-spoofing task, we evaluate three different modalities (i.e., RGB, infrared, depth) under four types of presentation attacks (i.e., print attack, replay attack, rigid mask, paper mask). For the face forgery detection task, we evaluate GAN-based and diffusion-based data with both visual and acoustic modalities. Each question is subjected to both zero-shot and few-shot tests under standard and chain of thought (COT) settings. The results indicate that MLLMs hold substantial potential in the face security domain, offering advantages over traditional specific models in terms of interpretability, multimodal flexible reasoning, and joint face spoof and forgery detection. Additionally, we develop a novel Multi-Attribute Chain of Thought (MA-COT) paradigm for describing and judging various task-specific and task-irrelevant attributes of face images, which provides rich task-related knowledge for subtle spoof/forged clue mining. Extensive experiments in separate face anti-spoofing, separate face forgery detection, and joint detection tasks demonstrate the effectiveness of the proposed MA-COT. The project is available at https$:$//github.com/laiyingxin2/SHIELD
Keyword: face recognition
There is no result
Keyword: augmentation
RTHDet: Rotate Table Area and Head Detection in images
Authors: Authors: Wenxing Hu, Minglei Tong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Traditional models focus on horizontal table detection but struggle in rotating contexts, limiting progress in table recognition. This paper introduces a new task: detecting table regions and localizing head-tail parts in rotation scenarios. We propose corresponding datasets, evaluation metrics, and methods. Our novel method, 'Adaptively Bounded Rotation,' addresses dataset scarcity in detecting rotated tables and their head-tail parts. We produced 'TRR360D,' a dataset incorporating semantic information of table head and tail, based on 'ICDAR2019MTD.' A new metric, 'R360 AP,' measures precision in detecting rotated regions and localizing head-tail parts. Our baseline, the high-speed and accurate 'RTMDet-S,' is chosen after extensive review and testing. We introduce 'RTHDet,' enhancing the baseline with a 'r360' rotated rectangle angle representation and an 'Angle Loss' branch, improving head-tail localization. By applying transfer learning and adaptive boundary rotation augmentation, RTHDet's AP50 (T<90) improved from 23.7% to 88.7% compared to the baseline. This demonstrates RTHDet's effectiveness in detecting rotating table regions and accurately localizing head and tail parts.RTHDet is integrated into the widely-used open-source MMRotate toolkit: https://github.com/open-mmlab/mmrotate/tree/dev-1.x/projects/RR360.
Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations
Authors: Authors: Helen Qu, Sang Michael Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Models trained on a labeled source domain (e.g., labeled images from wildlife camera traps) often generalize poorly when deployed on an out-of-distribution (OOD) target domain (e.g., images from new camera trap locations). In the domain adaptation setting where unlabeled target data is available, self-supervised pretraining (e.g., masked autoencoding or contrastive learning) is a promising method to mitigate this performance drop. Pretraining improves OOD error when the generic data augmentations used (e.g., masking or cropping) connect the source and target domains, which may be far apart in the input space. In this paper, we show on real-world tasks that standard fine-tuning after pretraining does not consistently improve OOD error over simply training from scratch on labeled source data. To better leverage pretraining for distribution shifts, we propose Connect Later: after pretraining with generic augmentations, fine-tune with targeted augmentations designed with knowledge of the distribution shift. Pretraining learns good representations within the source and target domains, while targeted augmentations connect the domains better during fine-tuning. Connect Later improves average OOD error over standard fine-tuning and supervised learning with targeted augmentations on 4 real-world datasets: Connect Later achieves the state-of-the-art on astronomical time-series classification (AstroClassification) by 2.5%, wildlife species identification (iWildCam-WILDS) with ResNet-50 by 0.9%, and tumor identification (Camelyon17-WILDS) with DenseNet121 by 1.1%; as well as best performance on a new dataset for astronomical time-series redshift prediction (Redshifts) by 0.03 RMSE (11% relative). Code and datasets are available at https://github.com/helenqu/connect-later.
Interplay of Semantic Communication and Knowledge Learning
Abstract
In the swiftly advancing realm of communication technologies, Semantic Communication (SemCom), which emphasizes knowledge understanding and processing, has emerged as a hot topic. By integrating artificial intelligence technologies, SemCom facilitates a profound understanding, analysis and transmission of communication content. In this chapter, we clarify the means of knowledge learning in SemCom with a particular focus on the utilization of Knowledge Graphs (KGs). Specifically, we first review existing efforts that combine SemCom with knowledge learning. Subsequently, we introduce a KG-enhanced SemCom system, wherein the receiver is carefully calibrated to leverage knowledge from its static knowledge base for ameliorating the decoding performance. Contingent upon this framework, we further explore potential approaches that can empower the system to operate in evolving knowledge base more effectively. Furthermore, we investigate the possibility of integration with Large Language Models (LLMs) for data augmentation, offering additional perspective into the potential implementation means of SemCom. Extensive numerical results demonstrate that the proposed framework yields superior performance on top of the KG-enhanced decoding and manifests its versatility under different scenarios.
Neural networks for abstraction and reasoning: Towards broad generalization in machines
Authors: Authors: Mikel Bober-Irizar, Soumya Banerjee
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
For half a century, artificial intelligence research has attempted to reproduce the human qualities of abstraction and reasoning - creating computer systems that can learn new concepts from a minimal set of examples, in settings where humans find this easy. While specific neural networks are able to solve an impressive range of problems, broad generalisation to situations outside their training data has proved elusive.In this work, we look at several novel approaches for solving the Abstraction & Reasoning Corpus (ARC), a dataset of abstract visual reasoning tasks introduced to test algorithms on broad generalization. Despite three international competitions with $100,000 in prizes, the best algorithms still fail to solve a majority of ARC tasks and rely on complex hand-crafted rules, without using machine learning at all. We revisit whether recent advances in neural networks allow progress on this task. First, we adapt the DreamCoder neurosymbolic reasoning solver to ARC. DreamCoder automatically writes programs in a bespoke domain-specific language to perform reasoning, using a neural network to mimic human intuition. We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allows DreamCoder to solve ARC tasks, and propose a new recognition model that allows us to significantly improve on the previous best implementation.We also propose a new encoding and augmentation scheme that allows large language models (LLMs) to solve ARC tasks, and find that the largest models can solve some ARC tasks. LLMs are able to solve a different group of problems to state-of-the-art solvers, and provide an interesting way to complement other approaches. We perform an ensemble analysis, combining models to achieve better results than any system alone. Finally, we publish the arckit Python library to make future research on ARC easier.
Autopilot System for Depth and Pitch Control in Underwater Vehicles: Navigating Near-Surface Waves and Disturbances
Authors: Authors: Vladimir Petrov, Gage MacLin, Venanzio Cichella
Abstract
This paper introduces a framework for depth and pitch control of underwater vehicles in near-surface wave conditions. By effectively managing tail, sail plane angles and hover tank operations utilizing a Linear Quadratic Regulator controller and L1 Adaptive Autopilot augmentation, the system ensures balanced control input distribution and significantly attenuates wave disturbances. This development in underwater vehicle control systems offers potential for improved functionality across a range of marine applications. The proposed framework is demonstrated to be robust in a variety of wave conditions, enabling more precise navigation and improved safety in operational scenarios. The effectiveness of this control strategy is validated through extensive simulations using the Joubert BB2 model.
Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning
Authors: Authors: Trilok Padhi, Ugur Kursuncu, Yaman Kumar, Valerie L. Shalin, Lane Peterson Fronczek
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract
The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory.
SHMC-Net: A Mask-guided Feature Fusion Network for Sperm Head Morphology Classification
Authors: Authors: Nishchal Sapkota, Yejia Zhang, Sirui Li, Peixian Liang, Zhuo Zhao, Danny Z Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Male infertility accounts for about one-third of global infertility cases. Manual assessment of sperm abnormalities through head morphology analysis encounters issues of observer variability and diagnostic discrepancies among experts. Its alternative, Computer-Assisted Semen Analysis (CASA), suffers from low-quality sperm images, small datasets, and noisy class labels. We propose a new approach for sperm head morphology classification, called SHMC-Net, which uses segmentation masks of sperm heads to guide the morphology classification of sperm images. SHMC-Net generates reliable segmentation masks using image priors, refines object boundaries with an efficient graph-based method, and trains an image network with sperm head crops and a mask network with the corresponding masks. In the intermediate stages of the networks, image and mask features are fused with a fusion scheme to better learn morphological features. To handle noisy class labels and regularize training on small datasets, SHMC-Net applies Soft Mixup to combine mixup augmentation and a loss function. We achieve state-of-the-art results on SCIAN and HuSHeM datasets, outperforming methods that use additional pre-training or costly ensembling techniques.
A new method for optical steel rope non-destructive damage detection
Authors: Authors: Yunqing Bao, Bin Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This paper presents a novel algorithm for non-destructive damage detection for steel ropes in high-altitude environments (aerial ropeway). The algorithm comprises two key components: First, a segmentation model named RGBD-UNet is designed to accurately extract steel ropes from complex backgrounds. This model is equipped with the capability to process and combine color and depth information through the proposed CMA module. Second, a detection model named VovNetV3.5 is developed to differentiate between normal and abnormal steel ropes. It integrates the VovNet architecture with a DBB module to enhance performance. Besides, a novel background augmentation method is proposed to enhance the generalization ability of the segmentation model. Datasets containing images of steel ropes in different scenarios are created for the training and testing of both the segmentation and detection models. Experiments demonstrate a significant improvement over baseline models. On the proposed dataset, the highest accuracy achieved by the detection model reached 0.975, and the maximum F-measure achieved by the segmentation model reached 0.948.
Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping
Authors: Authors: Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propose a novel and generic attacking strategy, called Deformation-Constrained Warping Attack (DeCoWA), that can be effectively applied to cross model genus attack. Specifically, DeCoWA firstly augments input examples via an elastic deformation, namely Deformation-Constrained Warping (DeCoW), to obtain rich local details of the augmented input. To avoid severe distortion of global semantics led by random deformation, DeCoW further constrains the strength and direction of the warping transformation by a novel adaptive control strategy. Extensive experiments demonstrate that the transferable examples crafted by our DeCoWA on CNN surrogates can significantly hinder the performance of Transformers (and vice versa) on various tasks, including image classification, video action recognition, and audio recognition. Code is made available at https://github.com/LinQinLiang/DeCoWA.
Efficient Availability Attacks against Supervised and Contrastive Learning Simultaneously
Abstract
Availability attacks can prevent the unauthorized use of private data and commercial datasets by generating imperceptible noise and making unlearnable examples before release. Ideally, the obtained unlearnability prevents algorithms from training usable models. When supervised learning (SL) algorithms have failed, a malicious data collector possibly resorts to contrastive learning (CL) algorithms to bypass the protection. Through evaluation, we have found that most of the existing methods are unable to achieve both supervised and contrastive unlearnability, which poses risks to data protection. Different from recent methods based on contrastive error minimization, we employ contrastive-like data augmentations in supervised error minimization or maximization frameworks to obtain attacks effective for both SL and CL. Our proposed AUE and AAP attacks achieve state-of-the-art worst-case unlearnability across SL and CL algorithms with less computation consumption, showcasing prospects in real-world applications.
Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation
Abstract
This study introduces Polyp-DDPM, a diffusion-based method for generating realistic images of polyps conditioned on masks, aimed at enhancing the segmentation of gastrointestinal (GI) tract polyps. Our approach addresses the challenges of data limitations, high annotation costs, and privacy concerns associated with medical images. By conditioning the diffusion model on segmentation masks-binary masks that represent abnormal areas-Polyp-DDPM outperforms state-of-the-art methods in terms of image quality (achieving a Frechet Inception Distance (FID) score of 78.47, compared to scores above 83.79) and segmentation performance (achieving an Intersection over Union (IoU) of 0.7156, versus less than 0.6694 for synthetic images from baseline models and 0.7067 for real data). Our method generates a high-quality, diverse synthetic dataset for training, thereby enhancing polyp segmentation models to be comparable with real images and offering greater data augmentation capabilities to improve segmentation models. The source code and pretrained weights for Polyp-DDPM are made publicly available at https://github.com/mobaidoctor/polyp-ddpm.
Low-Distortion Clustering with Ordinal and Limited Cardinal Information
Authors: Authors: Jakob Burkhardt, Ioannis Caragiannis, Karl Fehrs, Matteo Russo, Chris Schwiegelshohn, Sudarshan Shyam
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters, optimizing some social cost objective. The metric space is defined by a distance function $d$ between the agent locations. Information about $d$ is available only implicitly via $n$ rankings, through which each agent ranks all other agents in terms of their distance from her. Still, we would like to evaluate clustering algorithms in terms of social cost objectives that are defined using $d$. This is done using the notion of distortion, which measures how far from optimality a clustering can be, taking into account all underlying metrics that are consistent with the ordinal information available. Unfortunately, the most important clustering objectives do not admit algorithms with finite distortion. To sidestep this disappointing fact, we follow two alternative approaches: We first explore whether resource augmentation can be beneficial. We consider algorithms that use more than $k$ clusters but compare their social cost to that of the optimal $k$-clusterings. We show that using exponentially (in terms of $k$) many clusters, we can get low (constant or logarithmic) distortion for the $k$-center and $k$-median objectives. Interestingly, such an exponential blowup is shown to be necessary. More importantly, we explore whether limited cardinal information can be used to obtain better results. Somewhat surprisingly, for $k$-median and $k$-center, we show that a number of queries that is polynomial in $k$ and only logarithmic in $n$ (i.e., only sublinear in the number of agents for the most relevant scenarios in practice) is enough to get constant distortion.
Improved Generalization of Weight Space Networks via Augmentations
Authors: Authors: Aviv Shamsian, Aviv Navon, David W. Zhang, Yan Zhang, Ethan Fetaya, Gal Chechik, Haggai Maron
Abstract
Learning in deep weight spaces (DWS), where neural networks process the weights of other neural networks, is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs), as well as making inferences about other types of neural networks. Unfortunately, weight space models tend to suffer from substantial overfitting. We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets. While a given object can be represented by many different weight configurations, typical INR training sets fail to capture variability across INRs that represent the same object. To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces. We demonstrate the effectiveness of these methods in two setups. In classification, they improve performance similarly to having up to 10 times more data. In self-supervised contrastive learning, they yield substantial 5-10% gains in downstream classification.
Keyword: detection
RTHDet: Rotate Table Area and Head Detection in images
Techniques to Detect Crime Leaders within a Criminal Network: A Survey, Experimental, and Comparative Evaluations
Detection of tortured phrases in scientific literature
Weighted Conformal LiDAR-Mapping for Structured SLAM
A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model
Extending RAIM with a Gaussian Mixture of Opportunistic Information
Stitching the Spectrum: Semantic Spectrum Segmentation with Wideband Signal
Early prediction of onset of sepsis in Clinical Setting
How Does Unlabeled Data Provably Help Out-of-Distribution Detection?
nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model
A novel pattern recognition system for detecting Android malware by analyzing suspicious boot sequences
The Invisible Game on the Internet: A Case Study of Decoding Deceptive Patterns
Reverse Engineering and Security Evaluation of Commercial Tags for RFID-Based IoT Applications
Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning
Environment-Centric Learning Approach for Gait Synthesis in Terrestrial Soft Robots
BEAM: Beta Distribution Ray Denoising for Multi-view 3D Object Detection
Online Informative Sampling using Semantic Features in Underwater Environments
MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats
Enhancing Embodied Object Detection through Language-Image Pre-training and Implicit Object Memory
Deep Outdated Fact Detection in Knowledge Graphs
Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study
BotSSCL: Social Bot Detection with Self-Supervised Contrastive Learning
INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection
MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction
Weakly Supervised Anomaly Detection via Knowledge-Data Alignment
Face Detection: Present State and Research Directions
NK Hybrid Genetic Algorithm for Clustering
A new method for optical steel rope non-destructive damage detection
Efficient Generation of Hidden Outliers for Improved Outlier Detection
AED: Adaptable Error Detection for Few-shot Imitation Policy
Can Large Language Models Detect Rumors on Social Media?
Joint Beamforming Design for the STAR-RIS-Enabled ISAC Systems with Multiple Targets and Multiple Users
A Digital Twin Design Methodology for Control, Simulation, and Monitoring of Fluidic Circuits
Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing
The Use of a Large Language Model for Cyberbullying Detection
Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures
Use of Multi-CNNs for Section Analysis in Static Malware Detection
Advancing Legal Reasoning: The Integration of AI to Navigate Complexities and Biases in Global Jurisprudence with Semi-Automated Arbitration Processes (SAAPs)
Human Emotions Analysis and Recognition Using EEG Signals in Response to 360$^\circ$ Videos
COPS: A Compact On-device Pipeline for real-time Smishing detection
SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models
Keyword: face recognition
There is no result
Keyword: augmentation
RTHDet: Rotate Table Area and Head Detection in images
Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations
Interplay of Semantic Communication and Knowledge Learning
Neural networks for abstraction and reasoning: Towards broad generalization in machines
Autopilot System for Depth and Pitch Control in Underwater Vehicles: Navigating Near-Surface Waves and Disturbances
Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning
SHMC-Net: A Mask-guided Feature Fusion Network for Sperm Head Morphology Classification
A new method for optical steel rope non-destructive damage detection
Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping
Efficient Availability Attacks against Supervised and Contrastive Learning Simultaneously
Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation
Low-Distortion Clustering with Ordinal and Limited Cardinal Information
Improved Generalization of Weight Space Networks via Augmentations