New submissions for Mon, 1 Jan 24

Keyword: detection

TimePillars: Temporally-Recurrent 3D LiDAR Object Detection

Authors: Authors: Ernesto Lozano Calvo, Bernardo Taveira, Fredrik Kahl, Niklas Gustafsson, Jonathan Larsson, Adam Tonderski
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.17260
Pdf link: https://arxiv.org/pdf/2312.17260
Abstract Object detection applied to LiDAR point clouds is a relevant task in robotics, and particularly in autonomous driving. Single frame methods, predominant in the field, exploit information from individual sensor scans. Recent approaches achieve good performance, at relatively low inference time. Nevertheless, given the inherent high sparsity of LiDAR data, these methods struggle in long-range detection (e.g. 200m) which we deem to be critical in achieving safe automation. Aggregating multiple scans not only leads to a denser point cloud representation, but it also brings time-awareness to the system, and provides information about how the environment is changing. Solutions of this kind, however, are often highly problem-specific, demand careful data processing, and tend not to fulfil runtime requirements. In this context we propose TimePillars, a temporally-recurrent object detection pipeline which leverages the pillar representation of LiDAR data across time, respecting hardware integration efficiency constraints, and exploiting the diversity and long-range information of the novel Zenseact Open Dataset (ZOD). Through experimentation, we prove the benefits of having recurrency, and show how basic building blocks are enough to achieve robust and efficient results.
Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing
Authors: Authors: Juliano Pinto, Georg Hess, Yuxuan Xia, Henk Wymeersch, Lennart Svensson
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17261
Pdf link: https://arxiv.org/pdf/2312.17261
Abstract Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window. Several algorithms have been proposed to tackle the multi-object smoothing task, where object detections can be conditioned on all the measurements in the time window. However, the best-performing methods suffer from intractable computational complexity and require approximations, performing suboptimally in complex settings. Deep learning based algorithms are a possible venue for tackling this issue but have not been applied extensively in settings where accurate multi-object models are available and measurements are low-dimensional. We propose a novel DL architecture specifically tailored for this setting that decouples the data association task from the smoothing task. We compare the performance of the proposed smoother to the state-of-the-art in different tasks of varying difficulty and provide, to the best of our knowledge, the first comparison between traditional Bayesian trackers and DL trackers in the smoothing problem setting.
$μ$-Net: ConvNext-Based U-Nets for Cosmic Muon Tomography
Authors: Authors: Li Xin Jed Lim, Ziming Qiu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Instrumentation and Detectors (physics.ins-det)
Arxiv link: https://arxiv.org/abs/2312.17265
Pdf link: https://arxiv.org/pdf/2312.17265
Abstract Muon scattering tomography utilises muons, typically originating from cosmic rays to image the interiors of dense objects. However, due to the low flux of cosmic ray muons at sea-level and the highly complex interactions that muons display when travelling through matter, existing reconstruction algorithms often suffer from low resolution and high noise. In this work, we develop a novel two-stage deep learning algorithm, $\mu$-Net, consisting of an MLP to predict the muon trajectory and a ConvNeXt-based U-Net to convert the scattering points into voxels. $\mu$-Net achieves a state-of-the-art performance of 17.14 PSNR at the dosage of 1024 muons, outperforming traditional reconstruction algorithms such as the point of closest approach algorithm and maximum likelihood and expectation maximisation algorithm. Furthermore, we find that our method is robust to various corruptions such as inaccuracies in the muon momentum or a limited detector resolution. We also generate and publicly release the first large-scale dataset that maps muon detections to voxels. We hope that our research will spark further investigations into the potential of deep learning to revolutionise this field.
Anticipated Network Surveillance -- An extrapolated study to predict cyber-attacks using Machine Learning and Data Analytics
Authors: Authors: Aviral Srivastava, Dhyan Thakkar, Dr. Sharda Valiveti, Dr. Pooja Shah, Dr. Gaurang Raval
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17270
Pdf link: https://arxiv.org/pdf/2312.17270
Abstract Machine learning and data mining techniques are utiized for enhancement of the security of any network. Researchers used machine learning for pattern detection, anomaly detection, dynamic policy setting, etc. The methods allow the program to learn from data and make decisions without human intervention, consuming a huge training period and computation power. This paper discusses a novel technique to predict an upcoming attack in a network based on several data parameters. The dataset is continuous in real-time implementation. The proposed model comprises dataset pre-processing, and training, followed by the testing phase. Based on the results of the testing phase, the best model is selected using which, event class which may lead to an attack is extracted. The event statistics are used for attack
Intelligent Parsing: An Automated Parsing Framework for Extracting Design Semantics from E-commerce Creatives
Authors: Authors: Guandong Li, Xian Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.17283
Pdf link: https://arxiv.org/pdf/2312.17283
Abstract In the industrial e-commerce landscape, creative designs such as banners and posters are ubiquitous. Extracting structured semantic information from creative e-commerce design materials (manuscripts crafted by designers) to obtain design semantics represents a core challenge in the realm of intelligent design. In this paper, we propose a comprehensive automated framework for intelligently parsing creative materials. This framework comprises material recognition, preprocess, smartname, and label layers. The material recognition layer consolidates various detection and recognition interfaces, covering business aspects including detection of auxiliary areas within creative materials and layer-level detection, alongside label identification. Algorithmically, it encompasses a variety of coarse-to-fine methods such as Cascade RCNN, GFL, and other models. The preprocess layer involves filtering creative layers and grading creative materials. The smartname layer achieves intelligent naming for creative materials, while the label layer covers multi-level tagging for creative materials, enabling tagging at different hierarchical levels. Intelligent parsing constitutes a complete parsing framework that significantly aids downstream processes such as intelligent creation, creative optimization, and material library construction. Within the practical business applications at Suning, it markedly enhances the exposure, circulation, and click-through rates of creative materials, expediting the closed-loop production of creative materials and yielding substantial benefits.
AI Content Self-Detection for Transformer-based Large Language Models
Authors: Authors: Antônio Junior Alves Caiado, Michael Hahsler
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.17289
Pdf link: https://arxiv.org/pdf/2312.17289
Abstract $ $The usage of generative artificial intelligence (AI) tools based on large language models, including ChatGPT, Bard, and Claude, for text generation has many exciting applications with the potential for phenomenal productivity gains. One issue is authorship attribution when using AI tools. This is especially important in an academic setting where the inappropriate use of generative AI tools may hinder student learning or stifle research by creating a large amount of automatically generated derivative work. Existing plagiarism detection systems can trace the source of submitted text but are not yet equipped with methods to accurately detect AI-generated text. This paper introduces the idea of direct origin detection and evaluates whether generative AI systems can recognize their output and distinguish it from human-written texts. We argue why current transformer-based models may be able to self-detect their own generated text and perform a small empirical study using zero-shot learning to investigate if that is the case. Results reveal varying capabilities of AI systems to identify their generated text. Google's Bard model exhibits the largest capability of self-detection with an accuracy of 94\%, followed by OpenAI's ChatGPT with 83\%. On the other hand, Anthropic's Claude model seems to be not able to self-detect.
Towards Scalable Generation of Realistic Test Data for Duplicate Detection
Authors: Authors: Fabian Panse, Wolfram Wingerath, Benjamin Wollmer
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2312.17324
Pdf link: https://arxiv.org/pdf/2312.17324
Abstract Due to the increasing volume, volatility, and diversity of data in virtually all areas of our lives, the ability to detect duplicates in potentially linked data sources is more important than ever before. However, while research is already intensively engaged in adapting duplicate detection algorithms to the changing circumstances, existing test data generators are still designed for small -- mostly relational -- datasets and can thus fulfill their intended task only to a limited extent. In this report, we present our ongoing research on a novel approach for test data generation that -- in contrast to existing solutions -- is able to produce large test datasets with complex schemas and more realistic error patterns while being easy to use for inexperienced users.
Unmasking information manipulation: A quantitative approach to detecting Copy-pasta, Rewording, and Translation on Social Media
Authors: Authors: Manon Richard, Lisa Giordani, Cristian Brokate, Jean Liénard
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.17338
Pdf link: https://arxiv.org/pdf/2312.17338
Abstract This study proposes a comprehensive methodology for identifying three techniques utilized in foreign-operated information manipulation campaigns: Copy-Pasta, Rewording, and Translation. Our approach, dubbed the ``$3\Delta$-space duplicate methodology'', quantifies the semantic, grapheme, and language aspects of messages. Computing pairwise distances within these dimensions enables detection of abnormally close messages that are likely part of a coordinated campaign. We validate our approach using a synthetic dataset generated with ChatGPT and DeepL, further applying it to a real-world dataset on Venezuelan actors from Twitter Transparency. Our method successfully identifies all three types of inauthentic duplicates in the synthetic dataset, and is able to uncover inauthentic duplicates across political, commercial, and entertainment contexts in the Twitter dataset. The distinct focus on clustered alterations to messages, rather than individual messages, makes our approach efficient and effective at detecting large-scale instances of textual manipulation, including AI-generated ones. Moreover, our method offers a robust tool for identifying translated content, overlooked in previous research. This research also represents the first comprehensive analysis of copy-pasta detection, providing a reliable technique for tracking duplicate textual content across social networks.
Can you See me? On the Visibility of NOPs against Android Malware Detectors
Authors: Authors: Diego Soi, Davide Maiorca, Giorgio Giacinto, Harel Berger
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2312.17356
Pdf link: https://arxiv.org/pdf/2312.17356
Abstract Android malware still represents the most significant threat to mobile systems. While Machine Learning systems are increasingly used to identify these threats, past studies have revealed that attackers can bypass these detection mechanisms by making subtle changes to Android applications, such as adding specific API calls. These modifications are often referred to as No OPerations (NOP), which ideally should not alter the semantics of the program. However, many NOPs can be spotted and eliminated by refining the app analysis process. This paper proposes a visibility metric that assesses the difficulty in spotting NOPs and similar non-operational codes. We tested our metric on a state-of-the-art, opcode-based deep learning system for Android malware detection. We implemented attacks on the feature and problem spaces and calculated their visibility according to our metric. The attained results show an intriguing trade-off between evasion efficacy and detectability: our metric can be valuable to ensure the real effectiveness of an adversarial attack, also serving as a useful aid to develop better defenses.
Comparing roughness descriptors for distinct terrain surfaces in point cloud data
Authors: Authors: Lei Fan, Yang Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2312.17407
Pdf link: https://arxiv.org/pdf/2312.17407
Abstract Terrain surface roughness, often described abstractly, poses challenges in quantitative characterisation with various descriptors found in the literature. This study compares five commonly used roughness descriptors, exploring correlations among their quantified terrain surface roughness maps across three terrains with distinct spatial variations. Additionally, the study investigates the impacts of spatial scales and interpolation methods on these correlations. Dense point cloud data obtained through Light Detection and Ranging technique are used in this study. The findings highlight both global pattern similarities and local pattern distinctions in the derived roughness maps, emphasizing the significance of incorporating multiple descriptors in studies where local roughness values play a crucial role in subsequent analyses. The spatial scales were found to have a smaller impact on rougher terrain, while interpolation methods had minimal influence on roughness maps derived from different descriptors.
Social Bots: Detection and Challenges
Authors: Authors: Kai-Cheng Yang, Onur Varol, Alexander C. Nwala, Mohsen Sayyadiharikandeh, Emilio Ferrara, Alessandro Flammini, Filippo Menczer
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.17423
Pdf link: https://arxiv.org/pdf/2312.17423
Abstract While social media are a key source of data for computational social science, their ease of manipulation by malicious actors threatens the integrity of online information exchanges and their analysis. In this Chapter, we focus on malicious social bots, a prominent vehicle for such manipulation. We start by discussing recent studies about the presence and actions of social bots in various online discussions to show their real-world implications and the need for detection methods. Then we discuss the challenges of bot detection methods and use Botometer, a publicly available bot detection tool, as a case study to describe recent developments in this area. We close with a practical guide on how to handle social bots in social media research.
ChangeNet: Multi-Temporal Asymmetric Change Detection Dataset
Authors: Authors: Deyi Ji, Siqi Gao, Mingyuan Tao, Hongtao Lu, Feng Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17428
Pdf link: https://arxiv.org/pdf/2312.17428
Abstract Change Detection (CD) has been attracting extensive interests with the availability of bi-temporal datasets. However, due to the huge cost of multi-temporal images acquisition and labeling, existing change detection datasets are small in quantity, short in temporal, and low in practicability. Therefore, a large-scale practical-oriented dataset covering wide temporal phases is urgently needed to facilitate the community. To this end, the ChangeNet dataset is presented especially for multi-temporal change detection, along with the new task of ``Asymmetric Change Detection". Specifically, ChangeNet consists of 31,000 multi-temporal images pairs, a wide range of complex scenes from 100 cities, and 6 pixel-level annotated categories, which is far superior to all the existing change detection datasets including LEVIR-CD, WHU Building CD, etc.. In addition, ChangeNet contains amounts of real-world perspective distortions in different temporal phases on the same areas, which is able to promote the practical application of change detection algorithms. The ChangeNet dataset is suitable for both binary change detection (BCD) and semantic change detection (SCD) tasks. Accordingly, we benchmark the ChangeNet dataset on six BCD methods and two SCD methods, and extensive experiments demonstrate its challenges and great significance. The dataset is available at https://github.com/jankyee/ChangeNet.
MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World
Authors: Authors: Zheng Zhou, Hongbo Zhao, Ju Liu, Qiaosheng Zhang, Guangbiao Wang, Chunlei Wang, Wenquan Feng
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17431
Pdf link: https://arxiv.org/pdf/2312.17431
Abstract Recent research has shown that adversarial patches can manipulate outputs from object detection models. However, the conspicuous patterns on these patches may draw more attention and raise suspicions among humans. Moreover, existing works have primarily focused on the attack performance of individual models and have neglected the generation of adversarial patches for ensemble attacks on multiple object detection models. To tackle these concerns, we propose a novel approach referred to as the More Vivid Patch (MVPatch), which aims to improve the transferability and stealthiness of adversarial patches while considering the limitations observed in prior paradigms, such as easy identification and poor transferability. Our approach incorporates an attack algorithm that decreases object confidence scores of multiple object detectors by using the ensemble attack loss function, thereby enhancing the transferability of adversarial patches. Additionally, we propose a lightweight visual similarity measurement algorithm realized by the Compared Specified Image Similarity (CSS) loss function, which allows for the generation of natural and stealthy adversarial patches without the reliance on additional generative models. Extensive experiments demonstrate that the proposed MVPatch algorithm achieves superior attack transferability compared to similar algorithms in both digital and physical domains, while also exhibiting a more natural appearance. These findings emphasize the remarkable stealthiness and transferability of the proposed MVPatch attack algorithm.
LiDAR Odometry Survey: Recent Advancements and Remaining Challenges
Authors: Authors: Dongjae Lee, Minwoo Jung, Wooseong Yang, Ayoung Kim
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.17487
Pdf link: https://arxiv.org/pdf/2312.17487
Abstract Odometry is crucial for robot navigation, particularly in situations where global positioning methods like global positioning system (GPS) are unavailable. The main goal of odometry is to predict the robot's motion and accurately determine its current location. Various sensors, such as wheel encoder, inertial measurement unit (IMU), camera, radar, and Light Detection and Ranging (LiDAR), are used for odometry in robotics. LiDAR, in particular, has gained attention for its ability to provide rich three-dimensional (3D) data and immunity to light variations. This survey aims to examine advancements in LiDAR odometry thoroughly. We start by exploring LiDAR technology and then scrutinize LiDAR odometry works, categorizing them based on their sensor integration approaches. These approaches include methods relying solely on LiDAR, those combining LiDAR with IMU, strategies involving multiple LiDARs, and methods fusing LiDAR with other sensor modalities. In conclusion, we address existing challenges and outline potential future directions in LiDAR odometry. Additionally, we analyze public datasets and evaluation methods for LiDAR odometry. To our knowledge, this survey is the first comprehensive exploration of LiDAR odometry.
Operator learning for hyperbolic partial differential equations
Authors: Authors: Christopher Wang, Alex Townsend
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17489
Pdf link: https://arxiv.org/pdf/2312.17489
Abstract We construct the first rigorously justified probabilistic algorithm for recovering the solution operator of a hyperbolic partial differential equation (PDE) in two variables from input-output training pairs. The primary challenge of recovering the solution operator of hyperbolic PDEs is the presence of characteristics, along which the associated Green's function is discontinuous. Therefore, a central component of our algorithm is a rank detection scheme that identifies the approximate location of the characteristics. By combining the randomized singular value decomposition with an adaptive hierarchical partition of the domain, we construct an approximant to the solution operator using $O(\Psi\epsilon^{-1}\epsilon^{-7}\log(\Xi\epsilon^{-1}\epsilon^{-1}))$ input-output pairs with relative error $O(\Xi\epsilon^{-1}\epsilon)$ in the operator norm as $\epsilon\to0$, with high probability. Here, $\Psi\epsilon$ represents the existence of degenerate singular values of the solution operator, and $\Xi_\epsilon$ measures the quality of the training data. Our assumptions on the regularity of the coefficients of the hyperbolic PDE are relatively weak given that hyperbolic PDEs do not have the ``instantaneous smoothing effect'' of elliptic and parabolic PDEs, and our recovery rate improves as the regularity of the coefficients increases.
HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping
Authors: Authors: Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17492
Pdf link: https://arxiv.org/pdf/2312.17492
Abstract Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at a broader scale. Moreover, these methods cannot differentiate various semantics from multiple instances. To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP). Specifically, a novel lightweight head with cross-attention mechanism is designed to adaptively group intra-image patches into semantically coherent regions based on correlation among self-supervised features. Further, to ensure the distinguishability among various regions, we introduce a region-level contrastive clustering loss to pull closer similar regions across images. Also, an image-level contrastive loss is present to push foreground and background representations apart, with which foreground objects and background are accordingly discovered. HEAP facilitates efficient hierarchical image decomposition, which contributes to more accurate object discovery while also enabling differentiation among objects of various classes. Extensive experimental results on semantic segmentation retrieval, unsupervised object discovery, and saliency detection tasks demonstrate that HEAP achieves state-of-the-art performance.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Authors: Authors: Zetong Yang, Li Chen, Yanan Sun, Hongyang Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17655
Pdf link: https://arxiv.org/pdf/2312.17655
Abstract In contrast to extensive studies on general vision, pre-training for scalable visual autonomous driving remains seldom explored. Visual autonomous driving applications require features encompassing semantics, 3D geometry, and temporal information simultaneously for joint perception, prediction, and planning, posing dramatic challenges for pre-training. To resolve this, we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input. The key merit of this task captures the synergic learning of semantics, 3D structures, and temporal dynamics. Hence it shows superiority in various downstream tasks. To cope with this new problem, we present ViDAR, a general model to pre-train downstream visual encoders. It first extracts historical embeddings by the encoder. These representations are then transformed to 3D geometric space via a novel Latent Rendering operator for future point cloud prediction. Experiments show significant gain in downstream tasks, e.g., 3.1% NDS on 3D detection, ~10% error reduction on motion forecasting, and ~15% less collision rate on planning.
Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale
Authors: Authors: Hao Zhang, Shuaijie Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17663
Pdf link: https://arxiv.org/pdf/2312.17663
Abstract As an important component of the detector localization branch, bounding box regression loss plays a significant role in object detection tasks. The existing bounding box regression methods usually consider the geometric relationship between the GT box and the predicted box, and calculate the loss by using the relative position and shape of the bounding boxes, while ignoring the influence of inherent properties such as the shape and scale of the bounding boxes on bounding box regression. In order to make up for the shortcomings of existing research, this article proposes a bounding box regression method that focuses on the shape and scale of the bounding box itself. Firstly, we analyzed the regression characteristics of the bounding boxes and found that the shape and scale factors of the bounding boxes themselves will have an impact on the regression results. Based on the above conclusions, we propose the Shape IoU method, which can calculate the loss by focusing on the shape and scale of the bounding box itself, thereby making the bounding box regression more accurate. Finally, we validated our method through a large number of comparative experiments, which showed that our method can effectively improve detection performance and outperform existing methods, achieving state-of-the-art performance in different detection tasks.Code is available at https://github.com/malagoutou/Shape-IoU
Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models
Authors: Authors: Kay Liu, Hengrui Zhang, Ziqing Hu, Fangxin Wang, Philip S. Yu
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.17679
Pdf link: https://arxiv.org/pdf/2312.17679
Abstract Graph outlier detection is a prominent task of research and application in the realm of graph neural networks. It identifies the outlier nodes that exhibit deviation from the majority in the graph. One of the fundamental challenges confronting supervised graph outlier detection algorithms is the prevalent issue of class imbalance, where the scarcity of outlier instances compared to normal instances often results in suboptimal performance. Conventional methods mitigate the imbalance by reweighting instances in the estimation of the loss function, assigning higher weights to outliers and lower weights to inliers. Nonetheless, these strategies are prone to overfitting and underfitting, respectively. Recently, generative models, especially diffusion models, have demonstrated their efficacy in synthesizing high-fidelity images. Despite their extraordinary generation quality, their potential in data augmentation for supervised graph outlier detection remains largely underexplored. To bridge this gap, we introduce GODM, a novel data augmentation for mitigating class imbalance in supervised Graph Outlier detection with latent Diffusion Models. Specifically, our proposed method consists of three key components: (1) Variantioanl Encoder maps the heterogeneous information inherent within the graph data into a unified latent space. (2) Graph Generator synthesizes graph data that are statistically similar to real outliers from latent space, and (3) Latent Diffusion Model learns the latent space distribution of real organic data by iterative denoising. Extensive experiments conducted on multiple datasets substantiate the effectiveness and efficiency of GODM. The case study further demonstrated the generation quality of our synthetic data. To foster accessibility and reproducibility, we encapsulate GODM into a plug-and-play package and release it at the Python Package Index (PyPI).
Malware Detection in IOT Systems Using Machine Learning Techniques
Authors: Authors: Ali Mehrban, Pegah Ahadian
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2312.17683
Pdf link: https://arxiv.org/pdf/2312.17683
Abstract Malware detection in IoT environments necessitates robust methodologies. This study introduces a CNN-LSTM hybrid model for IoT malware identification and evaluates its performance against established methods. Leveraging K-fold cross-validation, the proposed approach achieved 95.5% accuracy, surpassing existing methods. The CNN algorithm enabled superior learning model construction, and the LSTM classifier exhibited heightened accuracy in classification. Comparative analysis against prevalent techniques demonstrated the efficacy of the proposed model, highlighting its potential for enhancing IoT security. The study advocates for future exploration of SVMs as alternatives, emphasizes the need for distributed detection strategies, and underscores the importance of predictive analyses for a more powerful IOT security. This research serves as a platform for developing more resilient security measures in IoT ecosystems.
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Authors: Authors: Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17686
Pdf link: https://arxiv.org/pdf/2312.17686
Abstract Action Localization is a challenging problem that combines detection and recognition tasks, which are often addressed separately. State-of-the-art methods rely on off-the-shelf bounding box detections pre-computed at high resolution and propose transformer models that focus on the classification task alone. Such two-stage solutions are prohibitive for real-time deployment. On the other hand, single-stage methods target both tasks by devoting part of the network (generally the backbone) to sharing the majority of the workload, compromising performance for speed. These methods build on adding a DETR head with learnable queries that, after cross- and self-attention can be sent to corresponding MLPs for detecting a person's bounding box and action. However, DETR-like architectures are challenging to train and can incur in big complexity. In this paper, we observe that a straight bipartite matching loss can be applied to the output tokens of a vision transformer. This results in a backbone + MLP architecture that can do both tasks without the need of an extra encoder-decoder head and learnable queries. We show that a single MViT-S architecture trained with bipartite matching to perform both tasks surpasses the same MViT-S when trained with RoI align on pre-computed bounding boxes. With a careful design of token pooling and the proposed training pipeline, our MViTv2-S model achieves +3 mAP on AVA2.2. w.r.t. the two-stage counterpart. Code and models will be released after paper revision.
TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models
Authors: Authors: Felipe Oliveira, Victoria Reis, Nelson Ebecken
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.17704
Pdf link: https://arxiv.org/pdf/2312.17704
Abstract Social media has become integral to human interaction, providing a platform for communication and expression. However, the rise of hate speech on these platforms poses significant risks to individuals and communities. Detecting and addressing hate speech is particularly challenging in languages like Portuguese due to its rich vocabulary, complex grammar, and regional variations. To address this, we introduce TuPy-E, the largest annotated Portuguese corpus for hate speech detection. TuPy-E leverages an open-source approach, fostering collaboration within the research community. We conduct a detailed analysis using advanced techniques like BERT models, contributing to both academic understanding and practical applications
Comparing Effectiveness and Efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) Tools in a Large Java-based System
Authors: Authors: Aishwarya Seth, Saikath Bhattacharya, Sarah Elder, Nusrat Zahan, Laurie Williams
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2312.17726
Pdf link: https://arxiv.org/pdf/2312.17726
Abstract Security resources are scarce, and practitioners need guidance in the effective and efficient usage of techniques and tools available in the cybersecurity industry. Two emerging tool types, Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP), have not been thoroughly evaluated against well-established counterparts such as Dynamic Application Security Testing (DAST) and Static Application Security Testing (SAST). The goal of this research is to aid practitioners in making informed choices about the use of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools through an analysis of their effectiveness and efficiency in comparison with different vulnerability detection and prevention techniques and tools. We apply IAST and RASP on OpenMRS, an open-source Java-based online application. We compare the efficiency and effectiveness of IAST and RASP with techniques applied on OpenMRS in prior work. We measure efficiency and effectiveness in terms of the number and type of vulnerabilities detected and prevented per hour. Our study shows IAST performed relatively well compared to other techniques, performing second-best in both efficiency and effectiveness. IAST detected eight Top-10 OWASP security risks compared to nine by SMPT and seven for EMPT, DAST, and SAST. IAST found more vulnerabilities than SMPT. The efficiency of IAST (2.14 VpH) is second to only EMPT (2.22 VpH). These findings imply that our study benefited from using IAST when conducting black-box security testing. In the context of a large, enterprise-scale web application such as OpenMRS, RASP does not replace vulnerability detection, while IAST is a powerful tool that complements other techniques.
MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments
Authors: Authors: Andrew Fishberg, Brian Quiter, Jonathan P. How
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.17731
Pdf link: https://arxiv.org/pdf/2312.17731
Abstract Inter-agent relative localization is critical for many multi-robot systems operating in the absence of external positioning infrastructure or prior environmental knowledge. We propose a novel inter-agent relative 3D pose estimation system where each participating agent is equipped with several ultra-wideband (UWB) ranging tags. Prior work typically supplements noisy UWB range measurements with additional continuously transmitted data, such as odometry, leading to potential scaling issues with increased team size and/or decreased communication network capability. By equipping each agent with multiple UWB antennas, our approach addresses these concerns by using only locally collected UWB range measurements, a priori state constraints, and detections of when said constraints are violated. Leveraging our learned mean ranging bias correction, we gain a 19% positional error improvement giving us experimental mean absolute position and heading errors of 0.24m and 9.5 degrees respectively. When compared to other state-of-the-art approaches, our work demonstrates improved performance over similar systems, while remaining competitive with methods that have significantly higher communication costs. Additionally, we make our datasets available.
Keyword: face recognition

QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition
Authors: Authors: Youzhe Song, Feng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2312.17494
Pdf link: https://arxiv.org/pdf/2312.17494
Abstract The quality of a face crop in an image is decided by many factors such as camera resolution, distance, and illumination condition. This makes the discrimination of face images with different qualities a challenging problem in realistic applications. However, most existing approaches are designed specifically for high-quality (HQ) or low-quality (LQ) images, and the performances would degrade for the mixed-quality images. Besides, many methods ask for pre-trained feature extractors or other auxiliary structures to support the training and the evaluation. In this paper, we point out that the key to better understand both the HQ and the LQ images simultaneously is to apply different learning methods according to their qualities. We propose a novel quality-guided joint training approach for mixed-quality face recognition, which could simultaneously learn the images of different qualities with a single encoder. Based on quality partition, classification-based method is employed for HQ data learning. Meanwhile, for the LQ images which lack identity information, we learn them with self-supervised image-image contrastive learning. To effectively catch up the model update and improve the discriminability of contrastive learning in our joint training scenario, we further propose a proxy-updated real-time queue to compose the contrastive pairs with features from the genuine encoder. Experiments on the low-quality datasets SCface and Tinyface, the mixed-quality dataset IJB-B, and five high-quality datasets demonstrate the effectiveness of our proposed approach in recognizing face images of different qualities.
Keyword: augmentation

Towards Mitigating Dimensional Collapse of Representations in Collaborative Filtering
Authors: Authors: Huiyuan Chen, Vivian Lai, Hongye Jin, Zhimeng Jiang, Mahashweta Das, Xia Hu
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2312.17468
Pdf link: https://arxiv.org/pdf/2312.17468
Abstract Contrastive Learning (CL) has shown promising performance in collaborative filtering. The key idea is to generate augmentation-invariant embeddings by maximizing the Mutual Information between different augmented views of the same instance. However, we empirically observe that existing CL models suffer from the \textsl{dimensional collapse} issue, where user/item embeddings only span a low-dimension subspace of the entire feature space. This suppresses other dimensional information and weakens the distinguishability of embeddings. Here we propose a non-contrastive learning objective, named nCL, which explicitly mitigates dimensional collapse of representations in collaborative filtering. Our nCL aims to achieve geometric properties of \textsl{Alignment} and \textsl{Compactness} on the embedding space. In particular, the alignment tries to push together representations of positive-related user-item pairs, while compactness tends to find the optimal coding length of user/item embeddings, subject to a given distortion. More importantly, our nCL does not require data augmentation nor negative sampling during training, making it scalable to large datasets. Experimental results demonstrate the superiority of our nCL.
FerKD: Surgical Label Adaptation for Efficient Distillation
Authors: Authors: Zhiqiang Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2312.17473
Pdf link: https://arxiv.org/pdf/2312.17473
Abstract We present FerKD, a novel efficient knowledge distillation framework that incorporates partial soft-hard label adaptation coupled with a region-calibration mechanism. Our approach stems from the observation and intuition that standard data augmentations, such as RandomResizedCrop, tend to transform inputs into diverse conditions: easy positives, hard positives, or hard negatives. In traditional distillation frameworks, these transformed samples are utilized equally through their predictive probabilities derived from pretrained teacher models. However, merely relying on prediction values from a pretrained teacher, a common practice in prior studies, neglects the reliability of these soft label predictions. To address this, we propose a new scheme that calibrates the less-confident regions to be the context using softened hard groundtruth labels. Our approach involves the processes of hard regions mining + calibration. We demonstrate empirically that this method can dramatically improve the convergence speed and final accuracy. Additionally, we find that a consistent mixing strategy can stabilize the distributions of soft supervision, taking advantage of the soft labels. As a result, we introduce a stabilized SelfMix augmentation that weakens the variation of the mixed images and corresponding soft labels through mixing similar regions within the same image. FerKD is an intuitive and well-designed learning system that eliminates several heuristics and hyperparameters in former FKD solution. More importantly, it achieves remarkable improvement on ImageNet-1K and downstream tasks. For instance, FerKD achieves 81.2% on ImageNet-1K with ResNet-50, outperforming FKD and FunMatch by remarkable margins. Leveraging better pre-trained weights and larger architectures, our finetuned ViT-G14 even achieves 89.9%. Our code is available at https://github.com/szq0214/FKD/tree/main/FerKD.
HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning
Authors: Authors: Hao Wang, Bo Tang, Chi Harold Liu, Shangqin Mao, Jiahong Zhou, Zipeng Dai, Yaqi Sun, Qianlong Xie, Xingxing Wang, Dong Wang
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2312.17503
Pdf link: https://arxiv.org/pdf/2312.17503
Abstract Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called ``HiBid'', consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.
Distance Guided Generative Adversarial Network for Explainable Binary Classifications
Authors: Authors: Xiangyu Xiong, Yue Sun, Xiaohong Liu, Wei Ke, Chan-Tong Lam, Jiangang Chen, Mingfeng Jiang, Mingwei Wang, Hui Xie, Tong Tong, Qinquan Gao, Hao Chen, Tao Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2312.17538
Pdf link: https://arxiv.org/pdf/2312.17538
Abstract Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classification. In this paper, we propose a distance guided GAN (DisGAN) which controls the variation degrees of generated samples in the hyperplane space. Specifically, we instantiate the idea of DisGAN by combining two ways. The first way is vertical distance GAN (VerDisGAN) where the inter-domain generation is conditioned on the vertical distances. The second way is horizontal distance GAN (HorDisGAN) where the intra-domain generation is conditioned on the horizontal distances. Furthermore, VerDisGAN can produce the class-specific regions by mapping the source images to the hyperplane. Experimental results show that DisGAN consistently outperforms the GAN-based augmentation methods with explainable binary classification. The proposed method can apply to different classification architectures and has potential to extend to multi-class classification.
The Tyranny of Possibilities in the Design of Task-Oriented LLM Systems: A Scoping Survey
Authors: Authors: Dhruv Dhamani, Mary Lou Maher
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.17601
Pdf link: https://arxiv.org/pdf/2312.17601
Abstract This scoping survey focuses on our current understanding of the design space for task-oriented LLM systems and elaborates on definitions and relationships among the available design parameters. The paper begins by defining a minimal task-oriented LLM system and exploring the design space of such systems through a thought experiment contemplating the performance of diverse LLM system configurations (involving single LLMs, single LLM-based agents, and multiple LLM-based agent systems) on a complex software development task and hypothesizes the results. We discuss a pattern in our results and formulate them into three conjectures. While these conjectures may be partly based on faulty assumptions, they provide a starting point for future research. The paper then surveys a select few design parameters: covering and organizing research in LLM augmentation, prompting techniques, and uncertainty estimation, and discussing their significance. The paper notes the lack of focus on computational and energy efficiency in evaluating research in these areas. Our survey findings provide a basis for developing the concept of linear and non-linear contexts, which we define and use to enable an agent-centric projection of prompting techniques providing a lens through which prompting techniques can be viewed as multi-agent systems. The paper discusses the implications of this lens, for the cross-pollination of research between LLM prompting and LLM-based multi-agent systems; and also, for the generation of synthetic training data based on existing prompting techniques in research. In all, the scoping survey presents seven conjectures that can help guide future research efforts.
Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models
Authors: Authors: Kay Liu, Hengrui Zhang, Ziqing Hu, Fangxin Wang, Philip S. Yu
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.17679
Pdf link: https://arxiv.org/pdf/2312.17679
Abstract Graph outlier detection is a prominent task of research and application in the realm of graph neural networks. It identifies the outlier nodes that exhibit deviation from the majority in the graph. One of the fundamental challenges confronting supervised graph outlier detection algorithms is the prevalent issue of class imbalance, where the scarcity of outlier instances compared to normal instances often results in suboptimal performance. Conventional methods mitigate the imbalance by reweighting instances in the estimation of the loss function, assigning higher weights to outliers and lower weights to inliers. Nonetheless, these strategies are prone to overfitting and underfitting, respectively. Recently, generative models, especially diffusion models, have demonstrated their efficacy in synthesizing high-fidelity images. Despite their extraordinary generation quality, their potential in data augmentation for supervised graph outlier detection remains largely underexplored. To bridge this gap, we introduce GODM, a novel data augmentation for mitigating class imbalance in supervised Graph Outlier detection with latent Diffusion Models. Specifically, our proposed method consists of three key components: (1) Variantioanl Encoder maps the heterogeneous information inherent within the graph data into a unified latent space. (2) Graph Generator synthesizes graph data that are statistically similar to real outliers from latent space, and (3) Latent Diffusion Model learns the latent space distribution of real organic data by iterative denoising. Extensive experiments conducted on multiple datasets substantiate the effectiveness and efficiency of GODM. The case study further demonstrated the generation quality of our synthetic data. To foster accessibility and reproducibility, we encapsulate GODM into a plug-and-play package and release it at the Python Package Index (PyPI).

LeeKyungwook / get-arxiv-noti

New submissions for Mon, 1 Jan 24 #913

Keyword: detection

TimePillars: Temporally-Recurrent 3D LiDAR Object Detection

Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing

$μ$-Net: ConvNext-Based U-Nets for Cosmic Muon Tomography

Anticipated Network Surveillance -- An extrapolated study to predict cyber-attacks using Machine Learning and Data Analytics

Intelligent Parsing: An Automated Parsing Framework for Extracting Design Semantics from E-commerce Creatives

AI Content Self-Detection for Transformer-based Large Language Models

Towards Scalable Generation of Realistic Test Data for Duplicate Detection

Unmasking information manipulation: A quantitative approach to detecting Copy-pasta, Rewording, and Translation on Social Media

Can you See me? On the Visibility of NOPs against Android Malware Detectors

Comparing roughness descriptors for distinct terrain surfaces in point cloud data

Social Bots: Detection and Challenges

ChangeNet: Multi-Temporal Asymmetric Change Detection Dataset

MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World

LiDAR Odometry Survey: Recent Advancements and Remaining Challenges

Operator learning for hyperbolic partial differential equations

HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale

Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models

Malware Detection in IOT Systems Using Machine Learning Techniques

Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization

TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models

Comparing Effectiveness and Efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) Tools in a Large Java-based System

MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments

Keyword: face recognition

QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition

Keyword: augmentation

Towards Mitigating Dimensional Collapse of Representations in Collaborative Filtering

FerKD: Surgical Label Adaptation for Efficient Distillation

HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning

Distance Guided Generative Adversarial Network for Explainable Binary Classifications

The Tyranny of Possibilities in the Design of Task-Oriented LLM Systems: A Scoping Survey

Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models