New submissions for Thu, 11 Jan 24

Keyword: detection

Exploring Attack Resilience in Distributed Platoon Controllers with Model Predictive Control

Authors: Authors: Tashfique Hasnine Choudhury
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.04736
Pdf link: https://arxiv.org/pdf/2401.04736
Abstract The extensive use of distributed vehicle platoon controllers has resulted in several benefits for transportation systems, such as increased traffic flow, fuel efficiency, and decreased pollution. The rising reliance on interconnected systems and communication networks, on the other hand, exposes these controllers to potential cyber-attacks, which may compromise their safety and functionality. This thesis aims to improve the security of distributed vehicle platoon controllers by investigating attack scenarios and assessing their influence on system performance. Various attack techniques, including man-in-the-middle (MITM) and false data injection (FDI), are simulated using Model Predictive Control (MPC) controller to identify vulnerabilities and weaknesses of the platoon controller. Countermeasures are offered and tested, that includes attack analysis and reinforced communication protocols using Machine Learning techniques for detection. The findings emphasize the significance of integrating security issues into their design and implementation, which helps to construct safe and resilient distributed platoon controllers.
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection
Authors: Authors: Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran peng, Qi Tian
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2401.04749
Pdf link: https://arxiv.org/pdf/2401.04749
Abstract Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps). Considering log data of variant domains, retraining the whole network for unknown domains is inefficient in real industrial scenarios. However, previous deep models merely focused on extracting the semantics of log sequences in the same domain, leading to poor generalization on multi-domain logs. To alleviate this issue, we propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains, where we establish a two-stage process including the pre-training and adapter-based tuning stage. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters. Besides, the Log-Attention module is proposed to supplement the information ignored by the log-paring. The proposed method is evaluated on three public and one real-world datasets. Experimental results on multiple benchmarks demonstrate the effectiveness of our LogFormer with fewer trainable parameters and lower training costs.
Network Layout Algorithm with Covariate Smoothing
Authors: Authors: Octavious Smiley, Till Hoffmann, Jukka-Pekka Onnela
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2401.04771
Pdf link: https://arxiv.org/pdf/2401.04771
Abstract Network science explores intricate connections among objects, employed in diverse domains like social interactions, fraud detection, and disease spread. Visualization of networks facilitates conceptualizing research questions and forming scientific hypotheses. Networks, as mathematical high-dimensional objects, require dimensionality reduction for (planar) visualization. Visualizing empirical networks present additional challenges. They often contain false positive (spurious) and false negative (missing) edges. Traditional visualization methods don't account for errors in observation, potentially biasing interpretations. Moreover, contemporary network data includes rich nodal attributes. However, traditional methods neglect these attributes when computing node locations. Our visualization approach aims to leverage nodal attribute richness to compensate for network data limitations. We employ a statistical model estimating the probability of edge connections between nodes based on their covariates. We enhance the Fruchterman-Reingold algorithm to incorporate estimated dyad connection probabilities, allowing practitioners to balance reliance on observed versus estimated edges. We explore optimal smoothing levels, offering a natural way to include relevant nodal information in layouts. Results demonstrate the effectiveness of our method in achieving robust network visualization, providing insights for improved analysis.
Phishing Website Detection through Multi-Model Analysis of HTML Content
Authors: Authors: Furkan Çolhak, Mert İlhan Ecevit, Bilal Emir Uçar, Reiner Creutzburg, Hasan Dağ
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.04820
Pdf link: https://arxiv.org/pdf/2401.04820
Abstract The way we communicate and work has changed significantly with the rise of the Internet. While it has opened up new opportunities, it has also brought about an increase in cyber threats. One common and serious threat is phishing, where cybercriminals employ deceptive methods to steal sensitive information.This study addresses the pressing issue of phishing by introducing an advanced detection model that meticulously focuses on HTML content. Our proposed approach integrates a specialized Multi-Layer Perceptron (MLP) model for structured tabular data and two pretrained Natural Language Processing (NLP) models for analyzing textual features such as page titles and content. The embeddings from these models are harmoniously combined through a novel fusion process. The resulting fused embeddings are then input into a linear classifier. Recognizing the scarcity of recent datasets for comprehensive phishing research, our contribution extends to the creation of an up-to-date dataset, which we openly share with the community. The dataset is meticulously curated to reflect real-life phishing conditions, ensuring relevance and applicability. The research findings highlight the effectiveness of the proposed approach, with the CANINE demonstrating superior performance in analyzing page titles and the RoBERTa excelling in evaluating page content. The fusion of two NLP and one MLP model,termed MultiText-LP, achieves impressive results, yielding a 96.80 F1 score and a 97.18 accuracy score on our research dataset. Furthermore, our approach outperforms existing methods on the CatchPhish HTML dataset, showcasing its efficacies.
The site linkage spectrum of data arrays
Authors: Authors: Christopher Barrett, Andrei Bura, Fenix Huang, Christian Reidys
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.04827
Pdf link: https://arxiv.org/pdf/2401.04827
Abstract A new perspective is introduced regarding the analysis of Multiple Sequence Alignments (MSA), representing aligned data defined over a finite alphabet of symbols. The framework is designed to produce a block decomposition of an MSA, where each block is comprised of sequences exhibiting a certain site-coherence. The key component of this framework is an information theoretical potential defined on pairs of sites (links) within the MSA. This potential quantifies the expected drop in variation of information between the two constituent sites, where the expectation is taken with respect to all possible sub-alignments, obtained by removing a finite, fixed collection of rows. It is proved that the potential is zero for linked sites representing columns, whose symbols are in bijective correspondence and it is strictly positive, otherwise. It is furthermore shown that the potential assumes its unique minimum for links at which each symbol pair appears with the same multiplicity. Finally, an application is presented regarding anomaly detection in an MSA, composed of inverse fold solutions of a fixed tRNA secondary structure, where the anomalies are represented by inverse fold solutions of a different RNA structure.
T-PRIME: Transformer-based Protocol Identification for Machine-learning at the Edge
Authors: Authors: Mauro Belgiovine, Joshua Groen, Miquel Sirera, Chinenye Tassie, Ayberk Yarkin Yildiz, Sage Trudeau, Stratis Ioannidis, Kaushik Chowdhury
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.04837
Pdf link: https://arxiv.org/pdf/2401.04837
Abstract Spectrum sharing allows different protocols of the same standard (e.g., 802.11 family) or different standards (e.g., LTE and DVB) to coexist in overlapping frequency bands. As this paradigm continues to spread, wireless systems must also evolve to identify active transmitters and unauthorized waveforms in real time under intentional distortion of preambles, extremely low signal-to-noise ratios and challenging channel conditions. We overcome limitations of correlation-based preamble matching methods in such conditions through the design of T-PRIME: a Transformer-based machine learning approach. T-PRIME learns the structural design of transmitted frames through its attention mechanism, looking at sequence patterns that go beyond the preamble alone. The paper makes three contributions: First, it compares Transformer models and demonstrates their superiority over traditional methods and state-of-the-art neural networks. Second, it rigorously analyzes T-PRIME's real-time feasibility on DeepWave's AIR-T platform. Third, it utilizes an extensive 66 GB dataset of over-the-air (OTA) WiFi transmissions for training, which is released along with the code for community use. Results reveal nearly perfect (i.e. $>98\%$) classification accuracy under simulated scenarios, showing $100\%$ detection improvement over legacy methods in low SNR ranges, $97\%$ classification accuracy for OTA single-protocol transmissions and up to $75\%$ double-protocol classification accuracy in interference scenarios.
On Achieving High-Fidelity Grant-free Non-Orthogonal Multiple Access
Authors: Authors: Haoran Mei, Limei Peng, Pin-Han Ho
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2401.04908
Pdf link: https://arxiv.org/pdf/2401.04908
Abstract Grant-free access (GFA) has been envisioned to play an active role in massive Machine Type Communication (mMTC) under 5G and Beyond mobile systems, which targets at achieving significant reduction of signaling overhead and access latency in the presence of sporadic traffic and small-size data. The paper focuses on a novel K-repetition GFA (K-GFA) scheme by incorporating Reed-Solomon (RS) code with the contention resolution diversity slotted ALOHA (CRDSA), aiming to achieve high-reliability and low-latency access in the presence of massive uncoordinated MTC devices (MTCDs). We firstly defines a MAC layer transmission structure at each MTCD for supporting message-level RS coding on a data message of $Q$ packets, where a RS code of $KQ$ packets is generated and sent in a super time frame (STF) that is composed of $Q$ time frames. The access point (AP) can recover the original $Q$ packets of the data message if at least $Q$ out of the $KQ$ packets of the RS code are successfully received. The AP buffers the received MTCD signals of each resource block (RB) within an STF and exercises the CRDSA based multi-user detection (MUD) by exploring signal-level inter-RB correlation via iterative interference cancellation (IIC). With the proposed CRDSA based K-GFA scheme, we provide the complexity analysis, and derive a closed-form analytical model on the access probability for each MTCD as well as its simplified approximate form. Extensive numerical experiments are conducted to validate its effectiveness on the proposed CRDSA based K-GFA scheme and gain deep understanding on its performance regarding various key operational parameters.
Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection
Authors: Authors: Sicong Huang, Jiawei He, Kry Yik Chau Lui
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.04933
Pdf link: https://arxiv.org/pdf/2401.04933
Abstract While likelihood is attractive in theory, its estimates by deep generative models (DGMs) are often broken in practice, and perform poorly for out of distribution (OOD) Detection. Various recent works started to consider alternative scores and achieved better performances. However, such recipes do not come with provable guarantees, nor is it clear that their choices extract sufficient information. We attempt to change this by conducting a case study on variational autoencoders (VAEs). First, we introduce the likelihood path (LPath) principle, generalizing the likelihood principle. This narrows the search for informative summary statistics down to the minimal sufficient statistics of VAEs' conditional likelihoods. Second, introducing new theoretic tools such as nearly essential support, essential distance and co-Lipschitzness, we obtain non-asymptotic provable OOD detection guarantees for certain distillation of the minimal sufficient statistics. The corresponding LPath algorithm demonstrates SOTA performances, even using simple and small VAEs with poor likelihood estimates. To our best knowledge, this is the first provable unsupervised OOD method that delivers excellent empirical results, better than any other VAEs based techniques. We use the same model as \cite{xiao2020likelihood}, open sourced from: https://github.com/XavierXiao/Likelihood-Regret
FBSDetector: Fake Base Station and Multi Step Attack Detection in Cellular Networks using Machine Learning
Authors: Authors: Kazi Samin Mubasshir, Imtiaz Karim, Elisa Bertino
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.04958
Pdf link: https://arxiv.org/pdf/2401.04958
Abstract Fake base stations (FBSes) pose a significant security threat by impersonating legitimate base stations. Though efforts have been made to defeat this threat, up to this day, the presence of FBSes and the multi-step attacks (MSAs) stemming from them can lead to unauthorized surveillance, interception of sensitive information, and disruption of network services for legitimate users. Therefore, detecting these malicious entities is crucial to ensure the security and reliability of cellular networks. Traditional detection methods often rely on additional hardware, predefined rules, signal scanning, changing protocol specifications, or cryptographic mechanisms that have limitations and incur huge infrastructure costs in accurately identifying FBSes. In this paper, we develop FBSDetector-an effective and efficient detection solution that can reliably detect FBSes and MSAs from layer-3 network traces using machine learning (ML) at the user equipment (UE) side. To develop FBSDetector, we created FBSAD and MSAD, the first-ever high-quality and large-scale datasets for training machine learning models capable of detecting FBSes and MSAs. These datasets capture the network traces in different real-world cellular network scenarios (including mobility and different attacker capabilities) incorporating legitimate base stations and FBSes. The combined network trace has a volume of 6.6 GB containing 751963 packets. Our novel ML models, specially designed to detect FBSes and MSAs, can effectively detect FBSes with an accuracy of 92% and a false positive rate of 5.96% and recognize MSAs with an accuracy of 86% and a false positive rate of 7.82%. We deploy FBSDetector as a real-world solution to protect end-users through an Android app and validate in a controlled lab environment. Compared to the existing solutions that fail to detect FBSes, FBSDetector can detect FBSes in the wild in real time.
ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection
Authors: Authors: Yuncheng Jiang, Zixun Zhang, Yiwen Hu, Guanbin Li, Xiang Wan, Song Wu, Shuguang Cui, Silin Huang, Zhen Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.04961
Pdf link: https://arxiv.org/pdf/2401.04961
Abstract Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training \& end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.
Optimising Graph Representation for Hardware Implementation of Graph Convolutional Networks for Event-based Vision
Authors: Authors: Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Andrea Pinna, Tomasz Kryjak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.04988
Pdf link: https://arxiv.org/pdf/2401.04988
Abstract Event-based vision is an emerging research field involving processing data generated by Dynamic Vision Sensors (neuromorphic cameras). One of the latest proposals in this area are Graph Convolutional Networks (GCNs), which allow to process events in its original sparse form while maintaining high detection and classification performance. In this paper, we present the hardware implementation of a~graph generation process from an event camera data stream, taking into account both the advantages and limitations of FPGAs. We propose various ways to simplify the graph representation and use scaling and quantisation of values. We consider both undirected and directed graphs that enable the use of PointNet convolution. The results obtained show that by appropriately modifying the graph representation, it is possible to create a~hardware module for graph generation. Moreover, the proposed modifications have no significant impact on object detection performance, only 0.08% mAP less for the base model and the N-Caltech data set.Finally, we describe the proposed hardware architecture of the graph generation module.
Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection
Authors: Authors: Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.05011
Pdf link: https://arxiv.org/pdf/2401.05011
Abstract Semi-supervised 3D object detection is a promising yet under-explored direction to reduce data annotation costs, especially for cluttered indoor scenes. A few prior works, such as SESS and 3DIoUMatch, attempt to solve this task by utilizing a teacher model to generate pseudo-labels for unlabeled samples. However, the availability of unlabeled samples in the 3D domain is relatively limited compared to its 2D counterpart due to the greater effort required to collect 3D data. Moreover, the loose consistency regularization in SESS and restricted pseudo-label selection strategy in 3DIoUMatch lead to either low-quality supervision or a limited amount of pseudo labels. To address these issues, we present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection. Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective. Specifically, from the data-perspective, we propose a class-probabilistic data augmentation method that augments the input data with additional instances based on the varying distribution of class probabilities. Our DPKE achieves feature-perspective knowledge enrichment by designing a geometry-aware feature matching method that regularizes feature-level similarity between object proposals from the student and teacher models. Extensive experiments on the two benchmark datasets demonstrate that our DPKE achieves superior performance over existing state-of-the-art approaches under various label ratio conditions. The source code will be made available to the public.
CreINNs: Credal-Set Interval Neural Networks for Uncertainty Estimation in Classification Tasks
Authors: Authors: Kaizheng Wang, Keivan Shariatmadar, Shireen Kudukkil Manchingal, Fabio Cuzzolin, David Moens, Hans Hallez
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.05043
Pdf link: https://arxiv.org/pdf/2401.05043
Abstract Uncertainty estimation is increasingly attractive for improving the reliability of neural networks. In this work, we present novel credal-set interval neural networks (CreINNs) designed for classification tasks. CreINNs preserve the traditional interval neural network structure, capturing weight uncertainty through deterministic intervals, while forecasting credal sets using the mathematical framework of probability intervals. Experimental validations on an out-of-distribution detection benchmark (CIFAR10 vs SVHN) showcase that CreINNs outperform epistemic uncertainty estimation when compared to variational Bayesian neural networks (BNNs) and deep ensembles (DEs). Furthermore, CreINNs exhibit a notable reduction in computational complexity compared to variational BNNs and demonstrate smaller model sizes than DEs.
MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
Authors: Authors: Marta R. Costa-jussà, Mariano Coria Meglioli, Pierre Andrews, David Dale, Prangthip Hansanti, Elahe Kalbassi, Alex Mourachko, Christophe Ropers, Carleigh Wood
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2401.05060
Pdf link: https://arxiv.org/pdf/2401.05060
Abstract Research in toxicity detection in natural language processing for the speech modality (audio-based) is quite limited, particularly for languages other than English. To address these limitations and lay the groundwork for truly multilingual audio-based toxicity detection, we introduce MuTox, the first highly multilingual audio-based dataset with toxicity labels. The dataset comprises 20,000 audio utterances for English and Spanish, and 4,000 for the other 19 languages. To demonstrate the quality of this dataset, we trained the MuTox audio-based toxicity classifier, which enables zero-shot toxicity detection across a wide range of languages. This classifier outperforms existing text-based trainable classifiers by more than 1% AUC, while expanding the language coverage more than tenfold. When compared to a wordlist-based classifier that covers a similar number of languages, MuTox improves precision and recall by approximately 2.5 times. This significant improvement underscores the potential of MuTox in advancing the field of audio-based toxicity detection.
Aligning Translation-Specific Understanding to General Understanding in Large Language Models
Authors: Authors: Yichong Huang, Xiaocheng Feng, Baohang Li, Chengpeng Fu, Wenshuai Huo, Ting Liu, Bing Qin
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.05072
Pdf link: https://arxiv.org/pdf/2401.05072
Abstract Although large language models (LLMs) have shown surprising language understanding and generation capabilities, they have yet to gain a revolutionary advancement in the field of machine translation. One potential cause of the limited performance is the misalignment between the translation-specific understanding and general understanding inside LLMs. To align the translation-specific understanding to the general one, we propose a novel translation process xIoD (Cross-Lingual Interpretation of Difficult words), explicitly incorporating the general understanding on the content incurring inconsistent understanding to guide the translation. Specifically, xIoD performs the cross-lingual interpretation for the difficult-to-translate words and enhances the translation with the generated interpretations. Furthermore, we reframe the external tools of QE to tackle the challenges of xIoD in the detection of difficult words and the generation of helpful interpretations. We conduct experiments on the self-constructed benchmark ChallengeMT, which includes cases in which multiple SOTA translation systems consistently underperform. Experimental results show the effectiveness of our xIoD, which improves up to +3.85 COMET.
SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image
Authors: Authors: Jiayuan Tian, Jie Lei, Jiaqing Zhang, Weiying Xie, Yunsong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.05093
Pdf link: https://arxiv.org/pdf/2401.05093
Abstract With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL often mistakenly identifies geographically adjacent samples with similar semantic content as negative pairs, leading to confusion during model training. Secondly, as an instance-level discriminative task, it tends to neglect the essential fine-grained features and complex details inherent in unstructured RSIs. To overcome these obstacles, we introduce SwiMDiff, a novel self-supervised pre-training framework designed for RSIs. SwiMDiff employs a scene-wide matching approach that effectively recalibrates labels to recognize data from the same scene as false negatives. This adjustment makes CL more applicable to the nuances of remote sensing. Additionally, SwiMDiff seamlessly integrates CL with a diffusion model. Through the implementation of pixel-level diffusion constraints, we enhance the encoder's ability to capture both the global semantic information and the fine-grained features of the images more comprehensively. Our proposed framework significantly enriches the information available for downstream tasks in remote sensing. Demonstrating exceptional performance in change detection and land-cover classification tasks, SwiMDiff proves its substantial utility and value in the field of remote sensing.
Toward distortion-aware change detection in realistic scenarios
Authors: Authors: Yitao Zhao, Heng-Chao Li, Nanqing Liu, Rui Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.05157
Pdf link: https://arxiv.org/pdf/2401.05157
Abstract In the conventional change detection (CD) pipeline, two manually registered and labeled remote sensing datasets serve as the input of the model for training and prediction. However, in realistic scenarios, data from different periods or sensors could fail to be aligned as a result of various coordinate systems. Geometric distortion caused by coordinate shifting remains a thorny issue for CD algorithms. In this paper, we propose a reusable self-supervised framework for bitemporal geometric distortion in CD tasks. The whole framework is composed of Pretext Representation Pre-training, Bitemporal Image Alignment, and Down-stream Decoder Fine-Tuning. With only single-stage pre-training, the key components of the framework can be reused for assistance in the bitemporal image alignment, while simultaneously enhancing the performance of the CD decoder. Experimental results in 2 large-scale realistic scenarios demonstrate that our proposed method can alleviate the bitemporal geometric distortion in CD tasks.
Watermark Text Pattern Spotting in Document Images
Authors: Authors: Mateusz Krubinski, Stefan Matcovici, Diana Grigore, Daniel Voinea, Alin-Ionut Popa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.05167
Pdf link: https://arxiv.org/pdf/2401.05167
Abstract Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.
CLIP-guided Source-free Object Detection in Aerial Images
Authors: Authors: Nanqing Liu, Xun Xu, Yongyi Su, Chengxin Liu, Peiliang Gong, Heng-Chao Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.05168
Pdf link: https://arxiv.org/pdf/2401.05168
Abstract Domain adaptation is crucial in aerial imagery, as the visual representation of these images can significantly vary based on factors such as geographic location, time, and weather conditions. Additionally, high-resolution aerial images often require substantial storage space and may not be readily accessible to the public. To address these challenges, we propose a novel Source-Free Object Detection (SFOD) method. Specifically, our approach is built upon a self-training framework; however, self-training can lead to inaccurate learning in the absence of labeled training data. To address this issue, we further integrate Contrastive Language-Image Pre-training (CLIP) to guide the generation of pseudo-labels, termed CLIP-guided Aggregation. By leveraging CLIP's zero-shot classification capability, we use it to aggregate scores with the original predicted bounding boxes, enabling us to obtain refined scores for the pseudo-labels. To validate the effectiveness of our method, we constructed two new datasets from different domains based on the DIOR dataset, named DIOR-C and DIOR-Cloudy. Experiments demonstrate that our method outperforms other comparative algorithms.
Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits
Authors: Authors: Helena Russello, Rik van der Tol, Menno Holzhauer, Eldert J. van Henten, Gert Kootstra
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.05202
Pdf link: https://arxiv.org/pdf/2401.05202
Abstract This study presents an automated lameness detection system that uses deep-learning image processing techniques to extract multiple locomotion traits associated with lameness. Using the T-LEAP pose estimation model, the motion of nine keypoints was extracted from videos of walking cows. The videos were recorded outdoors, with varying illumination conditions, and T-LEAP extracted 99.6% of correct keypoints. The trajectories of the keypoints were then used to compute six locomotion traits: back posture measurement, head bobbing, tracking distance, stride length, stance duration, and swing duration. The three most important traits were back posture measurement, head bobbing, and tracking distance. For the ground truth, we showed that a thoughtful merging of the scores of the observers could improve intra-observer reliability and agreement. We showed that including multiple locomotion traits improves the classification accuracy from 76.6% with only one trait to 79.9% with the three most important traits and to 80.1% with all six locomotion traits.
Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection
Authors: Authors: Nader Karayanni, Robert J. Shahla, Chieh-Lien Hsiao
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.05219
Pdf link: https://arxiv.org/pdf/2401.05219
Abstract The digital era has seen a marked increase in financial fraud. edge ML emerged as a promising solution for smartphone payment services fraud detection, enabling the deployment of ML models directly on edge devices. This approach enables a more personalized real-time fraud detection. However, a significant gap in current research is the lack of a robust system for monitoring data distribution shifts in these distributed edge ML applications. Our work bridges this gap by introducing a novel open-source framework designed for continuous monitoring of data distribution shifts on a network of edge devices. Our system includes an innovative calculation of the Kolmogorov-Smirnov (KS) test over a distributed network of edge devices, enabling efficient and accurate monitoring of users behavior shifts. We comprehensively evaluate the proposed framework employing both real-world and synthetic financial transaction datasets and demonstrate the framework's effectiveness.
CASA: Causality-driven Argument Sufficiency Assessment
Authors: Authors: Xiao Liu, Yansong Feng, Kai-Wei Chang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.05249
Pdf link: https://arxiv.org/pdf/2401.05249
Abstract The argument sufficiency assessment task aims to determine if the premises of a given argument support its conclusion. To tackle this task, existing works often train a classifier on data annotated by humans. However, annotating data is laborious, and annotations are often inconsistent due to subjective criteria. Motivated by the probability of sufficiency (PS) definition in the causal literature, we propose CASA, a zero-shot causality-driven argument sufficiency assessment framework. PS measures how likely introducing the premise event would lead to the conclusion, when both the premise and conclusion events are absent. To estimate this probability, we propose to use large language models (LLMs) to generate contexts that are inconsistent with the premise and conclusion, and revise them by injecting the premise event. Experiments on two logical fallacy detection datasets demonstrate that CASA accurately identifies insufficient arguments. We further deploy CASA in a writing assistance application, and find that suggestions generated by CASA enhance the sufficiency of student-written arguments. Code and data are available at https://github.com/xxxiaol/CASA.
Keyword: face recognition

There is no result

Keyword: augmentation

Content-Conditioned Generation of Stylized Free hand Sketches
Authors: Authors: Jiajun Liu, Siyuan Wang, Guangming Zhu, Liang Zhang, Ning Li, Eryang Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.04739
Pdf link: https://arxiv.org/pdf/2401.04739
Abstract In recent years, the recognition of free-hand sketches has remained a popular task. However, in some special fields such as the military field, free-hand sketches are difficult to sample on a large scale. Common data augmentation and image generation techniques are difficult to produce images with various free-hand sketching styles. Therefore, the recognition and segmentation tasks in related fields are limited. In this paper, we propose a novel adversarial generative network that can accurately generate realistic free-hand sketches with various styles. We explore the performance of the model, including using styles randomly sampled from a prior normal distribution to generate images with various free-hand sketching styles, disentangling the painters' styles from known free-hand sketches to generate images with specific styles, and generating images of unknown classes that are not in the training set. We further demonstrate with qualitative and quantitative evaluations our advantages in visual quality, content accuracy, and style imitation on SketchIME.
Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection
Authors: Authors: Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.05011
Pdf link: https://arxiv.org/pdf/2401.05011
Abstract Semi-supervised 3D object detection is a promising yet under-explored direction to reduce data annotation costs, especially for cluttered indoor scenes. A few prior works, such as SESS and 3DIoUMatch, attempt to solve this task by utilizing a teacher model to generate pseudo-labels for unlabeled samples. However, the availability of unlabeled samples in the 3D domain is relatively limited compared to its 2D counterpart due to the greater effort required to collect 3D data. Moreover, the loose consistency regularization in SESS and restricted pseudo-label selection strategy in 3DIoUMatch lead to either low-quality supervision or a limited amount of pseudo labels. To address these issues, we present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection. Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective. Specifically, from the data-perspective, we propose a class-probabilistic data augmentation method that augments the input data with additional instances based on the varying distribution of class probabilities. Our DPKE achieves feature-perspective knowledge enrichment by designing a geometry-aware feature matching method that regularizes feature-level similarity between object proposals from the student and teacher models. Extensive experiments on the two benchmark datasets demonstrate that our DPKE achieves superior performance over existing state-of-the-art approaches under various label ratio conditions. The source code will be made available to the public.
Singer Identity Representation Learning using Self-Supervised Techniques
Authors: Authors: Bernardo Torres, Stefan Lattner, Gaël Richard
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2401.05064
Pdf link: https://arxiv.org/pdf/2401.05064
Abstract Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer identity encoders to extract representations suitable for various singing-related tasks, such as singing voice similarity and synthesis. We explore different self-supervised learning techniques on a large collection of isolated vocal tracks and apply data augmentations during training to ensure that the representations are invariant to pitch and content variations. We evaluate the quality of the resulting representations on singer similarity and identification tasks across multiple datasets, with a particular emphasis on out-of-domain generalization. Our proposed framework produces high-quality embeddings that outperform both speaker verification and wav2vec 2.0 pre-trained baselines on singing voice while operating at 44.1 kHz. We release our code and trained models to facilitate further research on singing voice and related areas.
Hierarchical Classification of Transversal Skills in Job Ads Based on Sentence Embeddings
Authors: Authors: Florin Leon, Marius Gavrilescu, Sabina-Adriana Floria, Alina-Adriana Minea
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.05073
Pdf link: https://arxiv.org/pdf/2401.05073
Abstract This paper proposes a classification framework aimed at identifying correlations between job ad requirements and transversal skill sets, with a focus on predicting the necessary skills for individual job descriptions using a deep learning model. The approach involves data collection, preprocessing, and labeling using ESCO (European Skills, Competences, and Occupations) taxonomy. Hierarchical classification and multi-label strategies are used for skill identification, while augmentation techniques address data imbalance, enhancing model robustness. A comparison between results obtained with English-specific and multi-language sentence embedding models reveals close accuracy. The experimental case studies detail neural network configurations, hyperparameters, and cross-validation results, highlighting the efficacy of the hierarchical approach and the suitability of the multi-language model for the diverse European job market. Thus, a new approach is proposed for the hierarchical classification of transversal skills from job ads.

LeeKyungwook / get-arxiv-noti

New submissions for Thu, 11 Jan 24 #929

Keyword: detection

Exploring Attack Resilience in Distributed Platoon Controllers with Model Predictive Control

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

Network Layout Algorithm with Covariate Smoothing

Phishing Website Detection through Multi-Model Analysis of HTML Content

The site linkage spectrum of data arrays

T-PRIME: Transformer-based Protocol Identification for Machine-learning at the Edge

On Achieving High-Fidelity Grant-free Non-Orthogonal Multiple Access

Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

FBSDetector: Fake Base Station and Multi Step Attack Detection in Cellular Networks using Machine Learning

ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

Optimising Graph Representation for Hardware Implementation of Graph Convolutional Networks for Event-based Vision

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

CreINNs: Credal-Set Interval Neural Networks for Uncertainty Estimation in Classification Tasks

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector

Aligning Translation-Specific Understanding to General Understanding in Large Language Models

SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

Toward distortion-aware change detection in realistic scenarios

Watermark Text Pattern Spotting in Document Images

CLIP-guided Source-free Object Detection in Aerial Images

Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits

Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection

CASA: Causality-driven Argument Sufficiency Assessment

Keyword: face recognition

Keyword: augmentation

Content-Conditioned Generation of Stylized Free hand Sketches

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

Singer Identity Representation Learning using Self-Supervised Techniques

Hierarchical Classification of Transversal Skills in Job Ads Based on Sentence Embeddings