New submissions for Wed, 3 Jan 24

Keyword: detection

PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance Fields

Authors: Authors: Zheng Chen, Qingan Yan, Huangying Zhan, Changjiang Cai, Xiangyu Xu, Yuzhong Huang, Weihan Wang, Ziyue Feng, Lantao Liu, Yi Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00871
Pdf link: https://arxiv.org/pdf/2401.00871
Abstract Identifying spatially complete planar primitives from visual data is a crucial task in computer vision. Prior methods are largely restricted to either 2D segment recovery or simplifying 3D structures, even with extensive plane annotations. We present PlanarNeRF, a novel framework capable of detecting dense 3D planes through online learning. Drawing upon the neural field representation, PlanarNeRF brings three major contributions. First, it enhances 3D plane detection with concurrent appearance and geometry knowledge. Second, a lightweight plane fitting module is proposed to estimate plane parameters. Third, a novel global memory bank structure with an update mechanism is introduced, ensuring consistent cross-frame correspondence. The flexible architecture of PlanarNeRF allows it to function in both 2D-supervised and self-supervised solutions, in each of which it can effectively learn from sparse training signals, significantly improving training efficiency. Through extensive experiments, we demonstrate the effectiveness of PlanarNeRF in various scenarios and remarkable improvement over existing works.
A Bayesian Unification of Self-Supervised Clustering and Energy-Based Models
Authors: Authors: Emanuele Sansone, Robin Manhaeve
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00873
Pdf link: https://arxiv.org/pdf/2401.00873
Abstract Self-supervised learning is a popular and powerful method for utilizing large amounts of unlabeled data, for which a wide variety of training objectives have been proposed in the literature. In this study, we perform a Bayesian analysis of state-of-the-art self-supervised learning objectives, elucidating the underlying probabilistic graphical models in each class and presenting a standardized methodology for their derivation from first principles. The analysis also indicates a natural means of integrating self-supervised learning with likelihood-based generative models. We instantiate this concept within the realm of cluster-based self-supervised learning and energy models, introducing a novel lower bound which is proven to reliably penalize the most important failure modes. Furthermore, this newly proposed lower bound enables the training of a standard backbone architecture without the necessity for asymmetric elements such as stop gradients, momentum encoders, or specialized clustering layers - typically introduced to avoid learning trivial solutions. Our theoretical findings are substantiated through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, thus showing that our objective function allows to outperform existing self-supervised learning strategies in terms of clustering, generation and out-of-distribution detection performance by a wide margin. We also demonstrate that GEDI can be integrated into a neural-symbolic framework to mitigate the reasoning shortcut problem and to learn higher quality symbolic representations thanks to the enhanced classification performance.
Balanced Graph Structure Information for Brain Disease Detection
Authors: Authors: Falih Gozi Febrinanto, Mujie Liu, Feng Xia
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2401.00876
Pdf link: https://arxiv.org/pdf/2401.00876
Abstract Analyzing connections between brain regions of interest (ROI) is vital to detect neurological disorders such as autism or schizophrenia. Recent advancements employ graph neural networks (GNNs) to utilize graph structures in brains, improving detection performances. Current methods use correlation measures between ROI's blood-oxygen-level-dependent (BOLD) signals to generate the graph structure. Other methods use the training samples to learn the optimal graph structure through end-to-end learning. However, implementing those methods independently leads to some issues with noisy data for the correlation graphs and overfitting problems for the optimal graph. In this work, we proposed Bargrain (balanced graph structure for brains), which models two graph structures: filtered correlation matrix and optimal sample graph using graph convolution networks (GCNs). This approach aims to get advantages from both graphs and address the limitations of only relying on a single type of structure. Based on our extensive experiment, Bargrain outperforms state-of-the-art methods in classification tasks on brain disease datasets, as measured by average F1 scores.
Social-LLM: Modeling User Behavior at Scale using Language Models and Social Network Data
Authors: Authors: Julie Jiang, Emilio Ferrara
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00893
Pdf link: https://arxiv.org/pdf/2401.00893
Abstract The proliferation of social network data has unlocked unprecedented opportunities for extensive, data-driven exploration of human behavior. The structural intricacies of social networks offer insights into various computational social science issues, particularly concerning social influence and information diffusion. However, modeling large-scale social network data comes with computational challenges. Though large language models make it easier than ever to model textual content, any advanced network representation methods struggle with scalability and efficient deployment to out-of-sample users. In response, we introduce a novel approach tailored for modeling social network data in user detection tasks. This innovative method integrates localized social network interactions with the capabilities of large language models. Operating under the premise of social network homophily, which posits that socially connected users share similarities, our approach is designed to address these challenges. We conduct a thorough evaluation of our method across seven real-world social network datasets, spanning a diverse range of topics and detection tasks, showcasing its applicability to advance research in computational social science.
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
Authors: Authors: Chenhang He, Ruihuang Li, Guowen Zhang, Lei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00912
Pdf link: https://arxiv.org/pdf/2401.00912
Abstract Window-based transformers have demonstrated strong ability in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner. However, because of the sparse nature of point clouds, the number of voxels per window varies significantly. Current methods partition the voxels in each window into multiple subsets of equal size, which cost expensive overhead in sorting and padding the voxels, making them run slower than sparse convolution based methods. In this paper, we present ScatterFormer, which, for the first time to our best knowledge, could directly perform attention on voxel sets with variable length. The key of ScatterFormer lies in the innovative Scatter Linear Attention (SLA) module, which leverages the linear attention mechanism to process in parallel all voxels scattered in different windows. Harnessing the hierarchical computation units of the GPU and matrix blocking algorithm, we reduce the latency of the proposed SLA module to less than 1 ms on moderate GPUs. Besides, we develop a cross-window interaction module to simultaneously enhance the local representation and allow the information flow across windows, eliminating the need for window shifting. Our proposed ScatterFormer demonstrates 73 mAP (L2) on the large-scale Waymo Open Dataset and 70.5 NDS on the NuScenes dataset, running at an outstanding detection rate of 28 FPS. Code is available at https://github.com/skyhehe123/ScatterFormer
Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases
Authors: Authors: Yifei Chen, Chenyan Zhang, Ben Chen, Yiyu Huang, Yifei Sun, Changmiao Wang, Xianjun Fu, Yuxing Dai, Feiwei Qin, Yong Peng, Yu Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00926
Pdf link: https://arxiv.org/pdf/2401.00926
Abstract In standard hospital blood tests, the traditional process requires doctors to manually isolate leukocytes from microscopic images of patients' blood using microscopes. These isolated leukocytes are then categorized via automatic leukocyte classifiers to determine the proportion and volume of different types of leukocytes present in the blood samples, aiding disease diagnosis. This methodology is not only time-consuming and labor-intensive, but it also has a high propensity for errors due to factors such as image quality and environmental conditions, which could potentially lead to incorrect subsequent classifications and misdiagnosis. To address these issues, this paper proposes an innovative method of leukocyte detection: the Multi-level Feature Fusion and Deformable Self-attention DETR (MFDS-DETR). To tackle the issue of leukocyte scale disparity, we designed the High-level Screening-feature Fusion Pyramid (HS-FPN), enabling multi-level fusion. This model uses high-level features as weights to filter low-level feature information via a channel attention module and then merges the screened information with the high-level features, thus enhancing the model's feature expression capability. Further, we address the issue of leukocyte feature scarcity by incorporating a multi-scale deformable self-attention module in the encoder and using the self-attention and cross-deformable attention mechanisms in the decoder, which aids in the extraction of the global features of the leukocyte feature maps. The effectiveness, superiority, and generalizability of the proposed MFDS-DETR method are confirmed through comparisons with other cutting-edge leukocyte detection models using the private WBCDD, public LISC and BCCD datasets. Our source code and private WBCCD dataset are available at https://github.com/JustlfC03/MFDS-DETR.
Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from Data-centric Perspective
Authors: Authors: Din-Yin Hsieh, Chi-Hua Wang, Guang Cheng
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00965
Pdf link: https://arxiv.org/pdf/2401.00965
Abstract Exploring generative model training for synthetic tabular data, specifically in sequential contexts such as credit card transaction data, presents significant challenges. This paper addresses these challenges, focusing on attaining both high fidelity to actual data and optimal utility for machine learning tasks. We introduce five pre-processing schemas to enhance the training of the Conditional Probabilistic Auto-Regressive Model (CPAR), demonstrating incremental improvements in the synthetic data's fidelity and utility. Upon achieving satisfactory fidelity levels, our attention shifts to training fraud detection models tailored for time-series data, evaluating the utility of the synthetic data. Our findings offer valuable insights and practical guidelines for synthetic data practitioners in the finance sector, transitioning from real to synthetic datasets for training purposes, and illuminating broader methodologies for synthesizing credit card transaction time series.
Downstream Task-Oriented Generative Model Selections on Synthetic Data Training for Fraud Detection Models
Authors: Authors: Yinan Cheng, Chi-Hua Wang, Vamsi K. Potluru, Tucker Balch, Guang Cheng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00974
Pdf link: https://arxiv.org/pdf/2401.00974
Abstract Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance. Existing studies focused on the utility of a single family of generative models. They provided limited insights on how synthetic data practitioners select the best family generative models for synthetic training tasks given a specific combination of machine learning model class and performance metric. In this paper, we approach the downstream task-oriented generative model selections problem in the case of training fraud detection models and investigate the best practice given different combinations of model interpretability and model performance constraints. Our investigation supports that, while both Neural Network(NN)-based and Bayesian Network(BN)-based generative models are both good to complete synthetic training task under loose model interpretability constrain, the BN-based generative models is better than NN-based when synthetic training fraud detection model under strict model interpretability constrain. Our results provides practical guidance for machine learning practitioner who is interested in replacing their training dataset from real to synthetic, and shed lights on more general downstream task-oriented generative model selection problems.
Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning
Authors: Authors: Syed Muhammad Aamir, Hongbin Ma, Malak Abid Ali Khan, Muhammad Aaqib
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00986
Pdf link: https://arxiv.org/pdf/2401.00986
Abstract Detection of small, undetermined moving objects or objects in an occluded environment with a cluttered background is the main problem of computer vision. This greatly affects the detection accuracy of deep learning models. To overcome these problems, we concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background employing SSD and YOLO algorithms and improved precision of detection and reduce problems faced by these models. The developed method makes the custom dataset and employs a preprocessing technique to clean the noisy dataset. For training the developed model we apply the data augmentation technique to balance and diversify the data. We fine-tuned, trained, and evaluated these models on the established dataset by applying these techniques and highlighting the results we got more accurately than without applying these techniques. The accuracy and frame per second of the SSD-Mobilenet v2 model are higher than YOLO V3 and YOLO V4. Furthermore, by employing various techniques like data enhancement, noise reduction, parameter optimization, and model fusion we improve the effectiveness of detection and recognition. We further added a counting algorithm, and target attributes experimental comparison, and made a graphical user interface system for the developed model with features of object counting, alerts, status, resolution, and frame per second. Subsequently, to justify the importance of the developed method analysis of YOLO V3, V4, and SSD were incorporated. Which resulted in the overall completion of the proposed method.
Detection and Defense Against Prominent Attacks on Preconditioned LLM-Integrated Virtual Assistants
Authors: Authors: Chun Fai Chan, Daniel Wankit Yip, Aysan Esmradi
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.00994
Pdf link: https://arxiv.org/pdf/2401.00994
Abstract The emergence of LLM (Large Language Model) integrated virtual assistants has brought about a rapid transformation in communication dynamics. During virtual assistant development, some developers prefer to leverage the system message, also known as an initial prompt or custom prompt, for preconditioning purposes. However, it is important to recognize that an excessive reliance on this functionality raises the risk of manipulation by malicious actors who can exploit it with carefully crafted prompts. Such malicious manipulation poses a significant threat, potentially compromising the accuracy and reliability of the virtual assistant's responses. Consequently, safeguarding the virtual assistants with detection and defense mechanisms becomes of paramount importance to ensure their safety and integrity. In this study, we explored three detection and defense mechanisms aimed at countering attacks that target the system message. These mechanisms include inserting a reference key, utilizing an LLM evaluator, and implementing a Self-Reminder. To showcase the efficacy of these mechanisms, they were tested against prominent attack techniques. Our findings demonstrate that the investigated mechanisms are capable of accurately identifying and counteracting the attacks. The effectiveness of these mechanisms underscores their potential in safeguarding the integrity and reliability of virtual assistants, reinforcing the importance of their implementation in real-world scenarios. By prioritizing the security of virtual assistants, organizations can maintain user trust, preserve the integrity of the application, and uphold the high standards expected in this era of transformative technologies.
AI Mobile Application for Archaeological Dating of Bronze Dings
Authors: Authors: Chuntao Li, Ruihua Qi, Chuan Tang, Jiafu Wei, Xi Yang, Qian Zhang, Rixin Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01002
Pdf link: https://arxiv.org/pdf/2401.01002
Abstract We develop an AI application for archaeological dating of bronze Dings. A classification model is employed to predict the period of the input Ding, and a detection model is used to show the feature parts for making a decision of archaeological dating. To train the two deep learning models, we collected a large number of Ding images from published materials, and annotated the period and the feature parts on each image by archaeological experts. Furthermore, we design a user system and deploy our pre-trained models based on the platform of WeChat Mini Program for ease of use. Only need a smartphone installed WeChat APP, users can easily know the result of intelligent archaeological dating, the feature parts, and other reference artifacts, by taking a photo of a bronze Ding. To use our application, please scan this QR code by WeChat.
Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt
Authors: Authors: Jiaqi Liu, Kai Wu, Qiang Nie, Ying Chen, Bin-Bin Gao, Yong Liu, Jinbao Wang, Chengjie Wang, Feng Zheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.01010
Pdf link: https://arxiv.org/pdf/2401.01010
Abstract Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequentially, leading to catastrophic forgetting and a heavy computational burden. To address this issue, we introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD, which equips the UAD with continual learning capability through contrastively-learned prompts. In the proposed UCAD, we design a Continual Prompting Module (CPM) by utilizing a concise key-prompt-knowledge memory bank to guide task-invariant anomaly' model predictions using task-specificnormal' knowledge. Moreover, Structure-based Contrastive Learning (SCL) is designed with the Segment Anything Model (SAM) to improve prompt learning and anomaly segmentation results. Specifically, by treating SAM's masks as structure, we draw features within the same mask closer and push others apart for general feature representations. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation, demonstrating that our method is significantly better than anomaly detection methods, even with rehearsal training. The code will be available at https://github.com/shirowalker/UCAD.
Boosting Transformer's Robustness and Efficacy in PPG Signal Artifact Detection with Self-Supervised Learning
Authors: Authors: Thanh-Dung Le
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.01013
Pdf link: https://arxiv.org/pdf/2401.01013
Abstract Recent research at CHU Sainte Justine's Pediatric Critical Care Unit (PICU) has revealed that traditional machine learning methods, such as semi-supervised label propagation and K-nearest neighbors, outperform Transformer-based models in artifact detection from PPG signals, mainly when data is limited. This study addresses the underutilization of abundant unlabeled data by employing self-supervised learning (SSL) to extract latent features from these data, followed by fine-tuning on labeled data. Our experiments demonstrate that SSL significantly enhances the Transformer model's ability to learn representations, improving its robustness in artifact classification tasks. Among various SSL techniques, including masking, contrastive learning, and DINO (self-distillation with no labels)-contrastive learning exhibited the most stable and superior performance in small PPG datasets. Further, we delve into optimizing contrastive loss functions, which are crucial for contrastive SSL. Inspired by InfoNCE, we introduce a novel contrastive loss function that facilitates smoother training and better convergence, thereby enhancing performance in artifact classification. In summary, this study establishes the efficacy of SSL in leveraging unlabeled data, particularly in enhancing the capabilities of the Transformer model. This approach holds promise for broader applications in PICU environments, where annotated data is often limited.
Small Bird Detection using YOLOv7 with Test-Time Augmentation
Authors: Authors: Kosuke Shigematsu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01018
Pdf link: https://arxiv.org/pdf/2401.01018
Abstract In this paper, we propose a method specifically aimed at improving small bird detection for the Small Object Detection Challenge for Spotting Birds 2023. Utilizing YOLOv7 model with test-time augmentation, our approach involves increasing the input resolution, incorporating multiscale inference, considering flipped images during the inference process, and employing weighted boxes fusion to merge detection results. We rigorously explore the impact of each technique on detection performance. Experimental results demonstrate significant improvements in detection accuracy. Our method achieved a top score in the Development category, with a public AP of 0.732 and a private AP of 27.2, both at IoU=0.5.
Class Relevance Learning For Out-of-distribution Detection
Authors: Authors: Butian Xiong, Liguang Zhou, Tin Lun Lam, Yangsheng Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.01021
Pdf link: https://arxiv.org/pdf/2401.01021
Abstract Image classification plays a pivotal role across diverse applications, yet challenges persist when models are deployed in real-world scenarios. Notably, these models falter in detecting unfamiliar classes that were not incorporated during classifier training, a formidable hurdle for safe and effective real-world model deployment, commonly known as out-of-distribution (OOD) detection. While existing techniques, like max logits, aim to leverage logits for OOD identification, they often disregard the intricate interclass relationships that underlie effective detection. This paper presents an innovative class relevance learning method tailored for OOD detection. Our method establishes a comprehensive class relevance learning framework, strategically harnessing interclass relationships within the OOD pipeline. This framework significantly augments OOD detection capabilities. Extensive experimentation on diverse datasets, encompassing generic image classification datasets (Near OOD and Far OOD datasets), demonstrates the superiority of our method over state-of-the-art alternatives for OOD detection.
CautionSuicide: A Deep Learning Based Approach for Detecting Suicidal Ideation in Real Time Chatbot Conversation
Authors: Authors: Nelly Elsayed, Zag ElSayed, Murat Ozer
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.01023
Pdf link: https://arxiv.org/pdf/2401.01023
Abstract Suicide is recognized as one of the most serious concerns in the modern society. Suicide causes tragedy that affects countries, communities, and families. There are many factors that lead to suicidal ideations. Early detection of suicidal ideations can help to prevent suicide occurrence by providing the victim with the required professional support, especially when the victim does not recognize the danger of having suicidal ideations. As technology usage has increased, people share and express their ideations digitally via social media, chatbots, and other digital platforms. In this paper, we proposed a novel, simple deep learning-based model to detect suicidal ideations in digital content, mainly focusing on chatbots as the primary data source. In addition, we provide a framework that employs the proposed suicide detection integration with a chatbot-based support system.
A Comparison of Bounding Box and Landmark Detection Methods for Video-Based Heart Rate Estimation
Authors: Authors: Laurence Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2401.01032
Pdf link: https://arxiv.org/pdf/2401.01032
Abstract Remote Photoplethysmography (rPPG) uses the cyclic variation of skin tone on a person's forehead region to estimate that person's heart rate. This paper compares two methods: a bounding box-based method and a landmark-detection-based method to estimate heart rate, and discovered that the landmark-based approach has a smaller variance in terms of model results with a standard deviation that is more than 4 times smaller (4.171 compared to 18.720).
Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models
Authors: Authors: Shuzheng Gao, Wenxin Mao, Cuiyun Gao, Li Li, Xing Hu, Xin Xia, Michael R. Lyu
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2401.01060
Pdf link: https://arxiv.org/pdf/2401.01060
Abstract Pre-trained code models have recently achieved substantial improvements in many code intelligence tasks. These models are first pre-trained on large-scale unlabeled datasets in a task-agnostic manner using self-supervised learning, and then fine-tuned on labeled datasets in downstream tasks. However, the labeled datasets are usually limited in size (i.e., human intensive efforts), which may hinder the performance of pre-trained code models in specific tasks. To mitigate this, one possible solution is to leverage the large-scale unlabeled data in the tuning stage by pseudo-labeling. However, directly employing the pseudo-labeled data can bring a large amount of noise, i.e., incorrect labels, leading to suboptimal performance. How to effectively leverage the noisy pseudo-labeled data is a challenging yet under-explored problem.In this paper, we propose a novel approach named HINT to improve pre-trained code models with large-scale unlabeled datasets by better utilizing the pseudo-labeled data. HINT includes two main modules: HybrId pseudo-labeled data selection and Noise-tolerant Training. In the hybrid pseudo-data selection module, considering the robustness issue, apart from directly measuring the quality of pseudo labels through training loss, we further propose to employ a retrieval-based method to filter low-quality pseudo-labeled data. The noise-tolerant training module aims to further mitigate the influence of errors in pseudo labels by training the model with a noise-tolerant loss function and by regularizing the consistency of model predictions.The experimental results show that HINT can better leverage those unlabeled data in a task-specific way and provide complementary benefits for pre-trained models, e.g., improving the best baseline model by 15.33%, 16.50%, and 8.98% on code summarization, defect detection, and assertion generation, respectively.
A Novel Dual-Stage Evolutionary Algorithm for Finding Robust Solutions
Authors: Authors: Wei Du, Wenxuan Fang, Chen Liang, Yang Tang, Yaochu Jin
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2401.01070
Pdf link: https://arxiv.org/pdf/2401.01070
Abstract In robust optimization problems, the magnitude of perturbations is relatively small. Consequently, solutions within certain regions are less likely to represent the robust optima when perturbations are introduced. Hence, a more efficient search process would benefit from increased opportunities to explore promising regions where global optima or good local optima are situated. In this paper, we introduce a novel robust evolutionary algorithm named the dual-stage robust evolutionary algorithm (DREA) aimed at discovering robust solutions. DREA operates in two stages: the peak-detection stage and the robust solution-searching stage. The primary objective of the peak-detection stage is to identify peaks in the fitness landscape of the original optimization problem. Conversely, the robust solution-searching stage focuses on swiftly identifying the robust optimal solution using information obtained from the peaks discovered in the initial stage. These two stages collectively enable the proposed DREA to efficiently obtain the robust optimal solution for the optimization problem. This approach achieves a balance between solution optimality and robustness by separating the search processes for optimal and robust optimal solutions. Experimental results demonstrate that DREA significantly outperforms five state-of-the-art algorithms across 18 test problems characterized by diverse complexities. Moreover, when evaluated on higher-dimensional robust optimization problems (100-$D$ and 200-$D$), DREA also demonstrates superior performance compared to all five counterpart algorithms.
Depth-discriminative Metric Learning for Monocular 3D Object Detection
Authors: Authors: Wonhyeok Choi, Mingyu Shin, Sunghoon Im
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01075
Pdf link: https://arxiv.org/pdf/2401.01075
Abstract Monocular 3D object detection poses a significant challenge due to the lack of depth information in RGB images. Many existing methods strive to enhance the object depth estimation performance by allocating additional parameters for object depth estimation, utilizing extra modules or data. In contrast, we introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes without increasing inference time and model size. Our method employs the distance-preserving function to organize the feature space manifold in relation to ground-truth object depth. The proposed (K, B, eps)-quasi-isometric loss leverages predetermined pairwise distance restriction as guidance for adjusting the distance among object descriptors without disrupting the non-linearity of the natural feature manifold. Moreover, we introduce an auxiliary head for object-wise depth estimation, which enhances depth quality while maintaining the inference time. The broad applicability of our method is demonstrated through experiments that show improvements in overall performance when integrated into various baselines. The results show that our method consistently improves the performance of various baselines by 23.51% and 5.78% on average across KITTI and Waymo, respectively.
PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization
Authors: Authors: Jiaming He, Mingrui Li, Yangyang Wang, Hongyu Wang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.01081
Pdf link: https://arxiv.org/pdf/2401.01081
Abstract Visual-inertial SLAM is essential in various fields, such as AR/VR, uncrewed aerial vehicles, industrial robots, and autonomous driving. The fusion of a camera and inertial measurement unit (IMU) can make up for the shortcomings of a signal sensor, which significantly improves the accuracy and robustness of localization in challenging environments. Robust tracking and accurate inertial parameter estimation are the basis for the stable operation of the system. This article presents PLE-SLAM, an entirely precise and real-time visual-inertial SLAM algorithm based on point-line features and efficient IMU initialization. First, we introduce line features in a point-based visual-inertial SLAM system. We use parallel computing methods to extract features and compute descriptors to ensure real-time performance. Second, the proposed system estimates gyroscope bias with rotation pre-integration and point and line observations. Accelerometer bias and gravity direction are solved by an analytical method. After initialization, all inertial parameters are refined through maximum a posteriori (MAP) estimation. Moreover, we open a dynamic feature elimination thread to improve the adaptability to dynamic environments and use CNN, bag-of-words and GNN to detect loops and match features. Excellent wide baseline matching capability of DNN-based matching method and illumination robustness significantly improve loop detection recall and loop inter-frame pose estimation. The front-end and back-end are designed for hardware acceleration. The experiments are performed on public datasets, and the results show that the proposed system is one of the state-of-the-art methods in complex scenarios.
Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector
Authors: Authors: Jitao Ma, Weiying Xie, Yunsong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01093
Pdf link: https://arxiv.org/pdf/2401.01093
Abstract Hyperspectral anomaly detection (HAD) aims to localize pixel points whose spectral features differ from the background. HAD is essential in scenarios of unknown or camouflaged target features, such as water quality monitoring, crop growth monitoring and camouflaged target detection, where prior information of targets is difficult to obtain. Existing HAD methods aim to objectively detect and distinguish background and anomalous spectra, which can be achieved almost effortlessly by human perception. However, the underlying processes of human visual perception are thought to be quite complex. In this paper, we analyze hyperspectral image (HSI) features under human visual perception, and transfer the solution process of HAD to the more robust feature space for the first time. Specifically, we propose a small target aware detector (STAD), which introduces saliency maps to capture HSI features closer to human visual perception. STAD not only extracts more anomalous representations, but also reduces the impact of low-confidence regions through a proposed small target filter (STF). Furthermore, considering the possibility of HAD algorithms being applied to edge devices, we propose a full connected network to convolutional network knowledge distillation strategy. It can learn the spectral and spatial features of the HSI while lightening the network. We train the network on the HAD100 training set and validate the proposed method on the HAD100 test set. Our method provides a new solution space for HAD that is closer to human visual perception with high confidence. Sufficient experiments on real HSI with multiple method comparisons demonstrate the excellent performance and unique potential of the proposed method. The code is available at https://github.com/majitao-xd/STAD-HAD.
CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series
Authors: Authors: Tianyuan Huang, Zejia Wu, Jiajun Wu, Jackelyn Hwang, Ram Rajagopal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01107
Pdf link: https://arxiv.org/pdf/2401.01107
Abstract Urban transformations have profound societal impact on both individuals and communities at large. Accurately assessing these shifts is essential for understanding their underlying causes and ensuring sustainable urban planning. Traditional measurements often encounter constraints in spatial and temporal granularity, failing to capture real-time physical changes. While street view imagery, capturing the heartbeat of urban spaces from a pedestrian point of view, can add as a high-definition, up-to-date, and on-the-ground visual proxy of urban change. We curate the largest street view time series dataset to date, and propose an end-to-end change detection model to effectively capture physical alterations in the built environment at scale. We demonstrate the effectiveness of our proposed method by benchmark comparisons with previous literature and implementing it at the city-wide level. Our approach has the potential to supplement existing dataset and serve as a fine-grained and accurate assessment of urban change.
Static Deadlock Detection for Rust Programs
Authors: Authors: Yu Zhang, Kaiwen Zhang, Guanjun Liu
Subjects: Programming Languages (cs.PL); Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2401.01114
Pdf link: https://arxiv.org/pdf/2401.01114
Abstract Rust relies on its unique ownership mechanism to ensure thread and memory safety. However, numerous potential security vulnerabilities persist in practical applications. New language features in Rust pose new challenges for vulnerability detection. This paper proposes a static deadlock detection method tailored for Rust programs, aiming to identify various deadlock types, including double lock, conflict lock, and deadlock associated with conditional variables. With due consideration for Rust's ownership and lifetimes, we first complete the pointer analysis. Then, based on the obtained points-to information, we analyze dependencies among variables to identify potential deadlocks. We develop a tool and conduct experiments based on the proposed method. The experimental results demonstrate that our method outperforms existing deadlock detection methods in precision.
Explainable Adaptive Tree-based Model Selection for Time Series Forecasting
Authors: Authors: Matthias Jakobs, Amal Saadallah
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.01124
Pdf link: https://arxiv.org/pdf/2401.01124
Abstract Tree-based models have been successfully applied to a wide variety of tasks, including time series forecasting. They are increasingly in demand and widely accepted because of their comparatively high level of interpretability. However, many of them suffer from the overfitting problem, which limits their application in real-world decision-making. This problem becomes even more severe in online-forecasting settings where time series observations are incrementally acquired, and the distributions from which they are drawn may keep changing over time. In this context, we propose a novel method for the online selection of tree-based models using the TreeSHAP explainability method in the task of time series forecasting. We start with an arbitrary set of different tree-based models. Then, we outline a performance-based ranking with a coherent design to make TreeSHAP able to specialize the tree-based forecasters across different regions in the input time series. In this framework, adequate model selection is performed online, adaptively following drift detection in the time series. In addition, explainability is supported on three levels, namely online input importance, model selection, and model output explanation. An extensive empirical study on various real-world datasets demonstrates that our method achieves excellent or on-par results in comparison to the state-of-the-art approaches as well as several baselines.
Joint Generative Modeling of Scene Graphs and Images via Diffusion Models
Authors: Authors: Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01130
Pdf link: https://arxiv.org/pdf/2401.01130
Abstract In this paper, we present a novel generative task: joint scene graph - image generation. While previous works have explored image generation conditioned on scene graphs or layouts, our task is distinctive and important as it involves generating scene graphs themselves unconditionally from noise, enabling efficient and interpretable control for image generation. Our task is challenging, requiring the generation of plausible scene graphs with heterogeneous attributes for nodes (objects) and edges (relations among objects), including continuous object bounding boxes and discrete object and relation categories. We introduce a novel diffusion model, DiffuseSG, that jointly models the adjacency matrix along with heterogeneous node and edge attributes. We explore various types of encodings for the categorical data, relaxing it into a continuous space. With a graph transformer being the denoiser, DiffuseSG successively denoises the scene graph representation in a continuous space and discretizes the final representation to generate the clean scene graph. Additionally, we introduce an IoU regularization to enhance the empirical performance. Our model significantly outperforms existing methods in scene graph generation on the Visual Genome and COCO-Stuff datasets, both on standard and newly introduced metrics that better capture the problem complexity. Moreover, we demonstrate the additional benefits of our model in two downstream applications: 1) excelling in a series of scene graph completion tasks, and 2) improving scene graph detection models by using extra training samples generated from DiffuseSG.
Hybrid Pooling and Convolutional Network for Improving Accuracy and Training Convergence Speed in Object Detection
Authors: Authors: Shiwen Zhao, Wei Wang, Junhui Hou, Hai Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01134
Pdf link: https://arxiv.org/pdf/2401.01134
Abstract This paper introduces HPC-Net, a high-precision and rapidly convergent object detection network.
Deep Learning-Based Detection for Marker Codes over Insertion and Deletion Channels
Authors: Authors: Guochen Ma, Xiaopeng Jiao, Jianjun Mu, Hui Han, Yaming Yang
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.01155
Pdf link: https://arxiv.org/pdf/2401.01155
Abstract Marker code is an effective coding scheme to protect data from insertions and deletions. It has potential applications in future storage systems, such as DNA storage and racetrack memory. When decoding marker codes, perfect channel state information (CSI), i.e., insertion and deletion probabilities, are required to detect insertion and deletion errors. Sometimes, the perfect CSI is not easy to obtain or the accurate channel model is unknown. Therefore, it is deserved to develop detecting algorithms for marker code without the knowledge of perfect CSI. In this paper, we propose two CSI-agnostic detecting algorithms for marker code based on deep learning. The first one is a model-driven deep learning method, which deep unfolds the original iterative detecting algorithm of marker code. In this method, CSI become weights in neural networks and these weights can be learned from training data. The second one is a data-driven method which is an end-to-end system based on the deep bidirectional gated recurrent unit network. Simulation results show that error performances of the proposed methods are significantly better than that of the original detection algorithm with CSI uncertainty. Furthermore, the proposed data-driven method exhibits better error performances than other methods for unknown channel models.
YOLO algorithm with hybrid attention feature pyramid network for solder joint defect detection
Authors: Authors: Li Ang, Siti Khatijah Nor Abdul Rahim, Raseeda Hamzah, Raihah Aminuddin, Gao Yousheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2401.01214
Pdf link: https://arxiv.org/pdf/2401.01214
Abstract Traditional manual detection for solder joint defect is no longer applied during industrial production due to low efficiency, inconsistent evaluation, high cost and lack of real-time data. A new approach has been proposed to address the issues of low accuracy, high false detection rates and computational cost of solder joint defect detection in surface mount technology of industrial scenarios. The proposed solution is a hybrid attention mechanism designed specifically for the solder joint defect detection algorithm to improve quality control in the manufacturing process by increasing the accuracy while reducing the computational cost. The hybrid attention mechanism comprises a proposed enhanced multi-head self-attention and coordinate attention mechanisms increase the ability of attention networks to perceive contextual information and enhances the utilization range of network features. The coordinate attention mechanism enhances the connection between different channels and reduces location information loss. The hybrid attention mechanism enhances the capability of the network to perceive long-distance position information and learn local features. The improved algorithm model has good detection ability for solder joint defect detection, with mAP reaching 91.5%, 4.3% higher than the You Only Look Once version 5 algorithm and better than other comparative algorithms. Compared to other versions, mean Average Precision, Precision, Recall, and Frame per Seconds indicators have also improved. The improvement of detection accuracy can be achieved while meeting real-time detection requirements.
Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond
Authors: Authors: Dimitrios Kollias, Viktoriia Sharmanska, Stefanos Zafeiriou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01219
Pdf link: https://arxiv.org/pdf/2401.01219
Abstract Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlapping annotations, or when there is big discrepancy in the size of labeled data per task. We explore task-relatedness for co-annotation and co-training, and propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching. To demonstrate the general applicability of our method, we conducted diverse case studies in the domains of affective computing, face recognition, species recognition, and shopping item classification using nine datasets. Our large-scale study of affective tasks for basic expression recognition and facial action unit detection illustrates that our approach is network agnostic and brings large performance improvements compared to the state-of-the-art in both tasks and across all studied databases. In all case studies, we show that co-training via task-relatedness is advantageous and prevents negative transfer (which occurs when MT model's performance is worse than that of at least one single-task model).
Deep Learning-Based Computational Model for Disease Identification in Cocoa Pods (Theobroma cacao L.)
Authors: Authors: Darlyn Buenaño Vera, Byron Oviedo, Washington Chiriboga Casanova, Cristian Zambrano-Vega
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01247
Pdf link: https://arxiv.org/pdf/2401.01247
Abstract The early identification of diseases in cocoa pods is an important task to guarantee the production of high-quality cocoa. The use of artificial intelligence techniques such as machine learning, computer vision and deep learning are promising solutions to help identify and classify diseases in cocoa pods. In this paper we introduce the development and evaluation of a deep learning computational model applied to the identification of diseases in cocoa pods, focusing on "monilia" and "black pod" diseases. An exhaustive review of state-of-the-art of computational models was carried out, based on scientific articles related to the identification of plant diseases using computer vision and deep learning techniques. As a result of the search, EfficientDet-Lite4, an efficient and lightweight model for object detection, was selected. A dataset, including images of both healthy and diseased cocoa pods, has been utilized to train the model to detect and pinpoint disease manifestations with considerable accuracy. Significant enhancements in the model training and evaluation demonstrate the capability of recognizing and classifying diseases through image analysis. Furthermore, the functionalities of the model were integrated into an Android native mobile with an user-friendly interface, allowing to younger or inexperienced farmers a fast and accuracy identification of health status of cocoa pods
$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy
Authors: Authors: Nicola Novello, Andrea M. Tonello
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.01268
Pdf link: https://arxiv.org/pdf/2401.01268
Abstract In deep learning, classification tasks are formalized as optimization problems solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the $f$-divergence measure to generalize the formulation of the optimization problem for classification. With this goal in mind, we adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem. We propose a class of objective functions based on the variational representation of the $f$-divergence, from which we extract a list of five posterior probability estimators leveraging well-known $f$-divergences. In addition, driven by the challenge of improving the state-of-the-art approach, we propose a bottom-up method that leads us to the formulation of a new objective function (and posterior probability estimator) corresponding to a novel $f$-divergence referred to as shifted log (SL). First, we theoretically prove the convergence property of the posterior probability estimators. Then, we numerically test the set of proposed objective functions in three application scenarios: toy examples, image data sets, and signal detection/decoding problems. The analyzed tasks demonstrate the effectiveness of the proposed estimators and that the SL divergence achieves the highest classification accuracy in almost all the scenarios.
LLbezpeky: Leveraging Large Language Models for Vulnerability Detection
Authors: Authors: Noble Saji Mathews, Yelizaveta Brus, Yousra Aafer, Mei Nagappan, Shane McIntosh
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2401.01269
Pdf link: https://arxiv.org/pdf/2401.01269
Abstract Despite the continued research and progress in building secure systems, Android applications continue to be ridden with vulnerabilities, necessitating effective detection methods. Current strategies involving static and dynamic analysis tools come with limitations like overwhelming number of false positives and limited scope of analysis which make either difficult to adopt. Over the past years, machine learning based approaches have been extensively explored for vulnerability detection, but its real-world applicability is constrained by data requirements and feature engineering challenges. Large Language Models (LLMs), with their vast parameters, have shown tremendous potential in understanding semnatics in human as well as programming languages. We dive into the efficacy of LLMs for detecting vulnerabilities in the context of Android security. We focus on building an AI-driven workflow to assist developers in identifying and rectifying vulnerabilities. Our experiments show that LLMs outperform our expectations in finding issues within applications correctly flagging insecure apps in 91.67% of cases in the Ghera benchmark. We use inferences from our experiments towards building a robust and actionable vulnerability detection system and demonstrate its effectiveness. Our experiments also shed light on how different various simple configurations can affect the True Positive (TP) and False Positive (FP) rates.
GEqO: ML-Accelerated Semantic Equivalence Detection
Authors: Authors: Brandon Haynes, Rana Alotaibi, Anna Pavlenko, Jyoti Leeka, Alekh Jindal, Yuanyuan Tian
Subjects: Databases (cs.DB); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.01280
Pdf link: https://arxiv.org/pdf/2401.01280
Abstract Large scale analytics engines have become a core dependency for modern data-driven enterprises to derive business insights and drive actions. These engines support a large number of analytic jobs processing huge volumes of data on a daily basis, and workloads are often inundated with overlapping computations across multiple jobs. Reusing common computation is crucial for efficient cluster resource utilization and reducing job execution time. Detecting common computation is the first and key step for reducing this computational redundancy. However, detecting equivalence on large-scale analytics engines requires efficient and scalable solutions that are fully automated. In addition, to maximize computation reuse, equivalence needs to be detected at the semantic level instead of just the syntactic level (i.e., the ability to detect semantic equivalence of seemingly different-looking queries). Unfortunately, existing solutions fall short of satisfying these requirements. In this paper, we take a major step towards filling this gap by proposing GEqO, a portable and lightweight machine-learning-based framework for efficiently identifying semantically equivalent computations at scale. GEqO introduces two machine-learning-based filters that quickly prune out nonequivalent subexpressions and employs a semi-supervised learning feedback loop to iteratively improve its model with an intelligent sampling mechanism. Further, with its novel database-agnostic featurization method, GEqO can transfer the learning from one workload and database to another. Our extensive empirical evaluation shows that, on TPC-DS-like queries, GEqO yields significant performance gains-up to 200x faster than automated verifiers-and finds up to 2x more equivalences than optimizer and signature-based equivalence detection approaches.
Experimental Validation of Sensor Fusion-based GNSS Spoofing Attack Detection Framework for Autonomous Vehicles
Authors: Authors: Sagar Dasgupta, Kazi Hassan Shakib, Mizanur Rahman
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.01304
Pdf link: https://arxiv.org/pdf/2401.01304
Abstract In this paper, we validate the performance of the a sensor fusion-based Global Navigation Satellite System (GNSS) spoofing attack detection framework for Autonomous Vehicles (AVs). To collect data, a vehicle equipped with a GNSS receiver, along with Inertial Measurement Unit (IMU) is used. The detection framework incorporates two strategies: The first strategy involves comparing the predicted location shift, which is the distance traveled between two consecutive timestamps, with the inertial sensor-based location shift. For this purpose, data from low-cost in-vehicle inertial sensors such as the accelerometer and gyroscope sensor are fused and fed into a long short-term memory (LSTM) neural network. The second strategy employs a Random-Forest supervised machine learning model to detect and classify turns, distinguishing between left and right turns using the output from the steering angle sensor. In experiments, two types of spoofing attack models: turn-by-turn and wrong turn are simulated. These spoofing attacks are modeled as SQL injection attacks, where, upon successful implementation, the navigation system perceives injected spoofed location information as legitimate while being unable to detect legitimate GNSS signals. Importantly, the IMU data remains uncompromised throughout the spoofing attack. To test the effectiveness of the detection framework, experiments are conducted in Tuscaloosa, AL, mimicking urban road structures. The results demonstrate the framework's ability to detect various sophisticated GNSS spoofing attacks, even including slow position drifting attacks. Overall, the experimental results showcase the robustness and efficacy of the sensor fusion-based spoofing attack detection approach in safeguarding AVs against GNSS spoofing threats.
Keyword: face recognition

Dual Teacher Knowledge Distillation with Domain Alignment for Face Anti-spoofing
Authors: Authors: Zhe Kong, Wentian Zhang, Tao Wang, Kaihao Zhang, Yuexiang Li, Xiaoying Tang, Wenhan Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01102
Pdf link: https://arxiv.org/pdf/2401.01102
Abstract Face recognition systems have raised concerns due to their vulnerability to different presentation attacks, and system security has become an increasingly critical concern. Although many face anti-spoofing (FAS) methods perform well in intra-dataset scenarios, their generalization remains a challenge. To address this issue, some methods adopt domain adversarial training (DAT) to extract domain-invariant features. However, the competition between the encoder and the domain discriminator can cause the network to be difficult to train and converge. In this paper, we propose a domain adversarial attack (DAA) method to mitigate the training instability problem by adding perturbations to the input images, which makes them indistinguishable across domains and enables domain alignment. Moreover, since models trained on limited data and types of attacks cannot generalize well to unknown attacks, we propose a dual perceptual and generative knowledge distillation framework for face anti-spoofing that utilizes pre-trained face-related models containing rich face priors. Specifically, we adopt two different face-related models as teachers to transfer knowledge to the target student model. The pre-trained teacher models are not from the task of face anti-spoofing but from perceptual and generative tasks, respectively, which implicitly augment the data. By combining both DAA and dual-teacher knowledge distillation, we develop a dual teacher knowledge distillation with domain alignment framework (DTDA) for face anti-spoofing. The advantage of our proposed method has been verified through extensive ablation studies and comparison with state-of-the-art methods on public datasets across multiple protocols.
Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond
Authors: Authors: Dimitrios Kollias, Viktoriia Sharmanska, Stefanos Zafeiriou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01219
Pdf link: https://arxiv.org/pdf/2401.01219
Abstract Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlapping annotations, or when there is big discrepancy in the size of labeled data per task. We explore task-relatedness for co-annotation and co-training, and propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching. To demonstrate the general applicability of our method, we conducted diverse case studies in the domains of affective computing, face recognition, species recognition, and shopping item classification using nine datasets. Our large-scale study of affective tasks for basic expression recognition and facial action unit detection illustrates that our approach is network agnostic and brings large performance improvements compared to the state-of-the-art in both tasks and across all studied databases. In all case studies, we show that co-training via task-relatedness is advantageous and prevents negative transfer (which occurs when MT model's performance is worse than that of at least one single-task model).
Keyword: augmentation

Data Augmentation Techniques for Cross-Domain WiFi CSI-based Human Activity Recognition
Authors: Authors: Julian Strohmayer, Martin Kampel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00964
Pdf link: https://arxiv.org/pdf/2401.00964
Abstract The recognition of human activities based on WiFi Channel State Information (CSI) enables contactless and visual privacy-preserving sensing in indoor environments. However, poor model generalization, due to varying environmental conditions and sensing hardware, is a well-known problem in this space. To address this issue, in this work, data augmentation techniques commonly used in image-based learning are applied to WiFi CSI to investigate their effects on model generalization performance in cross-scenario and cross-system settings. In particular, we focus on the generalization between line-of-sight (LOS) and non-line-of-sight (NLOS) through-wall scenarios, as well as on the generalization between different antenna systems, which remains under-explored. We collect and make publicly available a dataset of CSI amplitude spectrograms of human activities. Utilizing this data, an ablation study is conducted in which activity recognition models based on the EfficientNetV2 architecture are trained, allowing us to assess the effects of each augmentation on model generalization performance. The gathered results show that specific combinations of simple data augmentation techniques applied to CSI amplitude data can significantly improve cross-scenario and cross-system generalization.
Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning
Authors: Authors: Syed Muhammad Aamir, Hongbin Ma, Malak Abid Ali Khan, Muhammad Aaqib
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00986
Pdf link: https://arxiv.org/pdf/2401.00986
Abstract Detection of small, undetermined moving objects or objects in an occluded environment with a cluttered background is the main problem of computer vision. This greatly affects the detection accuracy of deep learning models. To overcome these problems, we concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background employing SSD and YOLO algorithms and improved precision of detection and reduce problems faced by these models. The developed method makes the custom dataset and employs a preprocessing technique to clean the noisy dataset. For training the developed model we apply the data augmentation technique to balance and diversify the data. We fine-tuned, trained, and evaluated these models on the established dataset by applying these techniques and highlighting the results we got more accurately than without applying these techniques. The accuracy and frame per second of the SSD-Mobilenet v2 model are higher than YOLO V3 and YOLO V4. Furthermore, by employing various techniques like data enhancement, noise reduction, parameter optimization, and model fusion we improve the effectiveness of detection and recognition. We further added a counting algorithm, and target attributes experimental comparison, and made a graphical user interface system for the developed model with features of object counting, alerts, status, resolution, and frame per second. Subsequently, to justify the importance of the developed method analysis of YOLO V3, V4, and SSD were incorporated. Which resulted in the overall completion of the proposed method.
Small Bird Detection using YOLOv7 with Test-Time Augmentation
Authors: Authors: Kosuke Shigematsu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01018
Pdf link: https://arxiv.org/pdf/2401.01018
Abstract In this paper, we propose a method specifically aimed at improving small bird detection for the Small Object Detection Challenge for Spotting Birds 2023. Utilizing YOLOv7 model with test-time augmentation, our approach involves increasing the input resolution, incorporating multiscale inference, considering flipped images during the inference process, and employing weighted boxes fusion to merge detection results. We rigorously explore the impact of each technique on detection performance. Experimental results demonstrate significant improvements in detection accuracy. Our method achieved a top score in the Development category, with a public AP of 0.732 and a private AP of 27.2, both at IoU=0.5.
Optimally or almost optimally extendable self-orthogonal codes and locally recoverable codes from some functions over finite fields
Authors: Authors: Ziling Heng, Xiaoru Li, Yansheng Wu, Qi Wang
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.01135
Pdf link: https://arxiv.org/pdf/2401.01135
Abstract Linear codes are widely studied in coding theory as they have nice applications in distributed storage, combinatorics, lattices, cryptography and so on. Constructing linear codes with desirable properties is an interesting research topic. In this paper, based on the augmentation technique, we present two families of linear codes from some functions over finite fields. The first family of linear codes is constructed from monomial functions over finite fields. The locality of them is determined and the weight distributions of two subfamilies of the codes are also given. An infinite family of almost optimal recoverable codes and some optimal recoverable codes are obtained from the linear codes. In particular, the two subfamilies of the codes are proved to be both optimally or almost optimally extendable and self-orthogonal. The second family of linear codes is constructed from weakly regular bent functions over finite fields and their weight distribution is determined. This family of codes is proved to have locality 3 for some cases and is conjectured to have locality 2 for other cases. Particularly, two families of optimal locally recoverable codes are derived from the linear codes. Besides, this family of codes is also proved to be both optimally or almost optimally extendable and self-orthogonal.

LeeKyungwook / get-arxiv-noti

New submissions for Wed, 3 Jan 24 #917

Keyword: detection

PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance Fields

A Bayesian Unification of Self-Supervised Clustering and Energy-Based Models

Balanced Graph Structure Information for Brain Disease Detection

Social-LLM: Modeling User Behavior at Scale using Language Models and Social Network Data

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases

Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from Data-centric Perspective

Downstream Task-Oriented Generative Model Selections on Synthetic Data Training for Fraud Detection Models

Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning

Detection and Defense Against Prominent Attacks on Preconditioned LLM-Integrated Virtual Assistants

AI Mobile Application for Archaeological Dating of Bronze Dings

Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt

Boosting Transformer's Robustness and Efficacy in PPG Signal Artifact Detection with Self-Supervised Learning

Small Bird Detection using YOLOv7 with Test-Time Augmentation

Class Relevance Learning For Out-of-distribution Detection

CautionSuicide: A Deep Learning Based Approach for Detecting Suicidal Ideation in Real Time Chatbot Conversation

A Comparison of Bounding Box and Landmark Detection Methods for Video-Based Heart Rate Estimation

Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models

A Novel Dual-Stage Evolutionary Algorithm for Finding Robust Solutions

Depth-discriminative Metric Learning for Monocular 3D Object Detection

PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization

Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector

CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series

Static Deadlock Detection for Rust Programs

Explainable Adaptive Tree-based Model Selection for Time Series Forecasting

Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

Hybrid Pooling and Convolutional Network for Improving Accuracy and Training Convergence Speed in Object Detection

Deep Learning-Based Detection for Marker Codes over Insertion and Deletion Channels

YOLO algorithm with hybrid attention feature pyramid network for solder joint defect detection

Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond

Deep Learning-Based Computational Model for Disease Identification in Cocoa Pods (Theobroma cacao L.)

$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy

LLbezpeky: Leveraging Large Language Models for Vulnerability Detection

GEqO: ML-Accelerated Semantic Equivalence Detection

Experimental Validation of Sensor Fusion-based GNSS Spoofing Attack Detection Framework for Autonomous Vehicles

Keyword: face recognition

Dual Teacher Knowledge Distillation with Domain Alignment for Face Anti-spoofing

Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond

Keyword: augmentation

Data Augmentation Techniques for Cross-Domain WiFi CSI-based Human Activity Recognition

Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning

Small Bird Detection using YOLOv7 with Test-Time Augmentation

Optimally or almost optimally extendable self-orthogonal codes and locally recoverable codes from some functions over finite fields