New submissions for Thu, 8 Feb 24

Keyword: detection

Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources

Authors: Authors: Jinlong Li, Baolu Li, Xinyu Liu, Runsheng Xu, Jiaqi Ma, Hongkai Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.04273
Pdf link: https://arxiv.org/pdf/2402.04273
Abstract The diverse agents in multi-agent perception systems may be from different companies. Each company might use the identical classic neural network architecture based encoder for feature extraction. However, the data source to train the various agents is independent and private in each company, leading to the Distribution Gap of different private data for training distinct agents in multi-agent perception system. The data silos by the above Distribution Gap could result in a significant performance decline in multi-agent perception. In this paper, we thoroughly examine the impact of the distribution gap on existing multi-agent perception systems. To break the data silos, we introduce the Feature Distribution-aware Aggregation (FDA) framework for cross-domain learning to mitigate the above Distribution Gap in multi-agent perception. FDA comprises two key components: Learnable Feature Compensation Module and Distribution-aware Statistical Consistency Module, both aimed at enhancing intermediate features to minimize the distribution gap among multi-agent features. Intensive experiments on the public OPV2V and V2XSet datasets underscore FDA's effectiveness in point cloud-based 3D object detection, presenting it as an invaluable augmentation to existing multi-agent perception systems.
Road Surface Defect Detection -- From Image-based to Non-image-based: A Survey
Authors: Authors: Jongmin Yu, Jiaqi Jiang, Sebastiano Fichera, Paolo Paoletti, Lisa Layzell, Devansh Mehta, Shan Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04297
Pdf link: https://arxiv.org/pdf/2402.04297
Abstract Ensuring traffic safety is crucial, which necessitates the detection and prevention of road surface defects. As a result, there has been a growing interest in the literature on the subject, leading to the development of various road surface defect detection methods. The methods for detecting road defects can be categorised in various ways depending on the input data types or training methodologies. The predominant approach involves image-based methods, which analyse pixel intensities and surface textures to identify defects. Despite their popularity, image-based methods share the distinct limitation of vulnerability to weather and lighting changes. To address this issue, researchers have explored the use of additional sensors, such as laser scanners or LiDARs, providing explicit depth information to enable the detection of defects in terms of scale and volume. However, the exploration of data beyond images has not been sufficiently investigated. In this survey paper, we provide a comprehensive review of road surface defect detection studies, categorising them based on input data types and methodologies used. Additionally, we review recently proposed non-image-based methods and discuss several challenges and open problems associated with these techniques.
3D printer-controlled syringe pumps for dual, active, regulable and simultaneous dispensing of reagents. Manufacturing of immunochromatographic test strips
Authors: Authors: Gabriel Siano, Leandro Peretti, Juan Manuel Marquez, Nazarena Pujato, Leonardo Giovanini, Claudio Berli
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04354
Pdf link: https://arxiv.org/pdf/2402.04354
Abstract Lateral flow immunoassays (LFIA) are widely used worldwide for the detection of different analytes because they combine multiple advantages such as low production cost, simplicity, and portability, which allows biomarkers detection without requiring infrastructure or highly trained personnel. Here we propose to provide solutions to the manufacturing process of LFIA at laboratory-scale, particularly to the controlled and active dispensing of the reagents in the form the Test Lines (TL) and the Control Lines (CL). To accomplish this task, we adapted a 3D printer to also control Syringe Pumps (SP), since the proposed adaptation of a 3D printer is easy, free and many laboratories already have it in their infrastructure. In turn, the standard function of the 3D printer can be easily restored by disconnecting the SPs and reconnecting the extruder. Additionally, the unified control of the 3D printer enables dual, active, regulable and simultaneous dispensing, four features that are typically found only in certain high-cost commercial equipment. With the proposed setup, the challenge of dispensing simultaneously at least 2 lines (CL and TL) with SPs controlled by a 3D printer was addressed, including regulation in the width of dispensed lines within experimental limits. Also, the construction of a LFIA for the detection of leptospirosis is shown as a practical example of automatized reagent dispensing.
Detection Transformer for Teeth Detection, Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques
Authors: Authors: Hocine Kadi, Théo Sourget, Marzena Kawczynski, Sara Bendjama, Bruno Grollemund, Agnès Bloch-Zupan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04408
Pdf link: https://arxiv.org/pdf/2402.04408
Abstract In this work, we focused on deep learning image processing in the context of oral rare diseases, which pose challenges due to limited data availability. A crucial step involves teeth detection, segmentation and numbering in panoramic radiographs. To this end, we used a dataset consisting of 156 panoramic radiographs from individuals with rare oral diseases and labeled by experts. We trained the Detection Transformer (DETR) neural network for teeth detection, segmentation, and numbering the 52 teeth classes. In addition, we used data augmentation techniques, including geometric transformations. Finally, we generated new panoramic images using inpainting techniques with stable diffusion, by removing teeth from a panoramic radiograph and integrating teeth into it. The results showed a mAP exceeding 0,69 for DETR without data augmentation. The mAP was improved to 0,82 when data augmentation techniques are used. Furthermore, we observed promising performances when using new panoramic radiographs generated with inpainting technique, with mAP of 0,76.
Optimal Binary Signaling for a Two Sensor Gaussian MAC Network
Authors: Authors: Luca Sardellitti, Glen Takahara, Fady Alajaji
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2402.04424
Pdf link: https://arxiv.org/pdf/2402.04424
Abstract We consider a two sensor distributed detection system transmitting a binary non-uniform source over a Gaussian multiple access channel (MAC). We model the network via binary sensors whose outputs are generated by binary symmetric channels of different noise levels. We prove an optimal one dimensional constellation design under individual sensor power constraints which minimizes the error probability of detecting the source. Three distinct cases arise for this optimization based on the parameters in the problem setup. In the most notable case (Case III), the optimal signaling design is to not necessarily use all of the power allocated to the more noisy sensor (with less correlation to the source). We compare the error performance of the optimal one dimensional constellation to orthogonal signaling. The results show that the optimal one dimensional constellation achieves lower error probability than using orthogonal channels.
BAdaCost: Multi-class Boosting with Costs
Authors: Authors: Antonio Fernández-Baldera, José M. Buenaposada, Luis Baumela
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04465
Pdf link: https://arxiv.org/pdf/2402.04465
Abstract We present BAdaCost, a multi-class cost-sensitive classification algorithm. It combines a set of cost-sensitive multi-class weak learners to obtain a strong classification rule within the Boosting framework. To derive the algorithm we introduce CMEL, a Cost-sensitive Multi-class Exponential Loss that generalizes the losses optimized in various classification algorithms such as AdaBoost, SAMME, Cost-sensitive AdaBoost and PIBoost. Hence unifying them under a common theoretical framework. In the experiments performed we prove that BAdaCost achieves significant gains in performance when compared to previous multi-class cost-sensitive approaches. The advantages of the proposed algorithm in asymmetric multi-class classification are also evaluated in practical multi-view face and car detection problems.
IoT Network Traffic Analysis with Deep Learning
Authors: Authors: Mei Liu, Leon Yang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2402.04469
Pdf link: https://arxiv.org/pdf/2402.04469
Abstract As IoT networks become more complex and generate massive amounts of dynamic data, it is difficult to monitor and detect anomalies using traditional statistical methods and machine learning methods. Deep learning algorithms can process and learn from large amounts of data and can also be trained using unsupervised learning techniques, meaning they don't require labelled data to detect anomalies. This makes it possible to detect new and unknown anomalies that may not have been detected before. Also, deep learning algorithms can be automated and highly scalable; thereby, they can run continuously in the backend and make it achievable to monitor large IoT networks instantly. In this work, we conduct a literature review on the most recent works using deep learning techniques and implement a model using ensemble techniques on the KDD Cup 99 dataset. The experimental results showcase the impressive performance of our deep anomaly detection model, achieving an accuracy of over 98\%.
M2fNet: Multi-modal Forest Monitoring Network on Large-scale Virtual Dataset
Authors: Authors: Yawen Lu, Yunhan Huang, Su Sun, Tansi Zhang, Xuewen Zhang, Songlin Fei, Victor Chen
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2402.04534
Pdf link: https://arxiv.org/pdf/2402.04534
Abstract Forest monitoring and education are key to forest protection, education and management, which is an effective way to measure the progress of a country's forest and climate commitments. Due to the lack of a large-scale wild forest monitoring benchmark, the common practice is to train the model on a common outdoor benchmark (e.g., KITTI) and evaluate it on real forest datasets (e.g., CanaTree100). However, there is a large domain gap in this setting, which makes the evaluation and deployment difficult. In this paper, we propose a new photorealistic virtual forest dataset and a multimodal transformer-based algorithm for tree detection and instance segmentation. To the best of our knowledge, it is the first time that a multimodal detection and segmentation algorithm is applied to large-scale forest scenes. We believe that the proposed dataset and method will inspire the simulation, computer vision, education, and forestry communities towards a more comprehensive multi-modal understanding.
MuNES: Multifloor Navigation Including Elevators and Stairs
Authors: Authors: Donghwi Jung, Chan Kim, Jae-Kyung Cho, Seong-Woo Kim
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.04535
Pdf link: https://arxiv.org/pdf/2402.04535
Abstract We propose a scheme called MuNES for single mapping and trajectory planning including elevators and stairs. Optimized multifloor trajectories are important for optimal interfloor movements of robots. However, given two or more options of moving between floors, it is difficult to select the best trajectory because there are no suitable indoor multifloor maps in the existing methods. To solve this problem, MuNES creates a single multifloor map including elevators and stairs by estimating altitude changes based on pressure data. In addition, the proposed method performs floor-based loop detection for faster and more accurate loop closure. The single multifloor map is then voxelized leaving only the parts needed for trajectory planning. An optimal and realistic multifloor trajectory is generated by exploring the voxels using an A* algorithm based on the proposed cost function, which affects realistic factors. We tested this algorithm using data acquired from around a campus and note that a single accurate multifloor map could be created. Furthermore, optimal and realistic multifloor trajectory could be found by selecting the means of motion between floors between elevators and stairs according to factors such as the starting point, ending point, and elevator waiting time. The code and data used in this work are available at https://github.com/donghwijung/MuNES.
FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models
Authors: Authors: Chuhao Liu, Ke Wang, Jieqi Shi, Zhijian Qiao, Shaojie Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.04555
Pdf link: https://arxiv.org/pdf/2402.04555
Abstract Semantic mapping based on the supervised object detectors is sensitive to image distribution. In real-world environments, the object detection and segmentation performance can lead to a major drop, preventing the use of semantic mapping in a wider domain. On the other hand, the development of vision-language foundation models demonstrates a strong zero-shot transferability across data distribution. It provides an opportunity to construct generalizable instance-aware semantic maps. Hence, this work explores how to boost instance-aware semantic mapping from object detection generated from foundation models. We propose a probabilistic label fusion method to predict close-set semantic classes from open-set label measurements. An instance refinement module merges the over-segmented instances caused by inconsistent segmentation. We integrate all the modules into a unified semantic mapping system. Reading a sequence of RGB-D input, our work incrementally reconstructs an instance-aware semantic map. We evaluate the zero-shot performance of our method in ScanNet and SceneNN datasets. Our method achieves 40.3 mean average precision (mAP) on the ScanNet semantic instance segmentation task. It outperforms the traditional semantic mapping method significantly.
OIL-AD: An Anomaly Detection Framework for Sequential Decision Sequences
Authors: Authors: Chen Wang, Sarah Erfani, Tansu Alpcan, Christopher Leckie
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.04567
Pdf link: https://arxiv.org/pdf/2402.04567
Abstract Anomaly detection in decision-making sequences is a challenging problem due to the complexity of normality representation learning and the sequential nature of the task. Most existing methods based on Reinforcement Learning (RL) are difficult to implement in the real world due to unrealistic assumptions, such as having access to environment dynamics, reward signals, and online interactions with the environment. To address these limitations, we propose an unsupervised method named Offline Imitation Learning based Anomaly Detection (OIL-AD), which detects anomalies in decision-making sequences using two extracted behaviour features: action optimality and sequential association. Our offline learning model is an adaptation of behavioural cloning with a transformer policy network, where we modify the training process to learn a Q function and a state value function from normal trajectories. We propose that the Q function and the state value function can provide sufficient information about agents' behavioural data, from which we derive two features for anomaly detection. The intuition behind our method is that the action optimality feature derived from the Q function can differentiate the optimal action from others at each local state, and the sequential association feature derived from the state value function has the potential to maintain the temporal correlations between decisions (state-action pairs). Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in F1 score over comparable baselines.
Ransomware Detection Dynamics: Insights and Implications
Authors: Authors: Mike Nkongolo
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2402.04594
Pdf link: https://arxiv.org/pdf/2402.04594
Abstract The rise of ransomware attacks has necessitated the development of effective strategies for identifying and mitigating these threats. This research investigates the utilization of a feature selection algorithm for distinguishing ransomware-related and benign transactions in both Bitcoin (BTC) and United States Dollar (USD). Leveraging the UGRansome dataset, a comprehensive repository of ransomware related BTC and USD transactions, we propose a set of novel features designed to capture the distinct characteristics of ransomware activity within the cryptocurrency ecosystem. These features encompass transaction metadata, ransom analysis, and behavioral patterns, offering a multifaceted view of ransomware-related financial transactions. Through rigorous experimentation and evaluation, we demonstrate the effectiveness of our feature set in accurately extracting BTC and USD transactions, thereby aiding in the early detection and prevention of ransomware-related financial flows. We introduce a Ransomware Feature Selection Algorithm (RFSA) based on Gini Impurity and Mutual Information (MI) for selecting crucial ransomware features from the UGRansome dataset. Insights from the visualization highlight the potential of Gini Impurity and MI-based feature selection to enhance ransomware detection systems by effectively discriminating between ransomware classes. The analysis reveals that approximately 68% of ransomware incidents involve BTC transactions within the range of 1.46 to 2.56, with an average of 2.01 BTC transactions per attack. The findings emphasize the dynamic and adaptable nature of ransomware demands, suggesting that there is no fixed amount for specific cyberattacks, highlighting the evolving landscape of ransomware threats.
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
Authors: Authors: Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, Shijian Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04630
Pdf link: https://arxiv.org/pdf/2402.04630
Abstract Inspired by the outstanding zero-shot capability of vision language models (VLMs) in image classification tasks, open-vocabulary object detection has attracted increasing interest by distilling the broad VLM knowledge into detector training. However, most existing open-vocabulary detectors learn by aligning region embeddings with categorical labels (e.g., bicycle) only, disregarding the capability of VLMs on aligning visual embeddings with fine-grained text description of object parts (e.g., pedals and bells). This paper presents DVDet, a Descriptor-Enhanced Open Vocabulary Detector that introduces conditional context prompts and hierarchical textual descriptors that enable precise region-text alignment as well as open-vocabulary detection training in general. Specifically, the conditional context prompt transforms regional embeddings into image-like representations that can be directly integrated into general open vocabulary detection training. In addition, we introduce large language models as an interactive and implicit knowledge repository which enables iterative mining and refining visually oriented textual descriptors for precise region-text alignment. Extensive experiments over multiple large-scale benchmarks show that DVDet outperforms the state-of-the-art consistently by large margins.
G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection
Authors: Authors: Fan Wu, Jinling Gao, Lanqing Hong, Xinbing Wang, Chenghu Zhou, Nanyang Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04672
Pdf link: https://arxiv.org/pdf/2402.04672
Abstract In this paper, we focus on a realistic yet challenging task, Single Domain Generalization Object Detection (S-DGOD), where only one source domain's data can be used for training object detectors, but have to generalize multiple distinct target domains. In S-DGOD, both high-capacity fitting and generalization abilities are needed due to the task's complexity. Differentiable Neural Architecture Search (NAS) is known for its high capacity for complex data fitting and we propose to leverage Differentiable NAS to solve S-DGOD. However, it may confront severe over-fitting issues due to the feature imbalance phenomenon, where parameters optimized by gradient descent are biased to learn from the easy-to-learn features, which are usually non-causal and spuriously correlated to ground truth labels, such as the features of background in object detection data. Consequently, this leads to serious performance degradation, especially in generalizing to unseen target domains with huge domain gaps between the source domain and target domains. To address this issue, we propose the Generalizable loss (G-loss), which is an OoD-aware objective, preventing NAS from over-fitting by using gradient descent to optimize parameters not only on a subset of easy-to-learn features but also the remaining predictive features for generalization, and the overall framework is named G-NAS. Experimental results on the S-DGOD urban-scene datasets demonstrate that the proposed G-NAS achieves SOTA performance compared to baseline methods. Codes are available at https://github.com/wufan-cse/G-NAS.
Source Identification in Abstractive Summarization
Authors: Authors: Yoshi Suhara, Dimitris Alikaniotis
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.04677
Pdf link: https://arxiv.org/pdf/2402.04677
Abstract Neural abstractive summarization models make summaries in an end-to-end manner, and little is known about how the source information is actually converted into summaries. In this paper, we define input sentences that contain essential information in the generated summary as $\textit{source sentences}$ and study how abstractive summaries are made by analyzing the source sentences. To this end, we annotate source sentences for reference summaries and system summaries generated by PEGASUS on document-summary pairs sampled from the CNN/DailyMail and XSum datasets. We also formulate automatic source sentence detection and compare multiple methods to establish a strong baseline for the task. Experimental results show that the perplexity-based method performs well in highly abstractive settings, while similarity-based methods perform robustly in relatively extractive settings. Our code and data are available at https://github.com/suhara/sourcesum.
Detection Schemes with Low-Resolution ADCs and Spatial Oversampling for Transmission with Higher-Order Constellations in the Terahertz Band
Authors: Authors: Christian Forsch, Peter Zillmann, Osama Alrabadi, Stefan Brueck, Wolfgang Gerstacker
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2402.04728
Pdf link: https://arxiv.org/pdf/2402.04728
Abstract In this work, we consider Terahertz (THz) communications with low-resolution uniform quantization and spatial oversampling at the receiver side. We compare different analog-to-digital converter (ADC) parametrizations in a fair manner by keeping the ADC power consumption constant. Here, 1-, 2-, and 3-bit quantization is investigated with different oversampling factors. We analytically compute the statistics of the detection variable, and we propose the optimal as well as several suboptimal detection schemes for arbitrary quantization resolutions. Then, we evaluate the symbol error rate (SER) of the different detectors for a 16- and a 64-ary quadrature amplitude modulation (QAM) constellation. The results indicate that there is a noticeable performance degradation of the suboptimal detection schemes compared to the optimal scheme when the constellation size is larger than the number of quantization levels. Furthermore, at low signal-to-noise ratios (SNRs), 1-bit quantization outperforms 2- and 3-bit quantization, respectively, even when employing higher-order constellations. We confirm our analytical results by Monte Carlo simulations. Both a pure line-of-sight (LoS) and a more realistically modeled indoor THz channel are considered. Then, we optimize the input signal constellation with respect to SER for 1-bit quantization. The results show that the minimum SER can be lowered significantly for 16-QAM by increasing the distance between the inner and outer points of the input constellation. For larger constellations, however, the achievable reduction of the minimum SER is much smaller compared to 16-QAM.
Review of Cetacean's click detection algorithms
Authors: Authors: Mak Gracic, Guy Gubnisky, Roee Diamant
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2402.04735
Pdf link: https://arxiv.org/pdf/2402.04735
Abstract The detection of echolocation clicks is key in understanding the intricate behaviors of cetaceans and monitoring their populations. Cetacean species relying on clicks for navigation, foraging and even communications are sperm whales (Physeter macrocephalus) and a variety of dolphin groups. Echolocation clicks are wideband signals of short duration that are often emitted in sequences of varying inter-click-intervals. While datasets and models for clicks exist, the detection and classification of clicks present a significant challenge, mostly due to the diversity of clicks' structures, overlapping signals from simultaneously emitting animals, and the abundance of noise transients from, for example, snapping shrimps and shipping cavitation noise. This paper provides a survey of the many detection and classification methodologies of clicks, ranging from 2002 to 2023. We divide the surveyed techniques into categories by their methodology. Specifically, feature analysis (e.g., phase, ICI and duration), frequency content, energy based detection, supervised and unsupervised machine learning, template matching and adaptive detection approaches. Also surveyed are open access platforms for click detections, and databases openly available for testing. Details of the method applied for each paper are given along with advantages and limitations, and for each category we analyze the remaining challenges. The paper also includes a performance comparison for several schemes over a shared database. Finally, we provide tables summarizing the existing detection schemes in terms of challenges address, methods, detection and classification tools applied, features used and applications.
Color Recognition in Challenging Lighting Environments: CNN Approach
Authors: Authors: Nizamuddin Maitlo, Nooruddin Noonari, Sajid Ahmed Ghanghro, Sathishkumar Duraisamy, Fayaz Ahmed
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.04762
Pdf link: https://arxiv.org/pdf/2402.04762
Abstract Light plays a vital role in vision either human or machine vision, the perceived color is always based on the lighting conditions of the surroundings. Researchers are working to enhance the color detection techniques for the application of computer vision. They have implemented proposed several methods using different color detection approaches but still, there is a gap that can be filled. To address this issue, a color detection method, which is based on a Convolutional Neural Network (CNN), is proposed. Firstly, image segmentation is performed using the edge detection segmentation technique to specify the object and then the segmented object is fed to the Convolutional Neural Network trained to detect the color of an object in different lighting conditions. It is experimentally verified that our method can substantially enhance the robustness of color detection in different lighting conditions, and our method performed better results than existing methods.
Multiple bipolar fuzzy measures: an application to community detection problems for networks with additional information
Authors: Authors: Inmaculada Gutiérrez, Daniel Gómez, Javier Castro, Rosa Espínola
Subjects: Social and Information Networks (cs.SI); Statistics Theory (math.ST); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2402.04786
Pdf link: https://arxiv.org/pdf/2402.04786
Abstract In this paper we introduce the concept of multiple bipolar fuzzy measures as a generalization of a bipolar fuzzy measure. We also propose a new definition of a group, which is based on the multidimensional bipolar fuzzy relations of its elements. Taking into account this information, we provide a novel procedure (based on the well-known Louvain algorithm) to deal with community detection problems. This new method considers the multidimensional bipolar information provided by multiple bipolar fuzzy measures, as well as the information provided by a graph. We also give some detailed computational tests, obtained from the application of this algorithm in several benchmark models.
How Realistic Is Your Synthetic Data? Constraining Deep Generative Models for Tabular Data
Authors: Authors: Mihaela Cătălina Stoian, Salijona Dyrmishi, Maxime Cordy, Thomas Lukasiewicz, Eleonora Giunchiglia
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.04823
Pdf link: https://arxiv.org/pdf/2402.04823
Abstract Deep Generative Models (DGMs) have been shown to be powerful tools for generating tabular data, as they have been increasingly able to capture the complex distributions that characterize them. However, to generate realistic synthetic data, it is often not enough to have a good approximation of their distribution, as it also requires compliance with constraints that encode essential background knowledge on the problem at hand. In this paper, we address this limitation and show how DGMs for tabular data can be transformed into Constrained Deep Generative Models (C-DGMs), whose generated samples are guaranteed to be compliant with the given constraints. This is achieved by automatically parsing the constraints and transforming them into a Constraint Layer (CL) seamlessly integrated with the DGM. Our extensive experimental analysis with various DGMs and tasks reveals that standard DGMs often violate constraints, some exceeding $95\%$ non-compliance, while their corresponding C-DGMs are never non-compliant. Then, we quantitatively demonstrate that, at training time, C-DGMs are able to exploit the background knowledge expressed by the constraints to outperform their standard counterparts with up to $6.5\%$ improvement in utility and detection. Further, we show how our CL does not necessarily need to be integrated at training time, as it can be also used as a guardrail at inference time, still producing some improvements in the overall performance of the models. Finally, we show that our CL does not hinder the sample generation time of the models.
Advancing Anomaly Detection: An Adaptation Model and a New Dataset
Authors: Authors: Liyun Zhu, Arjun Raj, Lei Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04857
Pdf link: https://arxiv.org/pdf/2402.04857
Abstract Industry surveillance is widely applicable in sectors like retail, manufacturing, education, and smart cities, each presenting unique anomalies requiring specialized detection. However, adapting anomaly detection models to novel viewpoints within the same scenario poses challenges. Extending these models to entirely new scenarios necessitates retraining or fine-tuning, a process that can be time consuming. To address these challenges, we propose the Scenario-Adaptive Anomaly Detection (SA2D) method, leveraging the few-shot learning framework for faster adaptation of pre-trained models to new concepts. Despite this approach, a significant challenge emerges from the absence of a comprehensive dataset with diverse scenarios and camera views. In response, we introduce the Multi-Scenario Anomaly Detection (MSAD) dataset, encompassing 14 distinct scenarios captured from various camera views. This real-world dataset is the first high-resolution anomaly detection dataset, offering a solid foundation for training superior models. MSAD includes diverse normal motion patterns, incorporating challenging variations like different lighting and weather conditions. Through experimentation, we validate the efficacy of SA2D, particularly when trained on the MSAD dataset. Our results show that SA2D not only excels under novel viewpoints within the same scenario but also demonstrates competitive performance when faced with entirely new scenarios. This highlights our method's potential in addressing challenges in detecting anomalies across diverse and evolving surveillance scenarios.
STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation
Authors: Authors: Peter Hönig, Stefan Thalhammer, Jean-Baptiste Weibel, Matthias Hirschmanner, Markus Vincze
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04878
Pdf link: https://arxiv.org/pdf/2402.04878
Abstract Recent advances in machine learning have greatly benefited object detection and 6D pose estimation for robotic grasping. However, textureless and metallic objects still pose a significant challenge due to fewer visual cues and the texture bias of CNNs. To address this issue, we propose a texture-agnostic approach that focuses on learning from CAD models and emphasizes object shape features. To achieve a focus on learning shape features, the textures are randomized during the rendering of the training data. By treating the texture as noise, the need for real-world object instances or their final appearance during training data generation is eliminated. The TLESS and ITODD datasets, specifically created for industrial settings in robotics and featuring textureless and metallic objects, were used for evaluation. Texture agnosticity also increases the robustness against image perturbations such as imaging noise, motion blur, and brightness changes, which are common in robotics applications. Code and datasets are publicly available at github.com/hoenigpeter/randomized_texturing.
Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration
Authors: Authors: Chaoqun Wang, Yiran Qin, Zijian Kang, Ningning Ma, Ruimao Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04883
Pdf link: https://arxiv.org/pdf/2402.04883
Abstract Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces, as well as the accuracy of object localization within the 3D space. This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization. Different from previous methods which directly predict depth distributions by using a supervised estimation model, we propose a cascade framework consisting of two depth-aware learning paradigms. First, a depth estimation (DE) scheme leverages relative depth information to realize the effective feature lifting from 2D to 3D spaces. Furthermore, a depth calibration (DC) scheme introduces depth reconstruction to further adjust the 3D object localization perturbation along the depth axis. In practice, the DE is explicitly realized by using both the absolute and relative depth optimization loss to promote the precision of depth prediction, while the capability of DC is implicitly embedded into the detection Transformer through a depth denoising mechanism in the training phase. The entire model training is accomplished through an end-to-end manner. We propose a baseline detector and evaluate the effectiveness of our proposal with +2.2%/+2.7% NDS/mAP improvements on NuScenes benchmark, and gain a comparable performance with 55.9%/45.7% NDS/mAP. Furthermore, we conduct extensive experiments to demonstrate its generality based on various detectors with about +2% NDS improvements.
Detecting Generated Native Ads in Conversational Search
Authors: Authors: Sebastian Schmidt, Ines Zelch, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.04889
Pdf link: https://arxiv.org/pdf/2402.04889
Abstract Conversational search engines such as YouChat and Microsoft Copilot use large language models (LLMs) to generate answers to queries. It is only a small step to also use this technology to generate and integrate advertising within these answers - instead of placing ads separately from the organic search results. This type of advertising is reminiscent of native advertising and product placement, both of which are very effective forms of subtle and manipulative advertising. It is likely that information seekers will be confronted with such use of LLM technology in the near future, especially when considering the high computational costs associated with LLMs, for which providers need to develop sustainable business models. This paper investigates whether LLMs can also be used as a countermeasure against generated native ads, i.e., to block them. For this purpose we compile a large dataset of ad-prone queries and of generated answers with automatically integrated ads to experiment with fine-tuned sentence transformers and state-of-the-art LLMs on the task of recognizing the ads. In our experiments sentence transformers achieve detection precision and recall values above 0.9, while the investigated LLMs struggle with the task.
Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning
Authors: Authors: Apoorva Vashisth, Julius Rückin, Federico Magistri, Cyrill Stachniss, Marija Popović
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.04894
Pdf link: https://arxiv.org/pdf/2402.04894
Abstract Autonomous robots are often employed for data collection due to their efficiency and low labour costs. A key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations given platform-specific resource constraints, such as limited battery life. Adaptive online path planning in 3D environments is challenging due to the large set of valid actions and the presence of unknown occlusions. To address these issues, we propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments. A key aspect of our approach is a dynamically constructed graph that restricts planning actions local to the robot, allowing us to quickly react to newly discovered obstacles and targets of interest. For replanning, we propose a new reward function that balances between exploring the unknown environment and exploiting online-collected data about the targets of interest. Our experiments show that our method enables more efficient target detection compared to state-of-the-art learning and non-learning baselines. We also show the applicability of our approach for orchard monitoring using an unmanned aerial vehicle in a photorealistic simulator.
Text or Image? What is More Important in Cross-Domain Generalization Capabilities of Hate Meme Detection Models?
Authors: Authors: Piush Aggarwal, Jawar Mehrabanian, Weigang Huang, Özge Alacam, Torsten Zesch
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04967
Pdf link: https://arxiv.org/pdf/2402.04967
Abstract This paper delves into the formidable challenge of cross-domain generalization in multimodal hate meme detection, presenting compelling findings. We provide enough pieces of evidence supporting the hypothesis that only the textual component of hateful memes enables the existing multimodal classifier to generalize across different domains, while the image component proves highly sensitive to a specific training dataset. The evidence includes demonstrations showing that hate-text classifiers perform similarly to hate-meme classifiers in a zero-shot setting. Simultaneously, the introduction of captions generated from images of memes to the hate-meme classifier worsens performance by an average F1 of 0.02. Through blackbox explanations, we identify a substantial contribution of the text modality (average of 83%), which diminishes with the introduction of meme's image captions (52%). Additionally, our evaluation on a newly created confounder dataset reveals higher performance on text confounders as compared to image confounders with an average $\Delta$F1 of 0.18.
Scalable Algorithm for Finding Balanced Subgraphs with Tolerance in Signed Networks
Authors: Authors: Jingbang Chen, Qiuyang Mang, Hangrui Zhou, Richard Peng, Yu Gao, Chenhao Ma
Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2402.05006
Pdf link: https://arxiv.org/pdf/2402.05006
Abstract Signed networks, characterized by edges labeled as either positive or negative, offer nuanced insights into interaction dynamics beyond the capabilities of unsigned graphs. Central to this is the task of identifying the maximum balanced subgraph, crucial for applications like polarized community detection in social networks and portfolio analysis in finance. Traditional models, however, are limited by an assumption of perfect partitioning, which fails to mirror the complexities of real-world data. Addressing this gap, we introduce an innovative generalized balanced subgraph model that incorporates tolerance for irregularities. Our proposed region-based heuristic algorithm, tailored for this NP-hard problem, strikes a balance between low time complexity and high-quality outcomes. Comparative experiments validate its superior performance against leading solutions, delivering enhanced effectiveness (notably larger subgraph sizes) and efficiency (achieving up to 100x speedup) in both traditional and generalized contexts.
Information Theoretically Secure Encryption Key Generation over Wireless Networks by Exploiting Packet Errors
Authors: Authors: Amir K. Khandani
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.05012
Pdf link: https://arxiv.org/pdf/2402.05012
Abstract This article presents a novel method for establishing an information theoretically secure encryption key over wireless channels. It exploits the fact that data transmission over wireless links is accompanied by packet error, while noise terms, and thereby the error events observed by two separate receivers are independent of each other. A number of data packets, with random data, are transmitted from a first legitimate node, say Alice, to a second legitimate node, say Bob. Bob identifies all packets that are received error-free in the first transmission attempt and sends their indices to Alice over a public channel. Then, both Alice and Bob mix the contents of identified packets, e.g., using a hash function, and thereby derive an identical encryption key. Since error events from Alice to Bob is independent of error events from Alice to Eve, the chances that Eve has successfully received all packets used in key generation error-free diminishes as the number of packet increases. In many wireless standards, the first stage in error detection and Automatic Repeat Request (ARQ) is deployed at the PHY/MAC (Physical Layer/Medium Access Control) layer. In such setups, the first re-transmission is manged by the PHY/MAC layer without informing higher layers. This makes it impossible to directly access the information related to packet errors through high-level programming interfaces available to an end-user. A method is presented for determining packets received error-free in first transmission attempts through high-level programming. Examples are presented in conjunction with an LTE cellular network.
Community detection problem based on polarization measures:an application to Twitter: the COVID-19 case in Spain
Authors: Authors: Inmaculada Gutiérrez, Juan Antonio Guevara, Daniel Gómez, Javier Castro, Rosa Espínola
Subjects: Social and Information Networks (cs.SI); Statistics Theory (math.ST); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2402.05028
Pdf link: https://arxiv.org/pdf/2402.05028
Abstract In this paper, we address one of the most important topics in the field of Social Networks Analysis: the community detection problem with additional information. That additional information is modeled by a fuzzy measure that represents the risk of polarization. Particularly, we are interested in dealing with the problem of taking into account the polarization of nodes in the community detection problem. Adding this type of information to the community detection problem makes it more realistic, as a community is more likely to be defined if the corresponding elements are willing to maintain a peaceful dialogue. The polarization capacity is modeled by a fuzzy measure based on the JDJpol measure of polarization related to two poles. We also present an efficient algorithm for finding groups whose elements are no polarized. Hereafter, we work in a real case. It is a network obtained from Twitter, concerning the political position against the Spanish government taken by several influential users. We analyze how the partitions obtained change when some additional information related to how polarized that society is, is added to the problem.
Efficient Multi-Resolution Fusion for Remote Sensing Data with Label Uncertainty
Authors: Authors: Hersh Vakharia, Xiaoxiao Du
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.05045
Pdf link: https://arxiv.org/pdf/2402.05045
Abstract Multi-modal sensor data fusion takes advantage of complementary or reinforcing information from each sensor and can boost overall performance in applications such as scene classification and target detection. This paper presents a new method for fusing multi-modal and multi-resolution remote sensor data without requiring pixel-level training labels, which can be difficult to obtain. Previously, we developed a Multiple Instance Multi-Resolution Fusion (MIMRF) framework that addresses label uncertainty for fusion, but it can be slow to train due to the large search space for the fuzzy measures used to integrate sensor data sources. We propose a new method based on binary fuzzy measures, which reduces the search space and significantly improves the efficiency of the MIMRF framework. We present experimental results on synthetic data and a real-world remote sensing detection task and show that the proposed MIMRF-BFM algorithm can effectively and efficiently perform multi-resolution fusion given remote sensing data with uncertainty.
Keyword: face recognition

There is no result

Keyword: augmentation

Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources
Authors: Authors: Jinlong Li, Baolu Li, Xinyu Liu, Runsheng Xu, Jiaqi Ma, Hongkai Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.04273
Pdf link: https://arxiv.org/pdf/2402.04273
Abstract The diverse agents in multi-agent perception systems may be from different companies. Each company might use the identical classic neural network architecture based encoder for feature extraction. However, the data source to train the various agents is independent and private in each company, leading to the Distribution Gap of different private data for training distinct agents in multi-agent perception system. The data silos by the above Distribution Gap could result in a significant performance decline in multi-agent perception. In this paper, we thoroughly examine the impact of the distribution gap on existing multi-agent perception systems. To break the data silos, we introduce the Feature Distribution-aware Aggregation (FDA) framework for cross-domain learning to mitigate the above Distribution Gap in multi-agent perception. FDA comprises two key components: Learnable Feature Compensation Module and Distribution-aware Statistical Consistency Module, both aimed at enhancing intermediate features to minimize the distribution gap among multi-agent features. Intensive experiments on the public OPV2V and V2XSet datasets underscore FDA's effectiveness in point cloud-based 3D object detection, presenting it as an invaluable augmentation to existing multi-agent perception systems.
Detection Transformer for Teeth Detection, Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques
Authors: Authors: Hocine Kadi, Théo Sourget, Marzena Kawczynski, Sara Bendjama, Bruno Grollemund, Agnès Bloch-Zupan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.04408
Pdf link: https://arxiv.org/pdf/2402.04408
Abstract In this work, we focused on deep learning image processing in the context of oral rare diseases, which pose challenges due to limited data availability. A crucial step involves teeth detection, segmentation and numbering in panoramic radiographs. To this end, we used a dataset consisting of 156 panoramic radiographs from individuals with rare oral diseases and labeled by experts. We trained the Detection Transformer (DETR) neural network for teeth detection, segmentation, and numbering the 52 teeth classes. In addition, we used data augmentation techniques, including geometric transformations. Finally, we generated new panoramic images using inpainting techniques with stable diffusion, by removing teeth from a panoramic radiograph and integrating teeth into it. The results showed a mAP exceeding 0,69 for DETR without data augmentation. The mAP was improved to 0,82 when data augmentation techniques are used. Furthermore, we observed promising performances when using new panoramic radiographs generated with inpainting technique, with mAP of 0,76.
De-amplifying Bias from Differential Privacy in Language Model Fine-tuning
Authors: Authors: Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat, Anupam Datta, John C Mitchell
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2402.04489
Pdf link: https://arxiv.org/pdf/2402.04489
Abstract Fairness and privacy are two important values machine learning (ML) practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworthy ML pose a challenge to those wishing to address both. We show that DP amplifies gender, racial, and religious bias when fine-tuning large language models (LLMs), producing models more biased than ones fine-tuned without DP. We find the cause of the amplification to be a disparity in convergence of gradients across sub-groups. Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP. As a consequence, DP and CDA together can be used to fine-tune models while maintaining both fairness and privacy.
UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset
Authors: Authors: Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.04588
Pdf link: https://arxiv.org/pdf/2402.04588
Abstract Open-source large language models (LLMs) have gained significant strength across diverse fields. Nevertheless, the majority of studies primarily concentrate on English, with only limited exploration into the realm of multilingual supervised fine-tuning. In this work, we therefore construct an open-source multilingual supervised fine-tuning dataset. Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs. For language-specific abilities, we introduce a knowledge-grounded data augmentation approach to elicit more culture-specific knowledge of LLMs, improving their ability to serve users from different countries. For language-agnostic abilities, we find through experiments that modern LLMs exhibit strong cross-lingual transfer capabilities, thus repeatedly learning identical content in various languages is not necessary. Consequently, we can substantially prune the language-agnostic SFT data without any performance degradation, making the SFT process more efficient. The resulting UltraLink dataset comprises approximately 1 million samples across five languages, and the proposed data construction method can also be easily extended to other languages. UltraLink-LM, which is trained on UltraLink, outperforms several representative baselines across many tasks.
SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph
Authors: Authors: Julio C. Rangel, Tarcisio Mendes de Farias, Ana Claudia Sima, Norio Kobayashi
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Databases (cs.DB); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2402.04627
Pdf link: https://arxiv.org/pdf/2402.04627
Abstract The recent success of Large Language Models (LLM) in a wide range of Natural Language Processing applications opens the path towards novel Question Answering Systems over Knowledge Graphs leveraging LLMs. However, one of the main obstacles preventing their implementation is the scarcity of training data for the task of translating questions into corresponding SPARQL queries, particularly in the case of domain-specific KGs. To overcome this challenge, in this study, we evaluate several strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs. In particular, we propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph towards a larger dataset of semantically enriched question-to-SPARQL query pairs, enabling fine-tuning even for datasets where these pairs are scarce. In this context, we also investigate the role of semantic "clues" in the queries, such as meaningful variable names and inline comments. Finally, we evaluate our approach over the real-world Bgee gene expression knowledge graph and we show that semantic clues can improve model performance by up to 33% compared to a baseline with random variable names and no comments included.
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Authors: Authors: Natasha Butt, Blazej Manczak, Auke Wiggers, Corrado Rainone, David Zhang, Michaël Defferrard, Taco Cohen
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.04858
Pdf link: https://arxiv.org/pdf/2402.04858
Abstract Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines.
PAC Learnability under Explanation-Preserving Graph Perturbations
Authors: Authors: Xu Zheng, Farhad Shirani, Tianchun Wang, Shouwei Gao, Wenqian Dong, Wei Cheng, Dongsheng Luo
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.05039
Pdf link: https://arxiv.org/pdf/2402.05039
Abstract Graphical models capture relations between entities in a wide range of applications including social networks, biology, and natural language processing, among others. Graph neural networks (GNN) are neural models that operate over graphs, enabling the model to leverage the complex relationships and dependencies in graph-structured data. A graph explanation is a subgraph which is an `almost sufficient' statistic of the input graph with respect to its classification label. Consequently, the classification label is invariant, with high probability, to perturbations of graph edges not belonging to its explanation subgraph. This work considers two methods for leveraging such perturbation invariances in the design and training of GNNs. First, explanation-assisted learning rules are considered. It is shown that the sample complexity of explanation-assisted learning can be arbitrarily smaller than explanation-agnostic learning. Next, explanation-assisted data augmentation is considered, where the training set is enlarged by artificially producing new training samples via perturbation of the non-explanation edges in the original training set. It is shown that such data augmentation methods may improve performance if the augmented data is in-distribution, however, it may also lead to worse sample complexity compared to explanation-agnostic learning rules if the augmented data is out-of-distribution. Extensive empirical evaluations are provided to verify the theoretical analysis.
Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation
Authors: Authors: Dennis Hoftijzer, Gertjan Burghouts, Luuk Spreeuwers
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.05090
Pdf link: https://arxiv.org/pdf/2402.05090
Abstract Deep Reinforcement Learning (DRL) has shown great potential in enabling robots to find certain objects (e.g., `find a fridge') in environments like homes or schools. This task is known as Object-Goal Navigation (ObjectNav). DRL methods are predominantly trained and evaluated using environment simulators. Although DRL has shown impressive results, the simulators may be biased or limited. This creates a risk of shortcut learning, i.e., learning a policy tailored to specific visual details of training environments. We aim to deepen our understanding of shortcut learning in ObjectNav, its implications and propose a solution. We design an experiment for inserting a shortcut bias in the appearance of training environments. As a proof-of-concept, we associate room types to specific wall colors (e.g., bedrooms with green walls), and observe poor generalization of a state-of-the-art (SOTA) ObjectNav method to environments where this is not the case (e.g., bedrooms with blue walls). We find that shortcut learning is the root cause: the agent learns to navigate to target objects, by simply searching for the associated wall color of the target object's room. To solve this, we propose Language-Based (L-B) augmentation. Our key insight is that we can leverage the multimodal feature space of a Vision-Language Model (VLM) to augment visual representations directly at the feature-level, requiring no changes to the simulator, and only an addition of one layer to the model. Where the SOTA ObjectNav method's success rate drops 69%, our proposal has only a drop of 23%.

LeeKyungwook / get-arxiv-noti

New submissions for Thu, 8 Feb 24 #969

Keyword: detection

Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources

Road Surface Defect Detection -- From Image-based to Non-image-based: A Survey

3D printer-controlled syringe pumps for dual, active, regulable and simultaneous dispensing of reagents. Manufacturing of immunochromatographic test strips

Detection Transformer for Teeth Detection, Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques

Optimal Binary Signaling for a Two Sensor Gaussian MAC Network

BAdaCost: Multi-class Boosting with Costs

IoT Network Traffic Analysis with Deep Learning

M2fNet: Multi-modal Forest Monitoring Network on Large-scale Virtual Dataset

MuNES: Multifloor Navigation Including Elevators and Stairs

FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models

OIL-AD: An Anomaly Detection Framework for Sequential Decision Sequences

Ransomware Detection Dynamics: Insights and Implications

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors

G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection

Source Identification in Abstractive Summarization

Detection Schemes with Low-Resolution ADCs and Spatial Oversampling for Transmission with Higher-Order Constellations in the Terahertz Band

Review of Cetacean's click detection algorithms

Color Recognition in Challenging Lighting Environments: CNN Approach

Multiple bipolar fuzzy measures: an application to community detection problems for networks with additional information

How Realistic Is Your Synthetic Data? Constraining Deep Generative Models for Tabular Data

Advancing Anomaly Detection: An Adaptation Model and a New Dataset

STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation

Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration

Detecting Generated Native Ads in Conversational Search

Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning

Text or Image? What is More Important in Cross-Domain Generalization Capabilities of Hate Meme Detection Models?

Scalable Algorithm for Finding Balanced Subgraphs with Tolerance in Signed Networks

Information Theoretically Secure Encryption Key Generation over Wireless Networks by Exploiting Packet Errors

Community detection problem based on polarization measures:an application to Twitter: the COVID-19 case in Spain

Efficient Multi-Resolution Fusion for Remote Sensing Data with Label Uncertainty

Keyword: face recognition

Keyword: augmentation

Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources

Detection Transformer for Teeth Detection, Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset

SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

PAC Learnability under Explanation-Preserving Graph Perturbations

Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation