New submissions for Tue, 13 Feb 24

Keyword: detection

Transfer learning with generative models for object detection on limited datasets

Authors: Authors: Matteo Paiano, Stefano Martina, Carlotta Giannelli, Filippo Caruso
Subjects: Computer Vision and Pattern Recognition (cs.CV); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2402.06784
Pdf link: https://arxiv.org/pdf/2402.06784
Abstract The availability of data is limited in some fields, especially for object detection tasks, where it is necessary to have correctly labeled bounding boxes around each object. A notable example of such data scarcity is found in the domain of marine biology, where it is useful to develop methods to automatically detect submarine species for environmental monitoring. To address this data limitation, the state-of-the-art machine learning strategies employ two main approaches. The first involves pretraining models on existing datasets before generalizing to the specific domain of interest. The second strategy is to create synthetic datasets specifically tailored to the target domain using methods like copy-paste techniques or ad-hoc simulators. The first strategy often faces a significant domain shift, while the second demands custom solutions crafted for the specific task. In response to these challenges, here we propose a transfer learning framework that is valid for a generic scenario. In this framework, generated images help to improve the performances of an object detector in a few-real data regime. This is achieved through a diffusion-based generative model that was pretrained on large generic datasets, and is not trained on the task-specific domain. We validate our approach on object detection tasks, specifically focusing on fishes in an underwater environment, and on the more common domain of cars in an urban setting. Our method achieves detection performance comparable to models trained on thousands of images, using only a few hundreds of input data. Our results pave the way for new generative AI-based protocols for machine learning applications in various domains, for instance ranging from geophysics to biology and medicine.
Reasoning Grasping via Multimodal Large Language Model
Authors: Authors: Shiyu Jin, Jinxuan Xu, Yutian Lei, Liangjun Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.06798
Pdf link: https://arxiv.org/pdf/2402.06798
Abstract Despite significant progress in robotic systems for operation within human-centric environments, existing models still heavily rely on explicit human commands to identify and manipulate specific objects. This limits their effectiveness in environments where understanding and acting on implicit human intentions are crucial. In this study, we introduce a novel task: reasoning grasping, where robots need to generate grasp poses based on indirect verbal instructions or intentions. To accomplish this, we propose an end-to-end reasoning grasping model that integrates a multi-modal Large Language Model (LLM) with a vision-based robotic grasping framework. In addition, we present the first reasoning grasping benchmark dataset generated from the GraspNet-1 billion, incorporating implicit instructions for object-level and part-level grasping, and this dataset will soon be available for public access. Our results show that directly integrating CLIP or LLaVA with the grasp detection model performs poorly on the challenging reasoning grasping tasks, while our proposed model demonstrates significantly enhanced performance both in the reasoning grasping benchmark and real-world experiments.
Event-to-Video Conversion for Overhead Object Detection
Authors: Authors: Darryl Hannan, Ragib Arnab, Gavin Parpart, Garrett T. Kenyon, Edward Kim, Yijing Watkins
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.06805
Pdf link: https://arxiv.org/pdf/2402.06805
Abstract Collecting overhead imagery using an event camera is desirable due to the energy efficiency of the image sensor compared to standard cameras. However, event cameras complicate downstream image processing, especially for complex tasks such as object detection. In this paper, we investigate the viability of event streams for overhead object detection. We demonstrate that across a number of standard modeling approaches, there is a significant gap in performance between dense event representations and corresponding RGB frames. We establish that this gap is, in part, due to a lack of overlap between the event representations and the pre-training data used to initialize the weights of the object detectors. Then, we apply event-to-video conversion models that convert event streams into gray-scale video to close this gap. We demonstrate that this approach results in a large performance increase, outperforming even event-specific object detection techniques on our overhead target task. These results suggest that better alignment between event representations and existing large pre-trained models may result in greater short-term performance gains compared to end-to-end event-specific architectural improvements.
Neural Rendering based Urban Scene Reconstruction for Autonomous Driving
Authors: Authors: Shihao Shen, Louis Kerofsky, Varun Ravi Kumar, Senthil Yogamani
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.06826
Pdf link: https://arxiv.org/pdf/2402.06826
Abstract Dense 3D reconstruction has many applications in automated driving including automated annotation validation, multimodal data augmentation, providing ground truth annotations for systems lacking LiDAR, as well as enhancing auto-labeling accuracy. LiDAR provides highly accurate but sparse depth, whereas camera images enable estimation of dense depth but noisy particularly at long ranges. In this paper, we harness the strengths of both sensors and propose a multimodal 3D scene reconstruction using a framework combining neural implicit surfaces and radiance fields. In particular, our method estimates dense and accurate 3D structures and creates an implicit map representation based on signed distance fields, which can be further rendered into RGB images, and depth maps. A mesh can be extracted from the learned signed distance field and culled based on occlusion. Dynamic objects are efficiently filtered on the fly during sampling using 3D object detection models. We demonstrate qualitative and quantitative results on challenging automotive scenes.
Benchmarking Frameworks and Comparative Studies of Controller Area Network (CAN) Intrusion Detection Systems: A Review
Authors: Authors: Shaila Sharmin, Hafizah Mansor, Andi Fitriah Abdul Kadir, Normaziah A. Aziz
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.06904
Pdf link: https://arxiv.org/pdf/2402.06904
Abstract The development of intrusion detection systems (IDS) for the in-vehicle Controller Area Network (CAN) bus is one of the main efforts being taken to secure the in-vehicle network against various cyberattacks, which have the potential to cause vehicles to malfunction and result in dangerous accidents. These CAN IDS are evaluated in disparate experimental conditions that vary in terms of the workload used, the features used, the metrics reported, etc., which makes direct comparison difficult. Therefore, there have been several benchmarking frameworks and comparative studies designed to evaluate CAN IDS in similar experimental conditions to understand their relative performance and facilitate the selection of the best CAN IDS for implementation in automotive networks. This work provides a comprehensive survey of CAN IDS benchmarking frameworks and comparative studies in the current literature. A CAN IDS evaluation design space is also proposed in this work, which draws from the wider CAN IDS literature. This is not only expected to serve as a guide for designing CAN IDS evaluation experiments but is also used for categorizing current benchmarking efforts. The surveyed works have been discussed on the basis of the five aspects in the design space-namely IDS type, attack model, evaluation type, workload generation, and evaluation metrics-and recommendations for future work have been identified.
Assessing Uncertainty Estimation Methods for 3D Image Segmentation under Distribution Shifts
Authors: Authors: Masoumeh Javanbakhat, Md Tasnimul Hasan, Cristoph Lippert
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.06937
Pdf link: https://arxiv.org/pdf/2402.06937
Abstract In recent years, machine learning has witnessed extensive adoption across various sectors, yet its application in medical image-based disease detection and diagnosis remains challenging due to distribution shifts in real-world data. In practical settings, deployed models encounter samples that differ significantly from the training dataset, especially in the health domain, leading to potential performance issues. This limitation hinders the expressiveness and reliability of deep learning models in health applications. Thus, it becomes crucial to identify methods capable of producing reliable uncertainty estimation in the context of distribution shifts in the health sector. In this paper, we explore the feasibility of using cutting-edge Bayesian and non-Bayesian methods to detect distributionally shifted samples, aiming to achieve reliable and trustworthy diagnostic predictions in segmentation task. Specifically, we compare three distinct uncertainty estimation methods, each designed to capture either unimodal or multimodal aspects in the posterior distribution. Our findings demonstrate that methods capable of addressing multimodal characteristics in the posterior distribution, offer more dependable uncertainty estimates. This research contributes to enhancing the utility of deep learning in healthcare, making diagnostic predictions more robust and trustworthy.
Semantic Object-level Modeling for Robust Visual Camera Relocalization
Authors: Authors: Yifan Zhu, Lingjuan Miao, Haitao Wu, Zhiqiang Zhou, Weiyi Chen, Longwen Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.06951
Pdf link: https://arxiv.org/pdf/2402.06951
Abstract Visual relocalization is crucial for autonomous visual localization and navigation of mobile robotics. Due to the improvement of CNN-based object detection algorithm, the robustness of visual relocalization is greatly enhanced especially in viewpoints where classical methods fail. However, ellipsoids (quadrics) generated by axis-aligned object detection may limit the accuracy of the object-level representation and degenerate the performance of visual relocalization system. In this paper, we propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects. As for visual relocalization, we design a better pose optimization strategy for camera pose recovery, to fully utilize the projection characteristics of 2D fitted ellipses and the 3D accurate ellipsoids. All of these modules are entirely intergrated into visual SLAM system. Experimental results show that our semantic object-level mapping and object-based visual relocalization methods significantly enhance the performance of visual relocalization in terms of robustness to new viewpoints.
Architectural Neural Backdoors from First Principles
Authors: Authors: Harry Langford, Ilia Shumailov, Yiren Zhao, Robert Mullins, Nicolas Papernot
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06957
Pdf link: https://arxiv.org/pdf/2402.06957
Abstract While previous research backdoored neural networks by changing their parameters, recent work uncovered a more insidious threat: backdoors embedded within the definition of the network's architecture. This involves injecting common architectural components, such as activation functions and pooling layers, to subtly introduce a backdoor behavior that persists even after (full re-)training. However, the full scope and implications of architectural backdoors have remained largely unexplored. Bober-Irizar et al. [2023] introduced the first architectural backdoor; they showed how to create a backdoor for a checkerboard pattern, but never explained how to target an arbitrary trigger pattern of choice. In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision. This leads us to revisit the concept of architecture backdoors and taxonomise them, describing 12 distinct types. To gauge the difficulty of detecting such backdoors, we conducted a user study, revealing that ML developers can only identify suspicious components in common model definitions as backdoors in 37% of cases, while they surprisingly preferred backdoored models in 33% of cases. To contextualize these results, we find that language models outperform humans at the detection of backdoors. Finally, we discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
A Change Detection Reality Check
Authors: Authors: Isaac Corley, Caleb Robinson, Anthony Ortiz
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06994
Pdf link: https://arxiv.org/pdf/2402.06994
Abstract In recent years, there has been an explosion of proposed change detection deep learning architectures in the remote sensing literature. These approaches claim to offer state-of the-art performance on different standard benchmark datasets. However, has the field truly made significant progress? In this paper we perform experiments which conclude a simple U-Net segmentation baseline without training tricks or complicated architectural changes is still a top performer for the task of change detection.
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
Authors: Authors: Ankit Pal, Malaikannan Sankarasubbu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07023
Pdf link: https://arxiv.org/pdf/2402.07023
Abstract Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous evaluation. For this purpose, we comprehensively evaluated both open-source LLMs and Google's new multimodal LLM called Gemini across Medical reasoning, hallucination detection, and Medical Visual Question Answering tasks. While Gemini showed competence, it lagged behind state-of-the-art models like MedPaLM 2 and GPT-4 in diagnostic accuracy. Additionally, Gemini achieved an accuracy of 61.45\% on the medical VQA dataset, significantly lower than GPT-4V's score of 88\%. Our analysis revealed that Gemini is highly susceptible to hallucinations, overconfidence, and knowledge gaps, which indicate risks if deployed uncritically. We also performed a detailed analysis by medical subject and test type, providing actionable feedback for developers and clinicians. To mitigate risks, we applied prompting strategies that improved performance. Additionally, we facilitated future research and development by releasing a Python module for medical LLM evaluation and establishing a dedicated leaderboard on Hugging Face for medical domain LLMs. Python module can be found at https://github.com/promptslab/RosettaEval
Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance
Authors: Authors: Raza Imam, Muhammad Huzaifa, Nabil Mansour, Shaher Bano Mirza, Fouad Lamghari
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.07059
Pdf link: https://arxiv.org/pdf/2402.07059
Abstract In this study, we propose an automated framework for camel farm monitoring, introducing two key contributions: the Unified Auto-Annotation framework and the Fine-Tune Distillation framework. The Unified Auto-Annotation approach combines two models, GroundingDINO (GD), and Segment-Anything-Model (SAM), to automatically annotate raw datasets extracted from surveillance videos. Building upon this foundation, the Fine-Tune Distillation framework conducts fine-tuning of student models using the auto-annotated dataset. This process involves transferring knowledge from a large teacher model to a student model, resembling a variant of Knowledge Distillation. The Fine-Tune Distillation framework aims to be adaptable to specific use cases, enabling the transfer of knowledge from the large models to the small models, making it suitable for domain-specific applications. By leveraging our raw dataset collected from Al-Marmoom Camel Farm in Dubai, UAE, and a pre-trained teacher model, GroundingDINO, the Fine-Tune Distillation framework produces a lightweight deployable model, YOLOv8. This framework demonstrates high performance and computational efficiency, facilitating efficient real-time object detection. Our code is available at \href{https://github.com/Razaimam45/Fine-Tune-Distillation}{https://github.com/Razaimam45/Fine-Tune-Distillation}
Explainable Global Wildfire Prediction Models using Graph Neural Networks
Authors: Authors: Dayou Chen, Sibo Cheng, Jinwei Hu, Matthew Kasoar, Rossella Arcucci
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.07152
Pdf link: https://arxiv.org/pdf/2402.07152
Abstract Wildfire prediction has become increasingly crucial due to the escalating impacts of climate change. Traditional CNN-based wildfire prediction models struggle with handling missing oceanic data and addressing the long-range dependencies across distant regions in meteorological data. In this paper, we introduce an innovative Graph Neural Network (GNN)-based model for global wildfire prediction. We propose a hybrid model that combines the spatial prowess of Graph Convolutional Networks (GCNs) with the temporal depth of Long Short-Term Memory (LSTM) networks. Our approach uniquely transforms global climate and wildfire data into a graph representation, addressing challenges such as null oceanic data locations and long-range dependencies inherent in traditional models. Benchmarking against established architectures using an unseen ensemble of JULES-INFERNO simulations, our model demonstrates superior predictive accuracy. Furthermore, we emphasise the model's explainability, unveiling potential wildfire correlation clusters through community detection and elucidating feature importance via Integrated Gradient analysis. Our findings not only advance the methodological domain of wildfire prediction but also underscore the importance of model transparency, offering valuable insights for stakeholders in wildfire management.
On (Mis)perceptions of testing effectiveness: an empirical study
Authors: Authors: Sira Vegas, Patricia Riofrio, Esperanza Marcos, Natalia Juristo
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.07222
Pdf link: https://arxiv.org/pdf/2402.07222
Abstract A recurring problem in software development is incorrect decision making on the techniques, methods and tools to be used. Mostly, these decisions are based on developers' perceptions about them. A factor influencing people's perceptions is past experience, but it is not the only one. In this research, we aim to discover how well the perceptions of the defect detection effectiveness of different techniques match their real effectiveness in the absence of prior experience. To do this, we conduct an empirical study plus a replication. During the original study, we conduct a controlled experiment with students applying two testing techniques and a code review technique. At the end of the experiment, they take a survey to find out which technique they perceive to be most effective. The results show that participants' perceptions are wrong and that this mismatch is costly in terms of quality. In order to gain further insight into the results, we replicate the controlled experiment and extend the survey to include questions about participants' opinions on the techniques and programs. The results of the replicated study confirm the findings of the original study and suggest that participants' perceptions might be based not on their opinions about complexity or preferences for techniques but on how well they think that they have applied the techniques.
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study
Authors: Authors: Santonu Sarkar, Shanay Mehta, Nicole Fernandes, Jyotirmoy Sarkar, Snehanshu Saha
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07281
Pdf link: https://arxiv.org/pdf/2402.07281
Abstract Detection of anomalous situations for complex mission-critical systems holds paramount importance when their service continuity needs to be ensured. A major challenge in detecting anomalies from the operational data arises due to the imbalanced class distribution problem since the anomalies are supposed to be rare events. This paper evaluates a diverse array of machine learning-based anomaly detection algorithms through a comprehensive benchmark study. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms, spanning classical machine learning including various tree-based approaches to deep learning and outlier detection methods. The inclusion of 104 publicly available and a few proprietary industrial systems datasets enhances the diversity of the study, allowing for a more realistic evaluation of algorithm performance and emphasizing the importance of adaptability to real-world scenarios. The paper dispels the deep learning myth, demonstrating that though powerful, deep learning is not a universal solution in this case. We observed that recently proposed tree-based evolutionary algorithms outperform in many scenarios. We noticed that tree-based approaches catch a singleton anomaly in a dataset where deep learning methods fail. On the other hand, classical SVM performs the best on datasets with more than 10% anomalies, implying that such scenarios can be best modeled as a classification problem rather than anomaly detection. To our knowledge, such a study on a large number of state-of-the-art algorithms using diverse data sets, with the objective of guiding researchers and practitioners in making informed algorithmic choices, has not been attempted earlier.
Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets
Authors: Authors: Ross Greer, Mohan Trivedi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07320
Pdf link: https://arxiv.org/pdf/2402.07320
Abstract This research explores the integration of language embeddings for active learning in autonomous driving datasets, with a focus on novelty detection. Novelty arises from unexpected scenarios that autonomous vehicles struggle to navigate, necessitating higher-level reasoning abilities. Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning. The research presents a clustering experiment using Contrastive Language-Image Pretrained (CLIP) embeddings to organize datasets and detect novelties. We find that the proposed algorithm effectively isolates novel scenes from a collection of subsets derived from two real-world driving datasets, one vehicle-mounted and one infrastructure-mounted. From the generated clusters, we further present methods for generating textual explanations of elements which differentiate scenes classified as novel from other scenes in the data pool, presenting qualitative examples from the clustered results. Our results demonstrate the effectiveness of language-driven embeddings in identifying novel elements and generating explanations of data, and we further discuss potential applications in safe takeovers, data curation, and multi-task active learning.
Exploring Saliency Bias in Manipulation Detection
Authors: Authors: Joshua Krinsky, Alan Bettis, Qiuyu Tang, Daniel Moreira, Aparna Bharati
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.07338
Pdf link: https://arxiv.org/pdf/2402.07338
Abstract The social media-fuelled explosion of fake news and misinformation supported by tampered images has led to growth in the development of models and datasets for image manipulation detection. However, existing detection methods mostly treat media objects in isolation, without considering the impact of specific manipulations on viewer perception. Forensic datasets are usually analyzed based on the manipulation operations and corresponding pixel-based masks, but not on the semantics of the manipulation, i.e., type of scene, objects, and viewers' attention to scene content. The semantics of the manipulation play an important role in spreading misinformation through manipulated images. In an attempt to encourage further development of semantic-aware forensic approaches to understand visual misinformation, we propose a framework to analyze the trends of visual and semantic saliency in popular image manipulation datasets and their impact on detection.
Leveraging AI to Advance Science and Computing Education across Africa: Progress, Challenges, and Opportunities
Authors: Authors: George Boateng
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2402.07397
Pdf link: https://arxiv.org/pdf/2402.07397
Abstract Across the African continent, students grapple with various educational challenges, including limited access to essential resources such as computers, internet connectivity, reliable electricity, and a shortage of qualified teachers. Despite these challenges, recent advances in AI such as BERT, and GPT-4 have demonstrated their potential for advancing education. Yet, these AI tools tend to be deployed and evaluated predominantly within the context of Western educational settings, with limited attention directed towards the unique needs and challenges faced by students in Africa. In this book chapter, we describe our works developing and deploying AI in Education tools in Africa: (1) SuaCode, an AI-powered app that enables Africans to learn to code using their smartphones, (2) AutoGrad, an automated grading, and feedback tool for graphical and interactive coding assignments, (3) a tool for code plagiarism detection that shows visual evidence of plagiarism, (4) Kwame, a bilingual AI teaching assistant for coding courses, (5) Kwame for Science, a web-based AI teaching assistant that provides instant answers to students' science questions and (6) Brilla AI, an AI contestant for the National Science and Maths Quiz competition. We discuss challenges and potential opportunities to use AI to advance science and computing education across Africa.
Large Language Models are Few-shot Generators: Proposing Hybrid Prompt Algorithm To Generate Webshell Escape Samples
Authors: Authors: Mingrui Ma, Lansheng Han, Chunjie Zhou
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.07408
Pdf link: https://arxiv.org/pdf/2402.07408
Abstract The frequent occurrence of cyber-attacks has made webshell attacks and defense gradually become a research hotspot in the field of network security. However, the lack of publicly available benchmark datasets and the over-reliance on manually defined rules for webshell escape sample generation have slowed down the progress of research related to webshell escape sample generation strategies and artificial intelligence-based webshell detection algorithms. To address the drawbacks of weak webshell sample escape capabilities, the lack of webshell datasets with complex malicious features, and to promote the development of webshell detection technology, we propose the Hybrid Prompt algorithm for webshell escape sample generation with the help of large language models. As a prompt algorithm specifically developed for webshell sample generation, the Hybrid Prompt algorithm not only combines various prompt ideas including Chain of Thought, Tree of Thought, but also incorporates various components such as webshell hierarchical module and few-shot example to facilitate the LLM in learning and reasoning webshell escape strategies. Experimental results show that the Hybrid Prompt algorithm can work with multiple LLMs with excellent code reasoning ability to generate high-quality webshell samples with high Escape Rate (88.61% with GPT-4 model on VIRUSTOTAL detection engine) and Survival Rate (54.98% with GPT-4 model).
Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems
Authors: Authors: Justin Davis, Mehmet E. Belviranli
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.07415
Pdf link: https://arxiv.org/pdf/2402.07415
Abstract In recent years, deep neural networks (DNNs) have gained widespread adoption for continuous mobile object detection (OD) tasks, particularly in autonomous systems. However, a prevalent issue in their deployment is the one-size-fits-all approach, where a single DNN is used, resulting in inefficient utilization of computational resources. This inefficiency is particularly detrimental in energy-constrained systems, as it degrades overall system efficiency. We identify that, the contextual information embedded in the input data stream (e.g. the frames in the camera feed that the OD models are run on) could be exploited to allow a more efficient multi-model-based OD process. In this paper, we propose SHIFT which continuously selects from a variety of DNN-based OD models depending on the dynamically changing contextual information and computational constraints. During this selection, SHIFT uniquely considers multi-accelerator execution to better optimize the energy-efficiency while satisfying the latency constraints. Our proposed methodology results in improvements of up to 7.5x in energy usage and 2.8x in latency compared to state-of-the-art GPU-based single model OD approaches.
Malicious Package Detection using Metadata Information
Authors: Authors: S. Halder, M. Bewong, A. Mahboubi, Y. Jiang, R. Islam, Z. Islam, R. Ip, E. Ahmed, G. Ramachandran, A. Babar
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2402.07444
Pdf link: https://arxiv.org/pdf/2402.07444
Abstract Protecting software supply chains from malicious packages is paramount in the evolving landscape of software development. Attacks on the software supply chain involve attackers injecting harmful software into commonly used packages or libraries in a software repository. For instance, JavaScript uses Node Package Manager (NPM), and Python uses Python Package Index (PyPi) as their respective package repositories. In the past, NPM has had vulnerabilities such as the event-stream incident, where a malicious package was introduced into a popular NPM package, potentially impacting a wide range of projects. As the integration of third-party packages becomes increasingly ubiquitous in modern software development, accelerating the creation and deployment of applications, the need for a robust detection mechanism has become critical. On the other hand, due to the sheer volume of new packages being released daily, the task of identifying malicious packages presents a significant challenge. To address this issue, in this paper, we introduce a metadata-based malicious package detection model, MeMPtec. This model extracts a set of features from package metadata information. These extracted features are classified as either easy-to-manipulate (ETM) or difficult-to-manipulate (DTM) features based on monotonicity and restricted control properties. By utilising these metadata features, not only do we improve the effectiveness of detecting malicious packages, but also we demonstrate its resistance to adversarial attacks in comparison with existing state-of-the-art. Our experiments indicate a significant reduction in both false positives (up to 97.56%) and false negatives (up to 91.86%).
TriAug: Out-of-Distribution Detection for Robust Classification of Imbalanced Breast Lesion in Ultrasound
Authors: Authors: Yinyu Ye, Shijing Chen, Dong Ni, Ruobing Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07452
Pdf link: https://arxiv.org/pdf/2402.07452
Abstract Different diseases, such as histological subtypes of breast lesions, have severely varying incidence rates. Even trained with substantial amount of in-distribution (ID) data, models often encounter out-of-distribution (OOD) samples belonging to unseen classes in clinical reality. To address this, we propose a novel framework built upon a long-tailed OOD detection task for breast ultrasound images. It is equipped with a triplet state augmentation (TriAug) which improves ID classification accuracy while maintaining a promising OOD detection performance. Meanwhile, we designed a balanced sphere loss to handle the class imbalanced problem.
ClusterTabNet: Supervised clustering method for table detection and table structure recognition
Authors: Authors: Marek Polewczyk, Marco Spinaci
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.07502
Pdf link: https://arxiv.org/pdf/2402.07502
Abstract We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output. We interpret table structure bottom-up as a graph of relations between pairs of words (belonging to the same row, column, header, as well as to the same table) and use a transformer encoder model to predict its adjacency matrix. We demonstrate the performance of our method on the PubTables-1M dataset as well as PubTabNet and FinTabNet datasets. Compared to the current state-of-the-art detection methods such as DETR and Faster R-CNN, our method achieves similar or better accuracy, while requiring a significantly smaller model.
Using Ensemble Inference to Improve Recall of Clone Detection
Authors: Authors: Gul Aftab Ahmed, James Vincent Patten, Yuanhua Han, Guoxian Lu, David Gregg, Jim Buckley, Muslim Chochlov
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.07523
Pdf link: https://arxiv.org/pdf/2402.07523
Abstract Large-scale source-code clone detection is a challenging task. In our previous work, we proposed an approach (SSCD) that leverages artificial neural networks and approximates nearest neighbour search to effectively and efficiently locate clones in large-scale bodies of code, in a time-efficient manner. However, our literature review suggests that the relative efficacy of differing neural network models has not been assessed in the context of large-scale clone detection approaches. In this work, we aim to assess several such models individually, in terms of their potential to maximize recall, while preserving a high level of precision during clone detection. We investigate if ensemble inference (in this case, using the results of more than one of these neural network models in combination) can further assist in this task. To assess this, we employed four state-of-the-art neural network models and evaluated them individually/in combination. The results, on an illustrative dataset of approximately 500K lines of C/C++ code, suggest that ensemble inference outperforms individual models in all trialled cases, when recall is concerned. Of individual models, the ADA model (belonging to the ChatGPT family of models) has the best performance. However commercial companies may not be prepared to hand their proprietary source code over to the cloud, as required by that approach. Consequently, they may be more interested in an ensemble-combination of CodeBERT-based and CodeT5 models, resulting in similar (if slightly lesser) recall and precision results.
BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection
Authors: Authors: Kang Zhang, Osamu Yoshie, Weiran Huang
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.07536
Pdf link: https://arxiv.org/pdf/2402.07536
Abstract Trading range breakout (TRB) is a key method in the technical analysis of financial trading, widely employed by traders in financial markets such as stocks, futures, and foreign exchange. However, distinguishing between true and false breakout and providing the correct rationale cause significant challenges to investors. Recently, large language models have achieved success in various downstream applications, but their effectiveness in the domain of financial breakout detection has been subpar. The reason is that the unique data and specific knowledge are required in breakout detection. To address these issues, we introduce BreakGPT, the first large language model for financial breakout detection. Furthermore, we have developed a novel framework for large language models, namely multi-stage structure, effectively reducing mistakes in downstream applications. Experimental results indicate that compared to GPT-3.5, BreakGPT improves the accuracy of answers and rational by 44%, with the multi-stage structure contributing 17.6% to the improvement. Additionally, it outperforms ChatGPT-4 by 42.07%. Our Code is publicly available: https://github.com/Neviim96/BreakGPT
ASAP-Repair: API-Specific Automated Program Repair Based on API Usage Graphs
Authors: Authors: Sebastian Nielebock, Paul Blockhaus, Jacob Krüger, Frank Ortmeier
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2402.07542
Pdf link: https://arxiv.org/pdf/2402.07542
Abstract Modern software development relies on the reuse of code via Application Programming Interfaces (APIs). Such reuse relieves developers from learning and developing established algorithms and data structures anew, enabling them to focus on their problem at hand. However, there is also the risk of misusing an API due to a lack of understanding or proper documentation. While many techniques target API misuse detection, only limited efforts have been put into automatically repairing API misuses. In this paper, we present our advances on our technique API-Specific Automated Program Repair (ASAP-Repair). ASAP-Repair is intended to fix API misuses based on API Usage Graphs (AUGs) by leveraging API usage templates of state-of-the-art API misuse detectors. We demonstrate that ASAP-Repair is in principle applicable on an established API misuse dataset. Moreover, we discuss next steps and challenges to evolve ASAP-Repair towards a full-fledged Automatic Program Repair (APR) technique.
Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning
Authors: Authors: Teresa Salazar, João Gama, Helder Araújo, Pedro Henriques Abreu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07586
Pdf link: https://arxiv.org/pdf/2402.07586
Abstract In the evolving field of machine learning, ensuring fairness has become a critical concern, prompting the development of algorithms designed to mitigate discriminatory outcomes in decision-making processes. However, achieving fairness in the presence of group-specific concept drift remains an unexplored frontier, and our research represents pioneering efforts in this regard. Group-specific concept drift refers to situations where one group experiences concept drift over time while another does not, leading to a decrease in fairness even if accuracy remains fairly stable. Within the framework of federated learning, where clients collaboratively train models, its distributed nature further amplifies these challenges since each client can experience group-specific concept drift independently while still sharing the same underlying concept, creating a complex and dynamic environment for maintaining fairness. One of the significant contributions of our research is the formalization and introduction of the problem of group-specific concept drift and its distributed counterpart, shedding light on its critical importance in the realm of fairness. In addition, leveraging insights from prior research, we adapt an existing distributed concept drift adaptation algorithm to tackle group-specific distributed concept drift which utilizes a multi-model approach, a local group-specific drift detection mechanism, and continuous clustering of models over time. The findings from our experiments highlight the importance of addressing group-specific concept drift and its distributed counterpart to advance fairness in machine learning.
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Authors: Authors: Rui Song, Chenwei Liang, Hu Cao, Zhiran Yan, Walter Zimmer, Markus Gross, Andreas Festag, Alois Knoll
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.07635
Pdf link: https://arxiv.org/pdf/2402.07635
Abstract Collaborative perception in automated vehicles leverages the exchange of information between agents, aiming to elevate perception results. Previous camera-based collaborative 3D perception methods typically employ 3D bounding boxes or bird's eye views as representations of the environment. However, these approaches fall short in offering a comprehensive 3D environmental prediction. To bridge this gap, we introduce the first method for collaborative 3D semantic occupancy prediction. Particularly, it improves local 3D semantic occupancy predictions by hybrid fusion of (i) semantic and occupancy task features, and (ii) compressed orthogonal attention features shared between vehicles. Additionally, due to the lack of a collaborative perception dataset designed for semantic occupancy prediction, we augment a current collaborative perception dataset to include 3D collaborative semantic occupancy labels for a more robust evaluation. The experimental findings highlight that: (i) our collaborative semantic occupancy predictions excel above the results from single vehicles by over 30%, and (ii) models anchored on semantic occupancy outpace state-of-the-art collaborative 3D detection techniques in subsequent perception applications, showcasing enhanced accuracy and enriched semantic-awareness in road environments.
A Flow-based Credibility Metric for Safety-critical Pedestrian Detection
Authors: Authors: Maria Lyssenko, Christoph Gladisch, Christian Heinzemann, Matthias Woehrle, Rudolph Triebel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07642
Pdf link: https://arxiv.org/pdf/2402.07642
Abstract Safety is of utmost importance for perception in automated driving (AD). However, a prime safety concern in state-of-the art object detection is that standard evaluation schemes utilize safety-agnostic metrics to argue sufficient detection performance. Hence, it is imperative to leverage supplementary domain knowledge to accentuate safety-critical misdetections during evaluation tasks. To tackle the underspecification, this paper introduces a novel credibility metric, called c-flow, for pedestrian bounding boxes. To this end, c-flow relies on a complementary optical flow signal from image sequences and enhances the analyses of safety-critical misdetections without requiring additional labels. We implement and evaluate c-flow with a state-of-the-art pedestrian detector on a large AD dataset. Our analysis demonstrates that c-flow allows developers to identify safety-critical misdetections.
AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer
Authors: Authors: Tanmoy Dam, Sanjay Bhargav Dharavath, Sameer Alam, Nimrod Lilith, Supriyo Chakraborty, Mir Feroskhan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.07680
Pdf link: https://arxiv.org/pdf/2402.07680
Abstract Combining LiDAR and camera data has shown potential in enhancing short-distance object detection in autonomous driving systems. Yet, the fusion encounters difficulties with extended distance detection due to the contrast between LiDAR's sparse data and the dense resolution of cameras. Besides, discrepancies in the two data representations further complicate fusion methods. We introduce AYDIV, a novel framework integrating a tri-phase alignment process specifically designed to enhance long-distance detection even amidst data discrepancies. AYDIV consists of the Global Contextual Fusion Alignment Transformer (GCFAT), which improves the extraction of camera features and provides a deeper understanding of large-scale patterns; the Sparse Fused Feature Attention (SFFA), which fine-tunes the fusion of LiDAR and camera details; and the Volumetric Grid Attention (VGA) for a comprehensive spatial data fusion. AYDIV's performance on the Waymo Open Dataset (WOD) with an improvement of 1.24% in mAPH value(L2 difficulty) and the Argoverse2 Dataset with a performance improvement of 7.40% in AP value demonstrates its efficacy in comparison to other existing fusion-based methods. Our code is publicly available at https://github.com/sanjay-810/AYDIV2
Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision
Authors: Authors: Georg K.J. Fischer, Max Bergau, D. Adriana Gómez-Rosal, Andreas Wachaja, Johannes Gräter, Matthias Odenweller, Uwe Piechottka, Fabian Hoeflinger, Nikhil Gosala, Niklas Wetzel, Daniel Büscher, Abhinav Valada, Wolfram Burgard
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.07691
Pdf link: https://arxiv.org/pdf/2402.07691
Abstract Automated and autonomous industrial inspection is a longstanding research field, driven by the necessity to enhance safety and efficiency within industrial settings. In addressing this need, we introduce an autonomously navigating robotic system designed for comprehensive plant inspection. This innovative system comprises a robotic platform equipped with a diverse array of sensors integrated to facilitate the detection of various process and infrastructure parameters. These sensors encompass optical (LiDAR, Stereo, UV/IR/RGB cameras), olfactory (electronic nose), and acoustic (microphone array) capabilities, enabling the identification of factors such as methane leaks, flow rates, and infrastructural anomalies. The proposed system underwent individual evaluation at a wastewater treatment site within a chemical plant, providing a practical and challenging environment for testing. The evaluation process encompassed key aspects such as object detection, 3D localization, and path planning. Furthermore, specific evaluations were conducted for optical methane leak detection and localization, as well as acoustic assessments focusing on pump equipment and gas leak localization.
TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection
Authors: Authors: Hui Liu, Wenya Wang, Haoru Li, Haoliang Li
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2402.07776
Pdf link: https://arxiv.org/pdf/2402.07776
Abstract The proliferation of fake news has emerged as a severe societal problem, raising significant interest from industry and academia. While existing deep-learning based methods have made progress in detecting fake news accurately, their reliability may be compromised caused by the non-transparent reasoning processes, poor generalization abilities and inherent risks of integration with large language models (LLMs). To address this challenge, we propose {\methodname}, a novel framework for trustworthy fake news detection that prioritizes explainability, generalizability and controllability of models. This is achieved via a dual-system framework that integrates cognition and decision systems, adhering to the principles above. The cognition system harnesses human expertise to generate logical predicates, which guide LLMs in generating human-readable logic atoms. Meanwhile, the decision system deduces generalizable logic rules to aggregate these atoms, enabling the identification of the truthfulness of the input news across diverse domains and enhancing transparency in the decision-making process. Finally, we present comprehensive evaluation results on four datasets, demonstrating the feasibility and trustworthiness of our proposed framework. Our implementation is available at \url{https://github.com/less-and-less-bugs/Trust_TELLER}.
PBADet: A One-Stage Anchor-Free Approach for Part-Body Association
Authors: Authors: Zhongpai Gao, Huayi Zhou, Abhishek Sharma, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.07814
Pdf link: https://arxiv.org/pdf/2402.07814
Abstract The detection of human parts (e.g., hands, face) and their correct association with individuals is an essential task, e.g., for ubiquitous human-machine interfaces and action recognition. Traditional methods often employ multi-stage processes, rely on cumbersome anchor-based systems, or do not scale well to larger part sets. This paper presents PBADet, a novel one-stage, anchor-free approach for part-body association detection. Building upon the anchor-free object representation across multi-scale feature maps, we introduce a singular part-to-body center offset that effectively encapsulates the relationship between parts and their parent bodies. Our design is inherently versatile and capable of managing multiple parts-to-body associations without compromising on detection accuracy or robustness. Comprehensive experiments on various datasets underscore the efficacy of our approach, which not only outperforms existing state-of-the-art techniques but also offers a more streamlined and efficient solution to the part-body association challenge.
On the Detection of Reviewer-Author Collusion Rings From Paper Bidding
Authors: Authors: Steven Jecmen, Nihar B. Shah, Fei Fang, Leman Akoglu
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2402.07860
Pdf link: https://arxiv.org/pdf/2402.07860
Abstract A major threat to the peer-review systems of computer science conferences is the existence of "collusion rings" between reviewers. In such collusion rings, reviewers who have also submitted their own papers to the conference work together to manipulate the conference's paper assignment, with the aim of being assigned to review each other's papers. The most straightforward way that colluding reviewers can manipulate the paper assignment is by indicating their interest in each other's papers through strategic paper bidding. One potential approach to solve this important problem would be to detect the colluding reviewers from their manipulated bids, after which the conference can take appropriate action. While prior work has has developed effective techniques to detect other kinds of fraud, no research has yet established that detecting collusion rings is even possible. In this work, we tackle the question of whether it is feasible to detect collusion rings from the paper bidding. To answer this question, we conduct empirical analysis of two realistic conference bidding datasets, including evaluations of existing algorithms for fraud detection in other applications. We find that collusion rings can achieve considerable success at manipulating the paper assignment while remaining hidden from detection: for example, in one dataset, undetected colluders are able to achieve assignment to up to 30% of the papers authored by other colluders. In addition, when 10 colluders bid on all of each other's papers, no detection algorithm outputs a group of reviewers with more than 31% overlap with the true colluders. These results suggest that collusion cannot be effectively detected from the bidding, demonstrating the need to develop more complex detection algorithms that leverage additional metadata.
Using Graph Theory for Improving Machine Learning-based Detection of Cyber Attacks
Authors: Authors: Giacomo Zonneveld, Lorenzo Principi, Marco Baldi
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07878
Pdf link: https://arxiv.org/pdf/2402.07878
Abstract Early detection of network intrusions and cyber threats is one of the main pillars of cybersecurity. One of the most effective approaches for this purpose is to analyze network traffic with the help of artificial intelligence algorithms, with the aim of detecting the possible presence of an attacker by distinguishing it from a legitimate user. This is commonly done by collecting the traffic exchanged between terminals in a network and analyzing it on a per-packet or per-connection basis. In this paper, we propose instead to perform pre-processing of network traffic under analysis with the aim of extracting some new metrics on which we can perform more efficient detection and overcome some limitations of classical approaches. These new metrics are based on graph theory, and consider the network as a whole, rather than focusing on individual packets or connections. Our approach is validated through experiments performed on publicly available data sets, from which it results that it can not only overcome some of the limitations of classical approaches, but also achieve a better detection capability of cyber threats.
Distributed Anomaly Detection in Modern Power Systems: A Penalty-based Mitigation Approach
Authors: Authors: Erfan Mehdipour Abadi, Masoud H. Nazari
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2402.07884
Pdf link: https://arxiv.org/pdf/2402.07884
Abstract The evolving landscape of electric power networks, influenced by the integration of distributed energy resources require the development of novel power system monitoring and control architectures. This paper develops algorithm to monitor and detect anomalies of different parts of a power system that cannot be measured directly, by applying neighboring measurements and a dynamic probing technique in a distributed fashion. Additionally, the proposed method accurately assesses the severity of the anomaly. A decision-making algorithm is introduced to effectively penalize anomalous agents, ensuring vigilant oversight of the entire power system's functioning. Simulation results show the efficacy of algorithms in distributed anomaly detection and mitigation.
MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO
Authors: Authors: Shubhabrata Mukherjee, Cory Beard, Zhu Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.07894
Pdf link: https://arxiv.org/pdf/2402.07894
Abstract Low-light conditions and occluded scenarios impede object detection in real-world Internet of Things (IoT) applications like autonomous vehicles and security systems. While advanced machine learning models strive for accuracy, their computational demands clash with the limitations of resource-constrained devices, hampering real-time performance. In our current research, we tackle this challenge, by introducing "YOLO Phantom", one of the smallest YOLO models ever conceived. YOLO Phantom utilizes the novel Phantom Convolution block, achieving comparable accuracy to the latest YOLOv8n model while simultaneously reducing both parameters and model size by 43%, resulting in a significant 19% reduction in Giga Floating Point Operations (GFLOPs). YOLO Phantom leverages transfer learning on our multimodal RGB-infrared dataset to address low-light and occlusion issues, equipping it with robust vision under adverse conditions. Its real-world efficacy is demonstrated on an IoT platform with advanced low-light and RGB cameras, seamlessly connecting to an AWS-based notification endpoint for efficient real-time object detection. Benchmarks reveal a substantial boost of 17% and 14% in frames per second (FPS) for thermal and RGB detection, respectively, compared to the baseline YOLOv8n model. For community contribution, both the code and the multimodal dataset are available on GitHub.
Detection of Spider Mites on Labrador Beans through Machine Learning Approaches Using Custom Datasets
Authors: Authors: Violet Liu, Jason Chen, Ans Qureshi, Mahla Nejati
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.07895
Pdf link: https://arxiv.org/pdf/2402.07895
Abstract Amidst growing food production demands, early plant disease detection is essential to safeguard crops; this study proposes a visual machine learning approach for plant disease detection, harnessing RGB and NIR data collected in real-world conditions through a JAI FS-1600D-10GE camera to build an RGBN dataset. A two-stage early plant disease detection model with YOLOv8 and a sequential CNN was used to train on a dataset with partial labels, which showed a 3.6% increase in mAP compared to a single-stage end-to-end segmentation model. The sequential CNN model achieved 90.62% validation accuracy utilising RGBN data. An average of 6.25% validation accuracy increase is found using RGBN in classification compared to RGB using ResNet15 and the sequential CNN models. Further research and dataset improvements are needed to meet food production demands.
Keyword: face recognition

Trade-off Between Spatial and Angular Resolution in Facial Recognition
Authors: Authors: Muhammad Zeshan Alam, Sousso kelowani, Mohamed Elsaeidy
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.07263
Pdf link: https://arxiv.org/pdf/2402.07263
Abstract Ensuring robustness in face recognition systems across various challenging conditions is crucial for their versatility. State-of-the-art methods often incorporate additional information, such as depth, thermal, or angular data, to enhance performance. However, light field-based face recognition approaches that leverage angular information face computational limitations. This paper investigates the fundamental trade-off between spatio-angular resolution in light field representation to achieve improved face recognition performance. By utilizing macro-pixels with varying angular resolutions while maintaining the overall image size, we aim to quantify the impact of angular information at the expense of spatial resolution, while considering computational constraints. Our experimental results demonstrate a notable performance improvement in face recognition systems by increasing the angular resolution, up to a certain extent, at the cost of spatial resolution.
Keyword: augmentation

Neural Models for Source Code Synthesis and Completion
Authors: Authors: Mitodru Niyogi
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2402.06690
Pdf link: https://arxiv.org/pdf/2402.06690
Abstract Natural language (NL) to code suggestion systems assist developers in Integrated Development Environments (IDEs) by translating NL utterances into compilable code snippet. The current approaches mainly involve hard-coded, rule-based systems based on semantic parsing. These systems make heavy use of hand-crafted rules that map patterns in NL or elements in its syntax parse tree to various query constructs and can only work on a limited subset of NL with a restricted NL syntax. These systems are unable to extract semantic information from the coding intents of the developer, and often fail to infer types, names, and the context of the source code to get accurate system-level code suggestions. In this master thesis, we present sequence-to-sequence deep learning models and training paradigms to map NL to general-purpose programming languages that can assist users with suggestions of source code snippets, given a NL intent, and also extend auto-completion functionality of the source code to users while they are writing source code. The developed architecture incorporates contextual awareness into neural models which generate source code tokens directly instead of generating parse trees/abstract meaning representations from the source code and converting them back to source code. The proposed pretraining strategy and the data augmentation techniques improve the performance of the proposed architecture. The proposed architecture has been found to exceed the performance of a neural semantic parser, TranX, based on the BLEU-4 metric by 10.82%. Thereafter, a finer analysis for the parsable code translations from the NL intent for CoNaLA challenge was introduced. The proposed system is bidirectional as it can be also used to generate NL code documentation given source code. Lastly, a RoBERTa masked language model for Python was proposed to extend the developed system for code completion.
ExGRG: Explicitly-Generated Relation Graph for Self-Supervised Representation Learning
Authors: Authors: Mahdi Naseri, Mahdi Biparva
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.06737
Pdf link: https://arxiv.org/pdf/2402.06737
Abstract Self-supervised Learning (SSL) has emerged as a powerful technique in pre-training deep learning models without relying on expensive annotated labels, instead leveraging embedded signals in unlabeled data. While SSL has shown remarkable success in computer vision tasks through intuitive data augmentation, its application to graph-structured data poses challenges due to the semantic-altering and counter-intuitive nature of graph augmentations. Addressing this limitation, this paper introduces a novel non-contrastive SSL approach to Explicitly Generate a compositional Relation Graph (ExGRG) instead of relying solely on the conventional augmentation-based implicit relation graph. ExGRG offers a framework for incorporating prior domain knowledge and online extracted information into the SSL invariance objective, drawing inspiration from the Laplacian Eigenmap and Expectation-Maximization (EM). Employing an EM perspective on SSL, our E-step involves relation graph generation to identify candidates to guide the SSL invariance objective, and M-step updates the model parameters by integrating the derived relational information. Extensive experimentation on diverse node classification datasets demonstrates the superiority of our method over state-of-the-art techniques, affirming ExGRG as an effective adoption of SSL for graph representation learning.
Evaluation Metrics for Text Data Augmentation in NLP
Authors: Authors: Marcellus Amadeus, William Alberto Cruz Castañeda
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.06766
Pdf link: https://arxiv.org/pdf/2402.06766
Abstract Recent surveys on data augmentation for natural language processing have reported different techniques and advancements in the field. Several frameworks, tools, and repositories promote the implementation of text data augmentation pipelines. However, a lack of evaluation criteria and standards for method comparison due to different tasks, metrics, datasets, architectures, and experimental settings makes comparisons meaningless. Also, a lack of methods unification exists and text data augmentation research would benefit from unified metrics to compare different augmentation methods. Thus, academics and the industry endeavor relevant evaluation metrics for text data augmentation techniques. The contribution of this work is to provide a taxonomy of evaluation metrics for text augmentation methods and serve as a direction for a unified benchmark. The proposed taxonomy organizes categories that include tools for implementation and metrics calculation. Finally, with this study, we intend to present opportunities to explore the unification and standardization of text data augmentation metrics.
Neural Rendering based Urban Scene Reconstruction for Autonomous Driving
Authors: Authors: Shihao Shen, Louis Kerofsky, Varun Ravi Kumar, Senthil Yogamani
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2402.06826
Pdf link: https://arxiv.org/pdf/2402.06826
Abstract Dense 3D reconstruction has many applications in automated driving including automated annotation validation, multimodal data augmentation, providing ground truth annotations for systems lacking LiDAR, as well as enhancing auto-labeling accuracy. LiDAR provides highly accurate but sparse depth, whereas camera images enable estimation of dense depth but noisy particularly at long ranges. In this paper, we harness the strengths of both sensors and propose a multimodal 3D scene reconstruction using a framework combining neural implicit surfaces and radiance fields. In particular, our method estimates dense and accurate 3D structures and creates an implicit map representation based on signed distance fields, which can be further rendered into RGB images, and depth maps. A mesh can be extracted from the learned signed distance field and culled based on occlusion. Dynamic objects are efficiently filtered on the fly during sampling using 3D object detection models. We demonstrate qualitative and quantitative results on challenging automotive scenes.
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
Authors: Authors: Muthu Chidambaram, Rong Ge
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2402.06855
Pdf link: https://arxiv.org/pdf/2402.06855
Abstract Data augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade. An important subclass of data augmentation techniques - which includes both label smoothing and Mixup - involves modifying not only the input data but also the input label during model training. In this work, we analyze the role played by the label augmentation aspect of such methods. We prove that linear models on linearly separable data trained with label augmentation learn only the minimum variance features in the data, while standard training (which includes weight decay) can learn higher variance features. An important consequence of our results is negative: label smoothing and Mixup can be less robust to adversarial perturbations of the training data when compared to standard training. We verify that our theory reflects practice via a range of experiments on synthetic data and image classification benchmarks.
Understanding Test-Time Augmentation
Authors: Authors: Masanari Kimura
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.06892
Pdf link: https://arxiv.org/pdf/2402.06892
Abstract Test-Time Augmentation (TTA) is a very powerful heuristic that takes advantage of data augmentation during testing to produce averaged output. Despite the experimental effectiveness of TTA, there is insufficient discussion of its theoretical aspects. In this paper, we aim to give theoretical guarantees for TTA and clarify its behavior.
Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training
Authors: Authors: Haonan Chen, Zhicheng Dou, Xuetong Hao, Yunhao Tao, Shiren Song, Zhenli Sheng
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.07076
Pdf link: https://arxiv.org/pdf/2402.07076
Abstract Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems. However, despite their widespread use, the task of identifying appropriate company customers for a specific target solution to the sales team of a solution provider remains a complex business problem that existing matching systems have yet to adequately address. In this work, we study the B2B solution matching problem and identify two main challenges of this scenario: (1) the modeling of complex multi-field features and (2) the limited, incomplete, and sparse transaction data. To tackle these challenges, we propose a framework CAMA, which is built with a hierarchical multi-field matching structure as its backbone and supplemented by three data augmentation strategies and a contrastive pre-training objective to compensate for the imperfections in the available data. Through extensive experiments on a real-world dataset, we demonstrate that CAMA outperforms several strong baseline matching models significantly. Furthermore, we have deployed our matching framework on a system of Huawei Cloud. Our observations indicate an improvement of about 30% compared to the previous online model in terms of Conversion Rate (CVR), which demonstrates its great business value.
Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation
Authors: Authors: Haonan Chen, Zhicheng Dou, Kelong Mao, Jiongnan Liu, Ziliang Zhao
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2402.07092
Pdf link: https://arxiv.org/pdf/2402.07092
Abstract Conversational search utilizes muli-turn natural language contexts to retrieve relevant passages. Existing conversational dense retrieval models mostly view a conversation as a fixed sequence of questions and responses, overlooking the severe data sparsity problem -- that is, users can perform a conversation in various ways, and these alternate conversations are unrecorded. Consequently, they often struggle to generalize to diverse conversations in real-world scenarios. In this work, we propose a framework for generalizing Conversational dense retrieval via LLM-cognition data Augmentation (ConvAug). ConvAug first generates multi-level augmented conversations to capture the diverse nature of conversational contexts. Inspired by human cognition, we devise a cognition-aware process to mitigate the generation of false positives, false negatives, and hallucinations. Moreover, we develop a difficulty-adaptive sample filter that selects challenging samples for complex conversations, thereby giving the model a larger learning space. A contrastive learning objective is then employed to train a better conversational context encoder. Extensive experiments conducted on four public datasets, under both normal and zero-shot settings, demonstrate the effectiveness, generalizability, and applicability of ConvAug.
TriAug: Out-of-Distribution Detection for Robust Classification of Imbalanced Breast Lesion in Ultrasound
Authors: Authors: Yinyu Ye, Shijing Chen, Dong Ni, Ruobing Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07452
Pdf link: https://arxiv.org/pdf/2402.07452
Abstract Different diseases, such as histological subtypes of breast lesions, have severely varying incidence rates. Even trained with substantial amount of in-distribution (ID) data, models often encounter out-of-distribution (OOD) samples belonging to unseen classes in clinical reality. To address this, we propose a novel framework built upon a long-tailed OOD detection task for breast ultrasound images. It is equipped with a triplet state augmentation (TriAug) which improves ID classification accuracy while maintaining a promising OOD detection performance. Meanwhile, we designed a balanced sphere loss to handle the class imbalanced problem.
One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive Learning
Authors: Authors: Haozhen Zhang, Xi Xiao, Le Yu, Qing Li, Zhen Ling, Ye Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2402.07501
Pdf link: https://arxiv.org/pdf/2402.07501
Abstract As network security receives widespread attention, encrypted traffic classification has become the current research focus. However, existing methods conduct traffic classification without sufficiently considering the common characteristics between data samples, leading to suboptimal performance. Moreover, they train the packet-level and flow-level classification tasks independently, which is redundant because the packet representations learned in the packet-level task can be exploited by the flow-level task. Therefore, in this paper, we propose an effective model named a Contrastive Learning Enhanced Temporal Fusion Encoder (CLE-TFE). In particular, we utilize supervised contrastive learning to enhance the packet-level and flow-level representations and perform graph data augmentation on the byte-level traffic graph so that the fine-grained semantic-invariant characteristics between bytes can be captured through contrastive learning. We also propose cross-level multi-task learning, which simultaneously accomplishes the packet-level and flow-level classification tasks in the same model with one training. Further experiments show that CLE-TFE achieves the best overall performance on the two tasks, while its computational overhead (i.e., floating point operations, FLOPs) is only about 1/14 of the pre-trained model (e.g., ET-BERT). We release the code at https://github.com/ViktorAxelsen/CLE-TFE
MAFIA: Multi-Adapter Fused Inclusive LanguAge Models
Authors: Authors: Prachi Jain, Ashutosh Sathe, Varun Gumma, Kabir Ahuja, Sunayana Sitaram
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2402.07519
Pdf link: https://arxiv.org/pdf/2402.07519
Abstract Pretrained Language Models (PLMs) are widely used in NLP for various tasks. Recent studies have identified various biases that such models exhibit and have proposed methods to correct these biases. However, most of the works address a limited set of bias dimensions independently such as gender, race, or religion. Moreover, the methods typically involve finetuning the full model to maintain the performance on the downstream task. In this work, we aim to modularly debias a pretrained language model across multiple dimensions. Previous works extensively explored debiasing PLMs using limited US-centric counterfactual data augmentation (CDA). We use structured knowledge and a large generative model to build a diverse CDA across multiple bias dimensions in a semi-automated way. We highlight how existing debiasing methods do not consider interactions between multiple societal biases and propose a debiasing model that exploits the synergy amongst various societal biases and enables multi-bias debiasing simultaneously. An extensive evaluation on multiple tasks and languages demonstrates the efficacy of our approach.
Engineering Weighted Connectivity Augmentation Algorithms
Authors: Authors: Marcelo Fonseca Faraj, Ernestine Großmann, Felix Joos, Thomas Möller, Christian Schulz
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2402.07753
Pdf link: https://arxiv.org/pdf/2402.07753
Abstract Increasing the connectivity of a graph is a pivotal challenge in robust network design. The weighted connectivity augmentation problem is a common version of the problem that takes link costs into consideration. The problem is then to find a minimum cost subset of a given set of weighted links that increases the connectivity of a graph by one when the links are added to the edge set of the input instance. In this work, we give a first implementation of recently discovered better-than-2 approximations. Furthermore, we propose three new heuristic and one exact approach. These include a greedy algorithm considering link costs and the number of unique cuts covered, an approach based on minimum spanning trees and a local search algorithm that may improve a given solution by swapping links of paths. Our exact approach uses an ILP formulation with efficient cut enumeration as well as a fast initialization routine. We then perform an extensive experimental evaluation which shows that our algorithms are faster and yield the best solutions compared to the current state-of-the-art as well as the recently discovered better-than-2 approximation algorithms. Our novel local search algorithm can improve solution quality even further.
AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy
Authors: Authors: Philipp Schoenegger, Peter S. Park, Ezra Karger, Philip E. Tetlock
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2402.07862
Pdf link: https://arxiv.org/pdf/2402.07862
Abstract Large language models (LLMs) show impressive capabilities, matching and sometimes exceeding human performance in many domains. This study explores the potential of LLMs to augment judgement in forecasting tasks. We evaluated the impact on forecasting accuracy of two GPT-4-Turbo assistants: one designed to provide high-quality advice ('superforecasting'), and the other designed to be overconfident and base-rate-neglecting. Participants (N = 991) had the option to consult their assigned LLM assistant throughout the study, in contrast to a control group that used a less advanced model (DaVinci-003) without direct forecasting support. Our preregistered analyses reveal that LLM augmentation significantly enhances forecasting accuracy by 23% across both types of assistants, compared to the control group. This improvement occurs despite the superforecasting assistant's higher accuracy in predictions, indicating the augmentation's benefit is not solely due to model prediction accuracy. Exploratory analyses showed a pronounced effect in one forecasting item, without which we find that the superforecasting assistant increased accuracy by 43%, compared with 28% for the biased assistant. We further examine whether LLM augmentation disproportionately benefits less skilled forecasters, degrades the wisdom-of-the-crowd by reducing prediction diversity, or varies in effectiveness with question difficulty. Our findings do not consistently support these hypotheses. Our results suggest that access to an LLM assistant, even a biased one, can be a helpful decision aid in cognitively demanding tasks where the answer is not known at the time of interaction.

LeeKyungwook / get-arxiv-noti

New submissions for Tue, 13 Feb 24 #975

Keyword: detection

Transfer learning with generative models for object detection on limited datasets

Reasoning Grasping via Multimodal Large Language Model

Event-to-Video Conversion for Overhead Object Detection

Neural Rendering based Urban Scene Reconstruction for Autonomous Driving

Benchmarking Frameworks and Comparative Studies of Controller Area Network (CAN) Intrusion Detection Systems: A Review

Assessing Uncertainty Estimation Methods for 3D Image Segmentation under Distribution Shifts

Semantic Object-level Modeling for Robust Visual Camera Relocalization

Architectural Neural Backdoors from First Principles

A Change Detection Reality Check

Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations

Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance

Explainable Global Wildfire Prediction Models using Graph Neural Networks

On (Mis)perceptions of testing effectiveness: an empirical study

Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study

Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets

Exploring Saliency Bias in Manipulation Detection

Leveraging AI to Advance Science and Computing Education across Africa: Progress, Challenges, and Opportunities

Large Language Models are Few-shot Generators: Proposing Hybrid Prompt Algorithm To Generate Webshell Escape Samples

Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems

Malicious Package Detection using Metadata Information

TriAug: Out-of-Distribution Detection for Robust Classification of Imbalanced Breast Lesion in Ultrasound

ClusterTabNet: Supervised clustering method for table detection and table structure recognition

Using Ensemble Inference to Improve Recall of Clone Detection

BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection

ASAP-Repair: API-Specific Automated Program Repair Based on API Usage Graphs

Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

A Flow-based Credibility Metric for Safety-critical Pedestrian Detection

AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer

Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision

TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection

PBADet: A One-Stage Anchor-Free Approach for Part-Body Association

On the Detection of Reviewer-Author Collusion Rings From Paper Bidding

Using Graph Theory for Improving Machine Learning-based Detection of Cyber Attacks

Distributed Anomaly Detection in Modern Power Systems: A Penalty-based Mitigation Approach

MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

Detection of Spider Mites on Labrador Beans through Machine Learning Approaches Using Custom Datasets

Keyword: face recognition

Trade-off Between Spatial and Angular Resolution in Facial Recognition

Keyword: augmentation

Neural Models for Source Code Synthesis and Completion

ExGRG: Explicitly-Generated Relation Graph for Self-Supervised Representation Learning

Evaluation Metrics for Text Data Augmentation in NLP

Neural Rendering based Urban Scene Reconstruction for Autonomous Driving

For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

Understanding Test-Time Augmentation

Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training

Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation

TriAug: Out-of-Distribution Detection for Robust Classification of Imbalanced Breast Lesion in Ultrasound

One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive Learning

MAFIA: Multi-Adapter Fused Inclusive LanguAge Models

Engineering Weighted Connectivity Augmentation Algorithms

AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy