New submissions for Wed, 3 Apr 24

Keyword: detection

Graph-Based Optimisation of Network Expansion in a Dockless Bike Sharing System

Authors: Authors: Mark Roantree, Niamh Murphi, Dinh Viet Cuong, Vuong Minh Ngo
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2404.01320
Pdf link: https://arxiv.org/pdf/2404.01320
Abstract Bike-sharing systems (BSSs) are deployed in over a thousand cities worldwide and play an important role in many urban transportation systems. BSSs alleviate congestion, reduce pollution and promote physical exercise. It is essential to explore the spatiotemporal patterns of bike-sharing demand, as well as the factors that influence these patterns, in order to optimise system operational efficiency. In this study, an optimised geo-temporal graph is constructed using trip data from Moby Bikes, a dockless BSS operator. The process of optimising the graph unveiled prime locations for erecting new stations during future expansions of the BSS. The Louvain algorithm, a community detection technique, is employed to uncover usage patterns at different levels of temporal granularity. The community detection results reveal largely self-contained sub-networks that exhibit similar usage patterns at their respective levels of temporal granularity. Overall, this study reinforces that BSSs are intrinsically spatiotemporal systems, with community presence driven by spatiotemporal dynamics. These findings may aid operators in improving redistribution efficiency.
FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detecction
Authors: Authors: Ziyi Zhou, Xiaoming Zhang, Litian Zhang, Jiacheng Liu, Xi Zhang, Chaozhuo Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2404.01336
Pdf link: https://arxiv.org/pdf/2404.01336
Abstract Existing benchmarks for fake news detection have significantly contributed to the advancement of models in assessing the authenticity of news content. However, these benchmarks typically focus solely on news pertaining to a single semantic topic or originating from a single platform, thereby failing to capture the diversity of multi-domain news in real scenarios. In order to understand fake news across various domains, the external knowledge and fine-grained annotations are indispensable to provide precise evidence and uncover the diverse underlying strategies for fabrication, which are also ignored by existing benchmarks. To address this gap, we introduce a novel multi-domain knowledge-enhanced benchmark with fine-grained annotations, named \textbf{FineFake}. FineFake encompasses 16,909 data samples spanning six semantic topics and eight platforms. Each news item is enriched with multi-modal content, potential social context, semi-manually verified common knowledge, and fine-grained annotations that surpass conventional binary labels. Furthermore, we formulate three challenging tasks based on FineFake and propose a knowledge-enhanced domain adaptation network. Extensive experiments are conducted on FineFake under various scenarios, providing accurate and reliable benchmarks for future endeavors. The entire FineFake project is publicly accessible as an open-source repository at \url{https://github.com/Accuser907/FineFake}.
Detection of Temporality at Discourse Level on Financial News by Combining Natural Language Processing and Machine Learning
Authors: Authors: Silvia García-Méndez, Francisco de Arriba-Pérez, Ana Barros-Vila, Francisco J. González-Castaño
Subjects: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); Information Retrieval (cs.IR); Machine Learning (cs.LG); Statistical Finance (q-fin.ST)
Arxiv link: https://arxiv.org/abs/2404.01337
Pdf link: https://arxiv.org/pdf/2404.01337
Abstract Finance-related news such as Bloomberg News, CNN Business and Forbes are valuable sources of real data for market screening systems. In news, an expert shares opinions beyond plain technical analyses that include context such as political, sociological and cultural factors. In the same text, the expert often discusses the performance of different assets. Some key statements are mere descriptions of past events while others are predictions. Therefore, understanding the temporality of the key statements in a text is essential to separate context information from valuable predictions. We propose a novel system to detect the temporality of finance-related news at discourse level that combines Natural Language Processing and Machine Learning techniques, and exploits sophisticated features such as syntactic and semantic dependencies. More specifically, we seek to extract the dominant tenses of the main statements, which may be either explicit or implicit. We have tested our system on a labelled dataset of finance-related news annotated by researchers with knowledge in the field. Experimental results reveal a high detection precision compared to an alternative rule-based baseline approach. Ultimately, this research contributes to the state-of-the-art of market screening by identifying predictive knowledge for financial decision making.
Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation
Authors: Authors: Silvia García-Méndez, Francisco de Arriba-Pérez, Ana Barros-Vila, Francisco J. González-Castaño, Enrique Costa-Montenegro
Subjects: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); Information Retrieval (cs.IR); Machine Learning (cs.LG); Statistical Finance (q-fin.ST)
Arxiv link: https://arxiv.org/abs/2404.01338
Pdf link: https://arxiv.org/pdf/2404.01338
Abstract Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (NLP) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (LDA) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. We created an experimental data set composed of 2,158 financial news items that were manually labelled by NLP researchers to evaluate our solution. The ROUGE-L values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with LDA to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text.
Enhancing Bangla Fake News Detection Using Bidirectional Gated Recurrent Units and Deep Learning Techniques
Authors: Authors: Utsha Roy, Mst. Sazia Tahosin, Md. Mahedi Hassan, Taminul Islam, Fahim Imtiaz, Md Rezwane Sadik, Yassine Maleh, Rejwan Bin Sulaiman, Md. Simul Hasan Talukder
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01345
Pdf link: https://arxiv.org/pdf/2404.01345
Abstract The rise of fake news has made the need for effective detection methods, including in languages other than English, increasingly important. The study aims to address the challenges of Bangla which is considered a less important language. To this end, a complete dataset containing about 50,000 news items is proposed. Several deep learning models have been tested on this dataset, including the bidirectional gated recurrent unit (GRU), the long short-term memory (LSTM), the 1D convolutional neural network (CNN), and hybrid architectures. For this research, we assessed the efficacy of the model utilizing a range of useful measures, including recall, precision, F1 score, and accuracy. This was done by employing a big application. We carry out comprehensive trials to show the effectiveness of these models in identifying bogus news in Bangla, with the Bidirectional GRU model having a stunning accuracy of 99.16%. Our analysis highlights the importance of dataset balance and the need for continual improvement efforts to a substantial degree. This study makes a major contribution to the creation of Bangla fake news detecting systems with limited resources, thereby setting the stage for future improvements in the detection process.
Distribution-Agnostic Database De-Anonymization Under Obfuscation And Synchronization Errors
Authors: Authors: Serhat Bakirtas, Elza Erkip
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2404.01366
Pdf link: https://arxiv.org/pdf/2404.01366
Abstract Database de-anonymization typically involves matching an anonymized database with correlated publicly available data. Existing research focuses either on practical aspects without requiring knowledge of the data distribution yet provides limited guarantees, or on theoretical aspects assuming known distributions. This paper aims to bridge these two approaches, offering theoretical guarantees for database de-anonymization under synchronization errors and obfuscation without prior knowledge of data distribution. Using a modified replica detection algorithm and a new seeded deletion detection algorithm, we establish sufficient conditions on the database growth rate for successful matching, demonstrating a double-logarithmic seed size relative to row size is sufficient for detecting deletions in the database. Importantly, our findings indicate that these sufficient de-anonymization conditions are tight and are the same as in the distribution-aware setting, avoiding asymptotic performance loss due to unknown distributions. Finally, we evaluate the performance of our proposed algorithms through simulations, confirming their effectiveness in more practical, non-asymptotic, scenarios.
Object-conditioned Bag of Instances for Few-Shot Personalized Instance Recognition
Authors: Authors: Umberto Michieli, Jijoong Moon, Daehyun Kim, Mete Ozay
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.01397
Pdf link: https://arxiv.org/pdf/2404.01397
Abstract Nowadays, users demand for increased personalization of vision systems to localize and identify personal instances of objects (e.g., my dog rather than dog) from a few-shot dataset only. Despite outstanding results of deep networks on classical label-abundant benchmarks (e.g., those of the latest YOLOv8 model for standard object detection), they struggle to maintain within-class variability to represent different instances rather than object categories only. We construct an Object-conditioned Bag of Instances (OBoI) based on multi-order statistics of extracted features, where generic object detection models are extended to search and identify personal instances from the OBoI's metric space, without need for backpropagation. By relying on multi-order statistics, OBoI achieves consistent superior accuracy in distinguishing different instances. In the results, we achieve 77.1% personal object recognition accuracy in case of 18 personal instances, showing about 12% relative gain over the state of the art.
The Radar Ghost Dataset -- An Evaluation of Ghost Objects in Automotive Radar Data
Authors: Authors: Florian Kraus, Nicolas Scheiner, Werner Ritter, Klaus Dietmayer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01437
Pdf link: https://arxiv.org/pdf/2404.01437
Abstract Radar sensors have a long tradition in advanced driver assistance systems (ADAS) and also play a major role in current concepts for autonomous vehicles. Their importance is reasoned by their high robustness against meteorological effects, such as rain, snow, or fog, and the radar's ability to measure relative radial velocity differences via the Doppler effect. The cause for these advantages, namely the large wavelength, is also one of the drawbacks of radar sensors. Compared to camera or lidar sensor, a lot more surfaces in a typical traffic scenario appear flat relative to the radar's emitted signal. This results in multi-path reflections or so called ghost detections in the radar signal. Ghost objects pose a major source for potential false positive detections in a vehicle's perception pipeline. Therefore, it is important to be able to segregate multi-path reflections from direct ones. In this article, we present a dataset with detailed manual annotations for different kinds of ghost detections. Moreover, two different approaches for identifying these kinds of objects are evaluated. We hope that our dataset encourages more researchers to engage in the fields of multi-path object suppression or exploitation.
Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning
Authors: Authors: Martim Afonso, Praphulla M. S. Bhawsar, Monjoy Saha, Jonas S. Almeida, Arlindo L. Oliveira
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.01446
Pdf link: https://arxiv.org/pdf/2404.01446
Abstract Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.
QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving
Authors: Authors: Sourav Biswas, Sergio Casas, Quinlan Sykora, Ben Agro, Abbas Sadat, Raquel Urtasun
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01486
Pdf link: https://arxiv.org/pdf/2404.01486
Abstract A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps have been utilized to understand free-space. However, predicting a grid for the entire scene is wasteful since only certain spatio-temporal regions are reachable and relevant to the self-driving vehicle. We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive, then predict, and finally plan. Instead, we shift the paradigm to have the planner query occupancy at relevant spatio-temporal points, restricting the computation to those regions of interest. Exploiting this representation, we evaluate candidate trajectories around key factors such as collision avoidance, comfort, and progress for safety and interpretability. Our approach achieves better highway driving quality than the state-of-the-art in high-fidelity closed-loop simulations.
Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge
Authors: Authors: Heitor Rapela Medeiros, Masih Aminbeidokhti, Fidel Guerrero Pena, David Latortue, Eric Granger, Marco Pedersoli
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.01492
Pdf link: https://arxiv.org/pdf/2404.01492
Abstract A common practice in deep learning consists of training large neural networks on massive datasets to perform accurately for different domains and tasks. While this methodology may work well in numerous application areas, it only applies across modalities due to a larger distribution shift in data captured using different sensors. This paper focuses on the problem of adapting a large object detection model to one or multiple modalities while being efficient. To do so, we propose ModTr as an alternative to the common approach of fine-tuning large models. ModTr consists of adapting the input with a small transformation network trained to minimize the detection loss directly. The original model can therefore work on the translated inputs without any further change or fine-tuning to its parameters. Experimental results on translating from IR to RGB images on two well-known datasets show that this simple ModTr approach provides detectors that can perform comparably or better than the standard fine-tuning without forgetting the original knowledge. This opens the doors to a more flexible and efficient service-based detection pipeline in which, instead of using a different detector for each modality, a unique and unaltered server is constantly running, where multiple modalities with the corresponding translations can query it. Code: https://github.com/heitorrapela/ModTr.
MosquitoFusion: A Multiclass Dataset for Real-Time Detection of Mosquitoes, Swarms, and Breeding Sites Using Deep Learning
Authors: Authors: Md. Faiyaz Abdullah Sayeedi, Fahim Hafiz, Md Ashiqur Rahman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01501
Pdf link: https://arxiv.org/pdf/2404.01501
Abstract In this paper, we present an integrated approach to real-time mosquito detection using our multiclass dataset (MosquitoFusion) containing 1204 diverse images and leverage cutting-edge technologies, specifically computer vision, to automate the identification of Mosquitoes, Swarms, and Breeding Sites. The pre-trained YOLOv8 model, trained on this dataset, achieved a mean Average Precision (mAP@50) of 57.1%, with precision at 73.4% and recall at 50.5%. The integration of Geographic Information Systems (GIS) further enriches the depth of our analysis, providing valuable insights into spatial patterns. The dataset and code are available at https://github.com/faiyazabdullah/MosquitoFusion.
Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery
Authors: Authors: Christian Limberg, Artur Gonçalves, Bastien Rigault, Helmut Prendinger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.01571
Pdf link: https://arxiv.org/pdf/2404.01571
Abstract In this article, we explore the potential of zero-shot Large Multimodal Models (LMMs) in the domain of drone perception. We focus on person detection and action recognition tasks and evaluate two prominent LMMs, namely YOLO-World and GPT-4V(ision) using a publicly available dataset captured from aerial views. Traditional deep learning approaches rely heavily on large and high-quality training datasets. However, in certain robotic settings, acquiring such datasets can be resource-intensive or impractical within a reasonable timeframe. The flexibility of prompt-based Large Multimodal Models (LMMs) and their exceptional generalization capabilities have the potential to revolutionize robotics applications in these scenarios. Our findings suggest that YOLO-World demonstrates good detection performance. GPT-4V struggles with accurately classifying action classes but delivers promising results in filtering out unwanted region proposals and in providing a general description of the scenery. This research represents an initial step in leveraging LMMs for drone perception and establishes a foundation for future investigations in this area.
Diffusion Deepfake
Authors: Authors: Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, Xiatian Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01579
Pdf link: https://arxiv.org/pdf/2404.01579
Abstract Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection. The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes. Acknowledging the urgency to address the vulnerability of current deepfake detectors to this evolving threat, our paper introduces two extensive deepfake datasets generated by state-of-the-art diffusion models as other datasets are less diverse and low in quality. Our extensive experiments also showed that our dataset is more challenging compared to the other face deepfake datasets. Our strategic dataset creation not only challenge the deepfake detectors but also sets a new benchmark for more evaluation. Our comprehensive evaluation reveals the struggle of existing detection methods, often optimized for specific image domains and manipulations, to effectively adapt to the intricate nature of diffusion deepfakes, limiting their practical utility. To address this critical issue, we investigate the impact of enhancing training data diversity on representative detection methods. This involves expanding the diversity of both manipulation techniques and image domains. Our findings underscore that increasing training data diversity results in improved generalizability. Moreover, we propose a novel momentum difficulty boosting strategy to tackle the additional challenge posed by training data heterogeneity. This strategy dynamically assigns appropriate sample weights based on learning difficulty, enhancing the model's adaptability to both easy and challenging samples. Extensive experiments on both existing and newly proposed benchmarks demonstrate that our model optimization approach surpasses prior alternatives significantly.
Learning Temporal Cues by Predicting Objects Move for Multi-camera 3D Object Detection
Authors: Authors: Seokha Moon, Hongbeen Park, Jungphil Kwon, Jaekoo Lee, Jinkyu Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01580
Pdf link: https://arxiv.org/pdf/2404.01580
Abstract In autonomous driving and robotics, there is a growing interest in utilizing short-term historical data to enhance multi-camera 3D object detection, leveraging the continuous and correlated nature of input video streams. Recent work has focused on spatially aligning BEV-based features over timesteps. However, this is often limited as its gain does not scale well with long-term past observations. To address this, we advocate for supervising a model to predict objects' poses given past observations, thus explicitly guiding to learn objects' temporal cues. To this end, we propose a model called DAP (Detection After Prediction), consisting of a two-branch network: (i) a branch responsible for forecasting the current objects' poses given past observations and (ii) another branch that detects objects based on the current and past observations. The features predicting the current objects from branch (i) is fused into branch (ii) to transfer predictive knowledge. We conduct extensive experiments with the large-scale nuScenes datasets, and we observe that utilizing such predictive information significantly improves the overall detection performance. Our model can be used plug-and-play, showing consistent performance gain.
BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System
Authors: Authors: Jiarong Xian, Jibao Yuan, Peiwei Zheng, Dexian Chen
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2404.01582
Pdf link: https://arxiv.org/pdf/2404.01582
Abstract Text plagiarism detection task is a common natural language processing task that aims to detect whether a given text contains plagiarism or copying from other texts. In existing research, detection of high level plagiarism is still a challenge due to the lack of high quality datasets. In this paper, we propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs of text plagiarism detection datasets covering a wide range of plagiarism methods, bridging the gap in this part of research. Meanwhile, we propose a plagiarism identification method based on Faiss with BERT with high efficiency and high accuracy. Our experiments show that the performance of this model outperforms other models in several metrics, including 98.86\%, 98.90%, 98.86%, and 0.9888 for Accuracy, Precision, Recall, and F1 Score, respectively. At the end, we also provide a user-friendly demo platform that allows users to upload a text library and intuitively participate in the plagiarism analysis.
FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets
Authors: Authors: Akshit Sharma, Sam Reinher, Dinesh Mehta, Bo Wu
Subjects: Databases (cs.DB); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2404.01585
Pdf link: https://arxiv.org/pdf/2404.01585
Abstract Frequent Subgraph Mining (FSM) is the process of identifying common subgraph patterns that surpass a predefined frequency threshold. While FSM is widely applicable in fields like bioinformatics, chemical analysis, and social network anomaly detection, its execution remains time-consuming and complex. This complexity stems from the need to recognize high-frequency subgraphs and ascertain if they exceed the set threshold. Current approaches to identifying these patterns often rely on edge or vertex extension methods. However, these strategies can introduce redundancies and cause increased latency. To address these challenges, this paper introduces a novel approach for identifying potential k-vertex patterns by combining two frequently observed (k - 1)-vertex patterns. This method optimizes the breadth-]first search, which allows for quicker search termination based on vertices count and support value. Another challenge in FSM is the validation of the presumed pattern against a specific threshold. Existing metrics, such as Maximum Independent Set (MIS) and Minimum Node Image (MNI), either demand significant computational time or risk overestimating pattern counts. Our innovative approach aligns with the MIS and identifies independent subgraphs. Through the "Maximal Independent Set" metric, this paper offers an efficient solution that minimizes latency and provides users with control over pattern overlap. Through extensive experimentation, our proposed method achieves an average of 10.58x speedup when compared to GraMi and an average 3x speedup when compared to T-FSM
LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network
Authors: Authors: Hanqian Li, Ruinan Zhang, Ye Pan, Junchi Ren, Fei Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01614
Pdf link: https://arxiv.org/pdf/2404.01614
Abstract Remote sensing target detection aims to identify and locate critical targets within remote sensing images, finding extensive applications in agriculture and urban planning. Feature pyramid networks (FPNs) are commonly used to extract multi-scale features. However, existing FPNs often overlook extracting low-level positional information and fine-grained context interaction. To address this, we propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information and facilitate fine-grained context interaction. The LR-FPN consists of two primary modules: the shallow position information extraction module (SPIEM) and the contextual interaction module (CIM). Specifically, SPIEM first maximizes the retention of solid location information of the target by simultaneously extracting positional and saliency information from the low-level feature map. Subsequently, CIM injects this robust location information into different layers of the original FPN through spatial and channel interaction, explicitly enhancing the object area. Moreover, in spatial interaction, we introduce a simple local and non-local interaction strategy to learn and retain the saliency information of the object. Lastly, the LR-FPN can be readily integrated into common object detection frameworks to improve performance significantly. Extensive experiments on two large-scale remote sensing datasets (i.e., DOTAV1.0 and HRSC2016) demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches. Our code and models will be publicly available.
Voice EHR: Introducing Multimodal Audio Data for Health
Authors: Authors: James Anibal, Hannah Huth, Ming Li, Lindsey Hazen, Yen Minh Lam, Nguyen Thi Thu Hang, Michael Kleinman, Shelley Ost, Christopher Jackson, Laura Sprabery, Cheran Elangovan, Balaji Krishnaiah, Lee Akst, Ioan Lina, Iqbal Elyazar, Lenny Ekwati, Stefan Jansen, Richard Nduwayezu, Charisse Garcia, Jeffrey Plum, Jacqueline Brenner, Miranda Song, Emily Ricotta, David Clifton, C. Louise Thwaites, Yael Bensoussan, Bradford Wood
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2404.01620
Pdf link: https://arxiv.org/pdf/2404.01620
Abstract Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning - compensating for the typical limitations of unimodal clinical datasets. This report introduces a consortium of partners for global work, presents the application used for data collection, and showcases the potential of informative voice EHR to advance the scalability and diversity of audio AI.
Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning
Authors: Authors: Ayush Arunachalam, Ian Kintz, Suvadeep Banerjee, Arnab Raha, Xiankun Jin, Fei Su, Viswanathan Pillai Prasanth, Rubin A. Parekhji, Suriyaprakash Natarajan, Kanad Basu
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.01632
Pdf link: https://arxiv.org/pdf/2404.01632
Abstract Given the widespread use of safety-critical applications in the automotive field, it is crucial to ensure the Functional Safety (FuSa) of circuits and components within automotive systems. The Analog and Mixed-Signal (AMS) circuits prevalent in these systems are more vulnerable to faults induced by parametric perturbations, noise, environmental stress, and other factors, in comparison to their digital counterparts. However, their continuous signal characteristics present an opportunity for early anomaly detection, enabling the implementation of safety mechanisms to prevent system failure. To address this need, we propose a novel framework based on unsupervised machine learning for early anomaly detection in AMS circuits. The proposed approach involves injecting anomalies at various circuit locations and individual components to create a diverse and comprehensive anomaly dataset, followed by the extraction of features from the observed circuit signals. Subsequently, we employ clustering algorithms to facilitate anomaly detection. Finally, we propose a time series framework to enhance and expedite anomaly detection performance. Our approach encompasses a systematic analysis of anomaly abstraction at multiple levels pertaining to the automotive domain, from hardware- to block-level, where anomalies are injected to create diverse fault scenarios. By monitoring the system behavior under these anomalous conditions, we capture the propagation of anomalies and their effects at different abstraction levels, thereby potentially paving the way for the implementation of reliable safety mechanisms to ensure the FuSa of automotive SoCs. Our experimental findings indicate that our approach achieves 100% anomaly detection accuracy and significantly optimizes the associated latency by 5X, underscoring the effectiveness of our devised solution.
Learning to Control Camera Exposure via Reinforcement Learning
Authors: Authors: Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.01636
Pdf link: https://arxiv.org/pdf/2404.01636
Abstract Adjusting camera exposure in arbitrary lighting conditions is the first step to ensure the functionality of computer vision applications. Poorly adjusted camera exposure often leads to critical failure and performance degradation. Traditional camera exposure control methods require multiple convergence steps and time-consuming processes, making them unsuitable for dynamic lighting conditions. In this paper, we propose a new camera exposure control framework that rapidly controls camera exposure while performing real-time processing by exploiting deep reinforcement learning. The proposed framework consists of four contributions: 1) a simplified training ground to simulate real-world's diverse and dynamic lighting changes, 2) flickering and image attribute-aware reward design, along with lightweight state design for real-time processing, 3) a static-to-dynamic lighting curriculum to gradually improve the agent's exposure-adjusting capability, and 4) domain randomization techniques to alleviate the limitation of the training ground and achieve seamless generalization in the wild.As a result, our proposed method rapidly reaches a desired exposure level within five steps with real-time processing (1 ms). Also, the acquired images are well-exposed and show superiority in various computer vision tasks, such as feature extraction and object detection.
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Authors: Authors: Kristina Gligoric, Myra Cheng, Lucia Zheng, Esin Durmus, Dan Jurafsky
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2404.01651
Pdf link: https://arxiv.org/pdf/2404.01651
Abstract The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word. Here we show that computationally modeling this use-mention distinction is crucial for dealing with counterspeech online. Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself (e.g., calling a vaccine dangerous is not the same as expressing disapproval of someone for calling vaccines dangerous). We show that even recent language models fail at distinguishing use from mention, and that this failure propagates to two key downstream tasks: misinformation and hate speech detection, resulting in censorship of counterspeech. We introduce prompting mitigations that teach the use-mention distinction, and show they reduce these errors. Our work highlights the importance of the use-mention distinction for NLP and CSS and offers ways to address it.
Supporting Mitosis Detection AI Training with Inter-Observer Eye-Gaze Consistencies
Authors: Authors: Hongyan Gu, Zihan Yan, Ayesha Alvi, Brandon Day, Chunxu Yang, Zida Wu, Shino Magaki, Mohammad Haeri, Xiang 'Anthony' Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01656
Pdf link: https://arxiv.org/pdf/2404.01656
Abstract The expansion of artificial intelligence (AI) in pathology tasks has intensified the demand for doctors' annotations in AI development. However, collecting high-quality annotations from doctors is costly and time-consuming, creating a bottleneck in AI progress. This study investigates eye-tracking as a cost-effective technology to collect doctors' behavioral data for AI training with a focus on the pathology task of mitosis detection. One major challenge in using eye-gaze data is the low signal-to-noise ratio, which hinders the extraction of meaningful information. We tackled this by levering the properties of inter-observer eye-gaze consistencies and creating eye-gaze labels from consistent eye-fixations shared by a group of observers. Our study involved 14 non-medical participants, from whom we collected eye-gaze data and generated eye-gaze labels based on varying group sizes. We assessed the efficacy of such eye-gaze labels by training Convolutional Neural Networks (CNNs) and comparing their performance to those trained with ground truth annotations and a heuristic-based baseline. Results indicated that CNNs trained with our eye-gaze labels closely followed the performance of ground-truth-based CNNs, and significantly outperformed the baseline. Although primarily focused on mitosis, we envision that insights from this study can be generalized to other medical imaging tasks.
Event Detection from Social Media for Epidemic Prediction
Authors: Authors: Tanmay Parekh, Anh Mac, Jiarui Yu, Yuxuan Dong, Syed Shahriar, Bonnie Liu, Eric Yang, Kuan-Hao Huang, Wei Wang, Nanyun Peng, Kai-Wei Chang
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2404.01679
Pdf link: https://arxiv.org/pdf/2404.01679
Abstract Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by developing a framework to extract and analyze epidemic-related events from social media posts. To this end, we curate an epidemic event ontology comprising seven disease-agnostic event types and construct a Twitter dataset SPEED with human-annotated events focused on the COVID-19 pandemic. Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics of Monkeypox, Zika, and Dengue; while models trained on existing ED datasets fail miserably. Furthermore, we show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox. This utility of our framework lays the foundations for better preparedness against emerging epidemics.
Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss
Authors: Authors: Jaeha Kim, Junghun Oh, Kyoung Mu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01692
Pdf link: https://arxiv.org/pdf/2404.01692
Abstract In real-world scenarios, image recognition tasks, such as semantic segmentation and object detection, often pose greater challenges due to the lack of information available within low-resolution (LR) content. Image super-resolution (SR) is one of the promising solutions for addressing the challenges. However, due to the ill-posed property of SR, it is challenging for typical SR methods to restore task-relevant high-frequency contents, which may dilute the advantage of utilizing the SR method. Therefore, in this paper, we propose Super-Resolution for Image Recognition (SR4IR) that effectively guides the generation of SR images beneficial to achieving satisfactory image recognition performance when processing LR images. The critical component of our SR4IR is the task-driven perceptual (TDP) loss that enables the SR network to acquire task-specific knowledge from a network tailored for a specific task. Moreover, we propose a cross-quality patch mix and an alternate training framework that significantly enhances the efficacy of the TDP loss by addressing potential problems when employing the TDP loss. Through extensive experiments, we demonstrate that our SR4IR achieves outstanding task performance by generating SR images useful for a specific image recognition task, including semantic segmentation, object detection, and image classification. The implementation code is available at https://github.com/JaehaKim97/SR4IR.
Task Integration Distillation for Object Detectors
Authors: Authors: Hai Su, ZhenWen Jian, Songsen Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01699
Pdf link: https://arxiv.org/pdf/2404.01699
Abstract Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches consider only the classification task among the two sub-tasks of an object detector, largely overlooking the regression task. This oversight leads to a partial understanding of the object detector's comprehensive task, resulting in skewed estimations and potentially adverse effects. Therefore, we propose a knowledge distillation method that addresses both the classification and regression tasks, incorporating a task significance strategy. By evaluating the importance of features based on the output of the detector's two sub-tasks, our approach ensures a balanced consideration of both classification and regression tasks in object detection. Drawing inspiration from real-world teaching processes and the definition of learning condition, we introduce a method that focuses on both key and weak areas. By assessing the value of features for knowledge distillation based on their importance differences, we accurately capture the current model's learning situation. This method effectively prevents the issue of biased predictions about the model's learning reality caused by an incomplete utilization of the detector's outputs.
Disentangled Pre-training for Human-Object Interaction Detection
Authors: Authors: Zhuolong Li, Xingao Li, Changxing Ding, Xiangmin Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01725
Pdf link: https://arxiv.org/pdf/2404.01725
Abstract Detecting human-object interaction (HOI) has long been limited by the amount of supervised data available. Recent approaches address this issue by pre-training according to pseudo-labels, which align object regions with HOI triplets parsed from image captions. However, pseudo-labeling is tricky and noisy, making HOI pre-training a complex process. Therefore, we propose an efficient disentangled pre-training method for HOI detection (DP-HOI) to address this problem. First, DP-HOI utilizes object detection and action recognition datasets to pre-train the detection and interaction decoder layers, respectively. Then, we arrange these decoder layers so that the pre-training architecture is consistent with the downstream HOI detection task. This facilitates efficient knowledge transfer. Specifically, the detection decoder identifies reliable human instances in each action recognition dataset image, generates one corresponding query, and feeds it into the interaction decoder for verb classification. Next, we combine the human instance verb predictions in the same image and impose image-level supervision. The DP-HOI structure can be easily adapted to the HOI detection task, enabling effective model parameter initialization. Therefore, it significantly enhances the performance of existing HOI detection models on a broad range of rare categories. The code and pre-trained weight are available at https://github.com/xingaoli/DP-HOI.
Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge
Authors: Authors: Haoxiang Ma, Modi Shi, Boyang Gao, Di Huang
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01727
Pdf link: https://arxiv.org/pdf/2404.01727
Abstract We focus on the generalization ability of the 6-DoF grasp detection method in this paper. While learning-based grasp detection methods can predict grasp poses for unseen objects using the grasp distribution learned from the training set, they often exhibit a significant performance drop when encountering objects with diverse shapes and structures. To enhance the grasp detection methods' generalization ability, we incorporate domain prior knowledge of robotic grasping, enabling better adaptation to objects with significant shape and structure differences. More specifically, we employ the physical constraint regularization during the training phase to guide the model towards predicting grasps that comply with the physical rule on grasping. For the unstable grasp poses predicted on novel objects, we design a contact-score joint optimization using the projection contact map to refine these poses in cluttered scenarios. Extensive experiments conducted on the GraspNet-1billion benchmark demonstrate a substantial performance gain on the novel object set and the real-world grasping experiments also demonstrate the effectiveness of our generalizing 6-DoF grasp detection method.
Atom-Level Optical Chemical Structure Recognition with Limited Supervision
Authors: Authors: Martijn Oldenhof, Edward De Brouwer, Adam Arany, Yves Moreau
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01743
Pdf link: https://arxiv.org/pdf/2404.01743
Abstract Identifying the chemical structure from a graphical representation, or image, of a molecule is a challenging pattern recognition task that would greatly benefit drug development. Yet, existing methods for chemical structure recognition do not typically generalize well, and show diminished effectiveness when confronted with domains where data is sparse, or costly to generate, such as hand-drawn molecule images. To address this limitation, we propose a new chemical structure recognition tool that delivers state-of-the-art performance and can adapt to new domains with a limited number of data samples and supervision. Unlike previous approaches, our method provides atom-level localization, and can therefore segment the image into the different atoms and bonds. Our model is the first model to perform OCSR with atom-level entity detection with only SMILES supervision. Through rigorous and extensive benchmarking, we demonstrate the preeminence of our chemical structure recognition approach in terms of data efficiency, accuracy, and atom-level entity prediction.
Unleash the Potential of CLIP for Video Highlight Detection
Authors: Authors: Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.01745
Pdf link: https://arxiv.org/pdf/2404.01745
Abstract Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge.
Analyzing the Single Event Upset Vulnerability of Binarized Neural Networks on SRAM FPGAs
Authors: Authors: Ioanna Souvatzoglou, Athanasios Papadimitriou, Aitzan Sari, Vasileios Vlagkoulis, Mihalis Psarakis
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2404.01757
Pdf link: https://arxiv.org/pdf/2404.01757
Abstract Neural Networks (NNs) are increasingly used in the last decade in several demanding applications, such as object detection and classification, autonomous driving, etc. Among different computing platforms for implementing NNs, FPGAs have multiple advantages due to design flexibility and high performance-to-watt ratio. Moreover, approximation techniques, such as quantization, have been introduced, which reduce the computational and storage requirements, thus enabling the integration of larger NNs into FPGA devices. On the other hand, FPGAs are sensitive to radiation-induced Single Event Upsets (SEUs). In this work, we perform an in-depth reliability analysis in an FPGA-based Binarized Fully Connected Neural Network (BNN) accelerator running a statistical fault injection campaign. The BNN benchmark has been produced by FINN, an open-source framework that provides an end-to-end flow from abstract level to design, making it easy to design customized FPGA NN accelerators, while it also supports various approximation techniques. The campaign includes the injection of faults in the configuration memory of a state-of-the-art Xilinx Ultrascale+ FPGA running the BNN, as well an exhaustive fault injection in the user flip flops. We have analyzed the fault injection results characterizing the SEU vulnerability of the circuit per network layer, per clock cycle, and register. In general, the results show that the BNNs are inherently resilient to soft errors, since a low portion of SEUs in the configuration memory and the flip flops, cause system crashes or misclassification errors.
Class-Incremental Few-Shot Event Detection
Authors: Authors: Kailin Zhao, Xiaolong Jin, Long Bai, Jiafeng Guo, Xueqi Cheng
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.01767
Pdf link: https://arxiv.org/pdf/2404.01767
Abstract Event detection is one of the fundamental tasks in information extraction and knowledge graph. However, a realistic event detection system often needs to deal with new event classes constantly. These new classes usually have only a few labeled instances as it is time-consuming and labor-intensive to annotate a large number of unlabeled instances. Therefore, this paper proposes a new task, called class-incremental few-shot event detection. Nevertheless, this task faces two problems, i.e., old knowledge forgetting and new class overfitting. To solve these problems, this paper further presents a novel knowledge distillation and prompt learning based method, called Prompt-KD. Specifically, to handle the forgetting problem about old knowledge, Prompt-KD develops an attention based multi-teacher knowledge distillation framework, where the ancestor teacher model pre-trained on base classes is reused in all learning sessions, and the father teacher model derives the current student model via adaptation. On the other hand, in order to cope with the few-shot learning scenario and alleviate the corresponding new class overfitting problem, Prompt-KD is also equipped with a prompt learning mechanism. Extensive experiments on two benchmark datasets, i.e., FewEvent and MAVEN, demonstrate the superior performance of Prompt-KD.
Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation
Authors: Authors: Zekun Wu, Sahan Bulathwela, Maria Perez-Ortiz, Adriano Soares Koshiyama
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.01768
Pdf link: https://arxiv.org/pdf/2404.01768
Abstract Recent advancements in Large Language Models (LLMs) have significantly increased their presence in human-facing Artificial Intelligence (AI) applications. However, LLMs could reproduce and even exacerbate stereotypical outputs from training data. This work introduces the Multi-Grain Stereotype (MGS) dataset, encompassing 51,867 instances across gender, race, profession, religion, and stereotypical text, collected by fusing multiple previously publicly available stereotype detection datasets. We explore different machine learning approaches aimed at establishing baselines for stereotype detection, and fine-tune several language models of various architectures and model sizes, presenting in this work a series of stereotypes classifier models for English text trained on MGS. To understand whether our stereotype detectors capture relevant features (aligning with human common sense) we utilise a variety of explanainable AI tools, including SHAP, LIME, and BertViz, and analyse a series of example cases discussing the results. Finally, we develop a series of stereotype elicitation prompts and evaluate the presence of stereotypes in text generation tasks with popular LLMs, using one of our best performing previously presented stereotypes detectors. Our experiments yielded several key findings: i) Training stereotype detectors in a multi-dimension setting yields better results than training multiple single-dimension classifiers.ii) The integrated MGS Dataset enhances both the in-dataset and cross-dataset generalisation ability of stereotype detectors compared to using the datasets separately. iii) There is a reduction in stereotypes in the content generated by GPT Family LLMs with newer versions.
A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?
Authors: Authors: Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01775
Pdf link: https://arxiv.org/pdf/2404.01775
Abstract The ability to detect unfamiliar or unexpected images is essential for safe deployment of computer vision systems. In the context of classification, the task of detecting images outside of a model's training domain is known as out-of-distribution (OOD) detection. While there has been a growing research interest in developing post-hoc OOD detection methods, there has been comparably little discussion around how these methods perform when the underlying classifier is not trained on a clean, carefully curated dataset. In this work, we take a closer look at 20 state-of-the-art OOD detection methods in the (more realistic) scenario where the labels used to train the underlying classifier are unreliable (e.g. crowd-sourced or web-scraped labels). Extensive experiments across different datasets, noise types & levels, architectures and checkpointing strategies provide insights into the effect of class label noise on OOD detection, and show that poor separation between incorrectly classified ID samples vs. OOD samples is an overlooked yet important limitation of existing methods. Code: https://github.com/glhr/ood-labelnoise
A Feature Dataset of Microservices-based Systems
Authors: Authors: Weipan Yang, Yongchao Xing, Yiming Lyu, Zhihao Liang, Zhiying Tu
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.01789
Pdf link: https://arxiv.org/pdf/2404.01789
Abstract Microservice architecture has become a dominant architectural style in the service-oriented software industry. Poor practices in the design and development of microservices are called microservice bad smells. In microservice bad smells research, the detection of these bad smells relies on feature data from microservices. However, there is a lack of an appropriate open-source microservice feature dataset. The availability of such datasets may contribute to the detection of microservice bad smells unexpectedly. To address this research gap, this paper collects a number of open-source microservice systems utilizing Spring Cloud. Additionally, feature metrics are established based on the architecture and interactions of Spring Boot style microservices. And an extraction program is developed. The program is then applied to the collected open-source microservice systems, extracting the necessary information, and undergoing manual verification to create an open-source feature dataset specific to microservice systems using Spring Cloud. The dataset is made available through a CSV file. We believe that both the extraction program and the dataset have the potential to contribute to the study of micro-service bad smells.
Super-Resolution Analysis for Landfill Waste Classification
Authors: Authors: Matias Molina, Rita P. Ribeiro, Bruno Veloso, João Gama
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01790
Pdf link: https://arxiv.org/pdf/2404.01790
Abstract Illegal landfills are a critical issue due to their environmental, economic, and public health impacts. This study leverages aerial imagery for environmental crime monitoring. While advances in artificial intelligence and computer vision hold promise, the challenge lies in training models with high-resolution literature datasets and adapting them to open-access low-resolution images. Considering the substantial quality differences and limited annotation, this research explores the adaptability of models across these domains. Motivated by the necessity for a comprehensive evaluation of waste detection algorithms, it advocates cross-domain classification and super-resolution enhancement to analyze the impact of different image resolutions on waste classification as an evaluation to combat the proliferation of illegal landfills. We observed performance improvements by enhancing image quality but noted an influence on model sensitivity, necessitating careful threshold fine-tuning.
Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification
Authors: Authors: Michael Mitsios, Georgios Vamvoukakis, Georgia Maniati, Nikolaos Ellinas, Georgios Dimitriou, Konstantinos Markopoulos, Panos Kakoulidis, Alexandra Vioni, Myrsini Christidou, Junkwang Oh, Gunu Jho, Inchul Hwang, Georgios Vardaxoglou, Aimilios Chalamandaris, Pirros Tsiakoulis, Spyros Raptis
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01805
Pdf link: https://arxiv.org/pdf/2404.01805
Abstract Emotion detection in textual data has received growing interest in recent years, as it is pivotal for developing empathetic human-computer interaction systems. This paper introduces a method for categorizing emotions from text, which acknowledges and differentiates between the diversified similarities and distinctions of various emotions. Initially, we establish a baseline by training a transformer-based model for standard emotion classification, achieving state-of-the-art performance. We argue that not all misclassifications are of the same importance, as there are perceptual similarities among emotional classes. We thus redefine the emotion labeling problem by shifting it from a traditional classification model to an ordinal classification one, where discrete emotions are arranged in a sequential order according to their valence levels. Finally, we propose a method that performs ordinal classification in the two-dimensional emotion space, considering both valence and arousal scales. The results show that our approach not only preserves high accuracy in emotion prediction but also significantly reduces the magnitude of errors in cases of misclassification.
Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection
Authors: Authors: Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01819
Pdf link: https://arxiv.org/pdf/2404.01819
Abstract In this paper, we address the limitations of the DETR-based semi-supervised object detection (SSOD) framework, particularly focusing on the challenges posed by the quality of object queries. In DETR-based SSOD, the one-to-one assignment strategy provides inaccurate pseudo-labels, while the one-to-many assignments strategy leads to overlapping predictions. These issues compromise training efficiency and degrade model performance, especially in detecting small or occluded objects. We introduce Sparse Semi-DETR, a novel transformer-based, end-to-end semi-supervised object detection solution to overcome these challenges. Sparse Semi-DETR incorporates a Query Refinement Module to enhance the quality of object queries, significantly improving detection capabilities for small and partially obscured objects. Additionally, we integrate a Reliable Pseudo-Label Filtering Module that selectively filters high-quality pseudo-labels, thereby enhancing detection accuracy and consistency. On the MS-COCO and Pascal VOC object detection benchmarks, Sparse Semi-DETR achieves a significant improvement over current state-of-the-art methods that highlight Sparse Semi-DETR's effectiveness in semi-supervised object detection, particularly in challenging scenarios involving small or partially obscured objects.
A (More) Realistic Evaluation Setup for Generalisation of Community Models on Malicious Content Detection
Authors: Authors: Ivo Verhoeven, Pushkar Mishra, Rahel Beloch, Helen Yannakoudakis, Ekaterina Shutova
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2404.01822
Pdf link: https://arxiv.org/pdf/2404.01822
Abstract Community models for malicious content detection, which take into account the context from a social graph alongside the content itself, have shown remarkable performance on benchmark datasets. Yet, misinformation and hate speech continue to propagate on social media networks. This mismatch can be partially attributed to the limitations of current evaluation setups that neglect the rapid evolution of online content and the underlying social graph. In this paper, we propose a novel evaluation setup for model generalisation based on our few-shot subgraph sampling approach. This setup tests for generalisation through few labelled examples in local explorations of a larger graph, emulating more realistic application settings. We show this to be a challenging inductive setup, wherein strong performance on the training graph is not indicative of performance on unseen tasks, domains, or graph structures. Lastly, we show that graph meta-learners trained with our proposed few-shot subgraph sampling outperform standard community models in the inductive setup. We make our code publicly available.
Semi-Supervised Domain Adaptation for Wildfire Detection
Authors: Authors: JooYoung Jang, Youngseo Cha, Jisu Kim, SooHyung Lee, Geonu Lee, Minkook Cho, Young Hwang, Nojun Kwak
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01842
Pdf link: https://arxiv.org/pdf/2404.01842
Abstract Recently, both the frequency and intensity of wildfires have increased worldwide, primarily due to climate change. In this paper, we propose a novel protocol for wildfire detection, leveraging semi-supervised Domain Adaptation for object detection, accompanied by a corresponding dataset designed for use by both academics and industries. Our dataset encompasses 30 times more diverse labeled scenes for the current largest benchmark wildfire dataset, HPWREN, and introduces a new labeling policy for wildfire detection. Inspired by CoordConv, we propose a robust baseline, Location-Aware Object Detection for Semi-Supervised Domain Adaptation (LADA), utilizing a teacher-student based framework capable of extracting translational variance features characteristic of wildfires. With only using 1% target domain labeled data, our framework significantly outperforms our source-only baseline by a notable margin of 3.8% in mean Average Precision on the HPWREN wildfire dataset. Our dataset is available at https://github.com/BloomBerry/LADA.
Unmasking the Nuances of Loneliness: Using Digital Biomarkers to Understand Social and Emotional Loneliness in College Students
Authors: Authors: Malik Muhammad Qirtas, Evi Zafeirid, Dirk Pesch, Eleanor Bantry White
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2404.01845
Pdf link: https://arxiv.org/pdf/2404.01845
Abstract Background: Loneliness among students is increasing across the world, with potential consequences for mental health and academic success. To address this growing problem, accurate methods of detection are needed to identify loneliness and to differentiate social and emotional loneliness so that intervention can be personalized to individual need. Passive sensing technology provides a unique technique to capture behavioral patterns linked with distinct loneliness forms, allowing for more nuanced understanding and interventions for loneliness. Methods: To differentiate between social and emotional loneliness using digital biomarkers, our study included statistical tests, machine learning for predictive modeling, and SHAP values for feature importance analysis, revealing important factors in loneliness classification. Results: Our analysis revealed significant behavioral differences between socially and emotionally lonely groups, particularly in terms of phone usage and location-based features , with machine learning models demonstrating substantial predictive power in classifying loneliness levels. The XGBoost model, in particular, showed high accuracy and was effective in identifying key digital biomarkers, including phone usage duration and location-based features, as significant predictors of loneliness categories. Conclusion: This study underscores the potential of passive sensing data, combined with machine learning techniques, to provide insights into the behavioral manifestations of social and emotional loneliness among students. The identification of key digital biomarkers paves the way for targeted interventions aimed at mitigating loneliness in this population.
Scene Adaptive Sparse Transformer for Event-based Object Detection
Authors: Authors: Yansong Peng, Hebei Li, Yueyi Zhang, Xiaoyan Sun, Feng Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01882
Pdf link: https://arxiv.org/pdf/2404.01882
Abstract While recent Transformer-based approaches have shown impressive performances on event-based object detection tasks, their high computational costs still diminish the low power consumption advantage of event cameras. Image-based works attempt to reduce these costs by introducing sparse Transformers. However, they display inadequate sparsity and adaptability when applied to event-based object detection, since these approaches cannot balance the fine granularity of token-level sparsification and the efficiency of window-based Transformers, leading to reduced performance and efficiency. Furthermore, they lack scene-specific sparsity optimization, resulting in information loss and a lower recall rate. To overcome these limitations, we propose the Scene Adaptive Sparse Transformer (SAST). SAST enables window-token co-sparsification, significantly enhancing fault tolerance and reducing computational overhead. Leveraging the innovative scoring and selection modules, along with the Masked Sparse Window Self-Attention, SAST showcases remarkable scene-aware adaptability: It focuses only on important objects and dynamically optimizes sparsity level according to scene complexity, maintaining a remarkable balance between performance and computational cost. The evaluation results show that SAST outperforms all other dense and sparse networks in both performance and efficiency on two large-scale event-based object detection datasets (1Mpx and Gen1). Code: https://github.com/Peterande/SAST
ASTRA: An Action Spotting TRAnsformer for Soccer Videos
Authors: Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01891
Pdf link: https://arxiv.org/pdf/2404.01891
Abstract In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set.
Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack
Authors: Authors: Ying Zhou, Ben He, Le Sun
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01907
Pdf link: https://arxiv.org/pdf/2404.01907
Abstract With the development of large language models (LLMs), detecting whether text is generated by a machine becomes increasingly challenging in the face of malicious use cases like the spread of false information, protection of intellectual property, and prevention of academic plagiarism. While well-trained text detectors have demonstrated promising performance on unseen test data, recent research suggests that these detectors have vulnerabilities when dealing with adversarial attacks such as paraphrasing. In this paper, we propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection. We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness against such attacks. The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content. Furthermore, we explore the prospect of improving the model's robustness over iterative adversarial learning. Although some improvements in model robustness are observed, practical applications still face significant challenges. These findings shed light on the future development of AI-text detectors, emphasizing the need for more accurate and robust detection methods.
SCANNER: Knowledge-Enhanced Approach for Robust Multi-modal Named Entity Recognition of Unseen Entities
Authors: Authors: Hyunjong Ok, Taeho Kil, Sukmin Seo, Jaeho Lee
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.01914
Pdf link: https://arxiv.org/pdf/2404.01914
Abstract Recent advances in named entity recognition (NER) have pushed the boundary of the task to incorporate visual signals, leading to many variants, including multi-modal NER (MNER) or grounded MNER (GMNER). A key challenge to these tasks is that the model should be able to generalize to the entities unseen during the training, and should be able to handle the training samples with noisy annotations. To address this obstacle, we propose SCANNER (Span CANdidate detection and recognition for NER), a model capable of effectively handling all three NER variants. SCANNER is a two-stage structure; we extract entity candidates in the first stage and use it as a query to get knowledge, effectively pulling knowledge from various sources. We can boost our performance by utilizing this entity-centric extracted knowledge to address unseen entities. Furthermore, to tackle the challenges arising from noisy annotations in NER datasets, we introduce a novel self-distillation method, enhancing the robustness and accuracy of our model in processing training data with inherent uncertainties. Our approach demonstrates competitive performance on the NER benchmark and surpasses existing methods on both MNER and GMNER benchmarks. Further analysis shows that the proposed distillation and knowledge utilization methods improve the performance of our model on various benchmarks.
PREGO: online mistake detection in PRocedural EGOcentric videos
Authors: Authors: Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01933
Pdf link: https://arxiv.org/pdf/2404.01933
Abstract Promptly identifying procedural errors from egocentric videos in an online setting is highly challenging and valuable for detecting mistakes as soon as they happen. This capability has a wide range of applications across various fields, such as manufacturing and healthcare. The nature of procedural mistakes is open-set since novel types of failures might occur, which calls for one-class classifiers trained on correctly executed procedures. However, no technique can currently detect open-set procedural mistakes online. We propose PREGO, the first online one-class classification model for mistake detection in PRocedural EGOcentric videos. PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions. Mistake detection is performed by comparing the recognized current action with the expected future one. We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection to establish suitable benchmarks, thus defining the Assembly101-O and Epic-tent-O datasets, respectively.
An Exploratory Study of the Relationship between SATD and Other Software Development Activities
Authors: Authors: Shima Esfandiari, Ashkan Sami
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.01950
Pdf link: https://arxiv.org/pdf/2404.01950
Abstract Technical Debt is a common issue that arises when short-term gains are prioritized over long-term costs, leading to a degradation in the quality of the code. Self-Admitted Technical Debt (SATD) is a specific type of Technical Debt that involves documenting code to remind developers of its debt. Previous research has explored various aspects of SATD, including detection methods, distribution, and its impact on software quality. To better understand SATD, one comprehension technique is to examine its co-occurrence with other activities, such as refactoring and bug fixing. This study investigates the relationship between removing and adding SATD and activities such as refactoring, bug fixing, adding new features, and testing. To do so, we analyzed 77 open-source Java projects using TODO/FIXME/XXX removal or addition in inline comments as indicators of SATD. We examined the co-occurrence of SATD with each activity in each project through chi-square and odds ratio evaluations. Our results show that SATD removal occurs simultaneously with refactoring in 95% of projects, while its addition occurs in 89% of projects. Furthermore, we found that three types of refactoring - "move class", "remove method", and "move attribute" - occur more frequently in the presence of SATD. However, their distribution is similar in projects with and without SATD.
Automatic Wood Pith Detector: Local Orientation Estimation and Robust Accumulation
Authors: Authors: Henry Marichal, Diego Passarella, Gregory Randall
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01952
Pdf link: https://arxiv.org/pdf/2404.01952
Abstract A fully automated technique for wood pith detection (APD), relying on the concentric shape of the structure of wood ring slices, is introduced. The method estimates the ring's local orientations using the 2D structure tensor and finds the pith position, optimizing a cost function designed for this problem. We also present a variant (APD-PCL), using the parallel coordinates space, that enhances the method's effectiveness when there are no clear tree ring patterns. Furthermore, refining previous work by Kurdthongmee, a YoloV8 net is trained for pith detection, producing a deep learning-based approach to the same problem (APD-DL). All methods were tested on seven datasets, including images captured under diverse conditions (controlled laboratory settings, sawmill, and forest) and featuring various tree species (Pinus taeda, Douglas fir, Abies alba, and Gleditsia triacanthos). All proposed approaches outperform existing state-of-the-art methods and can be used in CPU-based real-time applications. Additionally, we provide a novel dataset comprising images of gymnosperm and angiosperm species. Dataset and source code are available at this http URL
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
Authors: Authors: Mamadou Keita, Wassim Hamidouche, Hessen Bougueffa Eutamene, Abdenour Hadid, Abdelmalik Taleb-Ahmed
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01959
Pdf link: https://arxiv.org/pdf/2404.01959
Abstract Advancements in deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs), have ushered in an era of generating highly realistic images. While this technological progress has captured significant interest, it has also raised concerns about the potential difficulty in distinguishing real images from their synthetic counterparts. This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs). We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images. The pivotal conceptual shift in our methodology revolves around reframing binary classification as an image captioning task, leveraging the distinctive capabilities of cutting-edge VLM, notably bootstrapping language image pre-training (BLIP2). Rigorous and comprehensive experiments are conducted to validate the effectiveness of our proposed approach, particularly in detecting unseen diffusion-generated images from unknown diffusion-based generative models during training, showcasing robustness to noise, and demonstrating generalization capabilities to GANs. The obtained results showcase an impressive average accuracy of 93.41% in synthetic image detection on unseen generation models. The code and models associated with this research can be publicly accessed at https://github.com/Mamadou-Keita/VLM-DETECT.
Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection
Authors: Authors: Jicheng Yuan, Anh Le-Tuan, Manfred Hauswirth, Danh Le-Phuoc
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01988
Pdf link: https://arxiv.org/pdf/2404.01988
Abstract Unsupervised Domain Adaptation (UDA) has shown significant advancements in object detection under well-lit conditions; however, its performance degrades notably in low-visibility scenarios, especially at night, posing challenges not only for its adaptability in low signal-to-noise ratio (SNR) conditions but also for the reliability and efficiency of automated vehicles. To address this problem, we propose a \textbf{Co}operative \textbf{S}tudents (\textbf{CoS}) framework that innovatively employs global-local transformations (GLT) and a proxy-based target consistency (PTC) mechanism to capture the spatial consistency in day- and night-time scenarios effectively, and thus bridge the significant domain shift across contexts. Building upon this, we further devise an adaptive IoU-informed thresholding (AIT) module to gradually avoid overlooking potential true positives and enrich the latent information in the target domain. Comprehensive experiments show that CoS essentially enhanced UDA performance in low-visibility conditions and surpasses current state-of-the-art techniques, achieving an increase in mAP of 3.0\%, 1.9\%, and 2.5\% on BDD100K, SHIFT, and ACDC datasets, respectively. Code is available at https://github.com/jichengyuan/Cooperitive_Students.
Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces
Authors: Authors: Advaitha Vetagiri, Gyandeep Kalita, Eisha Halder, Chetna Taparia, Partha Pakray, Riyanka Manna
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.02013
Pdf link: https://arxiv.org/pdf/2404.02013
Abstract Online gender-based harassment is a widespread issue limiting the free expression and participation of women and marginalized genders in digital spaces. Detecting such abusive content can enable platforms to curb this menace. We participated in the Gendered Abuse Detection in Indic Languages shared task at ICON2023 that provided datasets of annotated Twitter posts in English, Hindi and Tamil for building classifiers to identify gendered abuse. Our team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks that can effectively model semantic and sequential patterns in textual data. The CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text. To determine context-based offensiveness, the BiLSTM analyzes this sequence for dependencies among words and phrases. Multiple variations were trained using FastText and GloVe word embeddings for each language dataset comprising over 7,600 crowdsourced annotations across labels for explicit abuse, targeted minority attacks and general offences. The validation scores showed strong performance across f1-measures, especially for English 0.84. Our experiments reveal how customizing embeddings and model hyperparameters can improve detection capability. The proposed architecture ranked 1st in the competition, proving its ability to handle real-world noisy text with code-switching. This technique has a promising scope as platforms aim to combat cyber harassment facing Indic language internet users. Our Code is at https://github.com/advaithavetagiri/CNLP-NITS-PP
Multitask-based Evaluation of Open-Source LLM on Software Vulnerability
Authors: Authors: Xin Yin, Chao Ni
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.02056
Pdf link: https://arxiv.org/pdf/2404.02056
Abstract This paper proposes a pipeline for quantitatively evaluating interactive LLMs using publicly available datasets. We carry out an extensive technical evaluation of LLMs using Big-Vul covering four different common software vulnerability tasks. We evaluate the multitask and multilingual aspects of LLMs based on this dataset. We find that the existing state-of-the-art methods are generally superior to LLMs in software vulnerability detection. Although LLMs improve accuracy when providing context information, they still have limitations in accurately predicting severity ratings for certain CWE types. In addition, LLMs demonstrate some ability to locate vulnerabilities for certain CWE types, but their performance varies among different CWE types. Finally, LLMs show uneven performance in generating CVE descriptions for various CWE types, with limited accuracy in a few-shot setting. Overall, though LLMs perform well in some aspects, they still need improvement in understanding the subtle differences in code vulnerabilities and the ability to describe vulnerabilities to fully realize their potential. Our evaluation pipeline provides valuable insights for further enhancing LLMs' software vulnerability handling capabilities.
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Authors: Authors: Jinbae Im, JeongYeon Nam, Nokyung Park, Hyungmin Lee, Seunghyun Park
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.02072
Pdf link: https://arxiv.org/pdf/2404.02072
Abstract Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing technique that adjusts the relation label adaptively according to the quality of the detected objects. By the relation smoothing, the model is trained according to the continuous curriculum that focuses on object detection task at the beginning of training and performs multi-task learning as the object detection performance gradually improves. Furthermore, we propose a connectivity prediction task that predicts whether a relation exists between object pairs as an auxiliary task of the relation extraction. We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets. Our code is publicly available at https://github.com/naver-ai/egtr .
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task
Authors: Authors: Suyash Vardhan Mathur, Akshett Rai Jindal, Hardik Mittal, Manish Shrivastava
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2404.02088
Pdf link: https://arxiv.org/pdf/2404.02088
Abstract Conversation is the most natural form of human communication, where each utterance can range over a variety of possible emotions. While significant work has been done towards the detection of emotions in text, relatively little work has been done towards finding the cause of the said emotions, especially in multimodal settings. SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations, which aims to extract emotions reflected in individual utterances in a conversation involving multiple modalities (textual, audio, and visual modalities) along with the corresponding utterances that were the cause for the emotion. In this paper, we propose models that tackle this task as an utterance labeling and a sequence labeling problem and perform a comparative study of these models, involving baselines using different encoders, using BiLSTM for adding contextual information of the conversation, and finally adding a CRF layer to try to model the inter-dependencies between adjacent utterances more effectively. In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Authors: Authors: Jienneg Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.02132
Pdf link: https://arxiv.org/pdf/2404.02132
Abstract Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large-scale Internet image-text pairs. However, despite the amazing achievement from the VLMs, vanilla Vision Transformers (ViTs) remain the default choice for the image encoder. Although pure transformer proves its effectiveness in the text encoding area, it remains questionable whether it is also the case for image encoding, especially considering that various types of networks are proposed on the ImageNet benchmark, which, unfortunately, are rarely studied in VLMs. Due to small data/model scale, the original conclusions of model design on ImageNet can be limited and biased. In this paper, we aim at building an evaluation protocol of vision models in the vision-language era under the contrastive language-image pretraining (CLIP) framework. We provide a comprehensive way to benchmark different vision models, covering their zero-shot performance and scalability in both model and training data sizes. To this end, we introduce ViTamin, a new vision models tailored for VLMs. ViTamin-L significantly outperforms ViT-L by 2.0% ImageNet zero-shot accuracy, when using the same publicly available DataComp-1B dataset and the same OpenCLIP training scheme. ViTamin-L presents promising results on 60 diverse benchmarks, including classification, retrieval, open-vocabulary detection and segmentation, and large multi-modal models. When further scaling up the model size, our ViTamin-XL with only 436M parameters attains 82.9% ImageNet zero-shot accuracy, surpassing 82.0% achieved by EVA-E that has ten times more parameters (4.4B).
ResNet with Integrated Convolutional Block Attention Module for Ship Classification Using Transfer Learning on Optical Satellite Imagery
Authors: Authors: Ryan Donghan Kwon, Gangjoo Robin Nam, Jisoo Tak, Yeom Hyeok, Junseob Shin, Hyerin Cha, Kim Soo Bin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.02135
Pdf link: https://arxiv.org/pdf/2404.02135
Abstract This study proposes a novel transfer learning framework for effective ship classification using high-resolution optical remote sensing satellite imagery. The framework is based on the deep convolutional neural network model ResNet50 and incorporates the Convolutional Block Attention Module (CBAM) to enhance performance. CBAM enables the model to attend to salient features in the images, allowing it to better discriminate between subtle differences between ships and backgrounds. Furthermore, this study adopts a transfer learning approach tailored for accurately classifying diverse types of ships by fine-tuning a pre-trained model for the specific task. Experimental results demonstrate the efficacy of the proposed framework in ship classification using optical remote sensing imagery, achieving a high classification accuracy of 94% across 5 classes, outperforming existing methods. This research holds potential applications in maritime surveillance and management, illegal fishing detection, and maritime traffic monitoring.
Topic-based Watermarks for LLM-Generated Text
Authors: Authors: Alexander Nemecek, Yuzhou Jiang, Erman Ayday
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.02138
Pdf link: https://arxiv.org/pdf/2404.02138
Abstract Recent advancements of large language models (LLMs) have resulted in indistinguishable text outputs comparable to human-generated text. Watermarking algorithms are potential tools that offer a way to differentiate between LLM- and human-generated text by embedding detectable signatures within LLM-generated output. However, current watermarking schemes lack robustness against known attacks against watermarking algorithms. In addition, they are impractical considering an LLM generates tens of thousands of text outputs per day and the watermarking algorithm needs to memorize each output it generates for the detection to work. In this work, focusing on the limitations of current watermarking schemes, we propose the concept of a "topic-based watermarking algorithm" for LLMs. The proposed algorithm determines how to generate tokens for the watermarked LLM output based on extracted topics of an input prompt or the output of a non-watermarked LLM. Inspired from previous work, we propose using a pair of lists (that are generated based on the specified extracted topic(s)) that specify certain tokens to be included or excluded while generating the watermarked output of the LLM. Using the proposed watermarking algorithm, we show the practicality of a watermark detection algorithm. Furthermore, we discuss a wide range of attacks that can emerge against watermarking algorithms for LLMs and the benefit of the proposed watermarking scheme for the feasibility of modeling a potential attacker considering its benefit vs. loss.
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Authors: Authors: Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.02151
Pdf link: https://arxiv.org/pdf/2404.02151
Abstract We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks. First, we demonstrate how to successfully leverage access to logprobs for jailbreaking: we initially design an adversarial prompt template (sometimes adapted to the target LLM), and then we apply random search on a suffix to maximize the target logprob (e.g., of the token "Sure"), potentially with multiple restarts. In this way, we achieve nearly 100\% attack success rate -- according to GPT-4 as a judge -- on GPT-3.5/4, Llama-2-Chat-7B/13B/70B, Gemma-7B, and R2D2 from HarmBench that was adversarially trained against the GCG attack. We also show how to jailbreak all Claude models -- that do not expose logprobs -- via either a transfer or prefilling attack with 100\% success rate. In addition, we show how to use random search on a restricted set of tokens for finding trojan strings in poisoned models -- a task that shares many similarities with jailbreaking -- which is the algorithm that brought us the first place in the SaTML'24 Trojan Detection Competition. The common theme behind these attacks is that adaptivity is crucial: different models are vulnerable to different prompting templates (e.g., R2D2 is very sensitive to in-context learning prompts), some models have unique vulnerabilities based on their APIs (e.g., prefilling for Claude), and in some settings it is crucial to restrict the token search space based on prior knowledge (e.g., for trojan detection). We provide the code, prompts, and logs of the attacks at https://github.com/tml-epfl/llm-adaptive-attacks.
Keyword: face recognition

There is no result

Keyword: augmentation

Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs
Authors: Authors: Zheng Zhang, Fan Yang, Ziyan Jiang, Zheng Chen, Zhengyang Zhao, Chengyuan Ma, Liang Zhao, Yang Liu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01430
Pdf link: https://arxiv.org/pdf/2404.01430
Abstract Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts. This development is particularly crucial for tasks that involve retrieving knowledge from an external datastore, which can result in long inputs. However, recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information within the input sequence. In this study, we conduct extensive experiments to investigate the root causes of positional bias. Our findings indicate that the primary contributor to LLM positional bias stems from the inherent positional preferences of different models. We demonstrate that merely employing prompt-based solutions is inadequate for overcoming the positional preferences. To address this positional bias issue of a pre-trained LLM, we developed a Position-Aware Parameter Efficient Fine-Tuning (PAPEFT) approach which is composed of a data augmentation technique and a parameter efficient adapter, enhancing a uniform attention distribution across the input context. Our experiments demonstrate that the proposed approach effectively reduces positional bias, improving LLMs' effectiveness in handling long context sequences for various tasks that require externally retrieved knowledge.
AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness
Authors: Authors: Miaoran Zhang, Mingyang Wang, Jesujoba O. Alabi, Dietrich Klakow
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.01490
Pdf link: https://arxiv.org/pdf/2404.01490
Abstract This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages. The shared task aims at measuring the semantic textual relatedness between pairs of sentences, with a focus on a range of under-represented languages. In this work, we propose using machine translation for data augmentation to address the low-resource challenge of limited training data. Moreover, we apply task-adaptive pre-training on unlabeled task data to bridge the gap between pre-training and task adaptation. For model training, we investigate both full fine-tuning and adapter-based tuning, and adopt the adapter framework for effective zero-shot cross-lingual transfer. We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer).
Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation
Authors: Authors: Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2404.01532
Pdf link: https://arxiv.org/pdf/2404.01532
Abstract Event temporal graphs have been shown as convenient and effective representations of complex temporal relations between events in text. Recent studies, which employ pre-trained language models to auto-regressively generate linearised graphs for constructing event temporal graphs, have shown promising results. However, these methods have often led to suboptimal graph generation as the linearised graphs exhibit set characteristics which are instead treated sequentially by language models. This discrepancy stems from the conventional text generation objectives, leading to erroneous penalisation of correct predictions caused by the misalignment of elements in target sequences. To address these challenges, we reframe the task as a conditional set generation problem, proposing a Set-aligning Framework tailored for the effective utilisation of Large Language Models (LLMs). The framework incorporates data augmentations and set-property regularisations designed to alleviate text generation loss penalties associated with the linearised graph edge sequences, thus encouraging the generation of more relation edges. Experimental results show that our framework surpasses existing baselines for event temporal graph generation. Furthermore, under zero-shot settings, the structural knowledge introduced through our framework notably improves model generalisation, particularly when the training examples available are limited.
ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models
Authors: Authors: Minseop Jung, Minseong Kim, Jibum Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.01645
Pdf link: https://arxiv.org/pdf/2404.01645
Abstract The success of Transformer-based models has encouraged many researchers to learn CAD models using sequence-based approaches. However, learning CAD models is still a challenge, because they can be represented as complex shapes with long construction sequences. Furthermore, the same CAD model can be expressed using different CAD construction sequences. We propose a novel contrastive learning-based approach, named ContrastCAD, that effectively captures semantic information within the construction sequences of the CAD model. ContrastCAD generates augmented views using dropout techniques without altering the shape of the CAD model. We also propose a new CAD data augmentation method, called a Random Replace and Extrude (RRE) method, to enhance the learning performance of the model when training an imbalanced training CAD dataset. Experimental results show that the proposed RRE augmentation method significantly enhances the learning performance of Transformer-based autoencoders, even for complex CAD models having very long construction sequences. The proposed ContrastCAD model is shown to be robust to permutation changes of construction sequences and performs better representation learning by generating representation spaces where similar CAD models are more closely clustered. Our codes are available at https://github.com/cm8908/ContrastCAD.
A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification
Authors: Authors: Quanwei Liu, Yanni Dong, Tao Huang, Lefei Zhang, Bo Do
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01673
Pdf link: https://arxiv.org/pdf/2404.01673
Abstract Hyperspectral image (HSI) classification techniques have been intensively studied and a variety of models have been developed. However, these HSI classification models are confined to pocket models and unrealistic ways of datasets partitioning. The former limits the generalization performance of the model and the latter is partitioned leads to inflated model evaluation metrics, which results in plummeting model performance in the real world. Therefore, we propose a universal knowledge embedded contrastive learning framework (KnowCL) for supervised, unsupervised, and semisupervised HSI classification, which largely closes the gap of HSI classification models between pocket models and standard vision backbones. We present a new HSI processing pipeline in conjunction with a range of data transformation and augmentation techniques that provide diverse data representations and realistic data partitioning. The proposed framework based on this pipeline is compatible with all kinds of backbones and can fully exploit labeled and unlabeled samples with expected training time. Furthermore, we design a new loss function, which can adaptively fuse the supervised loss and unsupervised loss, enhancing the learning performance. This proposed new classification paradigm shows great potentials in exploring for HSI classification technology. The code can be accessed at https://github.com/quanweiliu/KnowCL.
Learning-based model augmentation with LFRs
Authors: Authors: Jan H. Hoekstra, Chris Verhoek, Roland Tóth, Maarten Schoukens
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.01901
Pdf link: https://arxiv.org/pdf/2404.01901
Abstract Artificial neural networks (ANN) have proven to be effective in dealing with the identification nonlinear models for highly complex systems. To still make use of the prior information available from baseline models derived from, e.g., first-principles (FP), methods have been developed that integrate the prior knowledge into the identification algorithm for the ANN in a variety of methods. These methods have shown better estimation speeds and/or accuracy on unseen data. Among these methods are model augmentation structures. A variety of these structures have been considered in literature, there is however no unifying theory to these. In this paper, we propose a flexible linear-fractional-representation (LFR) based model augmentation structure. This model structure is able to represent many common model augmentation structures, thus unifying them under the proposed model structure. Furthermore, we introduce an identification algorithm capable of estimating the proposed model augmentation structure. The performance and generalization capabilities of the identification algorithm and the augmentation structure is demonstrated on a hardening mass-spring-damper simulation example.
A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution
Authors: Authors: Bowen Ding, Qingkai Min, Shengkun Ma, Yingjie Li, Linyi Yang, Yue Zhang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.01921
Pdf link: https://arxiv.org/pdf/2404.01921
Abstract Based on Pre-trained Language Models (PLMs), event coreference resolution (ECR) systems have demonstrated outstanding performance in clustering coreferential events across documents. However, the existing system exhibits an excessive reliance on the `triggers lexical matching' spurious pattern in the input mention pair text. We formalize the decision-making process of the baseline ECR system using a Structural Causal Model (SCM), aiming to identify spurious and causal associations (i.e., rationales) within the ECR task. Leveraging the debiasing capability of counterfactual data augmentation, we develop a rationale-centric counterfactual data augmentation method with LLM-in-the-loop. This method is specialized for pairwise input in the ECR system, where we conduct direct interventions on triggers and context to mitigate the spurious association while emphasizing the causation. Our approach achieves state-of-the-art performance on three popular cross-document ECR benchmarks and demonstrates robustness in out-of-domain scenarios.
Fashion Style Editing with Generative Human Prior
Authors: Authors: Chaerin Kong, Seungyong Lee, Soohyeok Im, Wonsuk Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.01984
Pdf link: https://arxiv.org/pdf/2404.01984
Abstract Image editing has been a long-standing challenge in the research community with its far-reaching impact on numerous applications. Recently, text-driven methods started to deliver promising results in domains like human faces, but their applications to more complex domains have been relatively limited. In this work, we explore the task of fashion style editing, where we aim to manipulate the fashion style of human imagery using text descriptions. Specifically, we leverage a generative human prior and achieve fashion style editing by navigating its learned latent space. We first verify that the existing text-driven editing methods fall short for our problem due to their overly simplified guidance signal, and propose two directions to reinforce the guidance: textual augmentation and visual referencing. Combined with our empirical findings on the latent space structure, our Fashion Style Editing framework (FaSE) successfully projects abstract fashion concepts onto human images and introduces exciting new applications to the field.

LeeKyungwook / get-arxiv-noti

New submissions for Wed, 3 Apr 24 #1047

Keyword: detection

Graph-Based Optimisation of Network Expansion in a Dockless Bike Sharing System

FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detecction

Detection of Temporality at Discourse Level on Financial News by Combining Natural Language Processing and Machine Learning

Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation

Enhancing Bangla Fake News Detection Using Bidirectional Gated Recurrent Units and Deep Learning Techniques

Distribution-Agnostic Database De-Anonymization Under Obfuscation And Synchronization Errors

Object-conditioned Bag of Instances for Few-Shot Personalized Instance Recognition

The Radar Ghost Dataset -- An Evaluation of Ghost Objects in Automotive Radar Data

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving

Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge

MosquitoFusion: A Multiclass Dataset for Real-Time Detection of Mosquitoes, Swarms, and Breeding Sites Using Deep Learning

Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery

Diffusion Deepfake

Learning Temporal Cues by Predicting Objects Move for Multi-camera 3D Object Detection

BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System

FLEXIS: FLEXible Frequent Subgraph Mining using Maximal Independent Sets

LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network

Voice EHR: Introducing Multimodal Audio Data for Health

Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning

Learning to Control Camera Exposure via Reinforcement Learning

NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps

Supporting Mitosis Detection AI Training with Inter-Observer Eye-Gaze Consistencies

Event Detection from Social Media for Epidemic Prediction

Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss

Task Integration Distillation for Object Detectors

Disentangled Pre-training for Human-Object Interaction Detection

Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

Atom-Level Optical Chemical Structure Recognition with Limited Supervision

Unleash the Potential of CLIP for Video Highlight Detection

Analyzing the Single Event Upset Vulnerability of Binarized Neural Networks on SRAM FPGAs

Class-Incremental Few-Shot Event Detection

Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation

A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

A Feature Dataset of Microservices-based Systems

Super-Resolution Analysis for Landfill Waste Classification

Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection

A (More) Realistic Evaluation Setup for Generalisation of Community Models on Malicious Content Detection

Semi-Supervised Domain Adaptation for Wildfire Detection

Unmasking the Nuances of Loneliness: Using Digital Biomarkers to Understand Social and Emotional Loneliness in College Students

Scene Adaptive Sparse Transformer for Event-based Object Detection

ASTRA: An Action Spotting TRAnsformer for Soccer Videos

Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack

SCANNER: Knowledge-Enhanced Approach for Robust Multi-modal Named Entity Recognition of Unseen Entities

PREGO: online mistake detection in PRocedural EGOcentric videos

An Exploratory Study of the Relationship between SATD and Other Software Development Activities

Automatic Wood Pith Detector: Local Orientation Estimation and Robust Accumulation

Bi-LORA: A Vision-Language Approach for Synthetic Image Detection

Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection

Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces

Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

EGTR: Extracting Graph from Transformer for Scene Graph Generation

LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

ResNet with Integrated Convolutional Block Attention Module for Ship Classification Using Transfer Learning on Optical Satellite Imagery

Topic-based Watermarks for LLM-Generated Text

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Keyword: face recognition

Keyword: augmentation

Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs

AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness

Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation

ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models

A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification

Learning-based model augmentation with LFRs

A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Fashion Style Editing with Generative Human Prior