New submissions for Thu, 25 Apr 24

Keyword: detection

Feature Distribution Shift Mitigation with Contrastive Pretraining for Intrusion Detection

Authors: Authors: Weixing Wang, Haojin Yang, Christoph Meinel, Hasan Yagiz Özkan, Cristian Bermudez Serna, Carmen Mas-Machuca
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.15382
Pdf link: https://arxiv.org/pdf/2404.15382
Abstract In recent years, there has been a growing interest in using Machine Learning (ML), especially Deep Learning (DL) to solve Network Intrusion Detection (NID) problems. However, the feature distribution shift problem remains a difficulty, because the change in features' distributions over time negatively impacts the model's performance. As one promising solution, model pretraining has emerged as a novel training paradigm, which brings robustness against feature distribution shift and has proven to be successful in Computer Vision (CV) and Natural Language Processing (NLP). To verify whether this paradigm is beneficial for NID problem, we propose SwapCon, a ML model in the context of NID, which compresses shift-invariant feature information during the pretraining stage and refines during the finetuning stage. We exemplify the evidence of feature distribution shift using the Kyoto2006+ dataset. We demonstrate how pretraining a model with the proper size can increase robustness against feature distribution shifts by over 8%. Moreover, we show how an adequate numerical embedding strategy also enhances the performance of pretrained models. Further experiments show that the proposed SwapCon model also outperforms eXtreme Gradient Boosting (XGBoost) and K-Nearest Neighbor (KNN) based models by a large margin.
Uncertainty in latent representations of variational autoencoders optimized for visual tasks
Authors: Authors: Josefina Catoni, Enzo Ferrante, Diego H. Milone, Rodrigo Echeveste
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.15390
Pdf link: https://arxiv.org/pdf/2404.15390
Abstract Deep learning methods are increasingly becoming instrumental as modeling tools in computational neuroscience, employing optimality principles to build bridges between neural responses and perception or behavior. Developing models that adequately represent uncertainty is however challenging for deep learning methods, which often suffer from calibration problems. This constitutes a difficulty in particular when modeling cortical circuits in terms of Bayesian inference, beyond single point estimates such as the posterior mean or the maximum a posteriori. In this work we systematically studied uncertainty representations in latent representations of variational auto-encoders (VAEs), both in a perceptual task from natural images and in two other canonical tasks of computer vision, finding a poor alignment between uncertainty and informativeness or ambiguities in the images. We next showed how a novel approach which we call explaining-away variational auto-encoders (EA-VAEs), fixes these issues, producing meaningful reports of uncertainty in a variety of scenarios, including interpolation, image corruption, and even out-of-distribution detection. We show EA-VAEs may prove useful both as models of perception in computational neuroscience and as inference tools in computer vision.
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
Authors: Authors: Weifeng Chen, Jiacheng Zhang, Jie Wu, Hefeng Wu, Xuefeng Xiao, Liang Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.15449
Pdf link: https://arxiv.org/pdf/2404.15449
Abstract The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present \textbf{ID-Aligner}, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. \textbf{Project Page: \url{https://idaligner.github.io/}}
CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection
Authors: Authors: Hongyi Cai, Mohammad Mahdinur Rahman, Jingyu Wu, Yulun Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.15451
Pdf link: https://arxiv.org/pdf/2404.15451
Abstract Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection. However, the currently existing models generally focus on the Encoder-side Transformer to extract features, from which decoder improvement can bring further potential with well-designed architecture. We propose CFPFormer, a novel decoder block that integrates feature pyramids and transformers. Specifically, by leveraging patch embedding, cross-layer feature concatenation, and Gaussian attention mechanisms, CFPFormer enhances feature extraction capabilities while promoting generalization across diverse tasks. Benefiting from Transformer structure and U-shaped Connections, our introduced model gains the ability to capture long-range dependencies and effectively up-sample feature maps. Our model achieves superior performance in detecting small objects compared to existing methods. We evaluate CFPFormer on medical image segmentation datasets and object detection benchmarks (VOC 2007, VOC2012, MS-COCO), demonstrating its effectiveness and versatility. On the ACDC Post-2017-MICCAI-Challenge online test set, our model reaches exceptionally impressive accuracy, and performed well compared with the original decoder setting in Synapse multi-organ segmentation dataset.
IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents
Authors: Authors: Jean-Philippe Corbeil
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2404.15488
Pdf link: https://arxiv.org/pdf/2404.15488
Abstract In natural language processing applied to the clinical domain, utilizing large language models has emerged as a promising avenue for error detection and correction on clinical notes, a knowledge-intensive task for which annotated data is scarce. This paper presents MedReAct'N'MedReFlex, which leverages a suite of four LLM-based medical agents. The MedReAct agent initiates the process by observing, analyzing, and taking action, generating trajectories to guide the search to target a potential error in the clinical notes. Subsequently, the MedEval agent employs five evaluators to assess the targeted error and the proposed correction. In cases where MedReAct's actions prove insufficient, the MedReFlex agent intervenes, engaging in reflective analysis and proposing alternative strategies. Finally, the MedFinalParser agent formats the final output, preserving the original style while ensuring the integrity of the error correction process. One core component of our method is our RAG pipeline based on our ClinicalCorp corpora. Among other well-known sources containing clinical guidelines and information, we preprocess and release the open-source MedWiki dataset for clinical RAG application. Our results demonstrate the central role of our RAG approach with ClinicalCorp leveraged through the MedReAct'N'MedReFlex framework. It achieved the ninth rank on the MEDIQA-CORR 2024 final leaderboard.
Drop-Connect as a Fault-Tolerance Approach for RRAM-based Deep Neural Network Accelerators
Authors: Authors: Mingyuan Xiang, Xuhan Xie, Pedro Savarese, Xin Yuan, Michael Maire, Yanjing Li
Subjects: Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2404.15498
Pdf link: https://arxiv.org/pdf/2404.15498
Abstract Resistive random-access memory (RRAM) is widely recognized as a promising emerging hardware platform for deep neural networks (DNNs). Yet, due to manufacturing limitations, current RRAM devices are highly susceptible to hardware defects, which poses a significant challenge to their practical applicability. In this paper, we present a machine learning technique that enables the deployment of defect-prone RRAM accelerators for DNN applications, without necessitating modifying the hardware, retraining of the neural network, or implementing additional detection circuitry/logic. The key idea involves incorporating a drop-connect inspired approach during the training phase of a DNN, where random subsets of weights are selected to emulate fault effects (e.g., set to zero to mimic stuck-at-1 faults), thereby equipping the DNN with the ability to learn and adapt to RRAM defects with the corresponding fault rates. Our results demonstrate the viability of the drop-connect approach, coupled with various algorithm and system-level design and trade-off considerations. We show that, even in the presence of high defect rates (e.g., up to 30%), the degradation of DNN accuracy can be as low as less than 1% compared to that of the fault-free version, while incurring minimal system-level runtime/energy costs.
Cross-Temporal Spectrogram Autoencoder (CTSAE): Unsupervised Dimensionality Reduction for Clustering Gravitational Wave Glitches
Authors: Authors: Yi Li, Yunan Wu, Aggelos K. Katsaggelos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); General Relativity and Quantum Cosmology (gr-qc)
Arxiv link: https://arxiv.org/abs/2404.15552
Pdf link: https://arxiv.org/pdf/2404.15552
Abstract The advancement of The Laser Interferometer Gravitational-Wave Observatory (LIGO) has significantly enhanced the feasibility and reliability of gravitational wave detection. However, LIGO's high sensitivity makes it susceptible to transient noises known as glitches, which necessitate effective differentiation from real gravitational wave signals. Traditional approaches predominantly employ fully supervised or semi-supervised algorithms for the task of glitch classification and clustering. In the future task of identifying and classifying glitches across main and auxiliary channels, it is impractical to build a dataset with manually labeled ground-truth. In addition, the patterns of glitches can vary with time, generating new glitches without manual labels. In response to this challenge, we introduce the Cross-Temporal Spectrogram Autoencoder (CTSAE), a pioneering unsupervised method for the dimensionality reduction and clustering of gravitational wave glitches. CTSAE integrates a novel four-branch autoencoder with a hybrid of Convolutional Neural Networks (CNN) and Vision Transformers (ViT). To further extract features across multi-branches, we introduce a novel multi-branch fusion method using the CLS (Class) token. Our model, trained and evaluated on the GravitySpy O3 dataset on the main channel, demonstrates superior performance in clustering tasks when compared to state-of-the-art semi-supervised learning methods. To the best of our knowledge, CTSAE represents the first unsupervised approach tailored specifically for clustering LIGO data, marking a significant step forward in the field of gravitational wave research. The code of this paper is available at https://github.com/Zod-L/CTSAE
VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection
Authors: Authors: Xin-Cheng Wen, Xinchen Wang, Yujia Chen, Ruida Hu, David Lo, Cuiyun Gao
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.15596
Pdf link: https://arxiv.org/pdf/2404.15596
Abstract Deep Learning (DL)-based methods have proven to be effective for software vulnerability detection, with a potential for substantial productivity enhancements for detecting vulnerabilities. Current methods mainly focus on detecting single functions (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-procedural vulnerability detection scenarios in practice. For example, developers routinely engage with program analysis to detect vulnerabilities that span multiple functions within repositories. In addition, the widely-used benchmark datasets generally contain only intra-procedural vulnerabilities, leaving the assessment of inter-procedural vulnerability detection capabilities unexplored. To mitigate the issues, we propose a repository-level evaluation system, named \textbf{VulEval}, aiming at evaluating the detection performance of inter- and intra-procedural vulnerabilities simultaneously. Specifically, VulEval consists of three interconnected evaluation tasks: \textbf{(1) Function-Level Vulnerability Detection}, aiming at detecting intra-procedural vulnerability given a code snippet; \textbf{(2) Vulnerability-Related Dependency Prediction}, aiming at retrieving the most relevant dependencies from call graphs for providing developers with explanations about the vulnerabilities; and \textbf{(3) Repository-Level Vulnerability Detection}, aiming at detecting inter-procedural vulnerabilities by combining with the dependencies identified in the second task. VulEval also consists of a large-scale dataset, with a total of 4,196 CVE entries, 232,239 functions, and corresponding 4,699 repository-level source code in C/C++ programming languages. Our analysis highlights the current progress and future directions for software vulnerability detection.
Deepfakes and Higher Education: A Research Agenda and Scoping Review of Synthetic Media
Authors: Authors: Jasper Roe (1), Mike Perkins (2) ((1) James Cook University Singapore, (2) British University Vietnam)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.15601
Pdf link: https://arxiv.org/pdf/2404.15601
Abstract The availability of software which can produce convincing yet synthetic media poses both threats and benefits to tertiary education globally. While other forms of synthetic media exist, this study focuses on deepfakes, which are advanced Generative AI (GenAI) fakes of real people. This conceptual paper assesses the current literature on deepfakes across multiple disciplines by conducting an initial scoping review of 182 peer-reviewed publications. The review reveals three major trends: detection methods, malicious applications, and potential benefits, although no specific studies on deepfakes in the tertiary educational context were found. Following a discussion of these trends, this study applies the findings to postulate the major risks and potential mitigation strategies of deepfake technologies in higher education, as well as potential beneficial uses to aid the teaching and learning of both deepfakes and synthetic media. This culminates in the proposal of a research agenda to build a comprehensive, cross-cultural approach to investigate deepfakes in higher education.
Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models
Authors: Authors: Xu Shen, Yili Wang, Kaixiong Zhou, Shirui Pan, Xin Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15625
Pdf link: https://arxiv.org/pdf/2404.15625
Abstract The open-world test dataset is often mixed with out-of-distribution (OOD) samples, where the deployed models will struggle to make accurate predictions. Traditional detection methods need to trade off OOD detection and in-distribution (ID) classification performance since they share the same representation learning model. In this work, we propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs. Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection. Although it is conceptually simple, extending this vanilla framework to practical detection applications is still limited by two significant challenges. First, the popular similarity metrics based on Euclidian distance fail to consider the complex graph structure. Second, the generative model involving iterative denoising steps is time-consuming especially when it runs on the enormous pool of drugs. To address these challenges, our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations: i) An effective metric to comprehensively quantify the matching degree of input and reconstructed molecules; ii) A creative graph generator to construct prototypical graphs that are in line with ID but away from OOD; iii) An efficient and scalable OOD detector to compare the similarity between test samples and pre-constructed prototypical graphs and omit the generative process on every new molecule. Extensive experiments on ten benchmark datasets and six baselines are conducted to demonstrate our superiority.
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Authors: Authors: Sachin Mehta, Maxwell Horton, Fartash Faghri, Mohammad Hossein Sekhavat, Mahyar Najibi, Mehrdad Farajtabar, Oncel Tuzel, Mohammad Rastegari
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15653
Pdf link: https://arxiv.org/pdf/2404.15653
Abstract Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed method reframes pre-training on image-text data as a classification task. Consequently, it eliminates the need for pairwise similarity computations in contrastive loss, achieving a remarkable $2.7\times$ acceleration in training speed compared to contrastive learning on web-scale data. Through extensive experiments spanning diverse vision tasks, including detection and segmentation, we demonstrate that the proposed method maintains high representation quality. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}.
Augmented CARDS: A machine learning approach to identifying triggers of climate change misinformation on Twitter
Authors: Authors: Cristian Rojas, Frank Algra-Maschio, Mark Andrejevic, Travis Coan, John Cook, Yuan-Fang Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15673
Pdf link: https://arxiv.org/pdf/2404.15673
Abstract Misinformation about climate change poses a significant threat to societal well-being, prompting the urgent need for effective mitigation strategies. However, the rapid proliferation of online misinformation on social media platforms outpaces the ability of fact-checkers to debunk false claims. Automated detection of climate change misinformation offers a promising solution. In this study, we address this gap by developing a two-step hierarchical model, the Augmented CARDS model, specifically designed for detecting contrarian climate claims on Twitter. Furthermore, we apply the Augmented CARDS model to five million climate-themed tweets over a six-month period in 2022. We find that over half of contrarian climate claims on Twitter involve attacks on climate actors or conspiracy theories. Spikes in climate contrarianism coincide with one of four stimuli: political events, natural events, contrarian influencers, or convinced influencers. Implications for automated responses to climate misinformation are discussed.
Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation
Authors: Authors: Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.15687
Pdf link: https://arxiv.org/pdf/2404.15687
Abstract Vulnerability detection is crucial for ensuring the security and reliability of software systems. Recently, Graph Neural Networks (GNNs) have emerged as a prominent code embedding approach for vulnerability detection, owing to their ability to capture the underlying semantic structure of source code. However, GNNs face significant challenges in explainability due to their inherently black-box nature. To this end, several factual reasoning-based explainers have been proposed. These explainers provide explanations for the predictions made by GNNs by analyzing the key features that contribute to the outcomes. We argue that these factual reasoning-based explanations cannot answer critical what-if questions: What would happen to the GNN's decision if we were to alter the code graph into alternative structures? Inspired by advancements of counterfactual reasoning in artificial intelligence, we propose CFExplainer, a novel counterfactual explainer for GNN-based vulnerability detection. Unlike factual reasoning-based explainers, CFExplainer seeks the minimal perturbation to the input code graph that leads to a change in the prediction, thereby addressing the what-if questions for vulnerability detection. We term this perturbation a counterfactual explanation, which can pinpoint the root causes of the detected vulnerability and furnish valuable insights for developers to undertake appropriate actions for fixing the vulnerability. Extensive experiments on four GNN-based vulnerability detection models demonstrate the effectiveness of CFExplainer over existing state-of-the-art factual reasoning-based explainers.
DeepFeatureX Net: Deep Features eXtractors based Network for discriminating synthetic from real images
Authors: Authors: Orazio Pontorno (1), Luca Guarnera (1), Sebastiano Battiato (1) ((1) University of Catania)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.15697
Pdf link: https://arxiv.org/pdf/2404.15697
Abstract Deepfakes, synthetic images generated by deep learning algorithms, represent one of the biggest challenges in the field of Digital Forensics. The scientific community is working to develop approaches that can discriminate the origin of digital images (real or AI-generated). However, these methodologies face the challenge of generalization, that is, the ability to discern the nature of an image even if it is generated by an architecture not seen during training. This usually leads to a drop in performance. In this context, we propose a novel approach based on three blocks called Base Models, each of which is responsible for extracting the discriminative features of a specific image class (Diffusion Model-generated, GAN-generated, or real) as it is trained by exploiting deliberately unbalanced datasets. The features extracted from each block are then concatenated and processed to discriminate the origin of the input image. Experimental results showed that this approach not only demonstrates good robust capabilities to JPEG compression but also outperforms state-of-the-art methods in several generalization tests. Code, models and dataset are available at https://github.com/opontorno/block-based_deepfake-detection.
SRAGAN: Saliency Regularized and Attended Generative Adversarial Network for Chinese Ink-wash Painting Generation
Authors: Authors: Xiang Gao, Yuqi Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.15743
Pdf link: https://arxiv.org/pdf/2404.15743
Abstract This paper handles the problem of converting real pictures into traditional Chinese ink-wash paintings, i.e., Chinese ink-wash painting style transfer. Though this problem could be realized by a wide range of image-to-image translation models, a notable issue with all these methods is that the original image content details could be easily erased or corrupted due to transfer of ink-wash style elements. To solve or ameliorate this issue, we propose to incorporate saliency detection into the unpaired image-to-image translation framework to regularize content information of the generated paintings. The saliency map is utilized for content regularization from two aspects, both explicitly and implicitly: (\romannumeral1) we propose saliency IOU (SIOU) loss to explicitly regularize saliency consistency before and after stylization; (\romannumeral2) we propose saliency adaptive normalization (SANorm) which implicitly enhances content integrity of the generated paintings by injecting saliency information to the generator network to guide painting generation. Besides, we also propose saliency attended discriminator network which harnesses saliency mask to focus generative adversarial attention onto salient image regions, it contributes to producing finer ink-wash stylization effect for salient objects of images. Qualitative and quantitative experiments consistently demonstrate superiority of our model over related advanced methods for Chinese ink-wash painting style transfer.
Exploring Machine Learning Algorithms for Infection Detection Using GC-IMS Data: A Preliminary Study
Authors: Authors: Christos Sardianos, Chrysostomos Symvoulidis, Matthias Schlögl, Iraklis Varlamis, Georgios Th. Papadopoulos
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15757
Pdf link: https://arxiv.org/pdf/2404.15757
Abstract The developing field of enhanced diagnostic techniques in the diagnosis of infectious diseases, constitutes a crucial domain in modern healthcare. By utilizing Gas Chromatography-Ion Mobility Spectrometry (GC-IMS) data and incorporating machine learning algorithms into one platform, our research aims to tackle the ongoing issue of precise infection identification. Inspired by these difficulties, our goals consist of creating a strong data analytics process, enhancing machine learning (ML) models, and performing thorough validation for clinical applications. Our research contributes to the emerging field of advanced diagnostic technologies by integrating Gas Chromatography-Ion Mobility Spectrometry (GC-IMS) data and machine learning algorithms within a unified Laboratory Information Management System (LIMS) platform. Preliminary trials demonstrate encouraging levels of accuracy when employing various ML algorithms to differentiate between infected and non-infected samples. Continuing endeavors are currently concentrated on enhancing the effectiveness of the model, investigating techniques to clarify its functioning, and incorporating many types of data to further support the early detection of diseases.
Vision Transformer-based Adversarial Domain Adaptation
Authors: Authors: Yahan Li, Yuan Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15817
Pdf link: https://arxiv.org/pdf/2404.15817
Abstract Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. The most recent UDA methods always resort to adversarial training to yield state-of-the-art results and a dominant number of existing UDA methods employ convolutional neural networks (CNNs) as feature extractors to learn domain invariant features. Vision transformer (ViT) has attracted tremendous attention since its emergence and has been widely used in various computer vision tasks, such as image classification, object detection, and semantic segmentation, yet its potential in adversarial domain adaptation has never been investigated. In this paper, we fill this gap by employing the ViT as the feature extractor in adversarial domain adaptation. Moreover, we empirically demonstrate that ViT can be a plug-and-play component in adversarial domain adaptation, which means directly replacing the CNN-based feature extractor in existing UDA methods with the ViT-based feature extractor can easily obtain performance improvement. The code is available at https://github.com/LluckyYH/VT-ADA.
CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Authors: Authors: Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2404.15854
Pdf link: https://arxiv.org/pdf/2404.15854
Abstract The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD).
Optimizing Energy Efficiency of 5G RedCap Beam Management for Smart Agriculture Applications
Authors: Authors: Manishika Rawat, Matteo Pagin, Marco Giordani, Louis-Adrien Dufrene, Quentin Lampin, Michele Zorzi
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.15857
Pdf link: https://arxiv.org/pdf/2404.15857
Abstract Beam management in 5G NR involves the transmission and reception of control signals such as Synchronization Signal Blocks (SSBs), crucial for tasks like initial access and/or channel estimation. However, this procedure consumes energy, which is particularly challenging to handle for battery-constrained nodes such as RedCap devices. Specifically, in this work we study a mid-market Internet of Things (IoT) Smart Agriculture (SmA) deployment where an Unmanned Autonomous Vehicle (UAV) acts as a base station "from the sky" (UAV-gNB) to monitor and control ground User Equipments (UEs) in the field. Then, we formalize a multi-variate optimization problem to determine the optimal beam management design for RedCap SmA devices in order to reduce the energy consumption at the UAV-gNB. Specifically, we jointly optimize the transmission power and the beamwidth at the UAV-gNB. Based on the analysis, we derive the so-called "regions of feasibility," i.e., the upper limit(s) of the beam management parameters for which RedCap Quality of Service (QoS) and energy constraints are met. We study the impact of factors like the total transmission power at the gNB, the Signal-to-Noise Ratio (SNR) threshold for successful packet decoding, the number of UEs in the region, and the misdetection probability. Simulation results demonstrate that there exists an optimal configuration for beam management to promote energy efficiency, which depends on the speed of the UEs, the beamwidth, and other network parameters.
CONNECTION: COvert chaNnel NEtwork attaCk Through bIt-rate mOdulatioN
Authors: Authors: Simone Soderi, Rocco De Nicola
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.15858
Pdf link: https://arxiv.org/pdf/2404.15858
Abstract Covert channel networks are a well-known method for circumventing the security measures organizations put in place to protect their networks from adversarial attacks. This paper introduces a novel method based on bit-rate modulation for implementing covert channels between devices connected over a wide area network. This attack can be exploited to exfiltrate sensitive information from a machine (i.e., covert sender) and stealthily transfer it to a covert receiver while evading network security measures and detection systems. We explain how to implement this threat, focusing specifically on covert channel networks and their potential security risks to network information transmission. The proposed method leverages bit-rate modulation, where a high bit rate represents a '1' and a low bit rate represents a '0', enabling covert communication. We analyze the key metrics associated with covert channels, including robustness in the presence of legitimate traffic and other interference, bit-rate capacity, and bit error rate. Experiments demonstrate the good performance of this attack, which achieved 5 bps with excellent robustness and a channel capacity of up to 0.9239 bps/Hz under different noise sources. Therefore, we show that bit-rate modulation effectively violates network security and compromises sensitive data.
Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection
Authors: Authors: Michael Kösel, Marcel Schreiber, Michael Ulrich, Claudius Gläser, Klaus Dietmayer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15879
Pdf link: https://arxiv.org/pdf/2404.15879
Abstract LiDAR-based 3D object detection has become an essential part of automated driving due to its ability to localize and classify objects precisely in 3D. However, object detectors face a critical challenge when dealing with unknown foreground objects, particularly those that were not present in their original training data. These out-of-distribution (OOD) objects can lead to misclassifications, posing a significant risk to the safety and reliability of automated vehicles. Currently, LiDAR-based OOD object detection has not been well studied. We address this problem by generating synthetic training data for OOD objects by perturbing known object categories. Our idea is that these synthetic OOD objects produce different responses in the feature map of an object detector compared to in-distribution (ID) objects. We then extract features using a pre-trained and fixed object detector and train a simple multilayer perceptron (MLP) to classify each detection as either ID or OOD. In addition, we propose a new evaluation protocol that allows the use of existing datasets without modifying the point cloud, ensuring a more authentic evaluation of real-world scenarios. The effectiveness of our method is validated through experiments on the newly proposed nuScenes OOD benchmark. The source code is available at https://github.com/uulm-mrm/mmood3d.
Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks
Authors: Authors: Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.15881
Pdf link: https://arxiv.org/pdf/2404.15881
Abstract Latency attacks against object detection represent a variant of adversarial attacks that aim to inflate the inference time by generating additional ghost objects in a target image. However, generating ghost objects in the black-box scenario remains a challenge since information about these unqualified objects remains opaque. In this study, we demonstrate the feasibility of generating ghost objects in adversarial examples by extending the concept of "steal now, decrypt later" attacks. These adversarial examples, once produced, can be employed to exploit potential vulnerabilities in the AI service, giving rise to significant security concerns. The experimental results demonstrate that the proposed attack achieves successful attacks across various commonly used models and Google Vision API without any prior knowledge about the target model. Additionally, the average cost of each attack is less than \$ 1 dollars, posing a significant threat to AI security.
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Authors: Authors: Eunsu Baek, Keondo Park, Jiyoon Kim, Hyung-Sin Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.15882
Pdf link: https://arxiv.org/pdf/2404.15882
Abstract Computer vision applications predict on digital images acquired by a camera from physical scenes through light. However, conventional robustness benchmarks rely on perturbations in digitized images, diverging from distribution shifts occurring in the image acquisition process. To bridge this gap, we introduce a new distribution shift dataset, ImageNet-ES, comprising variations in environmental and camera sensor factors by directly capturing 202k images with a real camera in a controllable testbed. With the new dataset, we evaluate out-of-distribution (OOD) detection and model robustness. We find that existing OOD detection methods do not cope with the covariate shifts in ImageNet-ES, implying that the definition and detection of OOD should be revisited to embrace real-world distribution shifts. We also observe that the model becomes more robust in both ImageNet-C and -ES by learning environment and sensor variations in addition to existing digital augmentations. Lastly, our results suggest that effective shift mitigation via camera sensor control can significantly improve performance without increasing model size. With these findings, our benchmark may aid future research on robustness, OOD, and camera sensor control for computer vision. Our code and dataset are available at https://github.com/Edw2n/ImageNet-ES.
Detecting Disjoint Shortest Paths in Linear Time and More
Authors: Authors: Shyan Akmal, Virginia Vassilevska Williams, Nicole Wein
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2404.15916
Pdf link: https://arxiv.org/pdf/2404.15916
Abstract In the $k$-Disjoint Shortest Paths ($k$-DSP) problem, we are given a weighted graph $G$ on $n$ nodes and $m$ edges with specified source vertices $s_1, \dots, s_k$, and target vertices $t_1, \dots, t_k$, and are tasked with determining if $G$ contains vertex-disjoint $(s_i,t_i)$-shortest paths. For any constant $k$, it is known that $k$-DSP can be solved in polynomial time over undirected graphs and directed acyclic graphs (DAGs). However, the exact time complexity of $k$-DSP remains mysterious, with large gaps between the fastest known algorithms and best conditional lower bounds. In this paper, we obtain faster algorithms for important cases of $k$-DSP, and present better conditional lower bounds for $k$-DSP and its variants. Previous work solved 2-DSP over weighted undirected graphs in $O(n^7)$ time, and weighted DAGs in $O(mn)$ time. For the main result of this paper, we present linear time algorithms for solving 2-DSP on weighted undirected graphs and DAGs. Our algorithms are algebraic however, and so only solve the detection rather than search version of 2-DSP. For lower bounds, prior work implied that $k$-Clique can be reduced to $2k$-DSP in DAGs and undirected graphs with $O((kn)^2)$ nodes. We improve this reduction, by showing how to reduce from $k$-Clique to $k$-DSP in DAGs and undirected graphs with $O((kn)^2)$ nodes. A variant of $k$-DSP is the $k$-Disjoint Paths ($k$-DP) problem, where the solution paths no longer need to be shortest paths. Previous work reduced from $k$-Clique to $p$-DP in DAGs with $O(kn)$ nodes, for $p= k + k(k-1)/2$. We improve this by showing a reduction from $k$-Clique to $p$-DP, for $p=k + \lfloor k^2/4\rfloor$. Under the $k$-Clique Hypothesis from fine-grained complexity, our results establish better conditional lower bounds for $k$-DSP for all $k\ge 4$, and better conditional lower bounds for $p$-DP for all $p\le 4031$.
Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography
Authors: Authors: Xuxin Chen, Yuheng Li, Mingzhe Hu, Ella Salari, Xiaoqian Chen, Richard L.J. Qiu, Bin Zheng, Xiaofeng Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.15946
Pdf link: https://arxiv.org/pdf/2404.15946
Abstract Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, developing multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-training (CLIP), which has sparked interest across various medical imaging tasks. By solving the challenges in (1) effectively adapting the single-view CLIP for multi-view feature fusion and (2) efficiently fine-tuning this parameter-dense model with limited samples and computational resources, we introduce Mammo-CLIP, the first multi-modal framework to process multi-view mammograms and corresponding simple texts. Mammo-CLIP uses an early feature fusion strategy to learn multi-view relationships in four mammograms acquired from the CC and MLO views of the left and right breasts. To enhance learning efficiency, plug-and-play adapters are added into CLIP image and text encoders for fine-tuning parameters and limiting updates to about 1% of the parameters. For framework evaluation, we assembled two datasets retrospectively. The first dataset, comprising 470 malignant and 479 benign cases, was used for few-shot fine-tuning and internal evaluation of the proposed Mammo-CLIP via 5-fold cross-validation. The second dataset, including 60 malignant and 294 benign cases, was used to test generalizability of Mammo-CLIP. Study results show that Mammo-CLIP outperforms the state-of-art cross-view transformer in AUC (0.841 vs. 0.817, 0.837 vs. 0.807) on both datasets. It also surpasses previous two CLIP-based methods by 20.3% and 14.3%. This study highlights the potential of applying the finetuned vision-language models for developing next-generation, image-text-based CAD schemes of breast cancer.
Beyond Deepfake Images: Detecting AI-Generated Videos
Authors: Authors: Danial Samadi Vahdati, Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.15955
Pdf link: https://arxiv.org/pdf/2404.15955
Abstract Recent advances in generative AI have led to the development of techniques to generate visually realistic synthetic video. While a number of techniques have been developed to detect AI-generated synthetic images, in this paper we show that synthetic image detectors are unable to detect synthetic videos. We demonstrate that this is because synthetic video generators introduce substantially different traces than those left by image generators. Despite this, we show that synthetic video traces can be learned, and used to perform reliable synthetic video detection or generator source attribution even after H.264 re-compression. Furthermore, we demonstrate that while detecting videos from new generators through zero-shot transferability is challenging, accurate detection of videos from a new generator can be achieved through few-shot learning.
A Survey on Visual Mamba
Authors: Authors: Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Zi Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.15956
Pdf link: https://arxiv.org/pdf/2404.15956
Abstract State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.
Fast and Robust Expectation Propagation MIMO Detection via Preconditioned Conjugated Gradient
Authors: Authors: Luca Schmid, Dominik Sulz, Laurent Schmalen
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.15968
Pdf link: https://arxiv.org/pdf/2404.15968
Abstract We study the expectation propagation (EP) algorithm for symbol detection in massive multiple-input multiple-output (MIMO) systems. The EP detector shows excellent performance but suffers from a high computational complexity due to the matrix inversion, required in each EP iteration to perform marginal inference on a Gaussian system. We propose an inversion-free variant of the EP algorithm by treating inference on the mean and variance as two separate and simpler subtasks: We study the preconditioned conjugate gradient algorithm for obtaining the mean, which can significantly reduce the complexity and increase stability by relying on the Jacobi preconditioner that proves to fit the EP characteristics very well. For the variance, we use a simple approximation based on linear regression of the Gram channel matrix. Numerical studies on the Rayleigh-fading channel and on a realistic 3GPP channel model reveal the efficiency of the proposed scheme, which offers an attractive performance-complexity tradeoff and even outperforms the original EP detector in high multi-user inference cases where the matrix inversion becomes numerically unstable.
Keyword: face recognition

3D Face Morphing Attack Generation using Non-Rigid Registration
Authors: Authors: Jag Mohan Singh, Raghavendra Ramachandra
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.15765
Pdf link: https://arxiv.org/pdf/2404.15765
Abstract Face Recognition Systems (FRS) are widely used in commercial environments, such as e-commerce and e-banking, owing to their high accuracy in real-world conditions. However, these systems are vulnerable to facial morphing attacks, which are generated by blending face color images of different subjects. This paper presents a new method for generating 3D face morphs from two bona fide point clouds. The proposed method first selects bona fide point clouds with neutral expressions. The two input point clouds were then registered using a Bayesian Coherent Point Drift (BCPD) without optimization, and the geometry and color of the registered point clouds were averaged to generate a face morphing point cloud. The proposed method generates 388 face-morphing point clouds from 200 bona fide subjects. The effectiveness of the method was demonstrated through extensive vulnerability experiments, achieving a Generalized Morphing Attack Potential (G-MAP) of 97.93%, which is superior to the existing state-of-the-art (SOTA) with a G-MAP of 81.61%.
Keyword: augmentation

Neural Proto-Language Reconstruction
Authors: Authors: Chenxuan Cui, Ying Chen, Qinxin Wang, David R. Mortensen
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15690
Pdf link: https://arxiv.org/pdf/2404.15690
Abstract Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training.
BlissCam: Boosting Eye Tracking Efficiency with Learned In-Sensor Sparse Sampling
Authors: Authors: Yu Feng, Tianrui Ma, Yuhao Zhu, Xuan Zhang
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2404.15733
Pdf link: https://arxiv.org/pdf/2404.15733
Abstract Eye tracking is becoming an increasingly important task domain in emerging computing platforms such as Augmented/Virtual Reality (AR/VR). Today's eye tracking system suffers from long end-to-end tracking latency and can easily eat up half of the power budget of a mobile VR device. Most existing optimization efforts exclusively focus on the computation pipeline by optimizing the algorithm and/or designing dedicated accelerators while largely ignoring the front-end of any eye tracking pipeline: the image sensor. This paper makes a case for co-designing the imaging system with the computing system. In particular, we propose the notion of "in-sensor sparse sampling", whereby the pixels are drastically downsampled (by 20x) within the sensor. Such in-sensor sampling enhances the overall tracking efficiency by significantly reducing 1) the power consumption of the sensor readout chain and sensor-host communication interfaces, two major power contributors, and 2) the work done on the host, which receives and operates on far fewer pixels. With careful reuse of existing pixel circuitry, our proposed BLISSCAM requires little hardware augmentation to support the in-sensor operations. Our synthesis results show up to 8.2x energy reduction and 1.4x latency reduction over existing eye tracking pipelines.
An Empirical Study of Aegis
Authors: Authors: Daniel Saragih, Paridhi Goel, Tejas Balaji, Alyssa Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15784
Pdf link: https://arxiv.org/pdf/2404.15784
Abstract Bit flipping attacks are one class of attacks on neural networks with numerous defense mechanisms invented to mitigate its potency. Due to the importance of ensuring the robustness of these defense mechanisms, we perform an empirical study on the Aegis framework. We evaluate the baseline mechanisms of Aegis on low-entropy data (MNIST), and we evaluate a pre-trained model with the mechanisms fine-tuned on MNIST. We also compare the use of data augmentation to the robustness training of Aegis, and how Aegis performs under other adversarial attacks, such as the generation of adversarial examples. We find that both the dynamic-exit strategy and robustness training of Aegis has some drawbacks. In particular, we see drops in accuracy when testing on perturbed data, and on adversarial examples, as compared to baselines. Moreover, we found that the dynamic exit-strategy loses its uniformity when tested on simpler datasets. The code for this project is available on GitHub.
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Authors: Authors: Eunsu Baek, Keondo Park, Jiyoon Kim, Hyung-Sin Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.15882
Pdf link: https://arxiv.org/pdf/2404.15882
Abstract Computer vision applications predict on digital images acquired by a camera from physical scenes through light. However, conventional robustness benchmarks rely on perturbations in digitized images, diverging from distribution shifts occurring in the image acquisition process. To bridge this gap, we introduce a new distribution shift dataset, ImageNet-ES, comprising variations in environmental and camera sensor factors by directly capturing 202k images with a real camera in a controllable testbed. With the new dataset, we evaluate out-of-distribution (OOD) detection and model robustness. We find that existing OOD detection methods do not cope with the covariate shifts in ImageNet-ES, implying that the definition and detection of OOD should be revisited to embrace real-world distribution shifts. We also observe that the model becomes more robust in both ImageNet-C and -ES by learning environment and sensor variations in addition to existing digital augmentations. Lastly, our results suggest that effective shift mitigation via camera sensor control can significantly improve performance without increasing model size. With these findings, our benchmark may aid future research on robustness, OOD, and camera sensor control for computer vision. Our code and dataset are available at https://github.com/Edw2n/ImageNet-ES.
Drawing the Line: Deep Segmentation for Extracting Art from Ancient Etruscan Mirrors
Authors: Authors: Rafael Sterzinger, Simon Brenner, Robert Sablatnig
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15903
Pdf link: https://arxiv.org/pdf/2404.15903
Abstract Etruscan mirrors constitute a significant category within Etruscan art and, therefore, undergo systematic examinations to obtain insights into ancient times. A crucial aspect of their analysis involves the labor-intensive task of manually tracing engravings from the backside. Additionally, this task is inherently challenging due to the damage these mirrors have sustained, introducing subjectivity into the process. We address these challenges by automating the process through photometric-stereo scanning in conjunction with deep segmentation networks which, however, requires effective usage of the limited data at hand. We accomplish this by incorporating predictions on a per-patch level, and various data augmentations, as well as exploring self-supervised learning. Compared to our baseline, we improve predictive performance w.r.t. the pseudo-F-Measure by around 16%. When assessing performance on complete mirrors against a human baseline, our approach yields quantitative similar performance to a human annotator and significantly outperforms existing binarization methods. With our proposed methodology, we streamline the annotation process, enhance its objectivity, and reduce overall workload, offering a valuable contribution to the examination of these historical artifacts and other non-traditional documents.
Hardness and Tight Approximations of Demand Strip Packing
Authors: Authors: Klaus Jansen, Malin Rau, Malte Tutas
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2404.15917
Pdf link: https://arxiv.org/pdf/2404.15917
Abstract We settle the pseudo-polynomial complexity of the Demand Strip Packing (DSP) problem: Given a strip of fixed width and a set of items with widths and heights, the items must be placed inside the strip with the objective of minimizing the peak height. This problem has gained significant scientific interest due to its relevance in smart grids[Deppert et al.\ APPROX'21, G\'alvez et al.\ APPROX'21]. Smart Grids are a modern form of electrical grid that provide opportunities for optimization. They are forecast to impact the future of energy provision significantly. Algorithms running in pseudo-polynomial time lend themselves to these applications as considered time intervals, such as days, are small. Moreover, such algorithms can provide superior approximation guarantees over those running in polynomial time. Consequently, they evoke scientific interest in related problems. We prove that Demand Strip Packing is strongly NP-hard for approximation ratios below $5/4$. Through this proof, we provide novel insights into the relation of packing and scheduling problems. Using these insights, we show a series of frameworks that solve both Demand Strip Packing and Parallel Task Scheduling optimally when increasing the strip's width or number of machines. Such alterations to problems are known as resource augmentation. Applications are found when penalty costs are prohibitively large. Finally, we provide a pseudo-polynomial time approximation algorithm for DSP with an approximation ratio of $(5/4+\varepsilon)$, which is nearly optimal assuming $P\neq NP$. The construction of this algorithm provides several insights into the structure of DSP solutions and uses novel techniques to restructure optimal solutions.
Mixed Supervised Graph Contrastive Learning for Recommendation
Authors: Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.15954
Pdf link: https://arxiv.org/pdf/2404.15954
Abstract Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss and the contrastive loss. This decoupled design can cause inconsistent optimization direction from different losses, which leads to longer convergence time and even sub-optimal performance. Besides, the self-supervised contrastive loss falls short in alleviating the data sparsity issue in RecSys as it learns to differentiate users/items from different views without providing extra supervised collaborative filtering signals during augmentations. In this paper, we propose Mixed Supervised Graph Contrastive Learning for Recommendation (MixSGCL) to address these concerns. MixSGCL originally integrates the training of recommendation and unsupervised contrastive losses into a supervised contrastive learning loss to align the two tasks within one optimization direction. To cope with the data sparsity issue, instead unsupervised augmentation, we further propose node-wise and edge-wise mixup to mine more direct supervised collaborative filtering signals based on existing user-item interactions. Extensive experiments on three real-world datasets demonstrate that MixSGCL surpasses state-of-the-art methods, achieving top performance on both accuracy and efficiency. It validates the effectiveness of MixSGCL with our coupled design on supervised graph contrastive learning.

LeeKyungwook / get-arxiv-noti

New submissions for Thu, 25 Apr 24 #1078

Keyword: detection

Feature Distribution Shift Mitigation with Contrastive Pretraining for Intrusion Detection

Uncertainty in latent representations of variational autoencoders optimized for visual tasks

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection

IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents

Drop-Connect as a Fault-Tolerance Approach for RRAM-based Deep Neural Network Accelerators

Cross-Temporal Spectrogram Autoencoder (CTSAE): Unsupervised Dimensionality Reduction for Clustering Gravitational Wave Glitches

VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection

Deepfakes and Higher Education: A Research Agenda and Scoping Review of Synthetic Media

Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Augmented CARDS: A machine learning approach to identifying triggers of climate change misinformation on Twitter

Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation

DeepFeatureX Net: Deep Features eXtractors based Network for discriminating synthetic from real images

SRAGAN: Saliency Regularized and Attended Generative Adversarial Network for Chinese Ink-wash Painting Generation

Exploring Machine Learning Algorithms for Infection Detection Using GC-IMS Data: A Preliminary Study

Vision Transformer-based Adversarial Domain Adaptation

CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

Optimizing Energy Efficiency of 5G RedCap Beam Management for Smart Agriculture Applications

CONNECTION: COvert chaNnel NEtwork attaCk Through bIt-rate mOdulatioN

Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection

Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks

Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains

Detecting Disjoint Shortest Paths in Linear Time and More

Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography

Beyond Deepfake Images: Detecting AI-Generated Videos

A Survey on Visual Mamba

Fast and Robust Expectation Propagation MIMO Detection via Preconditioned Conjugated Gradient

Keyword: face recognition

3D Face Morphing Attack Generation using Non-Rigid Registration

Keyword: augmentation

Neural Proto-Language Reconstruction

BlissCam: Boosting Eye Tracking Efficiency with Learned In-Sensor Sparse Sampling

An Empirical Study of Aegis

Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains

Drawing the Line: Deep Segmentation for Extracting Art from Ancient Etruscan Mirrors

Hardness and Tight Approximations of Demand Strip Packing

Mixed Supervised Graph Contrastive Learning for Recommendation