【CS-part2】New submissions for Fri, 29 Mar 24

Keyword: webgpu

There is no result

Keyword: webgl

There is no result

Keyword: pre-rendering

There is no result

Keyword: prerendering

There is no result

Keyword: motion prediction

Egocentric Scene-aware Human Trajectory Prediction

Authors: Weizhuo Wang, C. Karen Liu, Monroe Kennedy III
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19026
Pdf link: https://arxiv.org/pdf/2403.19026
Abstract Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons. Such a robot needs to be able to predict the ego motion of the wearer based on egocentric vision and the surrounding scene. In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings. To facilitate research in ego-motion prediction, we have collected a comprehensive walking scene navigation dataset centered on the user's perspective. We present a method to predict human motion conditioning on the surrounding static scene. Our method leverages a diffusion model to produce a distribution of potential future trajectories, taking into account the user's observation of the environment. We introduce a compact representation to encode the user's visual memory of the surroundings, as well as an efficient sample-generating technique to speed up real-time inference of a diffusion model. We ablate our model and compare it to baselines, and results show that our model outperforms existing methods on key metrics of collision avoidance and trajectory mode coverage.
Keyword: incremental learning

There is no result

Keyword: svm incremental

There is no result

Keyword: nerf

Sine Activated Low-Rank Matrices for Parameter Efficient Learning
Authors: Yiping Ji, Hemanth Saratchandran, Cameron Gordon, Zeyu Zhang, Simon Lucey
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2403.19243
Pdf link: https://arxiv.org/pdf/2403.19243
Abstract Low-rank decomposition has emerged as a vital tool for enhancing parameter efficiency in neural network architectures, gaining traction across diverse applications in machine learning. These techniques significantly lower the number of parameters, striking a balance between compactness and performance. However, a common challenge has been the compromise between parameter efficiency and the accuracy of the model, where reduced parameters often lead to diminished accuracy compared to their full-rank counterparts. In this work, we propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process. This approach not only preserves the benefits of the parameter efficiency characteristic of low-rank methods but also increases the decomposition's rank, thereby enhancing model accuracy. Our method proves to be an adaptable enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF), and 3D shape modeling. This demonstrates the wide-ranging potential and efficiency of our proposed technique.
Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
Authors: Yujin Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Nießner
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19319
Pdf link: https://arxiv.org/pdf/2403.19319
Abstract We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues. In Mesh2NeRF, we propose an analytic solution to directly obtain ground-truth radiance fields from 3D meshes, characterizing the density field with an occupancy function featuring a defined surface thickness, and determining view-dependent color through a reflection function considering both the mesh and environment lighting. Mesh2NeRF extracts accurate radiance fields which provides direct supervision for training generative NeRFs and single scene representation. We validate the effectiveness of Mesh2NeRF across various tasks, achieving a noteworthy 3.12dB improvement in PSNR for view synthesis in single scene representation on the ABO dataset, a 0.69 PSNR enhancement in the single-view conditional generation of ShapeNet Cars, and notably improved mesh extraction from NeRF in the unconditional generation of Objaverse Mugs.
CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Authors: Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2403.19495
Pdf link: https://arxiv.org/pdf/2403.19495
Abstract The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.
SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects
Authors: Avinash Ummadisingu, Jongkeum Choi, Koki Yamane, Shimpei Masuda, Naoki Fukaya, Kuniyuki Takahashi
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19607
Pdf link: https://arxiv.org/pdf/2403.19607
Abstract Acquiring accurate depth information of transparent objects using off-the-shelf RGB-D cameras is a well-known challenge in Computer Vision and Robotics. Depth estimation/completion methods are typically employed and trained on datasets with quality depth labels acquired from either simulation, additional sensors or specialized data collection setups and known 3d models. However, acquiring reliable depth information for datasets at scale is not straightforward, limiting training scalability and generalization. Neural Radiance Fields (NeRFs) are learning-free approaches and have demonstrated wide success in novel view synthesis and shape recovery. However, heuristics and controlled environments (lights, backgrounds, etc) are often required to accurately capture specular surfaces. In this paper, we propose using Visual Foundation Models (VFMs) for segmentation in a zero-shot, label-free way to guide the NeRF reconstruction process for these objects via the simultaneous reconstruction of semantic fields and extensions to increase robustness. Our proposed method Segmentation-AIDed NeRF (SAID-NeRF) shows significant performance on depth completion datasets for transparent objects and robotic grasping.
Keyword: multiorgan

There is no result

Keyword: multi-organ

AIC-UNet: Anatomy-informed Cascaded UNet for Robust Multi-Organ Segmentation
Authors: Young Seok Jeon, Hongfei Yang, Huazhu Fu, Mengling Feng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2403.18878
Pdf link: https://arxiv.org/pdf/2403.18878
Abstract Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers, which may not scale to multi-organ segmentation problems where inter-organ relation also plays a huge role. We introduce a new approach to impose anatomical constraints on any existing encoder-decoder segmentation model by conditioning model prediction with learnable anatomy prior. More specifically, given an abdominal scan, a part of the encoder spatially warps a learnable prior to align with the given input scan using thin plate spline (TPS) grid interpolation. The warped prior is then integrated during the decoding phase to guide the model for more anatomy-informed predictions. Code is available at \hyperlink{https://anonymous.4open.science/r/AIC-UNet-7048}{https://anonymous.4open.science/r/AIC-UNet-7048}.
Keyword: multi organ

There is no result

Keyword: SAM

Unleashing the Power of AI. A Systematic Review of Cutting-Edge Techniques in AI-Enhanced Scientometrics, Webometrics, and Bibliometrics
Authors: Hamid Reza Saeidnia, Elaheh Hosseini, Shadi Abdoli, Marcel Ausloos
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2403.18838
Pdf link: https://arxiv.org/pdf/2403.18838
Abstract Purpose: The study aims to analyze the synergy of Artificial Intelligence (AI), with scientometrics, webometrics, and bibliometrics to unlock and to emphasize the potential of the applications and benefits of AI algorithms in these fields. Design/methodology/approach: By conducting a systematic literature review, our aim is to explore the potential of AI in revolutionizing the methods used to measure and analyze scholarly communication, identify emerging research trends, and evaluate the impact of scientific publications. To achieve this, we implemented a comprehensive search strategy across reputable databases such as ProQuest, IEEE Explore, EBSCO, Web of Science, and Scopus. Our search encompassed articles published from January 1, 2000, to September 2022, resulting in a thorough review of 61 relevant articles. Findings: (i) Regarding scientometrics, the application of AI yields various distinct advantages, such as conducting analyses of publications, citations, research impact prediction, collaboration, research trend analysis, and knowledge mapping, in a more objective and reliable framework. (ii) In terms of webometrics, AI algorithms are able to enhance web crawling and data collection, web link analysis, web content analysis, social media analysis, web impact analysis, and recommender systems. (iii) Moreover, automation of data collection, analysis of citations, disambiguation of authors, analysis of co-authorship networks, assessment of research impact, text mining, and recommender systems are considered as the potential of AI integration in the field of bibliometrics. Originality/value: This study covers the particularly new benefits and potential of AI-enhanced scientometrics, webometrics, and bibliometrics to highlight the significant prospects of the synergy of this integration through AI.
Graph Bayesian Optimization for Multiplex Influence Maximization
Authors: Zirui Yuan, Minglai Shao, Zhiqian Chen
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.18866
Pdf link: https://arxiv.org/pdf/2403.18866
Abstract Influence maximization (IM) is the problem of identifying a limited number of initial influential users within a social network to maximize the number of influenced users. However, previous research has mostly focused on individual information propagation, neglecting the simultaneous and interactive dissemination of multiple information items. In reality, when users encounter a piece of information, such as a smartphone product, they often associate it with related products in their minds, such as earphones or computers from the same brand. Additionally, information platforms frequently recommend related content to users, amplifying this cascading effect and leading to multiplex influence diffusion. This paper first formulates the Multiplex Influence Maximization (Multi-IM) problem using multiplex diffusion models with an information association mechanism. In this problem, the seed set is a combination of influential users and information. To effectively manage the combinatorial complexity, we propose Graph Bayesian Optimization for Multi-IM (GBIM). The multiplex diffusion process is thoroughly investigated using a highly effective global kernelized attention message-passing module. This module, in conjunction with Bayesian linear regression (BLR), produces a scalable surrogate model. A data acquisition module incorporating the exploration-exploitation trade-off is developed to optimize the seed set further. Extensive experiments on synthetic and real-world datasets have proven our proposed framework effective. The code is available at https://github.com/zirui-yuan/GBIM.
Targeted Visualization of the Backbone of Encoder LLMs
Authors: Isaac Roberts, Alexander Schulz, Luca Hermes, Barbara Hammer
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.18872
Pdf link: https://arxiv.org/pdf/2403.18872
Abstract Attention based Large Language Models (LLMs) are the state-of-the-art in natural language processing (NLP). The two most common architectures are encoders such as BERT, and decoders like the GPT models. Despite the success of encoder models, on which we focus in this work, they also bear several risks, including issues with bias or their susceptibility for adversarial attacks, signifying the necessity for explainable AI to detect such issues. While there does exist various local explainability methods focusing on the prediction of single inputs, global methods based on dimensionality reduction for classification inspection, which have emerged in other domains and that go further than just using t-SNE in the embedding space, are not widely spread in NLP. To reduce this gap, we investigate the application of DeepView, a method for visualizing a part of the decision function together with a data set in two dimensions, to the NLP domain. While in previous work, DeepView has been used to inspect deep image classification models, we demonstrate how to apply it to BERT-based NLP classifiers and investigate its usability in this domain, including settings with adversarially perturbed input samples and pre-trained, fine-tuned, and multi-task models.
A Geometric Explanation of the Likelihood OOD Detection Paradox
Authors: Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2403.18910
Pdf link: https://arxiv.org/pdf/2403.18910
Abstract Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.
Formal Verification of Consistency for Systems with Redundant Controllers
Authors: Bjarne Johansson (ABB AB, Västerås, Sweden), Bahman Pourvatan (Mälardalen University, Västerås, Sweden), Zahra Moezkarimi (Mälardalen University, Västerås, Sweden), Alessandro Papadopoulos (Mälardalen University, Västerås, Sweden), Marjan Sirjani (Mälardalen University, Västerås, Sweden)
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2403.18917
Pdf link: https://arxiv.org/pdf/2403.18917
Abstract A potential problem that may arise in the domain of distributed control systems is the existence of more than one primary controller in redundancy plans that may lead to inconsistency. An algorithm called NRP FD is proposed to solve this issue by prioritizing consistency over availability. In this paper, we demonstrate how by using modeling and formal verification, we discovered an issue in NRP FD where we may have two primary controllers at the same time. We then provide a solution to mitigate the identified issue, thereby enhancing the robustness and reliability of such systems.
CPR: Retrieval Augmented Generation for Copyright Protection
Authors: Aditya Golatkar, Alessandro Achille, Luca Zancato, Yu-Xiang Wang, Ashwin Swaminathan, Stefano Soatto
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.18920
Pdf link: https://arxiv.org/pdf/2403.18920
Abstract Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG techniques for image generation may lead to parts of the retrieved samples being copied in the model's output. To reduce risks of leaking private information contained in the retrieved set, we introduce Copy-Protected generation with Retrieval (CPR), a new method for RAG with strong copyright protection guarantees in a mixed-private setting for diffusion models.CPR allows to condition the output of diffusion models on a set of retrieved images, while also guaranteeing that unique identifiable information about those example is not exposed in the generated outputs. In particular, it does so by sampling from a mixture of public (safe) distribution and private (user) distribution by merging their diffusion scores at inference. We prove that CPR satisfies Near Access Freeness (NAF) which bounds the amount of information an attacker may be able to extract from the generated images. We provide two algorithms for copyright protection, CPR-KL and CPR-Choose. Unlike previously proposed rejection-sampling-based NAF methods, our methods enable efficient copyright-protected sampling with a single run of backward diffusion. We show that our method can be applied to any pre-trained conditional diffusion model, such as Stable Diffusion or unCLIP. In particular, we empirically show that applying CPR on top of unCLIP improves quality and text-to-image alignment of the generated results (81.4 to 83.17 on TIFA benchmark), while enabling credit attribution, copy-right protection, and deterministic, constant time, unlearning.
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
Authors: Mukund Varma T, Peihao Wang, Zhiwen Fan, Zhangyang Wang, Hao Su, Ravi Ramamoorthi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.18922
Pdf link: https://arxiv.org/pdf/2403.18922
Abstract In recent years, there has been an explosion of 2D vision models for numerous tasks such as semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets. At the same time, there has been renewed interest in 3D scene representations such as neural radiance fields from multi-view images. However, the availability of 3D or multiview data is still substantially limited compared to 2D image datasets, making extending 2D vision models to 3D data highly desirable but also very challenging. Indeed, extending a single 2D vision operator like scene editing to 3D typically requires a highly creative method specialized to that task and often requires per-scene optimization. In this paper, we ask the question of whether any 2D vision model can be lifted to make 3D consistent predictions. We answer this question in the affirmative; our new Lift3D method trains to predict unseen views on feature spaces generated by a few visual models (i.e. DINO and CLIP), but then generalizes to novel vision operators and tasks, such as style transfer, super-resolution, open vocabulary segmentation and image colorization; for some of these tasks, there is no comparable previous 3D method. In many cases, we even outperform state-of-the-art methods specialized for the task in question. Moreover, Lift3D is a zero-shot method, in the sense that it requires no task-specific training, nor scene-specific optimization.
Random Aggregate Beamforming for Over-the-Air Federated Learning in Large-Scale Networks
Authors: Chunmei Xu, Shengheng Liu, Yongming Huang, Bjorn Ottersten, Dusit Niyato
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.18946
Pdf link: https://arxiv.org/pdf/2403.18946
Abstract At present, there is a trend to deploy ubiquitous artificial intelligence (AI) applications at the edge of the network. As a promising framework that enables secure edge intelligence, federated learning (FL) has received widespread attention, and over-the-air computing (AirComp) has been integrated to further improve the communication efficiency. In this paper, we consider a joint device selection and aggregate beamforming design with the objectives of minimizing the aggregate error and maximizing the number of selected devices. This yields a combinatorial problem, which is difficult to solve especially in large-scale networks. To tackle the problems in a cost-effective manner, we propose a random aggregate beamforming-based scheme, which generates the aggregator beamforming vector via random sampling rather than optimization. The implementation of the proposed scheme does not require the channel estimation. We additionally use asymptotic analysis to study the obtained aggregate error and the number of the selected devices when the number of devices becomes large. Furthermore, a refined method that runs with multiple randomizations is also proposed for performance improvement. Extensive simulation results are presented to demonstrate the effectiveness of the proposed random aggregate beamforming-based scheme as well as the refined method.
Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical Systems
Authors: Ravi Chepuri, Dael Amzalag, Thomas Antonsen Jr., Michelle Girvan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.18953
Pdf link: https://arxiv.org/pdf/2403.18953
Abstract Reservoir computers (RCs) are powerful machine learning architectures for time series prediction. Recently, next generation reservoir computers (NGRCs) have been introduced, offering distinct advantages over RCs, such as reduced computational expense and lower data requirements. However, NGRCs have their own practical difficulties distinct from those of RCs, including sensitivity to sampling time and type of nonlinearities in the data. Here, we introduce a hybrid RC-NGRC approach for time series forecasting of complex and chaotic dynamical systems. We show that our hybrid approach can produce accurate short term predictions and capture the long term statistics of dynamical systems in situations where the RC and NGRC components alone are insufficient. The advantage of the hybrid RC-NGRC approach is most pronounced when both components are limited in their prediction capabilities, e.g. for a small RC and a large sampling time in the training data. Under these conditions, we show for several chaotic systems that the hybrid RC-NGRC method with a small reservoir ($N \approx 100$) can achieve prediction performance rivaling that of a pure RC with a much larger reservoir ($N \approx 1000$), illustrating that the hybrid approach offers significant gains in computational efficiency over traditional RCs while simultaneously addressing some of the limitations of NGRCs.
Conformal Intent Classification and Clarification for Fast and Accurate Intent Recognition
Authors: Floris den Hengst, Ralf Wolter, Patrick Altmeyer, Arda Kaygan
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.18973
Pdf link: https://arxiv.org/pdf/2403.18973
Abstract We present Conformal Intent Classification and Clarification (CICC), a framework for fast and accurate intent classification for task-oriented dialogue systems. The framework turns heuristic uncertainty scores of any intent classifier into a clarification question that is guaranteed to contain the true intent at a pre-defined confidence level. By disambiguating between a small number of likely intents, the user query can be resolved quickly and accurately. Additionally, we propose to augment the framework for out-of-scope detection. In a comparative evaluation using seven intent recognition datasets we find that CICC generates small clarification questions and is capable of out-of-scope detection. CICC can help practitioners and researchers substantially in improving the user experience of dialogue agents with specific clarification questions.
"Sorry, Come Again?" Prompting -- Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing
Authors: Vipula Rawte, S.M Towhidul Islam Tonmoy, S M Mehedi Zaman, Prachi Priya, Aman Chadha, Amit P. Sheth, Amitava Das
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.18976
Pdf link: https://arxiv.org/pdf/2403.18976
Abstract Hallucination has emerged as the most vulnerable aspect of contemporary Large Language Models (LLMs). In this paper, we introduce the Sorry, Come Again (SCA) prompting, aimed to avoid LLM hallucinations by enhancing comprehension through: (i) optimal paraphrasing and (ii) injecting [PAUSE] tokens to delay LLM generation. First, we provide an in-depth analysis of linguistic nuances: formality, readability, and concreteness of prompts for 21 LLMs, and elucidate how these nuances contribute to hallucinated generation. Prompts with lower readability, formality, or concreteness pose comprehension challenges for LLMs, similar to those faced by humans. In such scenarios, an LLM tends to speculate and generate content based on its imagination (associative memory) to fill these information gaps. Although these speculations may occasionally align with factual information, their accuracy is not assured, often resulting in hallucination. Recent studies reveal that an LLM often neglects the middle sections of extended prompts, a phenomenon termed as lost in the middle. While a specific paraphrase may suit one LLM, the same paraphrased version may elicit a different response from another LLM. Therefore, we propose an optimal paraphrasing technique to identify the most comprehensible paraphrase of a given prompt, evaluated using Integrated Gradient (and its variations) to guarantee that the LLM accurately processes all words. While reading lengthy sentences, humans often pause at various points to better comprehend the meaning read thus far. We have fine-tuned an LLM with injected [PAUSE] tokens, allowing the LLM to pause while reading lengthier prompts. This has brought several key contributions: (i) determining the optimal position to inject [PAUSE], (ii) determining the number of [PAUSE] tokens to be inserted, and (iii) introducing reverse proxy tuning to fine-tune the LLM for [PAUSE] insertion.
Dealing with Imbalanced Classes in Bot-IoT Dataset
Authors: Jesse Atuhurra, Takanori Hara, Yuanyu Zhang, Masahiro Sasabe, Shoji Kasahara
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.18989
Pdf link: https://arxiv.org/pdf/2403.18989
Abstract With the rapidly spreading usage of Internet of Things (IoT) devices, a network intrusion detection system (NIDS) plays an important role in detecting and protecting various types of attacks in the IoT network. To evaluate the robustness of the NIDS in the IoT network, the existing work proposed a realistic botnet dataset in the IoT network (Bot-IoT dataset) and applied it to machine learning-based anomaly detection. This dataset contains imbalanced normal and attack packets because the number of normal packets is much smaller than that of attack ones. The nature of imbalanced data may make it difficult to identify the minority class correctly. In this thesis, to address the class imbalance problem in the Bot-IoT dataset, we propose a binary classification method with synthetic minority over-sampling techniques (SMOTE). The proposed classifier aims to detect attack packets and overcome the class imbalance problem using the SMOTE algorithm. Through numerical results, we demonstrate the proposed classifier's fundamental characteristics and the impact of imbalanced data on its performance.
Few-Shot Cross-System Anomaly Trace Classification for Microservice-based systems
Authors: Yuqing Wang, Mika V. Mantylä, Serge Demeyer, Mutlu Beyazit, Joanna Kisaakye, Jesse Nyyssölä
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.18998
Pdf link: https://arxiv.org/pdf/2403.18998
Abstract Microservice-based systems (MSS) may experience failures in various fault categories due to their complex and dynamic nature. To effectively handle failures, AIOps tools utilize trace-based anomaly detection and root cause analysis. In this paper, we propose a novel framework for few-shot abnormal trace classification for MSS. Our framework comprises two main components: (1) Multi-Head Attention Autoencoder for constructing system-specific trace representations, which enables (2) Transformer Encoder-based Model-Agnostic Meta-Learning to perform effective and efficient few-shot learning for abnormal trace classification. The proposed framework is evaluated on two representative MSS, Trainticket and OnlineBoutique, with open datasets. The results show that our framework can adapt the learned knowledge to classify new, unseen abnormal traces of novel fault categories both within the same system it was initially trained on and even in the different MSS. Within the same MSS, our framework achieves an average accuracy of 93.26\% and 85.2\% across 50 meta-testing tasks for Trainticket and OnlineBoutique, respectively, when provided with 10 instances for each task. In a cross-system context, our framework gets an average accuracy of 92.19\% and 84.77\% for the same meta-testing tasks of the respective system, also with 10 instances provided for each task. Our work demonstrates the applicability of achieving few-shot abnormal trace classification for MSS and shows how it can enable cross-system adaptability. This opens an avenue for building more generalized AIOps tools that require less system-specific data labeling for anomaly detection and root cause analysis.
Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards
Authors: Yasin Sonmez, Neelay Junnarkar, Murat Arcak
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2403.19024
Pdf link: https://arxiv.org/pdf/2403.19024
Abstract Recent work in reinforcement learning has leveraged symmetries in the model to improve sample efficiency in training a policy. A commonly used simplifying assumption is that the dynamics and reward both exhibit the same symmetry. However, in many real-world environments, the dynamical model exhibits symmetry independent of the reward model: the reward may not satisfy the same symmetries as the dynamics. In this paper, we investigate scenarios where only the dynamics are assumed to exhibit symmetry, extending the scope of problems in reinforcement learning and learning in control theory where symmetry techniques can be applied. We use Cartan's moving frame method to introduce a technique for learning dynamics which, by construction, exhibit specified symmetries. We demonstrate through numerical experiments that the proposed method learns a more accurate dynamical model.
Egocentric Scene-aware Human Trajectory Prediction
Authors: Weizhuo Wang, C. Karen Liu, Monroe Kennedy III
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19026
Pdf link: https://arxiv.org/pdf/2403.19026
Abstract Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons. Such a robot needs to be able to predict the ego motion of the wearer based on egocentric vision and the surrounding scene. In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings. To facilitate research in ego-motion prediction, we have collected a comprehensive walking scene navigation dataset centered on the user's perspective. We present a method to predict human motion conditioning on the surrounding static scene. Our method leverages a diffusion model to produce a distribution of potential future trajectories, taking into account the user's observation of the environment. We introduce a compact representation to encode the user's visual memory of the surroundings, as well as an efficient sample-generating technique to speed up real-time inference of a diffusion model. We ablate our model and compare it to baselines, and results show that our model outperforms existing methods on key metrics of collision avoidance and trajectory mode coverage.
Visualizing High-Dimensional Temporal Data Using Direction-Aware t-SNE
Authors: Pavlin G. Poličar, Blaž Zupan
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2403.19040
Pdf link: https://arxiv.org/pdf/2403.19040
Abstract Many real-world data sets contain a temporal component or involve transitions from state to state. For exploratory data analysis, we can represent these high-dimensional data sets in two-dimensional maps, using embeddings of the data objects under exploration and representing their temporal relationships with directed edges. Most existing dimensionality reduction techniques, such as t-SNE and UMAP, do not take into account the temporal or relational nature of the data when constructing the embeddings, resulting in temporally cluttered visualizations that obscure potentially interesting patterns. To address this problem, we propose two complementary, direction-aware loss terms in the optimization function of t-SNE that emphasize the temporal aspects of the data, guiding the optimization and the resulting embedding to reveal temporal patterns that might otherwise go unnoticed. The Directional Coherence Loss (DCL) encourages nearby arrows connecting two adjacent time series points to point in the same direction, while the Edge Length Loss (ELL) penalizes arrows - which effectively represent time gaps in the visualized embedding - based on their length. Both loss terms are differentiable and can be easily incorporated into existing dimensionality reduction techniques. By promoting local directionality of the directed edges, our procedure produces more temporally meaningful and less cluttered visualizations. We demonstrate the effectiveness of our approach on a toy dataset and two real-world datasets.
Detecting Generative Parroting through Overfitting Masked Autoencoders
Authors: Saeid Asgari Taghanaki, Joseph Lambourne
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.19050
Pdf link: https://arxiv.org/pdf/2403.19050
Abstract The advent of generative AI models has revolutionized digital content creation, yet it introduces challenges in maintaining copyright integrity due to generative parroting, where models mimic their training data too closely. Our research presents a novel approach to tackle this issue by employing an overfitted Masked Autoencoder (MAE) to detect such parroted samples effectively. We establish a detection threshold based on the mean loss across the training dataset, allowing for the precise identification of parroted content in modified datasets. Preliminary evaluations demonstrate promising results, suggesting our method's potential to ensure ethical use and enhance the legal compliance of generative models.
CAUSE: Counterfactual Assessment of User Satisfaction Estimation in Task-Oriented Dialogue Systems
Authors: Amin Abolghasemi, Zhaochun Ren, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke, Suzan Verberne
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.19056
Pdf link: https://arxiv.org/pdf/2403.19056
Abstract An important unexplored aspect in previous work on user satisfaction estimation for Task-Oriented Dialogue (TOD) systems is their evaluation in terms of robustness for the identification of user dissatisfaction: current benchmarks for user satisfaction estimation in TOD systems are highly skewed towards dialogues for which the user is satisfied. The effect of having a more balanced set of satisfaction labels on performance is unknown. However, balancing the data with more dissatisfactory dialogue samples requires further data collection and human annotation, which is costly and time-consuming. In this work, we leverage large language models (LLMs) and unlock their ability to generate satisfaction-aware counterfactual dialogues to augment the set of original dialogues of a test collection. We gather human annotations to ensure the reliability of the generated samples. We evaluate two open-source LLMs as user satisfaction estimators on our augmented collection against state-of-the-art fine-tuned models. Our experiments show that when used as few-shot user satisfaction estimators, open-source LLMs show higher robustness to the increase in the number of dissatisfaction labels in the test collection than the fine-tuned state-of-the-art models. Our results shed light on the need for data augmentation approaches for user satisfaction estimation in TOD systems. We release our aligned counterfactual dialogues, which are curated by human annotation, to facilitate further research on this topic.
MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck
Authors: Liangjian Wen, Xiasi Wang, Jianzhuang Liu, Zenglin Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.19078
Pdf link: https://arxiv.org/pdf/2403.19078
Abstract Self-supervised learning aims to learn representation that can be effectively generalized to downstream tasks. Many self-supervised approaches regard two views of an image as both the input and the self-supervised signals, assuming that either view contains the same task-relevant information and the shared information is (approximately) sufficient for predicting downstream tasks. Recent studies show that discarding superfluous information not shared between the views can improve generalization. Hence, the ideal representation is sufficient for downstream tasks and contains minimal superfluous information, termed minimal sufficient representation. One can learn this representation by maximizing the mutual information between the representation and the supervised view while eliminating superfluous information. Nevertheless, the computation of mutual information is notoriously intractable. In this work, we propose an objective termed multi-view entropy bottleneck (MVEB) to learn minimal sufficient representation effectively. MVEB simplifies the minimal sufficient learning to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution. Our experiments confirm that MVEB significantly improves performance. For example, it achieves top-1 accuracy of 76.9\% on ImageNet with a vanilla ResNet-50 backbone on linear evaluation. To the best of our knowledge, this is the new state-of-the-art result with ResNet-50.
A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement
Authors: Junjie Wen, Jinqiang Cui, Benyun Zhao, Bingxin Han, Xuchen Liu, Zhi Gao, Ben M. Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19079
Pdf link: https://arxiv.org/pdf/2403.19079
Abstract In recent years, significant progress has been made in the field of underwater image enhancement (UIE). However, its practical utility for high-level vision tasks, such as underwater object detection (UOD) in Autonomous Underwater Vehicles (AUVs), remains relatively unexplored. It may be attributed to several factors: (1) Existing methods typically employ UIE as a pre-processing step, which inevitably introduces considerable computational overhead and latency. (2) The process of enhancing images prior to training object detectors may not necessarily yield performance improvements. (3) The complex underwater environments can induce significant domain shifts across different scenarios, seriously deteriorating the UOD performance. To address these challenges, we introduce EnYOLO, an integrated real-time framework designed for simultaneous UIE and UOD with domain-adaptation capability. Specifically, both the UIE and UOD task heads share the same network backbone and utilize a lightweight design. Furthermore, to ensure balanced training for both tasks, we present a multi-stage training strategy aimed at consistently enhancing their performance. Additionally, we propose a novel domain-adaptation strategy to align feature embeddings originating from diverse underwater environments. Comprehensive experiments demonstrate that our framework not only achieves state-of-the-art (SOTA) performance in both UIE and UOD tasks, but also shows superior adaptability when applied to different underwater scenarios. Our efficiency analysis further highlights the substantial potential of our framework for onboard deployment.
A Stabilized Physics Informed Neural Networks Method for Wave Equations
Authors: Yuling Jiao, Yuhui Liu, Jerry Zhijian Yang, Cheng Yuan
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2403.19090
Pdf link: https://arxiv.org/pdf/2403.19090
Abstract In this article, we propose a novel Stabilized Physics Informed Neural Networks method (SPINNs) for solving wave equations. In general, this method not only demonstrates theoretical convergence but also exhibits higher efficiency compared to the original PINNs. By replacing the $L^2$ norm with $H^1$ norm in the learning of initial condition and boundary condition, we theoretically proved that the error of solution can be upper bounded by the risk in SPINNs. Based on this, we decompose the error of SPINNs into approximation error, statistical error and optimization error. Furthermore, by applying the approximating theory of $ReLU^3$ networks and the learning theory on Rademacher complexity, covering number and pseudo-dimension of neural networks, we present a systematical non-asymptotic convergence analysis on our method, which shows that the error of SPINNs can be well controlled if the number of training samples, depth and width of the deep neural networks have been appropriately chosen. Two illustrative numerical examples on 1-dimensional and 2-dimensional wave equations demonstrate that SPINNs can achieve a faster and better convergence than classical PINNs method.
Purposeful remixing with generative AI: Constructing designer voice in multimodal composing
Authors: Xiao Tan, Wei Xu, Chaoran Wang
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2403.19095
Pdf link: https://arxiv.org/pdf/2403.19095
Abstract Voice, the discursive construction of the writer's identity, has been extensively studied and theorized in composition studies. In multimodal writing, students are able to mobilize both linguistic and non linguistic resources to express their real or imagined identities. But at the same time, when students are limited to choose from available online resources, their voices might be compromised due to the incompatibility between their authorial intentions and the existing materials. This study, therefore, investigates whether the use of generative AI tools could help student authors construct a more consistent voice in multimodal writing. In this study, we have designed a photo essay assignment where students recount a story in the form of photo essays and prompt AI image generating tools to create photos for their storytelling. Drawing on interview data, written reflection, written annotation, and multimodal products from seven focal participants, we have identified two remixing practices, through which students attempted to establish a coherent and unique voice in writing. The study sheds light on the intentional and discursive nature of multimodal writing with AI as afforded by the technological flexibility, while also highlighting the practical and ethical challenges that could be attributed to students insufficient prompt and multimodal literacy and the innate limitations of AI systems. This study provides important implications for incorporating AI tools in designing multimodal writing tasks.
Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications -- Part II: Joint Precoder, Radar Code, and Receive Filters Design
Authors: Jiawei Liu, Kumar Vijay Mishra, Mohammad Saquib
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2403.19119
Pdf link: https://arxiv.org/pdf/2403.19119
Abstract We address the challenge of spectral sharing between a statistical multiple-input multiple-output (MIMO) radar and an in-band full-duplex (IBFD) multi-user MIMO (MU-MIMO) communications system operating simultaneously in the same frequency band. Existing research on joint MIMO-radar-MIMO-communications (MRMC) systems has limitations, such as focusing on colocated MIMO radars, half-duplex MIMO communications, single-user scenarios, neglecting practical constraints, or employing separate transmit/receive units for MRMC coexistence. This paper, along with companion papers (Part I and III), proposes a comprehensive MRMC framework that addresses all these challenges. In the previous companion paper (Part I), we presented signal processing techniques for a distributed IBFD MRMC system. In this paper, we introduce joint design of statistical MIMO radar codes, uplink/downlink precoders, and corresponding receive filters using a novel metric called compounded-and-weighted sum mutual information. To solve the resulting highly non-convex problem, we employ a combination of block coordinate descent (BCD) and alternating projection methods. Numerical experiments show convergence of our algorithm, mitigation of uplink interference, and stable data rates under varying noise levels, channel estimate imperfections, and self-interference. The subsequent companion paper (Part III) extends the discussion to multiple targets and evaluates the tracking performance of our MRMC system.
Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications -- Part III: Multi-Target Tracking
Authors: Sk Nayemuzzaman, Kumar Vijay Mishra, Jiawei Liu, Mohammad Saquib
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2403.19120
Pdf link: https://arxiv.org/pdf/2403.19120
Abstract As a next-generation wireless technology, the in-band full-duplex (IBFD) transmission enables simultaneous transmission and reception of signals over the same frequency, thereby doubling spectral efficiency. Further, a continuous up-scaling of wireless network carrier frequencies arising from ever-increasing data traffic is driving research on integrated sensing and communications (ISAC) systems. In this context, we study the co-design of common waveforms, precoders, and filters for an IBFD multi-user (MU) multiple-input multiple-output (MIMO) communications with a distributed MIMO radar. This paper, along with companion papers (Part I and II), proposes a comprehensive MRMC framework that addresses all these challenges. In the companion papers, we developed signal processing and joint design algorithms for this distributed system. In this paper, we tackle multi-target detection, localization, and tracking. This co-design problem that includes practical MU-MIMO constraints on power and quality-of-service is highly non-convex. We propose a low-complexity procedure based on Barzilai-Borwein gradient algorithm to obtain the design parameters and mixed-integer linear program for distributed target localization. Numerical experiments demonstrate the feasibility and accuracy of multi-target sensing of the distributed FD ISAC system. Finally, we localize and track multiple targets by adapting the joint probabilistic data association and extended Kalman filter for this system.
Code Comparison Tuning for Code Large Language Models
Authors: Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.19121
Pdf link: https://arxiv.org/pdf/2403.19121
Abstract We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors. Specifically, we integrate the concept of comparison into instruction tuning, both at the token and sequence levels, enabling the model to discern even the slightest deviations in code. To compare the original code with an erroneous version containing manually added code errors, we use token-level preference loss for detailed token-level comparisons. Additionally, we combine code segments to create a new instruction tuning sample for sequence-level comparisons, enhancing the model's bug-fixing capability. Experimental results on the HumanEvalFix benchmark show that CCT surpasses instruction tuning in pass@1 scores by up to 4 points across diverse code LLMs, and extensive analysis demonstrates the effectiveness of our method.
PoCo: A Self-Supervised Approach via Polar Transformation Based Progressive Contrastive Learning for Ophthalmic Disease Diagnosis
Authors: Jinhong Wang, Tingting Chen, Jintai Chen, Yixuan Wu, Yuyang Xu, Danny Chen, Haochao Ying, Jian Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19124
Pdf link: https://arxiv.org/pdf/2403.19124
Abstract Automatic ophthalmic disease diagnosis on fundus images is important in clinical practice. However, due to complex fundus textures and limited annotated data, developing an effective automatic method for this problem is still challenging. In this paper, we present a self-supervised method via polar transformation based progressive contrastive learning, called PoCo, for ophthalmic disease diagnosis. Specifically, we novelly inject the polar transformation into contrastive learning to 1) promote contrastive learning pre-training to be faster and more stable and 2) naturally capture task-free and rotation-related textures, which provides insights into disease recognition on fundus images. Beneficially, simple normal translation-invariant convolution on transformed images can equivalently replace the complex rotation-invariant and sector convolution on raw images. After that, we develop a progressive contrastive learning method to efficiently utilize large unannotated images and a novel progressive hard negative sampling scheme to gradually reduce the negative sample number for efficient training and performance enhancement. Extensive experiments on three public ophthalmic disease datasets show that our PoCo achieves state-of-the-art performance with good generalization ability, validating that our method can reduce annotation efforts and provide reliable diagnosis. Codes are available at \url{https://github.com/wjh892521292/PoCo}.
QNCD: Quantization Noise Correction for Diffusion Models
Authors: Huanpeng Chu, Wei Wu, Chengjie Zang, Kun Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.19140
Pdf link: https://arxiv.org/pdf/2403.19140
Abstract Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity. However, their widespread adoption is hindered by the intensive computation required during the iterative denoising process. Post-training quantization (PTQ) presents a solution to accelerate sampling, aibeit at the expense of sample quality, extremely in low-bit settings. Addressing this, our study introduces a unified Quantization Noise Correction Scheme (QNCD), aimed at minishing quantization noise throughout the sampling process. We identify two primary quantization challenges: intra and inter quantization noise. Intra quantization noise, mainly exacerbated by embeddings in the resblock module, extends activation quantization ranges, increasing disturbances in each single denosing step. Besides, inter quantization noise stems from cumulative quantization deviations across the entire denoising process, altering data distributions step-by-step. QNCD combats these through embedding-derived feature smoothing for eliminating intra quantization noise and an effective runtime noise estimatiation module for dynamicly filtering inter quantization noise. Extensive experiments demonstrate that our method outperforms previous quantization methods for diffusion models, achieving lossless results in W4A8 and W8A8 quantization settings on ImageNet (LDM-4). Code is available at: https://github.com/huanpengchu/QNCD
MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation
Authors: Seyeon Kim, Siyoon Jin, Jihye Park, Kihong Kim, Jiyoung Kim, Jisu Nam, Seungryong Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19144
Pdf link: https://arxiv.org/pdf/2403.19144
Abstract Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still face challenges, including extensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overcome these challenges, we propose a novel motion-disentangled diffusion model for high-quality talking head generation, dubbed MoDiTalker. We introduce the two modules: audio-to-motion (AToM), designed to generate a synchronized lip motion from audio, and motion-to-video (MToV), designed to produce high-quality head video following the generated motion. AToM excels in capturing subtle lip movements by leveraging an audio attention mechanism. In addition, MToV enhances temporal consistency by leveraging an efficient tri-plane representation. Our experiments conducted on standard benchmarks demonstrate that our model achieves superior performance compared to existing models. We also provide comprehensive ablation studies and user study results.
Towards Understanding Dual BN In Hybrid Adversarial Training
Authors: Chenshuang Zhang, Chaoning Zhang, Kang Zhang, Axi Niu, Junmo Kim, In So Kweon
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19150
Pdf link: https://arxiv.org/pdf/2403.19150
Abstract There is a growing concern about applying batch normalization (BN) in adversarial training (AT), especially when the model is trained on both adversarial samples and clean samples (termed Hybrid-AT). With the assumption that adversarial and clean samples are from two different domains, a common practice in prior works is to adopt Dual BN, where BN and BN are used for adversarial and clean branches, respectively. A popular belief for motivating Dual BN is that estimating normalization statistics of this mixture distribution is challenging and thus disentangling it for normalization achieves stronger robustness. In contrast to this belief, we reveal that disentangling statistics plays a less role than disentangling affine parameters in model training. This finding aligns with prior work (Rebuffi et al., 2023), and we build upon their research for further investigations. We demonstrate that the domain gap between adversarial and clean samples is not very large, which is counter-intuitive considering the significant influence of adversarial perturbation on the model accuracy. We further propose a two-task hypothesis which serves as the empirical foundation and a unified framework for Hybrid-AT improvement. We also investigate Dual BN in test-time and reveal that affine parameters characterize the robustness during inference. Overall, our work sheds new light on understanding the mechanism of Dual BN in Hybrid-AT and its underlying justification.
RecDiffusion: Rectangling for Image Stitching with Diffusion Models
Authors: Tianhao Zhou, Haipeng Li, Ziyi Wang, Ao Luo, Chen-Lin Zhang, Jiajun Li, Bing Zeng, Shuaicheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19164
Pdf link: https://arxiv.org/pdf/2403.19164
Abstract Image stitching from different captures often results in non-rectangular boundaries, which is often considered unappealing. To solve non-rectangular boundaries, current solutions involve cropping, which discards image content, inpainting, which can introduce unrelated content, or warping, which can distort non-linear features and introduce artifacts. To overcome these issues, we introduce a novel diffusion-based learning framework, \textbf{RecDiffusion}, for image stitching rectangling. This framework combines Motion Diffusion Models (MDM) to generate motion fields, effectively transitioning from the stitched image's irregular borders to a geometrically corrected intermediary. Followed by Content Diffusion Models (CDM) for image detail refinement. Notably, our sampling process utilizes a weighted map to identify regions needing correction during each iteration of CDM. Our RecDiffusion ensures geometric accuracy and overall visual appeal, surpassing all previous methods in both quantitative and qualitative measures when evaluated on public benchmarks. Code is released at https://github.com/lhaippp/RecDiffusion.
Mining Bug Repositories for Multi-Fault Programs
Authors: Dylan Callaghan, Bernd Fischer
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2403.19171
Pdf link: https://arxiv.org/pdf/2403.19171
Abstract Datasets such as Defects4J and BugsInPy that contain bugs from real-world software projects are necessary for a realistic evaluation of automated debugging tools. However these datasets largely identify only a single bug in each entry, while real-world software projects (including those used in Defects4J and BugsInPy) typically contain multiple bugs at the same time. We lift this limitation and describe an extension to these datasets in which multiple bugs are identified in individual entries. We use test case transplantation and fault location translation, in order to expose and locate the bugs, respectively. We thus provide datasets of true multi-fault versions within real-world software projects, which maintain the properties and usability of the original datasets.
Ordering Collective Unit Tasks: from Scheduling to Computational Social Choice
Authors: Martin Durand, Fanny Pascual
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2403.19197
Pdf link: https://arxiv.org/pdf/2403.19197
Abstract We study the collective schedules problem, which consists in computing a one machine schedule of a set of tasks, knowing that a set of individuals (also called voters) have preferences regarding the order of the execution of the tasks. Our aim is to return a consensus schedule. We consider the setting in which all tasks have the same length -- such a schedule can therefore also be viewed as a ranking. We study two rules, one based on a distance criterion, and another one based one a binary criterion, and we show that these rules extend classic scheduling criteria. We also consider time constraints and precedence constraints between the tasks, and focus on two cases: the preferences of the voters fulfill these constraints, or they do not fulfill these constraints (but the collective schedule should fulfill them). In each case, either we show that the problem is NP-hard, or we provide a polynomial time algorithm which solves it. We also provide an analysis of a heuristic, which appears to be a 2 approximation of the Spearman's rule.
Are Large Language Models Good at Utility Judgments?
Authors: Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2403.19216
Pdf link: https://arxiv.org/pdf/2403.19216
Abstract Retrieval-augmented generation (RAG) is considered to be a promising approach to alleviate the hallucination issue of large language models (LLMs), and it has received widespread attention from researchers recently. Due to the limitation in the semantic understanding of retrieval models, the success of RAG heavily lies on the ability of LLMs to identify passages with utility. Recent efforts have explored the ability of LLMs to assess the relevance of passages in retrieval, but there has been limited work on evaluating the utility of passages in supporting question answering. In this work, we conduct a comprehensive study about the capabilities of LLMs in utility evaluation for open-domain QA. Specifically, we introduce a benchmarking procedure and collection of candidate passages with different characteristics, facilitating a series of experiments with five representative LLMs. Our experiments reveal that: (i) well-instructed LLMs can distinguish between relevance and utility, and that LLMs are highly receptive to newly generated counterfactual passages. Moreover, (ii) we scrutinize key factors that affect utility judgments in the instruction design. And finally, (iii) to verify the efficacy of utility judgments in practical retrieval augmentation applications, we delve into LLMs' QA capabilities using the evidence judged with utility and direct dense retrieval results. (iv) We propose a k-sampling, listwise approach to reduce the dependency of LLMs on the sequence of input passages, thereby facilitating subsequent answer generation. We believe that the way we formalize and study the problem along with our findings contributes to a critical assessment of retrieval-augmented LLMs. Our code and benchmark can be found at \url{https://github.com/ict-bigdatalab/utility_judgments}.
Taming Lookup Tables for Efficient Image Retouching
Authors: Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2403.19238
Pdf link: https://arxiv.org/pdf/2403.19238
Abstract The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT.
RTracker: Recoverable Tracking via PN Tree Structured Memory
Authors: Yuqing Huang, Xin Li, Zikun Zhou, Yaowei Wang, Zhenyu He, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19242
Pdf link: https://arxiv.org/pdf/2403.19242
Abstract Existing tracking methods mainly focus on learning better target representation or developing more robust prediction models to improve tracking performance. While tracking performance has significantly improved, the target loss issue occurs frequently due to tracking failures, complete occlusion, or out-of-view situations. However, considerably less attention is paid to the self-recovery issue of tracking methods, which is crucial for practical applications. To this end, we propose a recoverable tracking framework, RTracker, that uses a tree-structured memory to dynamically associate a tracker and a detector to enable self-recovery ability. Specifically, we propose a Positive-Negative Tree-structured memory to chronologically store and maintain positive and negative target samples. Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios. Our core idea is to use the support samples of positive and negative target categories to establish a relative distance-based criterion for a reliable assessment of target loss. The favorable performance in comparison against the state-of-the-art methods on numerous challenging benchmarks demonstrates the effectiveness of the proposed algorithm.
Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning
Authors: Wei Duan, Jie Lu, Junyu Xuan
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2403.19253
Pdf link: https://arxiv.org/pdf/2403.19253
Abstract Effective agent coordination is crucial in cooperative Multi-Agent Reinforcement Learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. Additionally, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for MARL. The LTS-CG leverages agents' historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relation uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG's superior performance.
NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data
Authors: Manuel Tonneau, Pedro Vitor Quinta de Castro, Karim Lasri, Ibrahim Farouq, Lakshminarayanan Subramanian, Victor Orozco-Olvera, Samuel Fraiberger
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.19260
Pdf link: https://arxiv.org/pdf/2403.19260
Abstract To address the global issue of hateful content proliferating in online platforms, hate speech detection (HSD) models are typically developed on datasets collected in the United States, thereby failing to generalize to English dialects from the Majority World. Furthermore, HSD models are often evaluated on curated samples, raising concerns about overestimating model performance in real-world settings. In this work, we introduce NaijaHate, the first dataset annotated for HSD which contains a representative sample of Nigerian tweets. We demonstrate that HSD evaluated on biased datasets traditionally used in the literature largely overestimates real-world performance on representative data. We also propose NaijaXLM-T, a pretrained model tailored to the Nigerian Twitter context, and establish the key role played by domain-adaptive pretraining and finetuning in maximizing HSD performance. Finally, we show that in this context, a human-in-the-loop approach to content moderation where humans review 1% of Nigerian tweets flagged as hateful would enable to moderate 60% of all hateful content. Taken together, these results pave the way towards robust HSD systems and a better protection of social media users from hateful content in low-resource settings.
DeepSample: DNN sampling-based testing for operational accuracy assessment
Authors: Antonio Guerriero, Roberto Pietrantuono, Stefano Russo
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.19271
Pdf link: https://arxiv.org/pdf/2403.19271
Abstract Deep Neural Networks (DNN) are core components for classification and regression tasks of many software systems. Companies incur in high costs for testing DNN with datasets representative of the inputs expected in operation, as these need to be manually labelled. The challenge is to select a representative set of test inputs as small as possible to reduce the labelling cost, while sufficing to yield unbiased high-confidence estimates of the expected DNN accuracy. At the same time, testers are interested in exposing as many DNN mispredictions as possible to improve the DNN, ending up in the need for techniques pursuing a threefold aim: small dataset size, trustworthy estimates, mispredictions exposure. This study presents DeepSample, a family of DNN testing techniques for cost-effective accuracy assessment based on probabilistic sampling. We investigate whether, to what extent, and under which conditions probabilistic sampling can help to tackle the outlined challenge. We implement five new sampling-based testing techniques, and perform a comprehensive comparison of such techniques and of three further state-of-the-art techniques for both DNN classification and regression tasks. Results serve as guidance for best use of sampling-based testing for faithful and high-confidence estimates of DNN accuracy in operation at low cost.
Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender Systems
Authors: Kexin Shi, Jing Zhang, Linjiajie Fang, Wenjia Wang, Bingyi Jing
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2403.19276
Pdf link: https://arxiv.org/pdf/2403.19276
Abstract In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning. However, the inadvertent selection of false negatives remains a major concern in hard negative sampling, as these false negatives can provide incorrect information and mislead the model learning. To date, only a small number of studies have been committed to solve the false negative problem, primarily focusing on designing sophisticated sampling algorithms to filter false negatives. In contrast, this paper shifts its focus to refining the loss function. We find that the original Bayesian Personalized Ranking (BPR), initially designed for uniform negative sampling, is inadequate in adapting to hard sampling scenarios. Hence, we introduce an enhanced Bayesian Personalized Ranking objective, named as Hard-BPR, which is specifically crafted for dynamic hard negative sampling to mitigate the influence of false negatives. This method is simple yet efficient for real-world deployment. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness and robustness of our approach, along with the enhanced ability to distinguish false negatives.
Fine-Tuning Language Models with Reward Learning on Policy
Authors: Hao Lang, Fei Huang, Yongbin Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.19279
Pdf link: https://arxiv.org/pdf/2403.19279
Abstract Reinforcement learning from human feedback (RLHF) has emerged as an effective approach to aligning large language models (LLMs) to human preferences. RLHF contains three steps, i.e., human preference collecting, reward learning, and policy optimization, which are usually performed serially. Despite its popularity, however, (fixed) reward models may suffer from inaccurate off-distribution, since policy optimization continuously shifts LLMs' data distribution. Repeatedly collecting new preference data from the latest LLMs may alleviate this issue, which unfortunately makes the resulting system more complicated and difficult to optimize. In this paper, we propose reward learning on policy (RLP), an unsupervised framework that refines a reward model using policy samples to keep it on-distribution. Specifically, an unsupervised multi-view learning method is introduced to learn robust representations of policy samples. Meanwhile, a synthetic preference generation approach is developed to simulate high-quality preference data with policy outputs. Extensive experiments on three benchmark datasets show that RLP consistently outperforms the state-of-the-art. Our code is available at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/rlp}.
Adaptive optimization of isogeometric multi-patch discretizations using artificial neural networks
Authors: Dany Rios, Felix Scholz, Thomas Takacs
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2403.19286
Pdf link: https://arxiv.org/pdf/2403.19286
Abstract In isogeometric analysis, isogeometric function spaces are employed for accurately representing the solution to a partial differential equation (PDE) on a parameterized domain. They are generated from a tensor-product spline space by composing the basis functions with the inverse of the parameterization. Depending on the geometry of the domain and on the data of the PDE, the solution might not have maximum Sobolev regularity, leading to a reduced convergence rate. In this case it is necessary to reduce the local mesh size close to the singularities. The classical approach is to perform adaptive h-refinement, which either leads to an unnecessarily large number of degrees of freedom or to a spline space that does not possess a tensor-product structure. Based on the concept of r-adaptivity we present a novel approach for finding a suitable isogeometric function space for a given PDE without sacrificing the tensor-product structure of the underlying spline space. In particular, we use the fact that different reparameterizations of the same computational domain lead to different isogeometric function spaces while preserving the geometry. Starting from a multi-patch domain consisting of bilinearly parameterized patches, we aim to find the biquadratic multi-patch parameterization that leads to the isogeometric function space with the smallest best approximation error of the solution. In order to estimate the location of the optimal control points, we employ a trained residual neural network that is applied to the graph surfaces of the approximated solution and its derivatives. In our experimental results, we observe that our new method results in a vast improvement of the approximation error for different PDE problems on multi-patch domains.
Graph Neural Networks for Treatment Effect Prediction
Authors: George Panagopoulos, Daniele Malitesta, Fragkiskos D. Malliaros, Jun Pang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2403.19289
Pdf link: https://arxiv.org/pdf/2403.19289
Abstract Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random underlining the need for models that can generalize with limited labeled samples to reduce experimental risks.
MRNaB: Mixed Reality-based Robot Navigation Interface using Optical-see-through MR-beacon
Authors: Eduardo Iglesius, Masato Kobayashi, Yuki Uranishi, Haruo Takemura
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2403.19310
Pdf link: https://arxiv.org/pdf/2403.19310
Abstract Recent advancements in robotics have led to the development of numerous interfaces to enhance the intuitiveness of robot navigation. However, the reliance on traditional 2D displays imposes limitations on the simultaneous visualization of information. Mixed Reality (MR) technology addresses this issue by enhancing the dimensionality of information visualization, allowing users to perceive multiple pieces of information concurrently. This paper proposes Mixed reality-based robot navigation interface using an optical-see-through MR-beacon (MRNaB), a novel approach that incorporates an MR-beacon, situated atop the real-world environment, to function as a signal transmitter for robot navigation. This MR-beacon is designed to be persistent, eliminating the need for repeated navigation inputs for the same location. Our system is mainly constructed into four primary functions: "Add", "Move", "Delete", and "Select". These allow for the addition of a MR-beacon, location movement, its deletion, and the selection of MR-beacon for navigation purposes, respectively. The effectiveness of the proposed method was then validated through experiments by comparing it with the traditional 2D system. As the result, MRNaB was proven to increase the performance of the user when doing navigation to a certain place subjectively and objectively. For additional material, please check: https://mertcookimg.github.io/mrnab
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction
Authors: Xiaoyang Lyu, Chirui Chang, Peng Dai, Yang-tian Sun, Xiaojuang Qi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19314
Pdf link: https://arxiv.org/pdf/2403.19314
Abstract Scene reconstruction from multi-view images is a fundamental problem in computer vision and graphics. Recent neural implicit surface reconstruction methods have achieved high-quality results; however, editing and manipulating the 3D geometry of reconstructed scenes remains challenging due to the absence of naturally decomposed object entities and complex object/background compositions. In this paper, we present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction. Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition. Total-Decom requires minimal human annotations while providing users with real-time control over the granularity and quality of decomposition. We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing. The code is available at \href{https://github.com/CVMI-Lab/Total-Decom.git}{https://github.com/CVMI-Lab/Total-Decom.git}.
Beyond Borders: Investigating Cross-Jurisdiction Transfer in Legal Case Summarization
Authors: T.Y.S.S Santosh, Vatsal Venkatkrishna, Saptarshi Ghosh, Matthias Grabmair
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.19317
Pdf link: https://arxiv.org/pdf/2403.19317
Abstract Legal professionals face the challenge of managing an overwhelming volume of lengthy judgments, making automated legal case summarization crucial. However, prior approaches mainly focused on training and evaluating these models within the same jurisdiction. In this study, we explore the cross-jurisdictional generalizability of legal case summarization models.Specifically, we explore how to effectively summarize legal cases of a target jurisdiction where reference summaries are not available. In particular, we investigate whether supplementing models with unlabeled target jurisdiction corpus and extractive silver summaries obtained from unsupervised algorithms on target data enhances transfer performance. Our comprehensive study on three datasets from different jurisdictions highlights the role of pre-training in improving transfer performance. We shed light on the pivotal influence of jurisdictional similarity in selecting optimal source datasets for effective transfer. Furthermore, our findings underscore that incorporating unlabeled target data yields improvements in general pre-trained models, with additional gains when silver summaries are introduced. This augmentation is especially valuable when dealing with extractive datasets and scenarios featuring limited alignment between source and target jurisdictions. Our study provides key insights for developing adaptable legal case summarization systems, transcending jurisdictional boundaries.
Learning a Formally Verified Control Barrier Function in Stochastic Environment
Authors: Manan Tayal, Hongchao Zhang, Pushpak Jagtap, Andrew Clark, Shishir Kolathaya
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2403.19332
Pdf link: https://arxiv.org/pdf/2403.19332
Abstract Safety is a fundamental requirement of control systems. Control Barrier Functions (CBFs) are proposed to ensure the safety of the control system by constructing safety filters or synthesizing control inputs. However, the safety guarantee and performance of safe controllers rely on the construction of valid CBFs. Inspired by universal approximatability, CBFs are represented by neural networks, known as neural CBFs (NCBFs). This paper presents an algorithm for synthesizing formally verified continuous-time neural Control Barrier Functions in stochastic environments in a single step. The proposed training process ensures efficacy across the entire state space with only a finite number of data points by constructing a sample-based learning framework for Stochastic Neural CBFs (SNCBFs). Our methodology eliminates the need for post hoc verification by enforcing Lipschitz bounds on the neural network, its Jacobian, and Hessian terms. We demonstrate the effectiveness of our approach through case studies on the inverted pendulum system and obstacle avoidance in autonomous driving, showcasing larger safe regions compared to baseline methods.
Test-Time Domain Generalization for Face Anti-Spoofing
Authors: Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Shouhong Ding, Lizhuang Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19334
Pdf link: https://arxiv.org/pdf/2403.19334
Abstract Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks. While domain generalization (DG) methods have been developed to enhance FAS performance, they predominantly focus on learning domain-invariant features during training, which may not guarantee generalizability to unseen data that differs largely from the source distributions. Our insight is that testing data can serve as a valuable resource to enhance the generalizability beyond mere evaluation for DG FAS. In this paper, we introduce a novel Test-Time Domain Generalization (TTDG) framework for FAS, which leverages the testing data to boost the model's generalizability. Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space. In particular, we first introduce the innovative TTSP to project the styles of the arbitrarily unseen samples of the testing distribution to the known source space of the training distributions. We then design the efficient DSSS to synthesize diverse style shifts via learnable style bases with two specifically designed losses in a hyperspherical feature space. Our method eliminates the need for model updates at the test time and can be seamlessly integrated into not only the CNN but also ViT backbones. Comprehensive experiments on widely used cross-domain FAS benchmarks demonstrate our method's state-of-the-art performance and effectiveness.
IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation
Authors: Jiacui Huang, Hongtao Zhang, Mingbo Zhao, Zhou Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2403.19336
Pdf link: https://arxiv.org/pdf/2403.19336
Abstract Vision-and-Language Navigation (VLN) is a challenging task that requires a robot to navigate in photo-realistic environments with human natural language promptings. Recent studies aim to handle this task by constructing the semantic spatial map representation of the environment, and then leveraging the strong ability of reasoning in large language models for generalizing code for guiding the robot navigation. However, these methods face limitations in instance-level and attribute-level navigation tasks as they cannot distinguish different instances of the same object. To address this challenge, we propose a new method, namely, Instance-aware Visual Language Map (IVLMap), to empower the robot with instance-level and attribute-level semantic mapping, where it is autonomously constructed by fusing the RGBD video data collected from the robot agent with special-designed natural language map indexing in the bird's-in-eye view. Such indexing is instance-level and attribute-level. In particular, when integrated with a large language model, IVLMap demonstrates the capability to i) transform natural language into navigation targets with instance and attribute information, enabling precise localization, and ii) accomplish zero-shot end-to-end navigation tasks based on natural language commands. Extensive navigation experiments are conducted. Simulation results illustrate that our method can achieve an average improvement of 14.4\% in navigation accuracy. Code and demo are released at https://ivlmap.github.io/.
A diverse Multilingual News Headlines Dataset from around the World
Authors: Felix Leeb, Bernhard Schölkopf
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.19352
Pdf link: https://arxiv.org/pdf/2403.19352
Abstract Babel Briefings is a novel dataset featuring 4.7 million news headlines from August 2020 to November 2021, across 30 languages and 54 locations worldwide with English translations of all articles included. Designed for natural language processing and media studies, it serves as a high-quality dataset for training or evaluating language models as well as offering a simple, accessible collection of articles, for example, to analyze global news coverage and cultural narratives. As a simple demonstration of the analyses facilitated by this dataset, we use a basic procedure using a TF-IDF weighted similarity metric to group articles into clusters about the same event. We then visualize the \emph{event signatures} of the event showing articles of which languages appear over time, revealing intuitive features based on the proximity of the event and unexpectedness of the event. The dataset is available on \href{https://www.kaggle.com/datasets/felixludos/babel-briefings}{Kaggle} and \href{https://huggingface.co/datasets/felixludos/babel-briefings}{HuggingFace} with accompanying \href{https://github.com/felixludos/babel-briefings}{GitHub} code.
A Software-Defined Networking Solution for Interconnecting Network Functions in Service-Based Architectures
Authors: Pablo Fondo-Ferreiro, Felipe Gil-Castiñeira, Francisco Javier González-Castaño, David Candal-Ventureira
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2403.19353
Pdf link: https://arxiv.org/pdf/2403.19353
Abstract Mobile core networks handle critical control functions for delivering services in modern cellular networks. Traditional point-to-point architectures, where network functions are directly connected through standardized interfaces, are being substituted by service-based architectures (SBAs), where core functionalities are finer-grained microservices decoupled from the underlying infrastructure. In this way, network functions and services can be distributed, with scaling and fail-over mechanisms, and can be dynamically deployed, updated, or removed to support slicing. A myriad of network functions can be deployed or removed according to traffic flows, thereby increasing the complexity of connection management. In this context, 3GPP Release 16 defines the service communication proxy (SCP) as a unified communication interface for a set of network functions. In this paper, we propose a novel software-defined networking (SDN)-based solution with the same role for a service mesh architecture where network functions can be deployed anywhere in the infrastructure. We demonstrated its efficiency in comparison with alternative architectures.
A robust two-level overlapping preconditioner for Darcy flow in high-contrast media
Authors: Changqing Ye, Shubin Fu, Eric T. Chung, Jizu Huang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2403.19356
Pdf link: https://arxiv.org/pdf/2403.19356
Abstract In this article, a two-level overlapping domain decomposition preconditioner is developed for solving linear algebraic systems obtained from simulating Darcy flow in high-contrast media. Our preconditioner starts at a mixed finite element method for discretizing the partial differential equation by Darcy's law with the no-flux boundary condition and is then followed by a velocity elimination technique to yield a linear algebraic system with only unknowns of pressure. Then, our main objective is to design a robust and efficient domain decomposition preconditioner for this system, which is accomplished by engineering a multiscale coarse space that is capable of characterizing high-contrast features of the permeability field. A generalized eigenvalue problem is solved in each non-overlapping coarse element in a communication-free manner to form the global solver, which is accompanied by local solvers originated from additive Schwarz methods but with a non-Galerkin discretization to derive the two-level preconditioner. We provide a rigorous analysis that indicates that the condition number of the preconditioned system could be bounded above with several assumptions. Extensive numerical experiments with various types of three-dimensional high-contrast models are exhibited. In particular, we study the robustness against the contrast of the media as well as the influences of numbers of eigenfunctions, oversampling sizes, and subdomain partitions on the efficiency of the proposed preconditioner. Besides, strong and weak scalability performances are also examined.
Coordinated Allocation of Radio Resources to Wi-Fi and Cellular Technologies in Shared Unlicensed Frequencies
Authors: David Candal-Ventureira, Francisco Javier González-Castaño, Felipe Gil-Castiñeira, Pablo Fondo-Ferreiro
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2403.19359
Pdf link: https://arxiv.org/pdf/2403.19359
Abstract Wireless connectivity is essential for industrial production processes and workflow management. Moreover, the connectivity requirements of industrial devices, which are usually long-term investments, are diverse and require different radio interfaces. In this regard, the 3GPP has studied how to support heterogeneous radio access technologies (RATs) such as Wi-Fi and unlicensed cellular technologies in 5G core networks. In some cases, these technologies coexist in the same spectrum. Dynamic spectrum sharing (DSS), which has already been proven to increase spectrum efficiency in licensed bands, can also be applied to this scenario. In this paper, we propose two solutions for mobile network operators (MNOs) or service providers to dynamically divide (multiplex) the radio resources of a shared channel between a Wi-Fi basic service set (BSS) and one or several carriers of scheduled wireless networks, such as cellular technologies, with a configurable level of sharing granularity. These solutions do not require modifications to the current commercial off-the-shelf (COTS) end devices. We adapt the existing IEEE 802.11 procedures to notify the Wi-Fi stations that they must share channels with different access networks. We demonstrate that our dynamic sharing proposals are also advantageous over direct coexistence and evaluate each of them quantitatively and qualitatively to determine when one or the other is preferable. The evaluation is particularized for IEEE 802.11ac and long-term evolution (LTE) license assisted access (LAA), but the solutions can be easily extended to 5G new radio-unlicensed (5G NR-U) or to any other wireless technology in which the network side schedules end device transmissions.
EthioMT: Parallel Corpus for Low-resource Ethiopian Languages
Authors: Atnafu Lambebo Tonja, Olga Kolesnikova, Alexander Gelbukh, Jugal Kalita
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2403.19365
Pdf link: https://arxiv.org/pdf/2403.19365
Abstract Recent research in natural language processing (NLP) has achieved impressive performance in tasks such as machine translation (MT), news classification, and question-answering in high-resource languages. However, the performance of MT leaves much to be desired for low-resource languages. This is due to the smaller size of available parallel corpora in these languages, if such corpora are available at all. NLP in Ethiopian languages suffers from the same issues due to the unavailability of publicly accessible datasets for NLP tasks, including MT. To help the research community and foster research for Ethiopian languages, we introduce EthioMT -- a new parallel corpus for 15 languages. We also create a new benchmark by collecting a dataset for better-researched languages in Ethiopia. We evaluate the newly collected corpus and the benchmark dataset for 23 Ethiopian languages using transformer and fine-tuning approaches.
A noise-tolerant, resource-saving probabilistic binary neural network implemented by the SOT-MRAM compute-in-memory system
Authors: Yu Gu, Puyang Huang, Tianhao Chen, Chenyi Fu, Aitian Chen, Shouzhong Peng, Xixiang Zhang, Xufeng Kou
Subjects: Emerging Technologies (cs.ET); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2403.19374
Pdf link: https://arxiv.org/pdf/2403.19374
Abstract We report a spin-orbit torque(SOT) magnetoresistive random-access memory(MRAM)-based probabilistic binary neural network(PBNN) for resource-saving and hardware noise-tolerant computing applications. With the presence of thermal fluctuation, the non-destructive SOT-driven magnetization switching characteristics lead to a random weight matrix with controllable probability distribution. In the meanwhile, the proposed CIM architecture allows for the concurrent execution of the probabilistic vector-matrix multiplication (PVMM) and binarization. Furthermore, leveraging the effectiveness of random binary cells to propagate multi-bit probabilistic information, our SOT-MRAM-based PBNN system achieves a 97.78\% classification accuracy under a 7.01\% weight variation on the MNIST database through 10 sampling cycles, and the number of bit-level computation operations is reduced by a factor of 6.9 compared to that of the full-precision LeNet-5 network. Our work provides a compelling framework for the design of reliable neural networks tailored to the applications with low power consumption and limited computational resources.
Clustering MOOC Programming Solutions to Diversify Their Presentation to Students
Authors: Elizaveta Artser, Anastasiia Birillo, Yaroslav Golubev, Maria Tigina, Hieke Keuning, Nikolay Vyahhi, Timofey Bryksin
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2403.19398
Pdf link: https://arxiv.org/pdf/2403.19398
Abstract In many MOOCs, whenever a student completes a programming task, they can see previous solutions of other students to find potentially different ways of solving the problem and learn new coding constructs. However, a lot of MOOCs simply show the most recent solutions, disregarding their diversity or quality. To solve this novel problem, we adapted the existing plagiarism detection tool JPlag to Python submissions on Hyperskill, a popular MOOC platform. However, due to the tool's inner algorithm, it fully processed only 46 out of 867 studied tasks. Therefore, we developed our own tool called Rhubarb. This tool first standardizes solutions that are algorithmically the same, then calculates the structure-aware edit distance between them, and then applies clustering. Finally, it selects one example from each of the largest clusters, taking into account their code quality. Rhubarb was able to handle all 867 tasks successfully. We compared approaches on a set of 59 tasks that both tools could process. Eight experts rated the selected solutions based on diversity, code quality, and usefulness. The default platform approach of selecting recent submissions received on average 3.12 out of 5, JPlag - 3.77, Rhubarb - 3.50. Since in the real MOOC, it is imperative to process everything, we created a system that uses JPlag on the 5.3% of tasks it fully processes and Rhubarb on the remaining 94.7%.
Scaling up ridge regression for brain encoding in a massive individual fMRI dataset
Authors: Sana Ahmadi, Pierre Bellec, Tristan Glatard
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2403.19421
Pdf link: https://arxiv.org/pdf/2403.19421
Abstract Brain encoding with neuroimaging data is an established analysis aimed at predicting human brain activity directly from complex stimuli features such as movie frames. Typically, these features are the latent space representation from an artificial neural network, and the stimuli are image, audio, or text inputs. Ridge regression is a popular prediction model for brain encoding due to its good out-of-sample generalization performance. However, training a ridge regression model can be highly time-consuming when dealing with large-scale deep functional magnetic resonance imaging (fMRI) datasets that include many space-time samples of brain activity. This paper evaluates different parallelization techniques to reduce the training time of brain encoding with ridge regression on the CNeuroMod Friends dataset, one of the largest deep fMRI resource currently available. With multi-threading, our results show that the Intel Math Kernel Library (MKL) significantly outperforms the OpenBLAS library, being 1.9 times faster using 32 threads on a single machine. We then evaluated the Dask multi-CPU implementation of ridge regression readily available in scikit-learn (MultiOutput), and we proposed a new "batch" version of Dask parallelization, motivated by a time complexity analysis. In line with our theoretical analysis, MultiOutput parallelization was found to be impractical, i.e., slower than multi-threading on a single machine. In contrast, the Batch-MultiOutput regression scaled well across compute nodes and threads, providing speed-ups of up to 33 times with 8 compute nodes and 32 threads compared to a single-threaded scikit-learn execution. Batch parallelization using Dask thus emerges as a scalable approach for brain encoding with ridge regression on high-performance computing systems using scikit-learn and large fMRI datasets.
Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality
Authors: Kyotaro Tokoro, Kazutoshi Akita, Norimichi Ukita
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19428
Pdf link: https://arxiv.org/pdf/2403.19428
Abstract While burst LR images are useful for improving the SR image quality compared with a single LR image, prior SR networks accepting the burst LR images are trained in a deterministic manner, which is known to produce a blurry SR image. In addition, it is difficult to perfectly align the burst LR images, making the SR image more blurry. Since such blurry images are perceptually degraded, we aim to reconstruct the sharp high-fidelity boundaries. Such high-fidelity images can be reconstructed by diffusion models. However, prior SR methods using the diffusion model are not properly optimized for the burst SR task. Specifically, the reverse process starting from a random sample is not optimized for image enhancement and restoration methods, including burst SR. In our proposed method, on the other hand, burst LR features are used to reconstruct the initial burst SR image that is fed into an intermediate step in the diffusion model. This reverse process from the intermediate step 1) skips diffusion steps for reconstructing the global structure of the image and 2) focuses on steps for refining detailed textures. Our experimental results demonstrate that our method can improve the scores of the perceptual quality metrics. Code: https://github.com/placerkyo/BSRD
Learning Sampling Distribution and Safety Filter for Autonomous Driving with VQ-VAE and Differentiable Optimization
Authors: Simon Idoko, Basant Sharma, Arun Kumar Singh
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2403.19461
Pdf link: https://arxiv.org/pdf/2403.19461
Abstract Sampling trajectories from a distribution followed by ranking them based on a specified cost function is a common approach in autonomous driving. Typically, the sampling distribution is hand-crafted (e.g a Gaussian, or a grid). Recently, there have been efforts towards learning the sampling distribution through generative models such as Conditional Variational Autoencoder (CVAE). However, these approaches fail to capture the multi-modality of the driving behaviour due to the Gaussian latent prior of the CVAE. Thus, in this paper, we re-imagine the distribution learning through vector quantized variational autoencoder (VQ-VAE), whose discrete latent-space is well equipped to capture multi-modal sampling distribution. The VQ-VAE is trained with demonstration data of optimal trajectories. We further propose a differentiable optimization based safety filter to minimally correct the VQVAE sampled trajectories to ensure collision avoidance. We use backpropagation through the optimization layers in a self-supervised learning set-up to learn good initialization and optimal parameters of the safety filter. We perform extensive comparisons with state-of-the-art CVAE-based baseline in dense and aggressive traffic scenarios and show a reduction of up to 12 times in collision-rate while being competitive in driving speeds.
Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Authors: Teodor V. Marinov, Alekh Agarwal, Mircea Trofin
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2403.19462
Pdf link: https://arxiv.org/pdf/2403.19462
Abstract This work studies a Reinforcement Learning (RL) problem in which we are given a set of trajectories collected with K baseline policies. Each of these policies can be quite suboptimal in isolation, and have strong performance in complementary parts of the state space. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space. We propose a simple imitation learning based algorithm, show a sample complexity bound on its accuracy and prove that the the algorithm is minimax optimal by showing a matching lower bound. Further, we apply the algorithm in the setting of machine learning guided compiler optimization to learn policies for inlining programs with the objective of creating a small binary. We demonstrate that we can learn a policy that outperforms an initial policy learned via standard RL through a few iterations of our approach.
The linear sampling method for data generated by small random scatterers
Authors: J. Garnier, H. Haddar, H. Montanelli
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2403.19482
Pdf link: https://arxiv.org/pdf/2403.19482
Abstract We present an extension of the linear sampling method for solving the sound-soft inverse scattering problem in two dimensions with data generated by randomly distributed small scatterers. The theoretical justification of our novel sampling method is based on a rigorous asymptotic model, a modified Helmholtz--Kirchhoff identity, and our previous work on the linear sampling method for random sources. Our numerical implementation incorporates boundary elements, Singular Value Decomposition, Tikhonov regularization, and Morozov's discrepancy principle. We showcase the robustness and accuracy of our algorithms with a series of numerical experiments.
Tensor Network-Constrained Kernel Machines as Gaussian Processes
Authors: Frederiek Wesel, Kim Batselier
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2403.19500
Pdf link: https://arxiv.org/pdf/2403.19500
Abstract Tensor Networks (TNs) have recently been used to speed up kernel machines by constraining the model weights, yielding exponential computational and storage savings. In this paper we prove that the outputs of Canonical Polyadic Decomposition (CPD) and Tensor Train (TT)-constrained kernel machines recover a Gaussian Process (GP), which we fully characterize, when placing i.i.d. priors over their parameters. We analyze the convergence of both CPD and TT-constrained models, and show how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters. We empirically observe this behavior in two numerical experiments where we respectively analyze the convergence to the GP and the performance at prediction. We thereby establish a connection between TN-constrained kernel machines and GPs.
SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations
Authors: Xuan Zhang, Jacob Helwig, Yuchao Lin, Yaochen Xie, Cong Fu, Stephan Wojtowytsch, Shuiwang Ji
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.19507
Pdf link: https://arxiv.org/pdf/2403.19507
Abstract We consider using deep neural networks to solve time-dependent partial differential equations (PDEs), where multi-scale processing is crucial for modeling complex, time-evolving dynamics. While the U-Net architecture with skip connections is commonly used by prior studies to enable multi-scale processing, our analysis shows that the need for features to evolve across layers results in temporally misaligned features in skip connections, which limits the model's performance. To address this limitation, we propose SineNet, consisting of multiple sequentially connected U-shaped network blocks, referred to as waves. In SineNet, high-resolution features are evolved progressively through multiple stages, thereby reducing the amount of misalignment within each stage. We furthermore analyze the role of skip connections in enabling both parallel and sequential processing of multi-scale information. Our method is rigorously tested on multiple PDE datasets, including the Navier-Stokes equations and shallow water equations, showcasing the advantages of our proposed approach over conventional U-Nets with a comparable parameter budget. We further demonstrate that increasing the number of waves in SineNet while maintaining the same number of parameters leads to a monotonically improved performance. The results highlight the effectiveness of SineNet and the potential of our approach in advancing the state-of-the-art in neural PDE solver design. Our code is available as part of AIRS (https://github.com/divelab/AIRS).
CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network
Authors: Jie Wen, Zheng Zhang, Yong Xu, Bob Zhang, Lunke Fei, Guo-Sen Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.19514
Pdf link: https://arxiv.org/pdf/2403.19514
Abstract In recent years, incomplete multi-view clustering, which studies the challenging multi-view clustering problem on missing views, has received growing research interests. Although a series of methods have been proposed to address this issue, the following problems still exist: 1) Almost all of the existing methods are based on shallow models, which is difficult to obtain discriminative common representations. 2) These methods are generally sensitive to noise or outliers since the negative samples are treated equally as the important samples. In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues. Specifically, it captures the high-level features and local structure of each view by incorporating the view-specific deep encoders and graph embedding strategy into a framework. Moreover, based on the human cognition, i.e., learning from easy to hard, it introduces a self-paced strategy to select the most confident samples for model training, which can reduce the negative influence of outliers. Experimental results on several incomplete datasets show that CDIMC-net outperforms the state-of-the-art incomplete multi-view clustering methods.
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Authors: Yuzheng Wang, Dingkang Yang, Zhaoyu Chen, Yang Liu, Siao Liu, Wenqiang Zhang, Lihua Zhang, Lizhe Qi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19539
Pdf link: https://arxiv.org/pdf/2403.19539
Abstract Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data. Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data. However, a long-overlooked issue is that the severe distribution shifts between their substitution and original data, which manifests as huge differences in the quality of images and class proportions. The harmful shifts are essentially the confounder that significantly causes performance bottlenecks. To tackle the issue, this paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts. By designing a customized causal graph, we first reveal the causalities among the variables in the DFKD task. Subsequently, we propose a Knowledge Distillation Causal Intervention (KDCI) framework based on the backdoor adjustment to de-confound the confounder. KDCI can be flexibly combined with most existing state-of-the-art baselines. Experiments in combination with six representative DFKD methods demonstrate the effectiveness of our KDCI, which can obviously help existing methods under almost all settings, \textit{e.g.}, improving the baseline by up to 15.54\% accuracy on the CIFAR-100 dataset.
Expectation Maximization Aided Modified Weighted Sequential Energy Detector for Distributed Cooperative Spectrum Sensing
Authors: Mohammed Rashid, Jeffrey A. Nanzer
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2403.19556
Pdf link: https://arxiv.org/pdf/2403.19556
Abstract Distributed cooperative spectrum sensing usually involves a group of unlicensed secondary users (SUs) collaborating to detect the primary user (PU) in the channel, and thereby opportunistically utilize it without causing interference to the PU. The conventional energy detector (ED) based spectrum sensing ignores the dynamic nature of the PU by using energy statistic only from the present sensing interval for the PU detection. However, for a dynamic PU, previous studies have shown that improved detection capabilities can be achieved by aggregating both present and past energy samples in a test statistic. To this end, a weighted sequential energy detector (WSED) has been proposed, but it is based on aggregating all the collected energy samples over an observation window. For a highly dynamic PU, that involves also combining the outdated samples in the test statistic. In this paper, we propose a modified WSED (mWSED) that uses the primary user states information over the window to aggregate only the highly correlated energy samples in its test statistic. In practice, since the PU states are a priori unknown, we also develop a joint expectation-maximization and Viterbi (EM-Viterbi) algorithm based scheme to iteratively estimate the states by using the energy samples collected over the window. The estimated states are then used in mWSED to compute its test statistics, and the algorithm is referred to here as EM-mWSED. Simulation results are presented to demonstrate the states estimation performance of EM-Viterbi and the PU detection performance of EM-mWSED. The results show that, for both highly dynamic as well as slowly time-varying PU, these algorithms outperform the ED and WSED at PU detection, and their performances improve by either increasing the average number of neighbors per SU in the network, or by increasing the SNR or the number of samples per energy statistic.
A Public and Reproducible Assessment of the Topics API on Real Data
Authors: Yohan Beugin, Patrick McDaniel
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2403.19577
Pdf link: https://arxiv.org/pdf/2403.19577
Abstract The Topics API for the web is Google's privacy-enhancing alternative to replace third-party cookies. Results of prior work have led to an ongoing discussion between Google and research communities about the capability of Topics to trade off both utility and privacy. The central point of contention is largely around the realism of the datasets used in these analyses and their reproducibility; researchers using data collected on a small sample of users or generating synthetic datasets, while Google's results are inferred from a private dataset. In this paper, we complement prior research by performing a reproducible assessment of the latest version of the Topics API on the largest and publicly available dataset of real browsing histories. First, we measure how unique and stable real users' interests are over time. Then, we evaluate if Topics can be used to fingerprint the users from these real browsing traces by adapting methodologies from prior privacy studies. Finally, we call on web actors to perform and enable reproducible evaluations by releasing anonymized distributions. We find that 46%, 55%, and 60% of the 1207 users in the dataset are uniquely re-identified across websites after only 1, 2, and 3 observations of their topics by advertisers, respectively. This paper shows on real data that Topics does not provide the same privacy guarantees to all users, further highlighting the need for public and reproducible evaluations of the claims made by new web proposals.
TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
Authors: Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2403.19586
Pdf link: https://arxiv.org/pdf/2403.19586
Abstract Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. The code will be publicly available.
Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
Authors: Aimon Rahman, Malsha V. Perera, Vishal M. Patel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19593
Pdf link: https://arxiv.org/pdf/2403.19593
Abstract Building on the momentum of image generation diffusion models, there is an increasing interest in video-based diffusion models. However, video generation poses greater challenges due to its higher-dimensional nature, the scarcity of training data, and the complex spatiotemporal relationships involved. Image generation models, due to their extensive data requirements, have already strained computational resources to their limits. There have been instances of these models reproducing elements from the training samples, leading to concerns and even legal disputes over sample replication. Video diffusion models, which operate with even more constrained datasets and are tasked with generating both spatial and temporal content, may be more prone to replicating samples from their training sets. Compounding the issue, these models are often evaluated using metrics that inadvertently reward replication. In our paper, we present a systematic investigation into the phenomenon of sample replication in video diffusion models. We scrutinize various recent diffusion models for video synthesis, assessing their tendency to replicate spatial and temporal content in both unconditional and conditional generation scenarios. Our study identifies strategies that are less likely to lead to replication. Furthermore, we propose new evaluation strategies that take replication into account, offering a more accurate measure of a model's ability to generate the original content.
Nearest Neighbor Classication for Classical Image Upsampling
Authors: Evan Matthews, Nicolas Prate
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2403.19611
Pdf link: https://arxiv.org/pdf/2403.19611
Abstract Given a set of ordered pixel data in the form of an image, our goal is to perform upsampling on the data such that: the resulting resolution is improved by some factor, the final result passes the human test, having added new, believable, and realistic information and detail to the image, the time complexity for upscaling is relatively close to that of lossy upscaling implementations.
SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing
Authors: Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19615
Pdf link: https://arxiv.org/pdf/2403.19615
Abstract In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art method Mip-Splatting needs modifying the training procedure of Gaussian splatting, our method functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alising performance. The core technique is to apply 2D scale-adaptive filters to each Gaussian during test time. As pointed out by Mip-Splatting, observing Gaussians at different frequencies leads to mismatches between the Gaussian scales during training and testing. Mip-Splatting resolves this issue using 3D smoothing and 2D Mip filters, which are unfortunately not aware of testing frequency. In this work, we show that a 2D scale-adaptive filter that is informed of testing frequency can effectively match the Gaussian scale, thus making the Gaussian primitive distribution remain consistent across different testing frequencies. When scale inconsistency is eliminated, sampling rates smaller than the scene frequency result in conventional jaggedness, and we propose to integrate the projected 2D Gaussian within each pixel during testing. This integration is actually a limiting case of super-sampling, which significantly improves anti-aliasing performance over vanilla Gaussian Splatting. Through extensive experiments using various settings and both bounded and unbounded scenes, we show SA-GS performs comparably with or better than Mip-Splatting. Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated. Our codes, data and models are available at https://github.com/zsy1987/SA-GS.
Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models
Authors: Ole Hall, Anil Yaman
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2403.19620
Pdf link: https://arxiv.org/pdf/2403.19620
Abstract Generative Adversarial Networks (GANs) have shown great success in generating high quality images and are thus used as one of the main approaches to generate art images. However, usually the image generation process involves sampling from the latent space of the learned art representations, allowing little control over the output. In this work, we first employ GANs that are trained to produce creative images using an architecture known as Creative Adversarial Networks (CANs), then, we employ an evolutionary approach to navigate within the latent space of the models to discover images. We use automatic aesthetic and collaborative interactive human evaluation metrics to assess the generated images. In the human interactive evaluation case, we propose a collaborative evaluation based on the assessments of several participants. Furthermore, we also experiment with an intelligent mutation operator that aims to improve the quality of the images through local search based on an aesthetic measure. We evaluate the effectiveness of this approach by comparing the results produced by the automatic and collaborative interactive evolution. The results show that the proposed approach can generate highly attractive art images when the evolution is guided by collaborative human feedback.
Human-compatible driving partners through data-regularized self-play reinforcement learning
Authors: Daphne Cornelisse, Eugene Vinitsky
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2403.19648
Pdf link: https://arxiv.org/pdf/2403.19648
Abstract A central challenge for autonomous vehicles is coordinating with humans. Therefore, incorporating realistic human agents is essential for scalable training and evaluation of autonomous driving systems in simulation. Simulation agents are typically developed by imitating large-scale, high-quality datasets of human driving. However, pure imitation learning agents empirically have high collision rates when executed in a multi-agent closed-loop setting. To build agents that are realistic and effective in closed-loop settings, we propose Human-Regularized PPO (HR-PPO), a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy. In contrast to prior work, our approach is RL-first and only uses 30 minutes of imperfect human demonstrations. We evaluate agents in a large set of multi-agent traffic scenes. Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%. At the same time, the agents drive in a human-like manner, as measured by their similarity to existing human driving logs. We also find that HR-PPO agents show considerable improvements on proxy measures for coordination with human driving, particularly in highly interactive scenarios. We open-source our code and trained agents at https://github.com/Emerge-Lab/nocturne_lab and provide demonstrations of agent behaviors at https://sites.google.com/view/driving-partners.
GraspXL: Generating Grasping Motions for Diverse Objects at Scale
Authors: Hui Zhang, Sammy Christen, Zicong Fan, Otmar Hilliges, Jie Song
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2403.19649
Pdf link: https://arxiv.org/pdf/2403.19649
Abstract Human hands possess the dexterity to interact with diverse objects such as grasping specific parts of the objects and/or approaching them from desired directions. More importantly, humans can grasp objects of any shape without object-specific skills. Recent works synthesize grasping motions following single objectives such as a desired approach heading direction or a grasping area. Moreover, they usually rely on expensive 3D hand-object data during training and inference, which limits their capability to synthesize grasping motions for unseen objects at scale. In this paper, we unify the generation of hand-object grasping motions across multiple motion objectives, diverse object shapes and dexterous hand morphologies in a policy learning framework GraspXL. The objectives are composed of the graspable area, heading direction during approach, wrist rotation, and hand position. Without requiring any 3D hand-object interaction data, our policy trained with 58 objects can robustly synthesize diverse grasping motions for more than 500k unseen objects with a success rate of 82.2%. At the same time, the policy adheres to objectives, which enables the generation of diverse grasps per object. Moreover, we show that our framework can be deployed to different dexterous hands and work with reconstructed or generated objects. We quantitatively and qualitatively evaluate our method to show the efficacy of our approach. Our model and code will be available.
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Authors: Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2403.19651
Pdf link: https://arxiv.org/pdf/2403.19651
Abstract Image retrieval, i.e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures. Recent work leverages text instructions to allow users to more freely express their search intents. However, existing work primarily focuses on image pairs that are visually similar and/or can be characterized by a small set of pre-defined relations. The core thesis of this paper is that text instructions can enable retrieving images with richer relations beyond visual similarity. To show this, we introduce MagicLens, a series of self-supervised image retrieval models that support open-ended instructions. MagicLens is built on a key novel insight: image pairs that naturally occur on the same web pages contain a wide range of implicit relations (e.g., inside view of), and we can bring those implicit relations explicit by synthesizing instructions via large multimodal models (LMMs) and large language models (LLMs). Trained on 36.7M (query image, instruction, target image) triplets with rich semantic relations mined from the web, MagicLens achieves comparable or better results on eight benchmarks of various image retrieval tasks than prior state-of-the-art (SOTA) methods. Remarkably, it outperforms previous SOTA but with a 50X smaller model size on multiple benchmarks. Additional human analyses on a 1.4M-image unseen corpus further demonstrate the diversity of search intents supported by MagicLens.

Yukeaaa / arxiv-daily

【CS-part2】New submissions for Fri, 29 Mar 24 #1327

Keyword: webgpu

Keyword: webgl

Keyword: pre-rendering

Keyword: prerendering

Keyword: motion prediction

Egocentric Scene-aware Human Trajectory Prediction

Keyword: incremental learning

Keyword: svm incremental

Keyword: nerf

Sine Activated Low-Rank Matrices for Parameter Efficient Learning

Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects

Keyword: multiorgan

Keyword: multi-organ

AIC-UNet: Anatomy-informed Cascaded UNet for Robust Multi-Organ Segmentation

Keyword: multi organ

Keyword: SAM

Unleashing the Power of AI. A Systematic Review of Cutting-Edge Techniques in AI-Enhanced Scientometrics, Webometrics, and Bibliometrics

Graph Bayesian Optimization for Multiplex Influence Maximization

Targeted Visualization of the Backbone of Encoder LLMs

A Geometric Explanation of the Likelihood OOD Detection Paradox

Formal Verification of Consistency for Systems with Redundant Controllers

CPR: Retrieval Augmented Generation for Copyright Protection

Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

Random Aggregate Beamforming for Over-the-Air Federated Learning in Large-Scale Networks

Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical Systems

Conformal Intent Classification and Clarification for Fast and Accurate Intent Recognition

"Sorry, Come Again?" Prompting -- Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing

Dealing with Imbalanced Classes in Bot-IoT Dataset

Few-Shot Cross-System Anomaly Trace Classification for Microservice-based systems

Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards

Egocentric Scene-aware Human Trajectory Prediction

Visualizing High-Dimensional Temporal Data Using Direction-Aware t-SNE

Detecting Generative Parroting through Overfitting Masked Autoencoders

CAUSE: Counterfactual Assessment of User Satisfaction Estimation in Task-Oriented Dialogue Systems

MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck

A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement

A Stabilized Physics Informed Neural Networks Method for Wave Equations

Purposeful remixing with generative AI: Constructing designer voice in multimodal composing

Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications -- Part II: Joint Precoder, Radar Code, and Receive Filters Design

Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications -- Part III: Multi-Target Tracking

Code Comparison Tuning for Code Large Language Models

PoCo: A Self-Supervised Approach via Polar Transformation Based Progressive Contrastive Learning for Ophthalmic Disease Diagnosis

QNCD: Quantization Noise Correction for Diffusion Models

MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

Towards Understanding Dual BN In Hybrid Adversarial Training

RecDiffusion: Rectangling for Image Stitching with Diffusion Models

Mining Bug Repositories for Multi-Fault Programs

Ordering Collective Unit Tasks: from Scheduling to Computational Social Choice

Are Large Language Models Good at Utility Judgments?

Taming Lookup Tables for Efficient Image Retouching

RTracker: Recoverable Tracking via PN Tree Structured Memory

Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data

DeepSample: DNN sampling-based testing for operational accuracy assessment

Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender Systems

Fine-Tuning Language Models with Reward Learning on Policy

Adaptive optimization of isogeometric multi-patch discretizations using artificial neural networks

Graph Neural Networks for Treatment Effect Prediction

MRNaB: Mixed Reality-based Robot Navigation Interface using Optical-see-through MR-beacon

Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

Beyond Borders: Investigating Cross-Jurisdiction Transfer in Legal Case Summarization

Learning a Formally Verified Control Barrier Function in Stochastic Environment

Test-Time Domain Generalization for Face Anti-Spoofing

IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation

A diverse Multilingual News Headlines Dataset from around the World

A Software-Defined Networking Solution for Interconnecting Network Functions in Service-Based Architectures

A robust two-level overlapping preconditioner for Darcy flow in high-contrast media

Coordinated Allocation of Radio Resources to Wi-Fi and Cellular Technologies in Shared Unlicensed Frequencies

EthioMT: Parallel Corpus for Low-resource Ethiopian Languages

A noise-tolerant, resource-saving probabilistic binary neural network implemented by the SOT-MRAM compute-in-memory system

Clustering MOOC Programming Solutions to Diversify Their Presentation to Students

Scaling up ridge regression for brain encoding in a massive individual fMRI dataset

Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality

Learning Sampling Distribution and Safety Filter for Autonomous Driving with VQ-VAE and Differentiable Optimization

Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization

The linear sampling method for data generated by small random scatterers