【CS-part2】New submissions for Mon, 15 Apr 24

Keyword: webgpu

There is no result

Keyword: webgl

There is no result

Keyword: pre-rendering

There is no result

Keyword: prerendering

There is no result

Keyword: motion prediction

There is no result

Keyword: incremental learning

There is no result

Keyword: svm incremental

There is no result

Keyword: nerf

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

Authors: Yuqun Wu, Jae Yong Lee, Chuhang Zou, Shenlong Wang, Derek Hoiem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08252
Pdf link: https://arxiv.org/pdf/2404.08252
Abstract The latest regularized Neural Radiance Field (NeRF) approaches produce poor geometry and view extrapolation for multiview stereo (MVS) benchmarks such as ETH3D. In this paper, we aim to create 3D models that provide accurate geometry and view synthesis, partially closing the large geometric performance gap between NeRF and traditional MVS methods. We propose a patch-based approach that effectively leverages monocular surface normal and relative depth predictions. The patch-based ray sampling also enables the appearance regularization of normalized cross-correlation (NCC) and structural similarity (SSIM) between randomly sampled virtual and training views. We further show that "density restrictions" based on sparse structure-from-motion points can help greatly improve geometric accuracy with a slight drop in novel view synthesis metrics. Our experiments show 4x the performance of RegNeRF and 8x that of FreeNeRF on average F1@2cm for ETH3D MVS benchmark, suggesting a fruitful research direction to improve the geometric accuracy of NeRF-based models, and sheds light on a potential future approach to enable NeRF-based optimization to eventually outperform traditional MVS.
GPN: Generative Point-based NeRF
Authors: Haipeng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2404.08312
Pdf link: https://arxiv.org/pdf/2404.08312
Abstract Scanning real-life scenes with modern registration devices typically gives incomplete point cloud representations, primarily due to the limitations of partial scanning, 3D occlusions, and dynamic light conditions. Recent works on processing incomplete point clouds have always focused on point cloud completion. However, these approaches do not ensure consistency between the completed point cloud and the captured images regarding color and geometry. We propose using Generative Point-based NeRF (GPN) to reconstruct and repair a partial cloud by fully utilizing the scanning images and the corresponding reconstructed cloud. The repaired point cloud can achieve multi-view consistency with the captured images at high spatial resolution. For the finetunes of a single scene, we optimize the global latent condition by incorporating an Auto-Decoder architecture while retaining multi-view consistency. As a result, the generated point clouds are smooth, plausible, and geometrically consistent with the partial scanning images. Extensive experiments on ShapeNet demonstrate that our works achieve competitive performances to the other state-of-the-art point cloud-based neural scene rendering and editing performances.
OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering
Authors: Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08449
Pdf link: https://arxiv.org/pdf/2404.08449
Abstract Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively. Our code will be available for research purposes.
Keyword: multiorgan

There is no result

Keyword: multi-organ

There is no result

Keyword: multi organ

There is no result

Keyword: SAM

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
Authors: Guangchen Lan, Dong-Jun Han, Abolfazl Hashemi, Vaneet Aggarwal, Christopher G. Brinton
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.08003
Pdf link: https://arxiv.org/pdf/2404.08003
Abstract To improve the efficiency of reinforcement learning, we propose a novel asynchronous federated reinforcement learning framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To handle the challenge of lagged policies in asynchronous settings, we design delay-adaptive lookahead and normalized update techniques that can effectively handle the heterogeneous arrival times of policy gradients. We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity. Specifically, our AFedPG method achieves $\mathcal{O}(\frac{{\epsilon}^{-2.5}}{N})$ sample complexity at each agent on average. Compared to the single agent setting with $\mathcal{O}(\epsilon^{-2.5})$ sample complexity, it enjoys a linear speedup with respect to the number of agents. Moreover, compared to synchronous FedPG, AFedPG improves the time complexity from $\mathcal{O}(\frac{t{\max}}{N})$ to $\mathcal{O}(\frac{1}{\sum{i=1}^{N} \frac{1}{t{i}}})$, where $t{i}$ denotes the time consumption in each iteration at the agent $i$, and $t{\max}$ is the largest one. The latter complexity $\mathcal{O}(\frac{1}{\sum{i=1}^{N} \frac{1}{t{i}}})$ is always smaller than the former one, and this improvement becomes significant in large-scale federated settings with heterogeneous computing powers ($t{\max}\gg t_{\min}$). Finally, we empirically verify the improved performances of AFedPG in three MuJoCo environments with varying numbers of agents. We also demonstrate the improvements with different computing heterogeneity.
GRANP: A Graph Recurrent Attentive Neural Process Model for Vehicle Trajectory Prediction
Authors: Yuhao Luo, Kehua Chen, Meixin Zhu
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.08004
Pdf link: https://arxiv.org/pdf/2404.08004
Abstract As a vital component in autonomous driving, accurate trajectory prediction effectively prevents traffic accidents and improves driving efficiency. To capture complex spatial-temporal dynamics and social interactions, recent studies developed models based on advanced deep-learning methods. On the other hand, recent studies have explored the use of deep generative models to further account for trajectory uncertainties. However, the current approaches demonstrating indeterminacy involve inefficient and time-consuming practices such as sampling from trained models. To fill this gap, we proposed a novel model named Graph Recurrent Attentive Neural Process (GRANP) for vehicle trajectory prediction while efficiently quantifying prediction uncertainty. In particular, GRANP contains an encoder with deterministic and latent paths, and a decoder for prediction. The encoder, including stacked Graph Attention Networks, LSTM and 1D convolutional layers, is employed to extract spatial-temporal relationships. The decoder is used to learn a latent distribution and thus quantify prediction uncertainty. To reveal the effectiveness of our model, we evaluate the performance of GRANP on the highD dataset. Extensive experiments show that GRANP achieves state-of-the-art results and can efficiently quantify uncertainties. Additionally, we undertake an intuitive case study that showcases the interpretability of the proposed approach. The code is available at https://github.com/joy-driven/GRANP.
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Authors: Kehua Feng, Keyan Ding, Kede Ma, Zhihua Wang, Qiang Zhang, Huajun Chen
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2404.08008
Pdf link: https://arxiv.org/pdf/2404.08008
Abstract The past years have witnessed a proliferation of large language models (LLMs). Yet, automated and unbiased evaluation of LLMs is challenging due to the inaccuracy of standard metrics in reflecting human preferences and the inefficiency in sampling informative and diverse test examples. While human evaluation remains the gold standard, it is expensive and time-consuming, especially when dealing with a large number of testing samples. To address this problem, we propose a sample-efficient human evaluation method based on MAximum Discrepancy (MAD) competition. MAD automatically selects a small set of informative and diverse instructions, each adapted to two LLMs, whose responses are subject to three-alternative forced choice by human subjects. The pairwise comparison results are then aggregated into a global ranking using the Elo rating system. We select eight representative LLMs and compare them in terms of four skills: knowledge understanding, mathematical reasoning, writing, and coding. Experimental results show that the proposed method achieves a reliable and sensible ranking of LLMs' capabilities, identifies their relative strengths and weaknesses, and offers valuable insights for further LLM advancement.
Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected Systems
Authors: Chetna Singhal, Yashuo Wu, Francesco Malandrino, Marco Levorato, Carla Fabiana Chiasserini
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.08060
Pdf link: https://arxiv.org/pdf/2404.08060
Abstract The increasing pervasiveness of intelligent mobile applications requires to exploit the full range of resources offered by the mobile-edge-cloud network for the execution of inference tasks. However, due to the heterogeneity of such multi-tiered networks, it is essential to make the applications' demand amenable to the available resources while minimizing energy consumption. Modern dynamic deep neural networks (DNN) achieve this goal by designing multi-branched architectures where early exits enable sample-based adaptation of the model depth. In this paper, we tackle the problem of allocating sections of DNNs with early exits to the nodes of the mobile-edge-cloud system. By envisioning a 3-stage graph-modeling approach, we represent the possible options for splitting the DNN and deploying the DNN blocks on the multi-tiered network, embedding both the system constraints and the application requirements in a convenient and efficient way. Our framework -- named Feasible Inference Graph (FIN) -- can identify the solution that minimizes the overall inference energy consumption while enabling distributed inference over the multi-tiered network with the target quality and latency. Our results, obtained for DNNs with different levels of complexity, show that FIN matches the optimum and yields over 65% energy savings relative to a state-of-the-art technique for cost minimization.
WildGraph: Realistic Graph-based Trajectory Generation for Wildlife
Authors: Ali Al-Lawati, Elsayed Eshra, Prasenjit Mitra
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08068
Pdf link: https://arxiv.org/pdf/2404.08068
Abstract Trajectory generation is an important task in movement studies; it circumvents the privacy, ethical, and technical challenges of collecting real trajectories from the target population. In particular, real trajectories in the wildlife domain are scarce as a result of ethical and environmental constraints of the collection process. In this paper, we consider the problem of generating long-horizon trajectories, akin to wildlife migration, based on a small set of real samples. We propose a hierarchical approach to learn the global movement characteristics of the real dataset and recursively refine localized regions. Our solution, WildGraph, discretizes the geographic path into a prototype network of H3 (https://www.uber.com/blog/h3/) regions and leverages a recurrent variational auto-encoder to probabilistically generate paths over the regions, based on occupancy. WildGraph successfully generates realistic months-long trajectories using a sample size as small as 60. Experiments performed on two wildlife migration datasets demonstrate that our proposed method improves the generalization of the generated trajectories in comparison to existing work while achieving superior or comparable performance in several benchmark metrics. Our code is published on the following repository: \url{https://github.com/aliwister/wildgraph}.
Persistent Classification: A New Approach to Stability of Data and Adversarial Examples
Authors: Brian Bell, Michael Geyer, David Glickenstein, Keaton Hamm, Carlos Scheidegger, Amanda Fernandez, Juston Moore
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08069
Pdf link: https://arxiv.org/pdf/2404.08069
Abstract There are a number of hypotheses underlying the existence of adversarial examples for classification problems. These include the high-dimensionality of the data, high codimension in the ambient space of the data manifolds of interest, and that the structure of machine learning models may encourage classifiers to develop decision boundaries close to data points. This article proposes a new framework for studying adversarial examples that does not depend directly on the distance to the decision boundary. Similarly to the smoothed classifier literature, we define a (natural or adversarial) data point to be $(\gamma,\sigma)$-stable if the probability of the same classification is at least $\gamma$ for points sampled in a Gaussian neighborhood of the point with a given standard deviation $\sigma$. We focus on studying the differences between persistence metrics along interpolants of natural and adversarial points. We show that adversarial examples have significantly lower persistence than natural examples for large neural networks in the context of the MNIST and ImageNet datasets. We connect this lack of persistence with decision boundary geometry by measuring angles of interpolants with respect to decision boundaries. Finally, we connect this approach with robustness by developing a manifold alignment gradient metric and demonstrating the increase in robustness that can be achieved when training with the addition of this metric.
Byzantine Reliable Broadcast with Low Communication and Time Complexity
Authors: Thomas Locher
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.08070
Pdf link: https://arxiv.org/pdf/2404.08070
Abstract Byzantine reliable broadcast is a fundamental problem in distributed computing, which has been studied extensively over the past decades. State-of-the-art algorithms are predominantly based on the approach to share encoded fragments of the broadcast message, yielding an asymptotically optimal communication complexity when the message size exceeds the network size, a condition frequently encountered in practice. However, algorithms following the standard coding approach incur an overhead factor of at least 3, which can already be a burden for bandwidth-constrained applications. Minimizing this overhead is an important objective with immediate benefits to protocols that use a reliable broadcast routine as a building block. This paper introduces a novel mechanism to lower the communication and computational complexity. Two algorithms are presented that employ this mechanism to reliably broadcast messages in an asynchronous network where less than a third of all nodes are Byzantine. The first algorithm reduces the overhead factor to 2 and has a time complexity of 3 if the sender is honest, whereas the second algorithm attains an optimal time complexity of 2 with the same overhead factor in the absence of equivocation. Moreover, an optimization for real-world implementations is proposed, reducing the overhead factor to 3/2 under normal operation. Lastly, a lower bound is proved that an overhead factor lower than 3/2 cannot be achieved for a relevant class of reliable broadcast algorithms.
SQBC: Active Learning using LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
Authors: Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele, Stefan Harmeling
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08078
Pdf link: https://arxiv.org/pdf/2404.08078
Abstract Stance detection is an important task for many applications that analyse or support online political discussions. Common approaches include fine-tuning transformer based models. However, these models require a large amount of labelled data, which might not be available. In this work, we present two different ways to leverage LLM-generated synthetic data to train and improve stance detection agents for online political discussions: first, we show that augmenting a small fine-tuning dataset with synthetic data can improve the performance of the stance detection model. Second, we propose a new active learning method called SQBC based on the "Query-by-Comittee" approach. The key idea is to use LLM-generated synthetic data as an oracle to identify the most informative unlabelled samples, that are selected for manual labelling. Comprehensive experiments show that both ideas can improve the stance detection performance. Curiously, we observed that fine-tuning on actively selected samples can exceed the performance of using the full dataset.
The Impact of School and Family Networks on COVID-19 Infections Among Dutch Students: A Study Using Population-Level Registry Data
Authors: Javier Garcia-Bernardo, Christine Hedde-von Westernhagen, Tom Emery, Albert Jan van Hoek
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2404.08098
Pdf link: https://arxiv.org/pdf/2404.08098
Abstract Understanding the impact of different social interactions is key to improving epidemic models. Here, we use extensive registry data -- including PCR test results and population-level networks -- to investigate the impact of school, family, and other social contacts on SARS-CoV-2 transmission in the Netherlands (June 2020--October 2021). We isolate and compare different contexts of potential SARS-CoV-2 transmission by matching pairs of students based on their attendance at the same or different primary school (in 2020) and secondary school (in 2021) and their geographic proximity. We then calculated the probability of temporally associated infections -- i.e. the probability of both students testing positive within a 14-day period. Our results highlight the relative importance of household and family transmission in the spread of SARS-CoV-2 compared to school settings. The probability of temporally associated infections for siblings and parent-child pairs living in the same household was 22.6--23.2\%, and 4.7--7.9\% for family members living in different household. In contrast, the probability of temporally associated infections was 0.52\% for pairs of students living nearby but not attending the same primary or secondary school, 0.66\% for pairs attending different secondary schools but having attended the same primary school, and 1.65\% for pairs attending the same secondary school. Finally, we used multilevel regression analyses to examine how individual, school, and geographic factors contribute to transmission risk. We found that the largest differences in transmission probabilities were due to unobserved individual (60\%) and school-level (34\%) factors. Only a small proportion (3\%) could be attributed to geographic proximity of students or to school size, denomination, or the median income of the school area.
Protein intrinsic disorder prediction using Attention U-Net and ProtTrans protein language model
Authors: Krzysztof Kotowski, Irena Roterman, Katarzyna Stapor
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Arxiv link: https://arxiv.org/abs/2404.08108
Pdf link: https://arxiv.org/pdf/2404.08108
Abstract The prediction of intrinsic disorder regions has significant implications for understanding protein function, structure, and dynamics. It can help to discover novel functions or protein-protein interactions essential to designing new drugs, therapies, or enzymes. Recently, a new generation of predictors based on protein language models is emerging. These algorithms reach state-of-the-art accuracy without calculating time-consuming multiple sequence alignments (MSAs). The article pre-sents a new protein intrinsic disorder predictor DisorderUnetLM based on the Attention U-Net convolutional neural network using features from the protein language model ProtTrans. DisorderUnetLM shows top results in the direct comparison with flDPnn and IDP-CRF predictors using MSAs and with the SETH predictor using features from the same ProtTrans model. Moreover, among 41 predictors from the latest Critical Assessment of Protein Intrinsic Disorder Prediction (CAID-2) benchmark, it ranks 9th for the Disorder-PDB subset (with ROC-AUC of 0.924) and 1st for the Disorder-NOX subset (with ROC-AUC of 0.844) which confirms its potential to perform well in the upcoming CAID-3 challenge for which Disor-derUnetLM was submitted.
Self-Supervised Learning of Color Constancy
Authors: Markus R. Ernst, Francisco M. López, Arthur Aubret, Roland W. Fleming, Jochen Triesch
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08127
Pdf link: https://arxiv.org/pdf/2404.08127
Abstract Color constancy (CC) describes the ability of the visual system to perceive an object as having a relatively constant color despite changes in lighting conditions. While CC and its limitations have been carefully characterized in humans, it is still unclear how the visual system acquires this ability during development. Here, we present a first study showing that CC develops in a neural network trained in a self-supervised manner through an invariance learning objective. During learning, objects are presented under changing illuminations, while the network aims to map subsequent views of the same object onto close-by latent representations. This gives rise to representations that are largely invariant to the illumination conditions, offering a plausible example of how CC could emerge during human cognitive development via a form of self-supervised learning.
Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization
Authors: Runqi Lin, Chaojian Yu, Tongliang Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08154
Pdf link: https://arxiv.org/pdf/2404.08154
Abstract Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. However, SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier, making it vulnerable to multi-step adversarial attacks. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour, that is, although these training samples are generated by the inner maximization process, their associated loss decreases instead, which we named abnormal adversarial examples (AAEs). Upon further analysis, we discover a close relationship between AAEs and classifier distortion, as both the number and outputs of AAEs undergo a significant variation with the onset of CO. Given this observation, we re-examine the SSAT process and uncover that before the occurrence of CO, the classifier already displayed a slight distortion, indicated by the presence of few AAEs. Furthermore, the classifier directly optimizing these AAEs will accelerate its distortion, and correspondingly, the variation of AAEs will sharply increase as a result. In such a vicious circle, the classifier rapidly becomes highly distorted and manifests as CO within a few iterations. These observations motivate us to eliminate CO by hindering the generation of AAEs. Specifically, we design a novel method, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted. Extensive experiments demonstrate that our method can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.
On the Power of Interactive Proofs for Learning
Authors: Tom Gur, Mohammad Mahdi Jahanara, Mohammad Mahdi Khodabandeh, Ninad Rajgopal, Bahar Salamatian, Igor Shinkar
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08158
Pdf link: https://arxiv.org/pdf/2404.08158
Abstract We continue the study of doubly-efficient proof systems for verifying agnostic PAC learning, for which we obtain the following results. - We construct an interactive protocol for learning the $t$ largest Fourier characters of a given function $f \colon {0,1}^n \to {0,1}$ up to an arbitrarily small error, wherein the verifier uses $\mathsf{poly}(t)$ random examples. This improves upon the Interactive Goldreich-Levin protocol of Goldwasser, Rothblum, Shafer, and Yehudayoff (ITCS 2021) whose sample complexity is $\mathsf{poly}(t,n)$. - For agnostically learning the class $\mathsf{AC}^0[2]$ under the uniform distribution, we build on the work of Carmosino, Impagliazzo, Kabanets, and Kolokolova (APPROX/RANDOM 2017) and design an interactive protocol, where given a function $f \colon {0,1}^n \to {0,1}$, the verifier learns the closest hypothesis up to $\mathsf{polylog}(n)$ multiplicative factor, using quasi-polynomially many random examples. In contrast, this class has been notoriously resistant even for constructing realisable learners (without a prover) using random examples. - For agnostically learning $k$-juntas under the uniform distribution, we obtain an interactive protocol, where the verifier uses $O(2^k)$ random examples to a given function $f \colon {0,1}^n \to {0,1}$. Crucially, the sample complexity of the verifier is independent of $n$. We also show that if we do not insist on doubly-efficient proof systems, then the model becomes trivial. Specifically, we show a protocol for an arbitrary class $\mathcal{C}$ of Boolean functions in the distribution-free setting, where the verifier uses $O(1)$ labeled examples to learn $f$.
Naively Sorting Evolving Data is Optimal and Robust
Authors: Marcos Kiwi, George Giakkoupis, Dimitrios Los
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2404.08162
Pdf link: https://arxiv.org/pdf/2404.08162
Abstract We study comparison sorting in the evolving data model [AKMU11], where the true total order changes while the sorting algorithm is processing the input. More precisely, each comparison operation of the algorithm is followed by a sequence of evolution steps, where an evolution step perturbs the rank of a random item by a "small" random value. The goal is to maintain an ordering that remains close to the true order over time. Previous works have analyzed adaptations of classic sorting algorithms, assuming that an evolution step changes the rank of an item by just one, and that a fixed constant number $b$ of evolution steps take place between two comparisons. In fact, the only previous result achieving optimal $O(n)$ total deviation from the true order, where $n$ is the number of items, applies just for $b=1$ [BDEGJ18]. We analyze a very simple sorting algorithm suggested in [M14], which samples a random pair of adjacent items in each step and swaps them if they are out of order. We show that the algorithm achieves and maintains, w.h.p., optimal total deviation, $O(n)$, and optimal maximum deviation, $O(\log n)$, under very general model settings. Namely, the perturbation introduced by each evolution step follows a distribution of bounded moment generating function, and over a linear number of steps, on average the number of evolution steps between two sorting steps is bounded by an arbitrary constant. Our proof consists of a novel potential function argument that inserts "gaps" in the list of items, and a general framework which separates the analysis of sorting from that of the evolution steps, and is applicable to a variety of settings for which previous approaches do not apply. Our results settle conjectures by [AKMU11] and [M14], and provide theoretical support for the empirical evidence that simple quadratic algorithms are optimal and robust for sorting evolving data [BDEGJ18].
Lightweight Cryptanalysis of IoT Encryption Algorithms : Is Quota Sampling the Answer?
Authors: Jonathan Cook, Sabih ur Rehman, M. Arif Khan
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.08165
Pdf link: https://arxiv.org/pdf/2404.08165
Abstract Rapid growth in the number of small sensor devices known as the Internet of Things (IoT) has seen the development of lightweight encryption algorithms. Two well-known lightweight algorithms are SIMON and SIMECK which have been specifically designed for use on resource-constrained IoT devices. These lightweight encryption algorithms are based on the efficient Feistel block structure which is known to exhibit vulnerabilities to differential cryptanalysis. Consequently, it is necessary to test these algorithms for resilience against such attacks. While existing state-of-the-art research has demonstrated novel heuristic methods of differential cryptanalysis that improve time efficiency on previous techniques, the large state sizes of these encryption algorithms inhibit cryptanalysis time efficiency. In this paper, we introduce Versatile Investigative Sampling Technique for Advanced Cryptanalysis (VISTA-CRYPT) - a time-efficient enhancement of differential cryptanalysis of lightweight encryption algorithms. The proposed technique introduces a simple framework of quota sampling that produces state-of-the-art results with time reductions of up to $76\%$ over existing techniques. Further, we present a preliminary graph-based analysis of the output differentials for the identification of relationships within the data and future research opportunities to further enhance the performance of differential cryptanalysis. The code designed for this work and associated datasets will be available at https://github.com/johncook1979/simon-cryptanalysis.
GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality
Authors: Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, Jon E. Froehlich
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2404.08213
Pdf link: https://arxiv.org/pdf/2404.08213
Abstract Voice assistants (VAs) like Siri and Alexa are transforming human-computer interaction; however, they lack awareness of users' spatiotemporal context, resulting in limited performance and unnatural dialogue. We introduce GazePointAR, a fully-functional context-aware VA for wearable augmented reality that leverages eye gaze, pointing gestures, and conversation history to disambiguate speech queries. With GazePointAR, users can ask "what's over there?" or "how do I solve this math problem?" simply by looking and/or pointing. We evaluated GazePointAR in a three-part lab study (N=12): (1) comparing GazePointAR to two commercial systems; (2) examining GazePointAR's pronoun disambiguation across three tasks; (3) and an open-ended phase where participants could suggest and try their own context-sensitive queries. Participants appreciated the naturalness and human-like nature of pronoun-driven queries, although sometimes pronoun use was counter-intuitive. We then iterated on GazePointAR and conducted a first-person diary study examining how GazePointAR performs in-the-wild. We conclude by enumerating limitations and design considerations for future context-aware VAs.
HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies
Authors: Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Chunjie Zhou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Information Theory (cs.IT); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08224
Pdf link: https://arxiv.org/pdf/2404.08224
Abstract Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominantly reconstruction-based and predictive in nature. However, they typically concentrate on a single-dimensional instance level, thereby not fully harnessing the complex associations inherent in industrial MTS. To address this issue, we propose a novel self-supervised hierarchical contrastive consistency learning method for detecting anomalies in MTS, named HCL-MTSAD. It innovatively leverages data consistency at multiple levels inherent in industrial MTS, systematically capturing consistent associations across four latent levels-measurement, sample, channel, and process. By developing a multi-layer contrastive loss, HCL-MTSAD can extensively mine data consistency and spatio-temporal association, resulting in more informative representations. Subsequently, an anomaly discrimination module, grounded in self-supervised hierarchical contrastive learning, is designed to detect timestamp-level anomalies by calculating multi-scale data consistency. Extensive experiments conducted on six diverse MTS datasets retrieved from real cyber-physical systems and server machines, in comparison with 20 baselines, indicate that HCL-MTSAD's anomaly detection capability outperforms the state-of-the-art benchmark models by an average of 1.8\% in terms of F1 score.
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Authors: Yuqun Wu, Jae Yong Lee, Chuhang Zou, Shenlong Wang, Derek Hoiem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08252
Pdf link: https://arxiv.org/pdf/2404.08252
Abstract The latest regularized Neural Radiance Field (NeRF) approaches produce poor geometry and view extrapolation for multiview stereo (MVS) benchmarks such as ETH3D. In this paper, we aim to create 3D models that provide accurate geometry and view synthesis, partially closing the large geometric performance gap between NeRF and traditional MVS methods. We propose a patch-based approach that effectively leverages monocular surface normal and relative depth predictions. The patch-based ray sampling also enables the appearance regularization of normalized cross-correlation (NCC) and structural similarity (SSIM) between randomly sampled virtual and training views. We further show that "density restrictions" based on sparse structure-from-motion points can help greatly improve geometric accuracy with a slight drop in novel view synthesis metrics. Our experiments show 4x the performance of RegNeRF and 8x that of FreeNeRF on average F1@2cm for ETH3D MVS benchmark, suggesting a fruitful research direction to improve the geometric accuracy of NeRF-based models, and sheds light on a potential future approach to enable NeRF-based optimization to eventually outperform traditional MVS.
Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models
Authors: Zeyu Yang, Peikun Guo, Khadija Zanna, Akane Sano
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08254
Pdf link: https://arxiv.org/pdf/2404.08254
Abstract Diffusion models have emerged as a robust framework for various generative tasks, such as image and audio synthesis, and have also demonstrated a remarkable ability to generate mixed-type tabular data comprising both continuous and discrete variables. However, current approaches to training diffusion models on mixed-type tabular data tend to inherit the imbalanced distributions of features present in the training dataset, which can result in biased sampling. In this research, we introduce a fair diffusion model designed to generate balanced data on sensitive attributes. We present empirical evidence demonstrating that our method effectively mitigates the class imbalance in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that our approach outperforms existing methods for synthesizing tabular data in terms of performance and fairness.
Practical Region-level Attack against Segment Anything Models
Authors: Yifan Shen, Zhengyuan Li, Gang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.08255
Pdf link: https://arxiv.org/pdf/2404.08255
Abstract Segment Anything Models (SAM) have made significant advancements in image segmentation, allowing users to segment target portions of an image with a single click (i.e., user prompt). Given its broad applications, the robustness of SAM against adversarial attacks is a critical concern. While recent works have explored adversarial attacks against a pre-defined prompt/click, their threat model is not yet realistic: (1) they often assume the user-click position is known to the attacker (point-based attack), and (2) they often operate under a white-box setting with limited transferability. In this paper, we propose a more practical region-level attack where attackers do not need to know the precise user prompt. The attack remains effective as the user clicks on any point on the target object in the image, hiding the object from SAM. Also, by adapting a spectrum transformation method, we make the attack more transferable under a black-box setting. Both control experiments and testing against real-world SAM services confirm its effectiveness.
Bottom-up Rebalancing Binary Search Trees by Flipping a Coin
Authors: Gerth Stølting Brodal
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2404.08287
Pdf link: https://arxiv.org/pdf/2404.08287
Abstract Rebalancing schemes for dynamic binary search trees are numerous in the literature, where the goal is to maintain trees of low height, either in the worst-case or expected sense. In this paper we study randomized rebalancing schemes for sequences of $n$ insertions into an initially empty binary search tree, under the assumption that a tree only stores the elements and the tree structure without any additional balance information. Seidel~(2009) presented a top-down randomized insertion algorithm, where insertions take expected $O\big(\lg^2 n\big)$ time, and the resulting trees have the same distribution as inserting a uniform random permutation into a binary search tree without rebalancing. Seidel states as an open problem if a similar result can be achieved with bottom-up insertions. In this paper we fail to answer this question. We consider two simple canonical randomized bottom-up insertion algorithms on binary search trees, assuming that an insertion is given the position where to insert the next element. The subsequent rebalancing is performed bottom-up in expected $O(1)$ time, uses expected $O(1)$ random bits, performs at most two rotations, and the rotations appear with geometrically decreasing probability in the distance from the leaf. For some insertion sequences the expected depth of each node is proved to be $O(\lg n)$. On the negative side, we prove for both algorithms that there exist simple insertion sequences where the expected depth is $\Omega(n)$, i.e., the studied rebalancing schemes are \emph{not} competitive with (most) other rebalancing schemes in the literature.
A Large Scale Survey of Motivation in Software Development and Analysis of its Validity
Authors: Idan Amit, Dror G. Feitelson
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08303
Pdf link: https://arxiv.org/pdf/2404.08303
Abstract Context: Motivation is known to improve performance. In software development in particular, there has been considerable interest in the motivation of contributors to open source. Objective: We identify 11 motivators from the literature (enjoying programming, ownership of code, learning, self use, etc.), and evaluate their relative effect on motivation. Since motivation is an internal subjective feeling, we also analyze the validity of the answers. Method: We conducted a survey with 66 questions on motivation which was completed by 521 developers. Most of the questions used an 11 point scale. We evaluated the validity of the answers validity by comparing related questions, comparing to actual behavior on GitHub, and comparison with the same developer in a follow up survey. Results: Validity problems include moderate correlations between answers to related questions, as well as self promotion and mistakes in the answers. Despite these problems, predictive analysis, investigating how diverse motivators influence the probability of high motivation, provided valuable insights. The correlations between the different motivators are low, implying their independence. High values in all 11 motivators predict increased probability of high motivation. In addition, improvement analysis shows that an increase in most motivators predicts an increase in general motivation.
The Integration of Semantic and Structural Knowledge in Knowledge Graph Entity Typing
Authors: Muzhi Li, Minda Hu, Irwin King, Ho-fung Leung
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08313
Pdf link: https://arxiv.org/pdf/2404.08313
Abstract The Knowledge Graph Entity Typing (KGET) task aims to predict missing type annotations for entities in knowledge graphs. Recent works only utilize the \textit{\textbf{structural knowledge}} in the local neighborhood of entities, disregarding \textit{\textbf{semantic knowledge}} in the textual representations of entities, relations, and types that are also crucial for type inference. Additionally, we observe that the interaction between semantic and structural knowledge can be utilized to address the false-negative problem. In this paper, we propose a novel \textbf{\underline{S}}emantic and \textbf{\underline{S}}tructure-aware KG \textbf{\underline{E}}ntity \textbf{\underline{T}}yping~{(SSET)} framework, which is composed of three modules. First, the \textit{Semantic Knowledge Encoding} module encodes factual knowledge in the KG with a Masked Entity Typing task. Then, the \textit{Structural Knowledge Aggregation} module aggregates knowledge from the multi-hop neighborhood of entities to infer missing types. Finally, the \textit{Unsupervised Type Re-ranking} module utilizes the inference results from the two models above to generate type predictions that are robust to false-negative samples. Extensive experiments show that SSET significantly outperforms existing state-of-the-art methods.
BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task Promoting
Authors: Yuqing Cheng, Bo Chen, Fanjin Zhang, Jie Tang
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08322
Pdf link: https://arxiv.org/pdf/2404.08322
Abstract From-scratch name disambiguation is an essential task for establishing a reliable foundation for academic platforms. It involves partitioning documents authored by identically named individuals into groups representing distinct real-life experts. Canonically, the process is divided into two decoupled tasks: locally estimating the pairwise similarities between documents followed by globally grouping these documents into appropriate clusters. However, such a decoupled approach often inhibits optimal information exchange between these intertwined tasks. Therefore, we present BOND, which bootstraps the local and global informative signals to promote each other in an end-to-end regime. Specifically, BOND harnesses local pairwise similarities to drive global clustering, subsequently generating pseudo-clustering labels. These global signals further refine local pairwise characterizations. The experimental results establish BOND's superiority, outperforming other advanced baselines by a substantial margin. Moreover, an enhanced version, BOND+, incorporating ensemble and post-match techniques, rivals the top methods in the WhoIsWho competition.
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
Authors: Hyesong Choi, Hyejin Park, Kwang Moo Yi, Sungmin Cha, Dongbo Min
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08327
Pdf link: https://arxiv.org/pdf/2404.08327
Abstract In this paper, we introduce Saliency-Based Adaptive Masking (SBAM), a novel and cost-effective approach that significantly enhances the pre-training performance of Masked Image Modeling (MIM) approaches by prioritizing token salience. Our method provides robustness against variations in masking ratios, effectively mitigating the performance instability issues common in existing methods. This relaxes the sensitivity of MIM-based pre-training to masking ratios, which in turn allows us to propose an adaptive strategy for `tailored' masking ratios for each data sample, which no existing method can provide. Toward this goal, we propose an Adaptive Masking Ratio (AMR) strategy that dynamically adjusts the proportion of masking for the unique content of each image based on token salience. We show that our method significantly improves over the state-of-the-art in mask-based pre-training on the ImageNet-1K dataset.
OTFS Channel Estimation and Detection for Channels with Very Large Delay Spread
Authors: Preety Priya, Yi Hong, Emanuele Viterbo
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.08333
Pdf link: https://arxiv.org/pdf/2404.08333
Abstract In low latency applications and in general, for overspread channels, channel delay spread is a large percentage of the transmission frame duration. In this paper, we consider OTFS in an overspread channel exhibiting a delay spread that exceeds the block duration in a frame, where traditional channel estimation (CE) fails. We propose a two-stage CE method based on a delay-Doppler (DD) training frame, consisting of a dual chirp converted from time domain and a higher power pilot. The first stage employs a DD domain embedded pilot CE to estimate the aliased delays (due to modulo operation) and Doppler shifts, followed by identifying all the underspread paths not coinciding with any overspread path. The second stage utilizes time domain dual chirp correlation to estimate the actual delays and Doppler shifts of the remaining paths. This stage also resolves ambiguity in estimating delays and Doppler shifts for paths sharing same aliased delay. Furthermore, we present a modified low-complexity maximum ratio combining (MRC) detection algorithm for OTFS in overspread channels. Finally, we evaluate performance of OTFS using the proposed CE and the modified MRC detection in terms of normalized mean square error (NMSE) and bit error rate (BER).
Data-driven Interval MDP for Robust Control Synthesis
Authors: Rudi Coppola, Andrea Peruffo, Licio Romao, Alessandro Abate, Manuel Mazo Jr
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08344
Pdf link: https://arxiv.org/pdf/2404.08344
Abstract The abstraction of dynamical systems is a powerful tool that enables the design of feedback controllers using a correct-by-design framework. We investigate a novel scheme to obtain data-driven abstractions of discrete-time stochastic processes in terms of richer discrete stochastic models, whose actions lead to nondeterministic transitions over the space of probability measures. The data-driven component of the proposed methodology lies in the fact that we only assume samples from an unknown probability distribution. We also rely on the model of the underlying dynamics to build our abstraction through backward reachability computations. The nondeterminism in the probability space is captured by a collection of Markov Processes, and we identify how this model can improve upon existing abstraction techniques in terms of satisfying temporal properties, such as safety or reach-avoid. The connection between the discrete and the underlying dynamics is made formal through the use of the scenario approach theory. Numerical experiments illustrate the advantages and main limitations of the proposed techniques with respect to existing approaches.
Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks
Authors: Yang Yang, Hongpeng Pan, Qing-Yuan Jiang, Yi Xu, Jinghui Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08347
Pdf link: https://arxiv.org/pdf/2404.08347
Abstract Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance(AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods.
FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework
Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.08364
Pdf link: https://arxiv.org/pdf/2404.08364
Abstract Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.
Collective Bayesian Decision-Making in a Swarm of Miniaturized Robots for Surface Inspection
Authors: Thiemen Siemensma, Darren Chiu, Sneha Ramshanker, Radhika Nagpal, Bahar Haghighat
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08390
Pdf link: https://arxiv.org/pdf/2404.08390
Abstract Robot swarms can effectively serve a variety of sensing and inspection applications. Certain inspection tasks require a binary classification decision. This work presents an experimental setup for a surface inspection task based on vibration sensing and studies a Bayesian two-outcome decision-making algorithm in a swarm of miniaturized wheeled robots. The robots are tasked with individually inspecting and collectively classifying a 1mx1m tiled surface consisting of vibrating and non-vibrating tiles based on the majority type of tiles. The robots sense vibrations using onboard IMUs and perform collision avoidance using a set of IR sensors. We develop a simulation and optimization framework leveraging the Webots robotic simulator and a Particle Swarm Optimization (PSO) method. We consider two existing information sharing strategies and propose a new one that allows the swarm to rapidly reach accurate classification decisions. We first find optimal parameters that allow efficient sampling in simulation and then evaluate our proposed strategy against the two existing ones using 100 randomized simulation and 10 real experiments. We find that our proposed method compels the swarm to make decisions at an accelerated rate, with an improvement of up to 20.52% in mean decision time at only 0.78% loss in accuracy.
Data-Driven Preference Sampling for Pareto Front Learning
Authors: Rongguang Ye, Lei Chen, Weiduo Liao, Jinyuan Zhang, Hisao Ishibuchi
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08397
Pdf link: https://arxiv.org/pdf/2404.08397
Abstract Pareto front learning is a technique that introduces preference vectors in a neural network to approximate the Pareto front. Previous Pareto front learning methods have demonstrated high performance in approximating simple Pareto fronts. These methods often sample preference vectors from a fixed Dirichlet distribution. However, no fixed sampling distribution can be adapted to diverse Pareto fronts. Efficiently sampling preference vectors and accurately estimating the Pareto front is a challenge. To address this challenge, we propose a data-driven preference vector sampling framework for Pareto front learning. We utilize the posterior information of the objective functions to adjust the parameters of the sampling distribution flexibly. In this manner, the proposed method can sample preference vectors from the location of the Pareto front with a high probability. Moreover, we design the distribution of the preference vector as a mixture of Dirichlet distributions to improve the performance of the model in disconnected Pareto fronts. Extensive experiments validate the superiority of the proposed method compared with state-of-the-art algorithms.
Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning
Authors: Hongtao Wang, Li Long, Jiangshe Zhang, Xiaoli Wei, Chunxia Zhang, Zhenbo Guo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Geophysics (physics.geo-ph)
Arxiv link: https://arxiv.org/abs/2404.08408
Pdf link: https://arxiv.org/pdf/2404.08408
Abstract Contemporary automatic first break (FB) picking methods typically analyze 1D signals, 2D source gathers, or 3D source-receiver gathers. Utilizing higher-dimensional data, such as 2D or 3D, incorporates global features, improving the stability of local picking. Despite the benefits, high-dimensional data requires structured input and increases computational demands. Addressing this, we propose a novel approach using deep graph learning called DGL-FB, constructing a large graph to efficiently extract information. In this graph, each seismic trace is represented as a node, connected by edges that reflect similarities. To manage the size of the graph, we develop a subgraph sampling technique to streamline model training and inference. Our proposed framework, DGL-FB, leverages deep graph learning for FB picking. It encodes subgraphs into global features using a deep graph encoder. Subsequently, the encoded global features are combined with local node signals and fed into a ResUNet-based 1D segmentation network for FB detection. Field survey evaluations of DGL-FB show superior accuracy and stability compared to a 2D U-Net-based benchmark method.
Evolutionary Preference Sampling for Pareto Set Learning
Authors: Rongguang Ye, Longcan Chen, Jinyuan Zhang, Hisao Ishibuchi
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08414
Pdf link: https://arxiv.org/pdf/2404.08414
Abstract Recently, Pareto Set Learning (PSL) has been proposed for learning the entire Pareto set using a neural network. PSL employs preference vectors to scalarize multiple objectives, facilitating the learning of mappings from preference vectors to specific Pareto optimal solutions. Previous PSL methods have shown their effectiveness in solving artificial multi-objective optimization problems (MOPs) with uniform preference vector sampling. The quality of the learned Pareto set is influenced by the sampling strategy of the preference vector, and the sampling of the preference vector needs to be decided based on the Pareto front shape. However, a fixed preference sampling strategy cannot simultaneously adapt the Pareto front of multiple MOPs. To address this limitation, this paper proposes an Evolutionary Preference Sampling (EPS) strategy to efficiently sample preference vectors. Inspired by evolutionary algorithms, we consider preference sampling as an evolutionary process to generate preference vectors for neural network training. We integrate the EPS strategy into five advanced PSL methods. Extensive experiments demonstrate that our proposed method has a faster convergence speed than baseline algorithms on 7 testing problems. Our implementation is available at https://github.com/rG223/EPS.
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
Authors: William Fleshman, Aleem Khan, Marc Marone, Benjamin Van Durme
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.08417
Pdf link: https://arxiv.org/pdf/2404.08417
Abstract Large language models (LLMs) are increasingly capable of completing knowledge intensive tasks by recalling information from a static pretraining corpus. Here we are concerned with LLMs in the context of evolving data requirements. For instance: batches of new data that are introduced periodically; subsets of data with user-based access controls; or requirements on dynamic removal of documents with guarantees that associated knowledge cannot be recalled. We wish to satisfy these requirements while at the same time ensuring a model does not forget old information when new data becomes available. To address these issues, we introduce AdapterSwap, a training and inference scheme that organizes knowledge from a data collection into a set of low-rank adapters, which are dynamically composed during inference. Our experiments demonstrate AdapterSwap's ability to support efficient continual learning, while also enabling organizations to have fine-grained control over data access and deletion.
Direct May Not Be the Best: An Incremental Evolution View of Pose Generation
Authors: Yuelong Li, Tengfei Xiao, Lei Geng, Jianming Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08419
Pdf link: https://arxiv.org/pdf/2404.08419
Abstract Pose diversity is an inherent representative characteristic of 2D images. Due to the 3D to 2D projection mechanism, there is evident content discrepancy among distinct pose images. This is the main obstacle bothering pose transformation related researches. To deal with this challenge, we propose a fine-grained incremental evolution centered pose generation framework, rather than traditional direct one-to-one in a rush. Since proposed approach actually bypasses the theoretical difficulty of directly modeling dramatic non-linear variation, the incurred content distortion and blurring could be effectively constrained, at the same time the various individual pose details, especially clothes texture, could be precisely maintained. In order to systematically guide the evolution course, both global and incremental evolution constraints are elaborately designed and merged into the overall frame?work. And a novel triple-path knowledge fusion structure is worked out to take full advantage of all available valuable knowledge to conduct high-quality pose synthesis. In addition, our framework could generate a series of valuable byproducts, namely the various intermediate poses. Extensive experiments have been conducted to verify the effectiveness of the proposed approach. Code is available at https://github.com/Xiaofei-CN/Incremental-Evolution-Pose-Generation.
Adapting the Segment Anything Model During Usage in Novel Situations
Authors: Robin Schön, Julian Lorenz, Katja Ludwig, Rainer Lienhart
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08421
Pdf link: https://arxiv.org/pdf/2404.08421
Abstract The interactive segmentation task consists in the creation of object segmentation masks based on user interactions. The most common way to guide a model towards producing a correct segmentation consists in clicks on the object and background. The recently published Segment Anything Model (SAM) supports a generalized version of the interactive segmentation problem and has been trained on an object segmentation dataset which contains 1.1B masks. Though being trained extensively and with the explicit purpose of serving as a foundation model, we show significant limitations of SAM when being applied for interactive segmentation on novel domains or object types. On the used datasets, SAM displays a failure rate $\text{FR}{30}@90$ of up to $72.6 \%$. Since we still want such foundation models to be immediately applicable, we present a framework that can adapt SAM during immediate usage. For this we will leverage the user interactions and masks, which are constructed during the interactive segmentation process. We use this information to generate pseudo-labels, which we use to compute a loss function and optimize a part of the SAM model. The presented method causes a relative reduction of up to $48.1 \%$ in the $\text{FR}{20}@85$ and $46.6 \%$ in the $\text{FR}_{30}@90$ metrics.
Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing
Authors: Cui Zhang, Xiao Xu, Qiong Wu, Pingyi Fan, Qiang Fan, Huiling Zhu, Jiangzhou Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08444
Pdf link: https://arxiv.org/pdf/2404.08444
Abstract In vehicle edge computing (VEC), asynchronous federated learning (AFL) is used, where the edge receives a local model and updates the global model, effectively reducing the global aggregation latency.Due to different amounts of local data,computing capabilities and locations of the vehicles, renewing the global model with same weight is inappropriate.The above factors will affect the local calculation time and upload time of the local model, and the vehicle may also be affected by Byzantine attacks, leading to the deterioration of the vehicle data. However, based on deep reinforcement learning (DRL), we can consider these factors comprehensively to eliminate vehicles with poor performance as much as possible and exclude vehicles that have suffered Byzantine attacks before AFL. At the same time, when aggregating AFL, we can focus on those vehicles with better performance to improve the accuracy and safety of the system. In this paper, we proposed a vehicle selection scheme based on DRL in VEC. In this scheme, vehicle s mobility, channel conditions with temporal variations, computational resources with temporal variations, different data amount, transmission channel status of vehicles as well as Byzantine attacks were taken into account.Simulation results show that the proposed scheme effectively improves the safety and accuracy of the global model.
Federated Optimization with Doubly Regularized Drift Correction
Authors: Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.08447
Pdf link: https://arxiv.org/pdf/2404.08447
Abstract Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown uniformly improved communication-computation trade-offs over vanilla gradient descent. In this work, we revisit DANE, an established method in distributed optimization. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints. Furthermore, (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers and has more freedom to choose how to aggregate the local updates. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/DANE+. This is achieved by using doubly regularized drift correction.
Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
Authors: Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08450
Pdf link: https://arxiv.org/pdf/2404.08450
Abstract Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to develop and maintain multiple models. To jointly detect physical and digital attacks within a single model, we propose an innovative approach that can adapt to any network architecture. Our approach mainly contains two types of data augmentation, which we call Simulated Physical Spoofing Clues augmentation (SPSC) and Simulated Digital Spoofing Clues augmentation (SDSC). SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Extensive experiments show that SPSC and SDSC can achieve state-of-the-art generalization in Protocols 2.1 and 2.2 of the UniAttackData dataset, respectively. Our method won first place in "Unified Physical-Digital Face Attack Detection" of the 5th Face Anti-spoofing Challenge@CVPR2024. Our final submission obtains 3.75% APCER, 0.93% BPCER, and 2.34% ACER, respectively. Our code is available at https://github.com/Xianhua-He/cvpr2024-face-anti-spoofing-challenge.
Lightweight Multi-System Multivariate Interconnection and Divergence Discovery
Authors: Mulugeta Weldezgina Asres, Christian Walter Omlin, Jay Dittmann, Pavel Parygin, Joshua Hiltbrand, Seth I. Cooper, Grace Cummings, David Yu
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08453
Pdf link: https://arxiv.org/pdf/2404.08453
Abstract Identifying outlier behavior among sensors and subsystems is essential for discovering faults and facilitating diagnostics in large systems. At the same time, exploring large systems with numerous multivariate data sets is challenging. This study presents a lightweight interconnection and divergence discovery mechanism (LIDD) to identify abnormal behavior in multi-system environments. The approach employs a multivariate analysis technique that first estimates the similarity heatmaps among the sensors for each system and then applies information retrieval algorithms to provide relevant multi-level interconnection and discrepancy details. Our experiment on the readout systems of the Hadron Calorimeter of the Compact Muon Solenoid (CMS) experiment at CERN demonstrates the effectiveness of the proposed method. Our approach clusters readout systems and their sensors consistent with the expected calorimeter interconnection configurations, while capturing unusual behavior in divergent clusters and estimating their root causes.
The Squared Kemeny Rule for Averaging Rankings
Authors: Patrick Lederer, Dominik Peters, Tomasz Wąs
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
Arxiv link: https://arxiv.org/abs/2404.08474
Pdf link: https://arxiv.org/pdf/2404.08474
Abstract For the problem of aggregating several rankings into one ranking, Kemeny (1959) proposed two methods: the median rule which selects the ranking with the smallest total swap distance to the input rankings, and the mean rule which minimizes the squared swap distances to the input rankings. The median rule has been extensively studied since and is now known simply as Kemeny's rule. It exhibits majoritarian properties, so for example if more than half of the input rankings are the same, then the output of the rule is the same ranking. We observe that this behavior is undesirable in many rank aggregation settings. For example, when we rank objects by different criteria (quality, price, etc.) and want to aggregate them with specified weights for the criteria, then a criterion with weight 51% should have 51% influence on the output instead of 100%. We show that the Squared Kemeny rule (i.e., the mean rule) behaves this way, by establishing a bound on the distance of the output ranking to any input rankings, as a function of their weights. Furthermore, we give an axiomatic characterization of the Squared Kemeny rule, which mirrors the existing characterization of the Kemeny rule but replaces the majoritarian Condorcet axiom by a proportionality axiom. Finally, we discuss the computation of the rule and show its behavior in a simulation study.
Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian
Authors: Stefano De Paoli
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.08488
Pdf link: https://arxiv.org/pdf/2404.08488
Abstract This paper proposes a test to perform Thematic Analysis (TA) with Large Language Model (LLM) on data which is in a different language than English. While there has been initial promising work on using pre-trained LLMs for TA on data in English, we lack any tests on whether these models can reasonably perform the same analysis with good quality in other language. In this paper a test will be proposed using an open access dataset of semi-structured interviews in Italian. The test shows that a pre-trained model can perform such a TA on the data, also using prompts in Italian. A comparative test shows the model capacity to produce themes which have a good resemblance with those produced independently by human researchers. The main implication of this study is that pre-trained LLMs may thus be suitable to support analysis in multilingual situations, so long as the language is supported by the model used.
Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
Authors: Haozhe Zhao, Zefan Cai, Shuzheng Si, Liang Chen, Yufeng He, Kaikai An, Baobao Chang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08491
Pdf link: https://arxiv.org/pdf/2404.08491
Abstract Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow these disparities by supervise fine-tuning the mPLMs with multilingual data. However, obtaining labeled multilingual data is time-consuming, and fine-tuning mPLM with limited labeled multilingual data merely encapsulates the knowledge specific to the labeled data. Therefore, we introduce ALSACE to leverage the learned knowledge from the well-performing languages to guide under-performing ones within the same mPLM, eliminating the need for additional labeled multilingual data. Experiments show that ALSACE effectively mitigates language-level performance disparity across various mPLMs while showing the competitive performance on different multilingual NLU tasks, ranging from full resource to limited resource settings. The code for our approach is available at https://github.com/pkunlp-icler/ALSACE.
Almost-Sure Termination by Guarded Refinement
Authors: Simon Oddershede Gregersen, Alejandro Aguirre, Philipp G. Haselwarter, Joseph Tassarotti, Lars Birkedal
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2404.08494
Pdf link: https://arxiv.org/pdf/2404.08494
Abstract Almost-sure termination is an important correctness property for probabilistic programs, and a number of program logics have been developed for establishing it. However, these logics have mostly been developed for first-order programs written in languages with specific syntactic patterns for looping. In this paper, we consider almost-sure termination for higher-order probabilistic programs with general references. This combination of features allows for recursion and looping to be encoded through a variety of patterns. Therefore, rather than developing proof rules for reasoning about particular recursion patterns, we instead propose an approach based on proving refinement between a higher-order program and a simpler probabilistic model, in such a way that the refinement preserves termination behavior. By proving a refinement, almost-sure termination behavior of the program can then be established by analyzing the simpler model. We present this approach in the form of Caliper, a higher-order separation logic for proving termination-preserving refinements. Caliper uses probabilistic couplings to carry out relational reasoning between a program and a model. To handle the range of recursion patterns found in higher-order programs, Caliper uses guarded recursion, in particular the principle of L\"ob induction. A technical novelty is that Caliper does not require the use of transfinite step indexing or other technical restrictions found in prior work on guarded recursion for termination-preservation refinement. We demonstrate the flexibility of this approach by proving almost-sure termination of several examples, including first-order loop constructs, a random list generator, treaps, and a sampler for Galton-Watson trees that uses higher-order store. All the results have been mechanized in the Coq proof assistant.
Dataset Reset Policy Optimization for RLHF
Authors: Jonathan D. Chang, Wenhao Shan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.08495
Pdf link: https://arxiv.org/pdf/2404.08495
Abstract Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of reset, we propose a new RLHF algorithm with provable guarantees. Motivated by the fact that offline preference dataset provides informative states (i.e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution. In theory, we show that DR-PO learns to perform at least as good as any policy that is covered by the offline dataset under general function approximation with finite sample complexity. In experiments, we demonstrate that on both the TL;DR summarization and the Anthropic Helpful Harmful (HH) dataset, the generation from DR-PO is better than that from Proximal Policy Optimization (PPO) and Direction Preference Optimization (DPO), under the metric of GPT4 win-rate. Code for this work can be found at https://github.com/Cornell-RL/drpo.
Adversarial Imitation Learning via Boosting
Authors: Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08513
Pdf link: https://arxiv.org/pdf/2404.08513
Abstract Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective is on-policy and DAC's ad-hoc application of off-policy training does not guarantee successful imitation (Kostrikov et al., 2019; 2020). Follow-up work such as ValueDICE (Kostrikov et al., 2020) tackles this issue by deriving a fully off-policy AIL objective. Instead in this work, we develop a novel and principled AIL algorithm via the framework of boosting. Like boosting, our new algorithm, AILBoost, maintains an ensemble of properly weighted weak learners (i.e., policies) and trains a discriminator that witnesses the maximum discrepancy between the distributions of the ensemble and the expert policy. We maintain a weighted replay buffer to represent the state-action distribution induced by the ensemble, allowing us to train discriminators using the entire data collected so far. In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework. Empirically, we evaluate our algorithm on both controller state-based and pixel-based environments from the DeepMind Control Suite. AILBoost outperforms DAC on both types of environments, demonstrating the benefit of properly weighting replay buffer data for off-policy training. On state-based environments, DAC outperforms ValueDICE and IQ-Learn (Gary et al., 2021), achieving competitive performance with as little as one expert trajectory.
Automatic Recommendations for Evolving Relational Databases Schema
Authors: Anne Etien, Nicolas Anquetil
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.08525
Pdf link: https://arxiv.org/pdf/2404.08525
Abstract Relational databases play a central role in many information systems. Their schema contains structural (e.g. tables and columns) and behavioral (e.g. stored procedures or views) entity descriptions. Then, just like for ``normal'' software, changes in legislation, offered functionalities, or functional contexts, impose to evolve databases and their schemas. But in some scenarios, it is not so easy to deconstruct a wished evolution of the schema into a precise sequence of operations. Changing a database schema may impose manually dropping and recreating dependent entities, or manually searching for dependencies in stored procedures. This is important because getting even the order of application of the operators can be difficult and have profound consequences. This meta-model allows us to compute the impact of planned changes and recommend additional changes that will ensure that the RDBMS constraints are always verified. The recommendations can then be compiled into a valid SQL patch actually updating the database schema in an orderly way. We replicated a past evolution showing that, without detailed knowledge of the database, we could perform the same change in 75\% less time than the expert database architect. We also exemplify the use of our approach on other planned changes.
Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation
Authors: Abu Bakor Hayat Arnob, Xiangxue Wang, Yiping Jiao, Xiao Gan, Wenlong Ming, Jun Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08584
Pdf link: https://arxiv.org/pdf/2404.08584
Abstract Medical image processing usually requires a model trained with carefully crafted datasets due to unique image characteristics and domain-specific challenges, especially in pathology. Primitive detection and segmentation in digitized tissue samples are essential for objective and automated diagnosis and prognosis of cancer. SAM (Segment Anything Model) has recently been developed to segment general objects from natural images with high accuracy, but it requires human prompts to generate masks. In this work, we present a novel approach that adapts pre-trained natural image encoders of SAM for detection-based region proposals. Regions proposed by a pre-trained encoder are sent to cascaded feature propagation layers for projection. Then, local semantic and global context is aggregated from multi-scale for bounding box localization and classification. Finally, the SAM decoder uses the identified bounding boxes as essential prompts to generate a comprehensive primitive segmentation map. The entire base framework, SAM, requires no additional training or fine-tuning but could produce an end-to-end result for two fundamental segmentation tasks in pathology. Our method compares with state-of-the-art models in F1 score for nuclei detection and binary/multiclass panoptic(bPQ/mPQ) and mask quality(dice) for segmentation quality on the PanNuke dataset while offering end-to-end efficiency. Our model also achieves remarkable Average Precision (+4.5%) on the secondary dataset (HuBMAP Kidney) compared to Faster RCNN. The code is publicly available at https://github.com/learner-codec/autoprom_sam.
Improving Referring Image Segmentation using Vision-Aware Text Features
Authors: Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08590
Pdf link: https://arxiv.org/pdf/2404.08590
Abstract Referring image segmentation is a challenging task that involves generating pixel-wise segmentation masks based on natural language descriptions. Existing methods have relied mostly on visual features to generate the segmentation masks while treating text features as supporting components. This over-reliance on visual features can lead to suboptimal results, especially in complex scenarios where text prompts are ambiguous or context-dependent. To overcome these challenges, we present a novel framework VATEX to improve referring image segmentation by enhancing object and context understanding with Vision-Aware Text Feature. Our method involves using CLIP to derive a CLIP Prior that integrates an object-centric visual heatmap with text description, which can be used as the initial query in DETR-based architecture for the segmentation task. Furthermore, by observing that there are multiple ways to describe an instance in an image, we enforce feature similarity between text variations referring to the same visual input by two components: a novel Contextual Multimodal Decoder that turns text embeddings into vision-aware text features, and a Meaning Consistency Constraint to ensure further the coherent and consistent interpretation of language expressions with the context understanding obtained from the image. Our method achieves a significant performance improvement on three benchmark datasets RefCOCO, RefCOCO+ and G-Ref. Code is available at: https://nero1342.github.io/VATEX\_RIS.
Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation
Authors: Yanhao Zheng, Kai Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08603
Pdf link: https://arxiv.org/pdf/2404.08603
Abstract Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects from novel classes unseen at the training time. Whereas, empirical studies reveal that advanced detectors generally assign lower scores to those novel instances, which are inadvertently suppressed during inference by commonly adopted greedy strategies like Non-Maximum Suppression (NMS), leading to sub-optimal detection performance for novel classes. This paper systematically investigates this problem with the commonly-adopted two-stage OVOD paradigm. Specifically, in the region-proposal stage, proposals that contain novel instances showcase lower objectness scores, since they are treated as background proposals during the training phase. Meanwhile, in the object-classification stage, novel objects share lower region-text similarities (i.e., classification scores) due to the biased visual-language alignment by seen training samples. To alleviate this problem, this paper introduces two advanced measures to adjust confidence scores and conserve erroneously dismissed objects: (1) a class-agnostic localization quality estimate via overlap degree of region/object proposals, and (2) a text-guided visual similarity estimate with proxy prototypes for novel classes. Integrated with adjusting techniques specifically designed for the region-proposal and object-classification stages, this paper derives the aggregated confidence estimate for the open-vocabulary object detection paradigm (AggDet). Our AggDet is a generic and training-free post-processing scheme, which consistently bolsters open-vocabulary detectors across model scales and architecture designs. For instance, AggDet receives 3.3% and 1.5% gains on OV-COCO and OV-LVIS benchmarks respectively, without any training cost.
Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network
Authors: Xin Tie, Muheon Shin, Changhee Lee, Scott B. Perlman, Zachary Huemann, Amy J. Weisman, Sharon M. Castellino, Kara M. Kelly, Kathleen M. McCarten, Adina L. Alazraki, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Medical Physics (physics.med-ph)
Arxiv link: https://arxiv.org/abs/2404.08611
Pdf link: https://arxiv.org/pdf/2404.08611
Abstract $\textbf{Purpose}$: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. $\textbf{Materials and Methods}$: This retrospective study included baseline (PET1) and interim (PET2) PET/CT images from 297 patients enrolled in two Children's Oncology Group clinical trials (AHOD1331 and AHOD0831). LAS-Net incorporates longitudinal cross-attention, allowing relevant features from PET1 to inform the analysis of PET2. Model performance was evaluated using Dice coefficients for PET1 and detection F1 scores for PET2. Additionally, we extracted and compared quantitative PET metrics, including metabolic tumor volume (MTV) and total lesion glycolysis (TLG) in PET1, as well as qPET and $\Delta$SUVmax in PET2, against physician measurements. We quantified their agreement using Spearman's $\rho$ correlations and employed bootstrap resampling for statistical analysis. $\textbf{Results}$: LAS-Net detected residual lymphoma in PET2 with an F1 score of 0.606 (precision/recall: 0.615/0.600), outperforming all comparator methods (P<0.01). For baseline segmentation, LAS-Net achieved a mean Dice score of 0.772. In PET quantification, LAS-Net's measurements of qPET, $\Delta$SUVmax, MTV and TLG were strongly correlated with physician measurements, with Spearman's $\rho$ of 0.78, 0.80, 0.93 and 0.96, respectively. The performance remained high, with a slight decrease, in an external testing cohort. $\textbf{Conclusion}$: LAS-Net achieved high performance in quantifying PET metrics across serial scans, highlighting the value of longitudinal awareness in evaluating multi-time-point imaging datasets.
Synthetic Dataset Creation and Fine-Tuning of Transformer Models for Question Answering in Serbian
Authors: Aleksa Cvetanović, Predrag Tadić
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.08617
Pdf link: https://arxiv.org/pdf/2404.08617
Abstract In this paper, we focus on generating a synthetic question answering (QA) dataset using an adapted Translate-Align-Retrieve method. Using this method, we created the largest Serbian QA dataset of more than 87K samples, which we name SQuAD-sr. To acknowledge the script duality in Serbian, we generated both Cyrillic and Latin versions of the dataset. We investigate the dataset quality and use it to fine-tune several pre-trained QA models. Best results were obtained by fine-tuning the BERTi\'c model on our Latin SQuAD-sr dataset, achieving 73.91% Exact Match and 82.97% F1 score on the benchmark XQuAD dataset, which we translated into Serbian for the purpose of evaluation. The results show that our model exceeds zero-shot baselines, but fails to go beyond human performance. We note the advantage of using a monolingual pre-trained model over multilingual, as well as the performance increase gained by using Latin over Cyrillic. By performing additional analysis, we show that questions about numeric values or dates are more likely to be answered correctly than other types of questions. Finally, we conclude that SQuAD-sr is of sufficient quality for fine-tuning a Serbian QA model, in the absence of a manually crafted and annotated dataset.
Using Information Flow to estimate interference between developers same method contributions
Authors: Roberto Souto Maior de Barros Filho, Paulo Borba
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.08619
Pdf link: https://arxiv.org/pdf/2404.08619
Abstract This work's main goal is to understand if Information Flow Control (IFC), a security technique used for discovering leaks in software, could be used to indicate the presence of dynamic semantic conflicts between developers contributions in merge scenarios. However, as defining if a dynamic semantic conflict exists involves understanding the expected behaviour of a system, and as such behavioural specifications are often hard to capture, formalize and reason about, we instead try to detect a code level adaptation of the notion of interference from Goguen and Meseguer. We limit our scope to interference caused by developers contributions on the same method. Therefore, we conduct an evaluation to understand if information flow may be used to estimate interference. In particular, we use Java Object-sensitive Analysis (JOANA) to do the IFC for Java programs. JOANA does the IFC of Java programs by using a System Dependence Graph (SDG), a directed graph representing the information flow through a program. Additionally, we bring evidence that information flow between developers same-method contributions occurred for around 64% of the scenarios we evaluated. Finally, we conducted a manual analysis, on 35 scenarios with information flow between developers same-method contributions, to understand the limitations of using information flow to estimate interference between same-method contributions. From the 35 analysed scenarios, for only 15 we considered that an interference in fact existed. We found three different major reasons for detecting information flow and no interference: cases related to the nature of changes, to excessive annotation from our strategy and to the conservativeness of the flows identified by JOANA. We conclude that information flow may be used to estimate interference, but, ideally, the number of false positives should be reduced.
FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models
Authors: Yanting Wang, Wei Zou, Jinyuan Jia
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.08631
Pdf link: https://arxiv.org/pdf/2404.08631
Abstract Few-shot classification with foundation models (e.g., CLIP, DINOv2, PaLM-2) enables users to build an accurate classifier with a few labeled training samples (called support samples) for a classification task. However, an attacker could perform data poisoning attacks by manipulating some support samples such that the classifier makes the attacker-desired, arbitrary prediction for a testing input. Empirical defenses cannot provide formal robustness guarantees, leading to a cat-and-mouse game between the attacker and defender. Existing certified defenses are designed for traditional supervised learning, resulting in sub-optimal performance when extended to few-shot classification. In our work, we propose FCert, the first certified defense against data poisoning attacks to few-shot classification. We show our FCert provably predicts the same label for a testing input under arbitrary data poisoning attacks when the total number of poisoned support samples is bounded. We perform extensive experiments on benchmark few-shot classification datasets with foundation models released by OpenAI, Meta, and Google in both vision and text domains. Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing state-of-the-art certified defenses for data poisoning attacks, and 3) is efficient and general.
Pre-training Small Base LMs with Fewer Tokens
Authors: Sunny Sanyal, Sujay Sanghavi, Alexandros G. Dimakis
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08634
Pdf link: https://arxiv.org/pdf/2404.08634
Abstract We study the effectiveness of a simple approach to develop a small base language model (LM) starting from an existing large base LM: first inherit a few transformer blocks from the larger LM, and then train this smaller model on a very small subset (0.1\%) of the raw pretraining data of the larger model. We call our simple recipe Inheritune and first demonstrate it for building a small base LM with 1.5B parameters using 1B tokens (and a starting few layers of larger LM of 3B parameters); we do this using a single A6000 GPU for less than half a day. Across 9 diverse evaluation datasets as well as the MMLU benchmark, the resulting model compares favorably to publicly available base models of 1B-2B size, some of which have been trained using 50-1000 times more tokens. We investigate Inheritune in a slightly different setting where we train small LMs utilizing larger LMs and their full pre-training dataset. Here we show that smaller LMs trained utilizing some of the layers of GPT2-medium (355M) and GPT-2-large (770M) can effectively match the val loss of their bigger counterparts when trained from scratch for the same number of training steps on OpenWebText dataset with 9B tokens. We analyze our recipe with extensive experiments and demonstrate it efficacy on diverse settings. Our code is available at https://github.com/sanyalsunny111/LLM-Inheritune.

Yukeaaa / arxiv-daily

【CS-part2】New submissions for Mon, 15 Apr 24 #1362

Keyword: webgpu

Keyword: webgl

Keyword: pre-rendering

Keyword: prerendering

Keyword: motion prediction

Keyword: incremental learning

Keyword: svm incremental

Keyword: nerf

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

GPN: Generative Point-based NeRF

OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering

Keyword: multiorgan

Keyword: multi-organ

Keyword: multi organ

Keyword: SAM

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

GRANP: A Graph Recurrent Attentive Neural Process Model for Vehicle Trajectory Prediction

Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition

Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected Systems

WildGraph: Realistic Graph-based Trajectory Generation for Wildlife

Persistent Classification: A New Approach to Stability of Data and Adversarial Examples

Byzantine Reliable Broadcast with Low Communication and Time Complexity

SQBC: Active Learning using LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

The Impact of School and Family Networks on COVID-19 Infections Among Dutch Students: A Study Using Population-Level Registry Data

Protein intrinsic disorder prediction using Attention U-Net and ProtTrans protein language model

Self-Supervised Learning of Color Constancy

Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization

On the Power of Interactive Proofs for Learning

Naively Sorting Evolving Data is Optimal and Robust

Lightweight Cryptanalysis of IoT Encryption Algorithms : Is Quota Sampling the Answer?

GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality

HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Practical Region-level Attack against Segment Anything Models

Bottom-up Rebalancing Binary Search Trees by Flipping a Coin

A Large Scale Survey of Motivation in Software Development and Analysis of its Validity

The Integration of Semantic and Structural Knowledge in Knowledge Graph Entity Typing

BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task Promoting

Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training

OTFS Channel Estimation and Detection for Channels with Very Large Delay Spread

Data-driven Interval MDP for Robust Control Synthesis

Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

Collective Bayesian Decision-Making in a Swarm of Miniaturized Robots for Surface Inspection

Data-Driven Preference Sampling for Pareto Front Learning

Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning

Evolutionary Preference Sampling for Pareto Set Learning

AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees

Direct May Not Be the Best: An Incremental Evolution View of Pose Generation

Adapting the Segment Anything Model During Usage in Novel Situations

Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing

Federated Optimization with Doubly Regularized Drift Correction

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Lightweight Multi-System Multivariate Interconnection and Divergence Discovery

The Squared Kemeny Rule for Averaging Rankings

Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Almost-Sure Termination by Guarded Refinement

Dataset Reset Policy Optimization for RLHF

Adversarial Imitation Learning via Boosting

Automatic Recommendations for Evolving Relational Databases Schema

Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation

Improving Referring Image Segmentation using Vision-Aware Text Features

Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation

Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network

Synthetic Dataset Creation and Fine-Tuning of Transformer Models for Question Answering in Serbian

Using Information Flow to estimate interference between developers same method contributions

FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models

Pre-training Small Base LMs with Fewer Tokens