New submissions for Mon, 27 Nov 23

Keyword: efficient

TRIDENT: The Nonlinear Trilogy for Implicit Neural Representations

Authors: Zhenda Shen, Yanqi Cheng, Raymond H. Chan, Pietro Liò, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.13610
Pdf link: https://arxiv.org/pdf/2311.13610
Abstract Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secondly, TRIDENT efficiently captures frequency information, a feature called frequency compactness. Thirdly, it has the capability to represent signals or images such that most of its energy is concentrated in a limited spatial region, denoting spatial compactness. We demonstrated through extensive experiments on various inverse problems that our proposed function outperforms existing implicit neural representation functions.
Efficient Transformer Knowledge Distillation: A Performance Review
Authors: Nathan Brown, Ashton Williamson, Tahj Anderson, Logan Lawrence
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13657
Pdf link: https://arxiv.org/pdf/2311.13657
Abstract As pretrained transformer language models continue to achieve state-of-the-art performance, the Natural Language Processing community has pushed for advances in model compression and efficient attention mechanisms to address high computational requirements and limited input sequence length. Despite these separate efforts, no investigation has been done into the intersection of these two fields. In this work, we provide an evaluation of model compression via knowledge distillation on efficient attention transformers. We provide cost-performance trade-offs for the compression of state-of-the-art efficient attention architectures and the gains made in performance in comparison to their full attention counterparts. Furthermore, we introduce a new long-context Named Entity Recognition dataset, GONERD, to train and test the performance of NER models on long sequences. We find that distilled efficient attention transformers can preserve a significant amount of original model performance, preserving up to 98.6% across short-context tasks (GLUE, SQUAD, CoNLL-2003), up to 94.6% across long-context Question-and-Answering tasks (HotpotQA, TriviaQA), and up to 98.8% on long-context Named Entity Recognition (GONERD), while decreasing inference times by up to 57.8%. We find that, for most models on most tasks, performing knowledge distillation is an effective method to yield high-performing efficient attention models with low costs.
Molly: A Verified Compiler for Cryptoprotocol Roles
Authors: Daniel J. Dougherty, Joshua D. Guttman
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.13692
Pdf link: https://arxiv.org/pdf/2311.13692
Abstract Molly is a program that compiles cryptographic protocol roles written in a high-level notation into straight-line programs in an intermediate-level imperative language, suitable for implementation in a conventional programming language. We define a denotational semantics for protocol roles based on an axiomatization of the runtime. A notable feature of our approach is that we assume that encryption is randomized. Thus, at the runtime level we treat encryption as a relation rather than a function. Molly is written in Coq, and generates a machine-checked proof that the procedure it constructs is correct with respect to the runtime semantics. Using Coq's extraction mechanism, one can build an efficient functional program for compilation.
DiverseNet: Decision Diversified Semi-supervised Semantic Segmentation Networks for Remote Sensing Imagery
Authors: Wanli Ma, Oktay Karakus, Paul L. Rosin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13716
Pdf link: https://arxiv.org/pdf/2311.13716
Abstract Semi-supervised learning is designed to help reduce the cost of the manual labelling process by exploiting the use of useful features from a large quantity of unlabelled data during training. Since pixel-level manual labelling in large-scale remote sensing imagery is expensive, semi-supervised learning becomes an appropriate solution to this. However, most of the existing semi-supervised learning methods still lack efficient perturbation methods to promote diversity of features and the precision of pseudo labels during training. In order to fill this gap, we propose DiverseNet architectures which explore multi-head and multi-model semi-supervised learning algorithms by simultaneously promoting precision and diversity during training. The two proposed methods of DiverseNet, namely the DiverseHead and DiverseModel, achieve the highest semantic segmentation performance in four widely utilised remote sensing imagery data sets compared to state-of-the-art semi-supervised learning methods. Meanwhile, the proposed DiverseHead architecture is relatively lightweight in terms of parameter space compared to the state-of-the-art methods whilst reaching high-performance results for all the tested data sets.
A Unified Approach to Count-Based Weakly-Supervised Learning
Authors: Vinay Shukla, Zhe Zeng, Kareem Ahmed, Guy Van den Broeck
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13718
Pdf link: https://arxiv.org/pdf/2311.13718
Abstract High-quality labels are often very scarce, whereas unlabeled data with inferred weak labels occurs more naturally. In many cases, these weak labels dictate the frequency of each respective class over a set of instances. In this paper, we develop a unified approach to learning from such weakly-labeled data, which we call count-based weakly-supervised learning. At the heart of our approach is the ability to compute the probability of exactly k out of n outputs being set to true. This computation is differentiable, exact, and efficient. Building upon the previous computation, we derive a count loss penalizing the model for deviations in its distribution from an arithmetic constraint defined over label counts. We evaluate our approach on three common weakly-supervised learning paradigms and observe that our proposed approach achieves state-of-the-art or highly competitive results across all three of the paradigms.
Sample-Efficient Training for Diffusion
Authors: Shivam Gupta, Aditya Parulekar, Eric Price, Zhiyang Xun
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13745
Pdf link: https://arxiv.org/pdf/2311.13745
Abstract Score-based diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. Recently, a number of theoretical works \citep{chen2022, Chen2022ImprovedAO, Chenetal23flowode, benton2023linear} have shown that diffusion models can efficiently sample, assuming $L^2$-accurate score estimates. The score-matching objective naturally approximates the true score in $L^2$, but the sample complexity of existing bounds depends \emph{polynomially} on the data radius and desired Wasserstein accuracy. By contrast, the time complexity of sampling is only logarithmic in these parameters. We show that estimating the score in $L^2$ \emph{requires} this polynomial dependence, but that a number of samples that scales polylogarithmically in the Wasserstein accuracy actually do suffice for sampling. We show that with a polylogarithmic number of samples, the ERM of the score-matching objective is $L^2$ accurate on all but a probability $\delta$ fraction of the true distribution, and that this weaker guarantee is sufficient for efficient sampling.
3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology
Authors: Asma Ben Abacha, Alberto Santamaria-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P. Lungren, Thomas Lin, Noel C Codella, Ivan Tarapov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13752
Pdf link: https://arxiv.org/pdf/2311.13752
Abstract The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retrieval holds potential to reduce radiologist workloads by enabling clinicians to efficiently search through diagnostically similar or otherwise relevant cases, resulting in faster and more precise diagnoses. However, the field of 3D medical image retrieval is still emerging, lacking established evaluation benchmarks, comprehensive datasets, and thorough studies. This paper attempts to bridge this gap by introducing a novel benchmark for 3D Medical Image Retrieval (3D-MIR) that encompasses four different anatomies imaged with computed tomography. Using this benchmark, we explore a diverse set of search strategies that use aggregated 2D slices, 3D volumes, and multi-modal embeddings from popular multi-modal foundation models as queries. Quantitative and qualitative assessments of each approach are provided alongside an in-depth discussion that offers insight for future research. To promote the advancement of this field, our benchmark, dataset, and code are made publicly available.
Work-Efficient Parallel Derandomization I: Chernoff-like Concentrations via Pairwise Independence
Authors: Mohsen Ghaffari, Christoph Grunau, Václav Rozhoň
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.13764
Pdf link: https://arxiv.org/pdf/2311.13764
Abstract We present a novel technique for work-efficient parallel derandomization, for algorithms that rely on the concentration of measure bounds such as Chernoff, Hoeffding, and Bernstein inequalities. Our method increases the algorithm's computational work and depth by only polylogarithmic factors. Before our work, the only known method to obtain parallel derandomization with such strong concentrations was by the results of [Motwani, Naor, and Naor FOCS'89; Berger and Rompel FOCS'89], which perform a binary search in a $k$-wise independent space for $k=poly(\log n)$. However, that method blows up the computational work by a high $poly(n)$ factor and does not yield work-efficient parallel algorithms. Their method was an extension of the approach of [Luby FOCS'88], which gave a work-efficient derandomization but was limited to algorithms analyzed with only pairwise independence. Pushing the method from pairwise to the higher $k$-wise analysis resulted in the $poly(n)$ factor computational work blow-up. Our work can be viewed as an alternative extension from the pairwise case, which yields the desired strong concentrations while retaining work efficiency up to logarithmic factors. Our approach works by casting the problem of determining the random variables as an iterative process with $poly(\log n)$ iterations, where different iterations have independent randomness. This is done so that for the desired concentrations, we need only pairwise independence inside each iteration. In particular, we model each binary random variable as a result of a gradual random walk, and our method shows that the desired Chernoff-like concentrations about the endpoints of these walks can be boiled down to some pairwise analysis on the steps of these random walks in each iteration (while having independence across iterations).
Work-Efficient Parallel Derandomization II: Optimal Concentrations via Bootstrapping
Authors: Mohsen Ghaffari, Christoph Grunau
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.13771
Pdf link: https://arxiv.org/pdf/2311.13771
Abstract We present an efficient parallel derandomization method for randomized algorithms that rely on concentrations such as the Chernoff bound. This settles a classic problem in parallel derandomization, which dates back to the 1980s. Consider the \textit{set balancing} problem where $m$ sets of size at most $s$ are given in a ground set of size $n$, and we should partition the ground set into two parts such that each set is split evenly up to a small additive (discrepancy) bound. A random partition achieves a discrepancy of $O(\sqrt{s \log m})$ in each set, by Chernoff bound. We give a deterministic parallel algorithm that matches this bound, using near-linear work and polylogarithmic depth. The previous results were weaker in discrepancy and/or work bounds: Motwani, Naor, and Naor [FOCS'89] and Berger and Rompel [FOCS'89] achieve discrepancy $s^{\varepsilon} \cdot O(\sqrt{s \log m})$ with work $\tilde{O}(m+n+\sum_{i=1}^{m} |S_i|) \cdot m^{\Theta(1/\varepsilon)}$ and polylogarithmic depth; the discrepancy was optimized to $O(\sqrt{s \log m})$ in later work, e.g. by Harris [Algorithmica'19], but the work bound remained high at $\tilde{O}(m^4n^3)$. Ghaffari, Grunau, and Rozhon [FOCS'23] achieve discrepancy $s/poly(\log(nm)) + O(\sqrt{s \log m})$ with near-linear work and polylogarithmic-depth. Notice that this discrepancy is barely sublinear with respect to the trivial bound of $s$. Our method relies on a novel bootstrapping idea that uses crude partitioning algorithms as a subroutine. In particular, we solve the problem recursively, by using the crude partition in each iteration to split the variables into many smaller parts, and then we find a constraint for the variables in each part such that we reduce the overall number of variables in the problem. The scheme relies on an interesting application of the multiplicative weights update method to control the variance losses in each iteration.
Scalable AI Generative Content for Vehicular Network Semantic Communication
Authors: Hao Feng, Yi Yang, Zhu Han
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13782
Pdf link: https://arxiv.org/pdf/2311.13782
Abstract Perceiving vehicles in a driver's blind spot is vital for safe driving. The detection of potentially dangerous vehicles in these blind spots can benefit from vehicular network semantic communication technology. However, efficient semantic communication involves a trade-off between accuracy and delay, especially in bandwidth-limited situations. This paper unveils a scalable Artificial Intelligence Generated Content (AIGC) system that leverages an encoder-decoder architecture. This system converts images into textual representations and reconstructs them into quality-acceptable images, optimizing transmission for vehicular network semantic communication. Moreover, when bandwidth allows, auxiliary information is integrated. The encoder-decoder aims to maintain semantic equivalence with the original images across various tasks. Then the proposed approach employs reinforcement learning to enhance the reliability of the generated contents. Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data. While this method is specifically designed for driving scenarios, this encoder-decoder architecture also holds potential for wide use across various semantic communication scenarios.
Safe Physical Human-Robot Interaction through Variable Impedance Control based on ISO/TS 15066
Authors: Armin Ghanbarzadeh, Esmaeil Najafi
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13814
Pdf link: https://arxiv.org/pdf/2311.13814
Abstract The successful implementation of Physical Human-Robot Interaction in industrial environments depends on ensuring safe collaboration between human operators and robotic devices. This necessitates the adoption of measures that guarantee the safety of human operators in close proximity to robots, without constraining the speed and motion of the robotic systems. This paper proposes a novel variable impedance-based controller for cobots that ensures safe collaboration by adhering to the ISO/TS 15066 safety standard, namely power and force limiting mode, while achieving higher operational speeds. The effectiveness of the proposed controller has been compared with conventional methods and implemented on two different robotic platforms. The results demonstrate the designed controller achieves higher speeds, while maintaining compliance with safety standards. The proposed variable impedance holds significant potential for enabling efficient and safe collaboration between humans and robots in industrial settings.
HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted Corrections for Reliable Regression on Imbalanced Electrocardiograms
Authors: Uddeshya Upadhyay, Sairam Bade, Arjun Puranik, Shahir Asfahan, Melwin Babu, Francisco Lopez-Jimenez, Samuel J. Asirvatham, Ashim Prasad, Ajit Rajasekharan, Samir Awasthi, Rakesh Barve
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2311.13821
Pdf link: https://arxiv.org/pdf/2311.13821
Abstract The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals effectively. However, previous research has primarily focused on classifying medical time series rather than attempting to regress the continuous-valued physiological parameters central to diagnosis. One significant challenge in this regard is the imbalanced nature of the dataset, as a low prevalence of abnormal conditions can lead to heavily skewed data that results in inaccurate predictions and a lack of certainty in such predictions when deployed. To address these challenges, we propose HypUC, a framework for imbalanced probabilistic regression in medical time series, making several contributions. (i) We introduce a simple kernel density-based technique to tackle the imbalanced regression problem with medical time series. (ii) Moreover, we employ a probabilistic regression framework that allows uncertainty estimation for the predicted continuous values. (iii) We also present a new approach to calibrate the predicted uncertainty further. (iv) Finally, we demonstrate a technique to use calibrated uncertainty estimates to improve the predicted continuous value and show the efficacy of the calibrated uncertainty estimates to flag unreliable predictions. HypUC is evaluated on a large, diverse, real-world dataset of ECGs collected from millions of patients, outperforming several conventional baselines on various diagnostic tasks, suggesting a potential use-case for the reliable clinical deployment of deep learning models.
Constraint-Guided Online Data Selection for Scalable Data-Driven Safety Filters in Uncertain Robotic Systems
Authors: Jason J. Choi, Fernando Castañeda, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13824
Pdf link: https://arxiv.org/pdf/2311.13824
Abstract As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant reliance on high-quality training data. In response to these challenges, this study presents a scalable data-driven controller that efficiently identifies and infers from the most informative data points for implementing data-driven safety filters. Our approach is grounded in the integration of a model-based certificate function-based method and Gaussian Process (GP) regression, reinforced by a novel online data selection algorithm that reduces time complexity from quadratic to linear relative to dataset size. Empirical evidence, gathered from successful real-world cart-pole swing-up experiments and simulated locomotion of a five-link bipedal robot, demonstrates the efficacy of our approach. Our findings reveal that our efficient online data selection algorithm, which strategically selects key data points, enhances the practicality and efficiency of data-driven certifying filters in complex robotic systems, significantly mitigating scalability concerns inherent in nonparametric learning-based control methods.
A reduced basis warm-start iterative solver for the parameterized systems
Authors: Shijin Hou, Yanlai Chen, Yinhua Xia
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13862
Pdf link: https://arxiv.org/pdf/2311.13862
Abstract The reduced basis methods (RBMs) are widely use in fast solution of the parametrized parametrized linear systems. In some problems lacking good order-reduction condition, only the RBMs are not competent to give a high-precision solution with an affordable computational cost of the offline stage. To develop a high-precision solution and balance the offline and online cost, we explore a reasonable and effective framework for accelerating the iterative methods that is based on the RBMs. Firstly, the highly efficient reduced basis (RB) solver is used as the generation tool of accurate initial values. This data-driven initialization method could provide a warm start for the iterative methods. Secondly, we analyze the further acceleration of the RBMs as a preconditioner. For the purpose of high-precision solution, the RBM-preconditioner not only fail to accelerate the convergence but also need to pay more cost for the overuse of the RBMs. Two numerical test on 3D steady-state diffusion equations for two- and six-dimensional parameter space are presented to demonstrate the capability and efficiency of the RBM-initialized pure high-fidelity iterative methods.
PointPCA+: Extending PointPCA objective quality assessment metric
Authors: Xuemei Zhou, Evangelos Alexiou, Irene Viola, Pablo Cesar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.13880
Pdf link: https://arxiv.org/pdf/2311.13880
Abstract A computationally-simplified and descriptor-richer Point Cloud Quality Assessment (PCQA) metric, namely PointPCA+, is proposed in this paper, which is an extension of PointPCA. PointPCA proposed a set of perceptually-relevant descriptors based on PCA decomposition that were applied to both the geometry and texture data of point clouds for full reference PCQA. PointPCA+ employs PCA only on the geometry data while enriching existing geometry and texture descriptors, that are computed more efficiently. Similarly to PointPCA, a total quality score is obtained through a learning-based fusion of individual predictions from geometry and texture descriptors that capture local shape and appearance properties, respectively. Before feature fusion, a feature selection module is introduced to choose the most effective features from a proposed super-set. Experimental results show that PointPCA+ achieves high predictive performance against subjective ground truth scores obtained from publicly available datasets. The code is available at \url{https://github.com/cwi-dis/pointpca_suite/}.
A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs
Authors: Muhammad Ilyas Azeem, Sallam Abualhaija
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13881
Pdf link: https://arxiv.org/pdf/2311.13881
Abstract Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern to requirements engineering (RE). Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated in GDPR and outlined in DPAs. Requirements engineers can elicit from DPAs legal requirements for regulating the data processing activities in software systems. Checking the completeness of a DPA according to the GDPR provisions is therefore an essential prerequisite to ensure that the elicited requirements are complete. Analyzing DPAs entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy to address the completeness checking of DPAs against GDPR. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.
Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach
Authors: Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, Guoliang Fan
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13884
Pdf link: https://arxiv.org/pdf/2311.13884
Abstract The significant advancements in large language models (LLMs) have presented novel opportunities for tackling planning and decision-making within multi-agent systems. However, as the number of agents increases, the issues of hallucination in LLMs and coordination in multi-agent systems (MAS) have become increasingly pronounced. Additionally, the efficient utilization of tokens becomes a critical consideration when employing LLMs to facilitate the interactions of large numbers of agents. In this paper, we present a novel framework aimed at enhancing coordination and decision-making capabilities of LLMs within large-scale multi-agent environments. Our approach draws inspiration from the actor-critic framework employed in multi-agent reinforcement learning, and we develop a modular and token-efficient solution that effectively addresses challenges presented by LLMs and MAS. Through evaluations conducted in experiments involving system resource allocation and robot grid transportation, we demonstrate the considerable advantages afforded by our proposed approach.
High-order upwind summation-by-parts methods for nonlinear conservation laws
Authors: Hendrik Ranocha, Andrew R. Winters, Michael Schlottke-Lakemper, Philipp Öffner, Jan Glaubitz, Gregor J. Gassner
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13888
Pdf link: https://arxiv.org/pdf/2311.13888
Abstract High-order methods for conservation laws can be very efficient, in particular on modern hardware. However, it can be challenging to guarantee their stability and robustness, especially for under-resolved flows. A typical approach is to combine a well-working baseline scheme with additional techniques to ensure invariant domain preservation. To obtain good results without too much dissipation, it is important to develop suitable baseline methods. In this article, we study upwind summation-by-parts operators, which have been used mostly for linear problems so far. These operators come with some built-in dissipation everywhere, not only at element interfaces as typical in discontinuous Galerkin methods. At the same time, this dissipation does not introduce additional parameters. We discuss the relation of high-order upwind summation-by-parts methods to flux vector splitting schemes and investigate their local linear/energy stability. Finally, we present some numerical examples for shock-free flows of the compressible Euler equations.
Beamforming Design for Hybrid IRS-aided AF Relay Wireless Networks
Authors: Xuehui Wang, Yifan Zhao, Feng Shu, Yan Wang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.13893
Pdf link: https://arxiv.org/pdf/2311.13893
Abstract In this paper, a hybrid IRS-aided amplify-and-forward (AF) relay wireless network is put forward, where the hybrid IRS is made up of passive and active elements. For maximum signal-to-noise ratio (SNR), a low-complexity method based on successive convex approximation and fractional programming (LC-SCA-FP) is proposed to jointly optimize the beamforming matrix at AF relay and the reflecting coefficient matrices at IRS. Simulation results verify that the rate achieved by the proposed LC-SCA-FP method surpass those of the benchmark schemes, namely the passive IRS-aided AF relay and only AF relay network.
A comparison of Algebraic Multigrid Bidomain solvers on hybrid CPU-GPU architectures
Authors: Edoardo Centofanti, Simone Scacchi
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13914
Pdf link: https://arxiv.org/pdf/2311.13914
Abstract The numerical simulation of cardiac electrophysiology is a highly challenging problem in scientific computing. The Bidomain system is the most complete mathematical model of cardiac bioelectrical activity. It consists of an elliptic and a parabolic partial differential equation (PDE), of reaction-diffusion type, describing the spread of electrical excitation in the cardiac tissue. The two PDEs are coupled with a stiff system of ordinary differential equations (ODEs), representing ionic currents through the cardiac membrane. Developing efficient and scalable preconditioners for the linear systems arising from the discretization of such computationally challenging model is crucial in order to reduce the computational costs required by the numerical simulations of cardiac electrophysiology. In this work, focusing on the Bidomain system as a model problem, we have benchmarked two popular implementations of the Algebraic Multigrid (AMG) preconditioner embedded in the PETSc library and we have studied the performance on the calibration of specific parameters. We have conducted our analysis on modern HPC architectures, performing scalability tests on multi-core and multi-GPUs setttings. The results have shown that, for our problem, although scalability is verified on CPUs, GPUs are the optimal choice, since they yield the best performance in terms of solution time.
Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning
Authors: Seonghak Kim, Gyeongdo Ham, Yucheol Cho, Daeshik Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13934
Pdf link: https://arxiv.org/pdf/2311.13934
Abstract The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore, in previous KD approaches, we observed that data augmentation, a technique aimed at enhancing a model's generalization, can have an adverse impact. Therefore, we propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning. This approach enables KD to effectively incorporate data augmentation for performance improvement. Extensive experiments on various datasets, including CIFAR-100, FGVR, TinyImagenet, and ImageNet, demonstrate our method's superiority over current state-of-the-art methods.
High-Ratio Compression for Machine-Generated Data
Authors: Jiujing Zhang, Zhitao Shen, Shiyu Yang, Lingkai Meng, Chuan Xiao, Wei Jia, Yue Li, Qinhui Sun, Wenjie Zhang, Xuemin Lin
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2311.13947
Pdf link: https://arxiv.org/pdf/2311.13947
Abstract Machine-generated data is rapidly growing and poses challenges for data-intensive systems, especially as the growth of data outpaces the growth of storage space. To cope with the storage issue, compression plays a critical role in storage engines, particularly for data-intensive applications, where high compression ratios and efficient random access are essential. However, existing compression techniques tend to focus on general-purpose and data block approaches, but overlook the inherent structure of machine-generated data and hence result in low-compression ratios or limited lookup efficiency. To address these limitations, we introduce the Pattern-Based Compression (PBC) algorithm, which specifically targets patterns in machine-generated data to achieve Pareto-optimality in most cases. Unlike traditional data block-based methods, PBC compresses data on a per-record basis, facilitating rapid random access. Our experimental evaluation demonstrates that PBC, on average, achieves a compression ratio twice as high as state-of-the-art techniques while maintaining competitive compression and decompression speeds.We also integrate PBC to a production database system and achieve improvement on both comparison ratio and throughput.
Optimal Power Flow in Highly Renewable Power System Based on Attention Neural Networks
Authors: Chen Li, Alexander Kies, Kai Zhou, Markus Schlott, Omar El Sayed, Mariia Bilousova, Horst Stoecker
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13949
Pdf link: https://arxiv.org/pdf/2311.13949
Abstract The Optimal Power Flow (OPF) problem is pivotal for power system operations, guiding generator output and power distribution to meet demand at minimized costs, while adhering to physical and engineering constraints. The integration of renewable energy sources, like wind and solar, however, poses challenges due to their inherent variability. This variability, driven largely by changing weather conditions, demands frequent recalibrations of power settings, thus necessitating recurrent OPF resolutions. This task is daunting using traditional numerical methods, particularly for extensive power systems. In this work, we present a cutting-edge, physics-informed machine learning methodology, trained using imitation learning and historical European weather datasets. Our approach directly correlates electricity demand and weather patterns with power dispatch and generation, circumventing the iterative requirements of traditional OPF solvers. This offers a more expedient solution apt for real-time applications. Rigorous evaluations on aggregated European power systems validate our method's superiority over existing data-driven techniques in OPF solving. By presenting a quick, robust, and efficient solution, this research sets a new standard in real-time OPF resolution, paving the way for more resilient power systems in the era of renewable energy.
Electric Network Frequency Optical Sensing Devices
Authors: Christos Moysiadis, Georgios Karantaidis, Constantine Kotropoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2311.13954
Pdf link: https://arxiv.org/pdf/2311.13954
Abstract Electric Network Frequency (ENF) acts as a fingerprint in multimedia forensics applications. In indoor environments, ENF variations affect the intensity of light sources connected to power mains. Accordingly, the light intensity variations captured by sensing devices can be exploited to estimate the ENF. A first optical sensing device based on a photodiode is developed for capturing ENF variations in indoor lighting environments. In addition, a device that captures the ENF directly from power mains is implemented. This device serves as a ground truth ENF collector. Video recordings captured by a camera are also employed to estimate the ENF. The camera serves as a second optical sensor. The factors affecting the ENF estimation are thoroughly studied. The maximum correlation coefficient between the ENF estimated by the two optical sensors and that estimated directly from power mains is used to measure the estimation accuracy. The paper's major contribution is in the disclosure of extensive experimental evidence on ENF estimation in scenes ranging from static ones capturing a white wall to non-static ones, including human activity.
Efficient Trigger Word Insertion
Authors: Yueqi Zeng, Ziqiang Li, Pengfei Xia, Lei Liu, Bin Li
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.13957
Pdf link: https://arxiv.org/pdf/2311.13957
Abstract With the boom in the natural language processing (NLP) field these years, backdoor attacks pose immense threats against deep neural network models. However, previous works hardly consider the effect of the poisoning rate. In this paper, our main objective is to reduce the number of poisoned samples while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor attacks. To accomplish this, we propose an efficient trigger word insertion strategy in terms of trigger word optimization and poisoned sample selection. Extensive experiments on different datasets and models demonstrate that our proposed method can significantly improve attack effectiveness in text classification tasks. Remarkably, our approach achieves an ASR of over 90% with only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of the training data in the clean-label setting.
An Efficient Distributed Nash Equilibrium Seeking with Compressed and Event-triggered Communication
Authors: Xiaomeng Chen, Wei Huo, Yuchi Wu, Subhrakanti Dey, Ling Shi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13994
Pdf link: https://arxiv.org/pdf/2311.13994
Abstract Distributed Nash equilibrium (NE) seeking problems for networked games have been widely investigated in recent years. Despite the increasing attention, communication expenditure is becoming a major bottleneck for scaling up distributed approaches within limited communication bandwidth between agents. To reduce communication cost, an efficient distributed NE seeking (ETC-DNES) algorithm is proposed to obtain an NE for games over directed graphs, where the communication efficiency is improved by event-triggered exchanges of compressed information among neighbors. ETC-DNES saves communication costs in both transmitted bits and rounds of communication. Furthermore, our method only requires the row-stochastic property of the adjacency graph, unlike previous approaches that hinged on double-stochastic communication matrices. We provide convergence guarantees for ETC-DNES on games with restricted strongly monotone mappings, testifying that such a communication method is efficient without sacrificing the accuracy of the algorithm. The algorithm and analysis are extended to a compressed algorithm with stochastic event-triggered mechanism (SETC-DNES). In SETC-DNES, we introduce a random variable in the triggering condition to further enhance algorithm efficiency. We demonstrate that SETC-DNES guarantees linear convergence to the optimal NE while achieving even greater reductions in communication costs compared to ETC-DNES. Finally, numerical simulations illustrate the effectiveness of the proposed algorithms.
An efficient mixed finite element method for nonlinear magnetostatics and quasistatics
Authors: Herbert Egger, Felix Engertsberger, Bogdan Radu
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.14019
Pdf link: https://arxiv.org/pdf/2311.14019
Abstract We consider systems of nonlinear magnetostatics and quasistatics that typically arise in the modeling and simulation of electric machines. The nonlinear problems, eventually obtained after time discretization, are usually solved by employing a vector potential formulation. In the relevant two-dimensional setting, a discretization can be obtained by H1-conforming finite elements. We here consider an alternative formulation based on the H-field which leads to a nonlinear saddlepoint problem. After commenting on the unique solvability, we study the numerical approximation by H(curl)-conforming finite elements and present the main convergence results. A particular focus is put on the efficient solution of the linearized systems arising in every step of the nonlinear Newton solver. Via hybridization, the linearized saddlepoint systems can be transformed into linear elliptic problems, which can be solved with similar computational complexity as those arising in the vector or scalar potential formulation. In summary, we can thus claim that the mixed finite element approach based on the $H$-field can be considered a competitive alternative to the standard vector or scalar potential formulations for the solution of problems in nonlinear magneto-quasistatics.
PrivateLoRA For Efficient Privacy Preserving LLM
Authors: Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.14030
Pdf link: https://arxiv.org/pdf/2311.14030
Abstract End users face a choice between privacy and efficiency in current Large Language Model (LLM) service paradigms. In cloud-based paradigms, users are forced to compromise data locality for generation quality and processing speed. Conversely, edge device paradigms maintain data locality but fail to deliver satisfactory performance. In this work, we propose a novel LLM service paradigm that distributes privacy-sensitive computation on edge devices and shared computation in the cloud. Only activations are transmitted between the central cloud and edge devices to ensure data locality. Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations, achieving over 95% communication reduction. Consequently, PrivateLoRA effectively maintains data locality and is extremely resource efficient. Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models. PrivateLoRA also provides tuning performance comparable to LoRA for advanced personalization. Our approach democratizes access to state-of-the-art generative AI for edge devices, paving the way for more tailored LLM experiences for the general public. To our knowledge, our proposed framework is the first efficient and privacy-preserving LLM solution in the literature.
Step size control for explicit relaxation Runge-Kutta methods preserving invariants
Authors: Sebastian Bleecke, Hendrik Ranocha
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.14050
Pdf link: https://arxiv.org/pdf/2311.14050
Abstract Many time-dependent differential equations are equipped with invariants. Preserving such invariants under discretization can be important, e.g., to improve the qualitative and quantitative properties of numerical solutions. Recently, relaxation methods have been proposed as small modifications of standard time integration schemes guaranteeing the correct evolution of functionals of the solution. Here, we investigate how to combine these relaxation techniques with efficient step size control mechanisms based on local error estimates for explicit Runge-Kutta methods. We demonstrate our results in several numerical experiments including ordinary and partial differential equations.
Assessing the Impact of Noise on Quantum Neural Networks: An Experimental Analysis
Authors: Erik B. Terres Escudero, Danel Arias Alamo, Oier Mentxaka Gómez, Pablo García Bringas
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.14057
Pdf link: https://arxiv.org/pdf/2311.14057
Abstract In the race towards quantum computing, the potential benefits of quantum neural networks (QNNs) have become increasingly apparent. However, Noisy Intermediate-Scale Quantum (NISQ) processors are prone to errors, which poses a significant challenge for the execution of complex algorithms or quantum machine learning. To ensure the quality and security of QNNs, it is crucial to explore the impact of noise on their performance. This paper provides a comprehensive analysis of the impact of noise on QNNs, examining the Mottonen state preparation algorithm under various noise models and studying the degradation of quantum states as they pass through multiple layers of QNNs. Additionally, the paper evaluates the effect of noise on the performance of pre-trained QNNs and highlights the challenges posed by noise models in quantum computing. The findings of this study have significant implications for the development of quantum software, emphasizing the importance of prioritizing stability and noise-correction measures when developing QNNs to ensure reliable and trustworthy results. This paper contributes to the growing body of literature on quantum computing and quantum machine learning, providing new insights into the impact of noise on QNNs and paving the way towards the development of more robust and efficient quantum algorithms.
Hardware Resilience Properties of Text-Guided Image Classifiers
Authors: Syed Talal Wasim, Kabila Haile Saboka, Abdulrahman Mahmoud, Salman Khan, David Brooks, Gu-Yeon Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14062
Pdf link: https://arxiv.org/pdf/2311.14062
Abstract This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable $5.5\times$ average increase in hardware reliability (and up to 14x) across various architectures in the most critical layer, with minimal accuracy drop (0.3% on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain. Our code and models are released at https://github.com/TalalWasim/TextGuidedResilience.
You Only Explain Once
Authors: David A. Kelly, Hana Chockler, Daniel Kroening, Nathan Blake, Aditi Ramaswamy, Melane Navaratnarajah, Aaditya Shivakumar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14081
Pdf link: https://arxiv.org/pdf/2311.14081
Abstract In this paper, we propose a new black-box explainability algorithm and tool, YO-ReX, for efficient explanation of the outputs of object detectors. The new algorithm computes explanations for all objects detected in the image simultaneously. Hence, compared to the baseline, the new algorithm reduces the number of queries by a factor of 10X for the case of ten detected objects. The speedup increases further with with the number of objects. Our experimental results demonstrate that YO-ReX can explain the outputs of YOLO with a negligible overhead over the running time of YOLO. We also demonstrate similar results for explaining SSD and Faster R-CNN. The speedup is achieved by avoiding backtracking by combining aggressive pruning with a causal analysis.
SySMOL: A Hardware-software Co-design Framework for Ultra-Low and Fine-Grained Mixed-Precision Neural Networks
Authors: Cyrus Zhou, Vaughn Richard, Pedro Savarese, Zachary Hassman, Michael Maire, Michael DiBrino, Yanjing Li
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.14114
Pdf link: https://arxiv.org/pdf/2311.14114
Abstract Recent advancements in quantization and mixed-precision techniques offer significant promise for improving the run-time and energy efficiency of neural networks. In this work, we further showed that neural networks, wherein individual parameters or activations can take on different precisions ranging between 1 and 4 bits, can achieve accuracies comparable to or exceeding the full-precision counterparts. However, the deployment of such networks poses numerous challenges, stemming from the necessity to manage and control the compute/communication/storage requirements associated with these extremely fine-grained mixed precisions for each piece of data. There is a lack of existing efficient hardware and system-level support tailored to these unique and challenging requirements. Our research introduces the first novel holistic hardware-software co-design approach for these networks, which enables a continuous feedback loop between hardware design, training, and inference to facilitate systematic design exploration. As a proof-of-concept, we illustrate this co-design approach by designing new, configurable CPU SIMD architectures tailored for these networks, tightly integrating the architecture with new system-aware training and inference techniques. We perform systematic design space exploration using this framework to analyze various tradeoffs. The design for mixed-precision networks that achieves optimized tradeoffs corresponds to an architecture that supports 1, 2, and 4-bit fixed-point operations with four configurable precision patterns, when coupled with system-aware training and inference optimization -- networks trained for this design achieve accuracies that closely match full-precision accuracies, while compressing and improving run-time efficiency of the neural networks drastically by 10-20x, compared to full-precision networks.
A Blockchain Solution for Collaborative Machine Learning over IoT
Authors: Carlos Beis-Penedo, Francisco Troncoso-Pastoriza, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas, Manuel Fernández-Veiga, Martín González Soto
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.14136
Pdf link: https://arxiv.org/pdf/2311.14136
Abstract The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.
Class Balanced Dynamic Acquisition for Domain Adaptive Semantic Segmentation using Active Learning
Authors: Marc Schachtsiek, Simone Rossi, Thomas Hannagan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14146
Pdf link: https://arxiv.org/pdf/2311.14146
Abstract Domain adaptive active learning is leading the charge in label-efficient training of neural networks. For semantic segmentation, state-of-the-art models jointly use two criteria of uncertainty and diversity to select training labels, combined with a pixel-wise acquisition strategy. However, we show that such methods currently suffer from a class imbalance issue which degrades their performance for larger active learning budgets. We then introduce Class Balanced Dynamic Acquisition (CBDA), a novel active learning method that mitigates this issue, especially in high-budget regimes. The more balanced labels increase minority class performance, which in turn allows the model to outperform the previous baseline by 0.6, 1.7, and 2.4 mIoU for budgets of 5%, 10%, and 20%, respectively. Additionally, the focus on minority classes leads to improvements of the minimum class performance of 0.5, 2.9, and 4.6 IoU respectively. The top-performing model even exceeds the fully supervised baseline, showing that a more balanced label than the entire ground truth can be beneficial.
Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs
Authors: Andrea Tagliabue, Jonathan P. How
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14153
Pdf link: https://arxiv.org/pdf/2311.14153
Abstract Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC), but it often requires many samples, leading to long training times or limited robustness. To address these issues, we combine IL with a variant of robust MPC that accounts for process and sensing uncertainties, and we design a data augmentation (DA) strategy that enables efficient learning of vision-based policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance Fields (NeRFs) to generate novel synthetic images, and uses properties of the robust MPC (the tube) to select relevant views and to efficiently compute the corresponding actions. We tailor our approach to the task of localization and trajectory tracking on a multirotor, by learning a visuomotor policy that generates control actions using images from the onboard camera as only source of horizontal position. Our evaluations numerically demonstrate learning of a robust visuomotor policy with an 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods. Additionally, our policies successfully transfer to a real multirotor, achieving accurate localization and low tracking errors despite large disturbances, with an onboard inference time of only 1.5 ms.
Variational Annealing on Graphs for Combinatorial Optimization
Authors: Sebastian Sanokowski, Wilhelm Berghammer, Sepp Hochreiter, Sebastian Lehner
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14156
Pdf link: https://arxiv.org/pdf/2311.14156
Abstract Several recent unsupervised learning methods use probabilistic approaches to solve combinatorial optimization (CO) problems based on the assumption of statistically independent solution variables. We demonstrate that this assumption imposes performance limitations in particular on difficult problem instances. Our results corroborate that an autoregressive approach which captures statistical dependencies among solution variables yields superior performance on many popular CO problems. We introduce subgraph tokenization in which the configuration of a set of solution variables is represented by a single token. This tokenization technique alleviates the drawback of the long sequential sampling procedure which is inherent to autoregressive methods without sacrificing expressivity. Importantly, we theoretically motivate an annealed entropy regularization and show empirically that it is essential for efficient and stable learning.
Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus
Authors: Gabriele Maroni, Loris Cannelli, Dario Piga
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14182
Pdf link: https://arxiv.org/pdf/2311.14182
Abstract Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model coefficients. As this hyperparameter is scalar, it can be easily selected via random or grid search optimizing a cross-validation criterion. However, using a scalar hyperparameter limits the algorithm's flexibility and potential for better generalization. In this paper, we address the problem of linear regression with l2-regularization, where a different regularization hyperparameter is associated with each input variable. We optimize these hyperparameters using a gradient-based approach, wherein the gradient of a cross-validation criterion with respect to the regularization hyperparameters is computed analytically through matrix differential calculus. Additionally, we introduce two strategies tailored for sparse model learning problems aiming at reducing the risk of overfitting to the validation data. Numerical examples demonstrate that our multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression. Moreover, the analytical computation of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables. Application to the identification of over-parameterized Linear Parameter-Varying models is also presented.
ECRF: Entropy-Constrained Neural Radiance Fields Compression with Frequency Domain Optimization
Authors: Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.14208
Pdf link: https://arxiv.org/pdf/2311.14208
Abstract Explicit feature-grid based NeRF models have shown promising results in terms of rendering quality and significant speed-up in training. However, these methods often require a significant amount of data to represent a single scene or object. In this work, we present a compression model that aims to minimize the entropy in the frequency domain in order to effectively reduce the data size. First, we propose using the discrete cosine transform (DCT) on the tensorial radiance fields to compress the feature-grid. This feature-grid is transformed into coefficients, which are then quantized and entropy encoded, following a similar approach to the traditional video coding pipeline. Furthermore, to achieve a higher level of sparsity, we propose using an entropy parameterization technique for the frequency domain, specifically for DCT coefficients of the feature-grid. Since the transformed coefficients are optimized during the training phase, the proposed model does not require any fine-tuning or additional information. Our model only requires a lightweight compression pipeline for encoding and decoding, making it easier to apply volumetric radiance field methods for real-world applications. Experimental results demonstrate that our proposed frequency domain entropy model can achieve superior compression performance across various datasets. The source code will be made publicly available.
Maximum Cardinality $f$-Matching in Time $O(n^{2/3}m)$
Authors: Harold Gabow
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.14236
Pdf link: https://arxiv.org/pdf/2311.14236
Abstract We present an algorithm that finds a maximum cardinality $f$-matching of a simple graph in time $O(n^{2/3} m)$. Here $f:V\to \mathbb{N}$ is a given function, and an $f$-matching is a subgraph wherein each vertex $v\in V$ has degree $\le f(v)$. This result generalizes a string of algorithms, concentrating on simple bipartite graphs. The bipartite case is based on the notion of level graph, introduced by Dinic for network flow. For general graphs the ``level'' of a vertex is unclear: A given vertex can occur on many different levels in augmenting trails. In fact there does not seem to be a unique level graph, our notion of level graph depends on the trails being analyzed. Our analysis presents new properties of blossoms of shortest augmenting trails. Our algorithm, unmodified, is also efficient on multigraphs, achieving time $O(\min {\sqrt {f(V)}, n}\,m)$, for $f(V)=\sum_vf(v)$.
How We Manage an Army of Teaching Assistants: Experience Report on Scaling a CS1 Course
Authors: Ildar Akhmetov, Sadaf Ahmed, Kezziah Ayuno
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2311.14241
Pdf link: https://arxiv.org/pdf/2311.14241
Abstract A considerable increase in enrollment numbers poses major challenges in course management, such as fragmented information sharing, inefficient meetings, and poor understanding of course activities among a large team of teaching assistants. To address these challenges, we restructured the course, drawing inspiration from successful management and educational practices. We developed an organized, three-tier structure for teams, each led by an experienced Lead TA. We also formed five functional teams, each focusing on a specific area of responsibility: communication, content, "lost student" support, plagiarism, and scheduling. In addition, we updated our recruitment method for undergraduate TAs, following a model similar to the one used in the software industry, while also deciding to mentor Lead TAs in place of traditional training. Our experiences, lessons learned, and future plans for enhancement have been detailed in this experience report. We emphasize the value of using management techniques in dealing with large-scale course handling and invite cooperation to improve the implementation of these strategies, inviting other institutions to consider and adapt this approach, tailoring it to their specific needs.
Distribution Testing with a Confused Collector
Authors: Renato Ferreira Pinto Jr., Nathaniel Harms
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.14247
Pdf link: https://arxiv.org/pdf/2311.14247
Abstract We are interested in testing properties of distributions with systematically mislabeled samples. Our goal is to make decisions about unknown probability distributions, using a sample that has been collected by a confused collector, such as a machine-learning classifier that has not learned to distinguish all elements of the domain. The confused collector holds an unknown clustering of the domain and an input distribution $\mu$, and provides two oracles: a sample oracle which produces a sample from $\mu$ that has been labeled according to the clustering; and a label-query oracle which returns the label of a query point $x$ according to the clustering. Our first set of results shows that identity, uniformity, and equivalence of distributions can be tested efficiently, under the earth-mover distance, with remarkably weak conditions on the confused collector, even when the unknown clustering is adversarial. This requires defining a variant of the distribution testing task (inspired by the recent testable learning framework of Rubinfeld & Vasilyan), where the algorithm should test a joint property of the distribution and its clustering. As an example, we get efficient testers when the distribution tester is allowed to reject if it detects that the confused collector clustering is "far" from being a decision tree. The second set of results shows that we can sometimes do significantly better when the clustering is random instead of adversarial. For certain one-dimensional random clusterings, we show that uniformity can be tested under the TV distance using $\widetilde O\left(\frac{\sqrt n}{\rho^{3/2} \epsilon^2}\right)$ samples and zero queries, where $\rho \in (0,1]$ controls the "resolution" of the clustering. We improve this to $O\left(\frac{\sqrt n}{\rho \epsilon^2}\right)$ when queries are allowed.
Efficient Local Search for Nonlinear Real Arithmetic
Authors: Zhonghan Wang, Bohua Zhan, Bohan Li, Shaowei Cai
Subjects: Symbolic Computation (cs.SC)
Arxiv link: https://arxiv.org/abs/2311.14249
Pdf link: https://arxiv.org/pdf/2311.14249
Abstract Local search has recently been applied to SMT problems over various arithmetic theories. Among these, nonlinear real arithmetic poses special challenges due to its uncountable solution space and potential need to solve higher-degree polynomials. As a consequence, existing work on local search only considered fragments of the theory. In this work, we analyze the difficulties and propose ways to address them, resulting in an efficient search algorithm that covers the full theory of nonlinear real arithmetic. In particular, we present two algorithmic improvements: incremental computation of variable scores and temporary relaxation of equality constraints. We also discuss choice of candidate moves and a look-ahead mechanism in case when no critical moves are available. The resulting implementation is competitive on satisfiable problem instances against complete methods such as MCSAT in existing SMT solvers.
Bursting Spikes: Efficient and High-performance SNNs for Event-based Vision
Authors: Ziqing Wang, Yuetong Fang, Jiahang Cao, Renjing Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14265
Pdf link: https://arxiv.org/pdf/2311.14265
Abstract Advancing event-driven vision through spiking neural networks (SNNs) is crucial to empowering high-speed and efficient perception. While directly converting the pre-trained artificial neural networks (ANNs) - by replacing the non-linear activation with spiking neurons - can provide SNNs with good performance, the resultant SNNs typically demand long timesteps and high energy consumption to achieve their optimal performance. To address this challenge, we introduce the burst-spike mechanism inspired by the biological nervous system, allowing multiple spikes per timestep to reduce conversion errors and produce low-latency SNNs. To further bolster this enhancement, we leverage the Pareto Frontier-driven algorithm to reallocate burst-firing patterns. Moreover, to reduce energy consumption during the conversion process, we propose a sensitivity-driven spike compression technique, which automatically locates the optimal threshold ratio according to layer-specific sensitivity. Extensive experiments demonstrate our approach outperforms state-of-the-art SNN methods, showcasing superior performance and reduced energy usage across classification and object detection. Our code will be available at https://github.com/bic-L/burst-ann2snn.
Segmentation-Based Parametric Painting
Authors: Manuel Ladron de Guevara, Matthew Fisher, Aaron Hertzmann
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14271
Pdf link: https://arxiv.org/pdf/2311.14271
Abstract We introduce a novel image-to-painting method that facilitates the creation of large-scale, high-fidelity paintings with human-like quality and stylistic variation. To process large images and gain control over the painting process, we introduce a segmentation-based painting process and a dynamic attention map approach inspired by human painting strategies, allowing optimization of brush strokes to proceed in batches over different image regions, thereby capturing both large-scale structure and fine details, while also allowing stylistic control over detail. Our optimized batch processing and patch-based loss framework enable efficient handling of large canvases, ensuring our painted outputs are both aesthetically compelling and functionally superior as compared to previous methods, as confirmed by rigorous evaluations. Code available at: https://github.com/manuelladron/semantic\_based\_painting.git
Fair Influence Maximization in Social Networks: A Community-Based Evolutionary Algorithm
Authors: Kaicong Ma, Xinxiang Xu, Haipeng Yang, Renzhi Cao, Lei Zhang
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2311.14288
Pdf link: https://arxiv.org/pdf/2311.14288
Abstract Influence Maximization (IM) has been extensively studied in network science, which attempts to find a subset of users to maximize the influence spread. A new variant of IM, Fair Influence Maximization (FIM), which primarily enhances the fair propagation of information, attracts increasing attention in academic. However, existing algorithms for FIM suffer from a trade-off between fairness and running time. Since it is a tough task to ensure that users are fairly influenced in terms of sensitive attributes, such as race or gender, while maintaining a high influence spread. To tackle this problem, in this paper, we propose an effective and efficient Community-based Evolutionary Algorithm for FIM (named CEA-FIM). In CEA-FIM, a community-based node selection strategy is proposed to identify potential nodes, which not only considers the size of the community but also the attributes of the nodes in the community. Subsequently, we design an evolutionary algorithm based on the proposed node selection strategy to hasten the search for the optimal solution, including the novel initialization, crossover and mutation strategies. We validate the proposed algorithm CEA-FIM by performing experiments on real-world and synthetic networks. The experimental results show that the proposed CEA-FIM achieves a better balance between effectiveness and efficiency, compared to the state-of-the-art baseline algorithms.
Exploiting Active RIS in NOMA Networks with Hardware Impairments
Authors: Xinwei Yue, Meiqi Song, Chongjun Ouyang, Yuanwei Liu, Tian Li, Tianwei Hou
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.14295
Pdf link: https://arxiv.org/pdf/2311.14295
Abstract Active reconfigurable intelligent surface (ARIS) is a promising way to compensate for multiplicative fading attenuation by amplifying and reflecting event signals to selected users. This paper investigates the performance of ARIS assisted non-orthogonal multiple access (NOMA) networks over cascaded Nakagami-m fading channels. The effects of hardware impairments (HIS) and reflection coefficients on ARIS-NOMA networks with imperfect successive interference cancellation (ipSIC) and perfect successive interference cancellation (pSIC) are considered. More specifically, we develop new precise and asymptotic expressions of outage probability and ergodic data rate with ipSIC/pSIC for ARIS-NOMA-HIS networks. According to the approximated analyses, the diversity orders and multiplexing gains for couple of non-orthogonal users are attained in detail. Additionally, the energy efficiency of ARIS-NOMA-HIS networks is surveyed in delay-limited and delay-tolerant transmission schemes. The simulation findings are presented to demonstrate that: i) The outage behaviors and ergodic data rates of ARIS-NOMA-HIS networks precede that of ARIS aided orthogonal multiple access (OMA) and passive reconfigurable intelligent surface (PRIS) aided OMA; ii) As the reflection coefficient of ARIS increases, ARIS-NOMA-HIS networks have the ability to provide the strengthened outage performance; and iii) ARIS-NOMA-HIS networks are more energy efficient than ARIS/PRIS-OMA networks and conventional cooperative schemes.
AdaMedGraph: Adaboosting Graph Neural Networks for Personalized Medicine
Authors: Jie Lian, Xufang Luo, Caihua Shan, Dongqi Han, Varut Vardhanabhuti, Dongsheng Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14304
Pdf link: https://arxiv.org/pdf/2311.14304
Abstract Precision medicine tailored to individual patients has gained significant attention in recent times. Machine learning techniques are now employed to process personalized data from various sources, including images, genetics, and assessments. These techniques have demonstrated good outcomes in many clinical prediction tasks. Notably, the approach of constructing graphs by linking similar patients and then applying graph neural networks (GNNs) stands out, because related information from analogous patients are aggregated and considered for prediction. However, selecting the appropriate edge feature to define patient similarity and construct the graph is challenging, given that each patient is depicted by high-dimensional features from diverse sources. Previous studies rely on human expertise to select the edge feature, which is neither scalable nor efficient in pinpointing crucial edge features for complex diseases. In this paper, we propose a novel algorithm named \ours, which can automatically select important features to construct multiple patient similarity graphs, and train GNNs based on these graphs as weak learners in adaptive boosting. \ours{} is evaluated on two real-world medical scenarios and shows superiors performance.
Distance-Only Task Orchestration Algorithm for Energy Efficiency in Satellite-Based Mist Computing
Authors: Messaoud Babaghayou, Noureddine Chaib, Leandros Maglaras, Yagmur Yigit, Mohamed Amine Ferrag
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.14308
Pdf link: https://arxiv.org/pdf/2311.14308
Abstract This paper addresses the challenge of efficiently offloading heavy computing tasks from ground mobile devices to the satellite-based mist computing environment. With ground-based edge and cloud servers often being inaccessible, the exploitation of satellite mist computing becomes imperative. Existing offloading algorithms have shown limitations in adapting to the unique characteristics of heavy computing tasks. Thus, we propose a heavy computing task offloading algorithm that prioritizes satellite proximity. This approach not only reduces energy consumption during telecommunications but also ensures tasks are executed within the specified timing constraints, which are typically non-time-critical. Our proposed algorithm outperforms other offloading schemes in terms of satellites energy consumption, average end-to-end delay, and tasks success rates. Although it exhibits a higher average VM CPU usage, this increase does not pose critical challenges. This distance-based approach offers a promising solution to enhance energy efficiency in satellite-based mist computing, making it well-suited for heavy computing tasks demands.
Stable Cluster Discrimination for Deep Clustering
Authors: Qi Qian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14310
Pdf link: https://arxiv.org/pdf/2311.14310
Abstract Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution (i.e., clustering) simultaneously, which demonstrates a superior performance over conventional clustering methods with given features. However, the coupled objective implies a trivial solution that all instances collapse to the uniform features. To tackle the challenge, a two-stage training strategy is developed for decoupling, where it introduces an additional pre-training stage for representation learning and then fine-tunes the obtained model for clustering. Meanwhile, one-stage methods are developed mainly for representation learning rather than clustering, where various constraints for cluster assignments are designed to avoid collapsing explicitly. Despite the success of these methods, an appropriate learning objective tailored for deep clustering has not been investigated sufficiently. In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering due to the lack of ground-truth labels and positive instances for certain clusters in each mini-batch. To mitigate the issue, a novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly. Moreover, a global entropy constraint for cluster assignments is studied with efficient optimization. Extensive experiments are conducted on benchmark data sets and ImageNet. SeCu achieves state-of-the-art performance on all of them, which demonstrates the effectiveness of one-stage deep clustering. Code is available at \url{https://github.com/idstcv/SeCu}.
RelJoin: Relative-cost-based Selection of Distributed Join Methods for Query Plan Optimization
Authors: F. Liang, F.C.M. Lau, H. Cui, Y. Li, B. Lin, C. Li, X. Hu
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.14311
Pdf link: https://arxiv.org/pdf/2311.14311
Abstract Selecting appropriate distributed join methods for logical join operations in a query plan is crucial for the performance of data-intensive scalable computing (DISC). Different network communication patterns in the data exchange phase generate varying network communication workloads and significantly affect the distributed join performance. However, most cost-based query optimizers focus on the local computing cost and do not precisely model the network communication cost. We propose a cost model for various distributed join methods to optimize join queries in DISC platforms. Our method precisely measures the network and local computing workloads in different execution phases, using information on the size and cardinality statistics of datasets and cluster join parallelism. Our cost model reveals the importance of the relative size of the joining datasets. We implement an efficient distributed join selection strategy, known as RelJoin in SparkSQL, which is an industry-prevalent distributed data processing framework. RelJoin uses runtime adaptive statistics for accurate cost estimation and selects optimal distributed join methods for logical joins to optimize the physical query plan. The evaluation results on the TPC-DS benchmark show that RelJoin performs best in 62 of the 97 queries and can reduce the average query time by 21% compared with other strategies.
An Adaptive Fast-Multipole-Accelerated Hybrid Boundary Integral Equation Method for Accurate Diffusion Curves
Authors: Seungbae Bang, Kirill Serkh, Oded Stein, Alec Jacobson
Subjects: Numerical Analysis (math.NA); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.14312
Pdf link: https://arxiv.org/pdf/2311.14312
Abstract In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. Previous applications of the boundary element method to diffusion curves have relied on polygonal approximations, which either forfeit the high-order smoothness of B\'ezier curves, or, when the polygonal approximation is extremely detailed, result in large and costly systems of equations that must be solved. In this paper, we utilize the boundary integral equation method to accurately and efficiently solve the underlying partial differential equation. Given a desired resolution and viewport, we then interpolate this solution and use the boundary element method to render it. We couple this hybrid approach with the fast multipole method on a non-uniform quadtree for efficient computation. Furthermore, we introduce an adaptive strategy to enable truly scalable infinite-resolution diffusion curves.
Numerical methods and regularity properties for viscosity solutions of nonlocal in space and time diffusion equations
Authors: Félix del Teso, Łukasz Płociniczak
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2311.14317
Pdf link: https://arxiv.org/pdf/2311.14317
Abstract We consider a general family of nonlocal in space and time diffusion equations with space-time dependent diffusivity and prove convergence of finite difference schemes in the context of viscosity solutions under very mild conditions. The proofs, based on regularity properties and compactness arguments on the numerical solution, allow to inherit a number of interesting results for the limit equation. More precisely, assuming H\"older regularity only on the initial condition, we prove convergence of the scheme, space-time H\"older regularity of the solution depending on the fractional orders of the operators, as well as specific blow up rates of the first time derivative. Finally, using the obtained regularity results, we are able to prove orders of convergence of the scheme in some cases. These results are consistent with previous studies. The schemes' performance is further numerically verified using both constructed exact solutions and realistic examples. Our experiments show that multithreaded implementation yields an efficient method to solve nonlocal equations numerically.
Binarized 3D Whole-body Human Mesh Recovery
Authors: Zhiteng Li, Yulun Zhang, Jing Lin, Haotong Qin, Jinjin Gu, Xin Yuan, Linghe Kong, Xiaokang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14323
Pdf link: https://arxiv.org/pdf/2311.14323
Abstract 3D whole-body human mesh recovery aims to reconstruct the 3D human body, face, and hands from a single image. Although powerful deep learning models have achieved accurate estimation in this task, they require enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited edge devices. In this work, we propose a Binarized Dual Residual Network (BiDRN), a novel quantization method to estimate the 3D human body, face, and hands parameters efficiently. Specifically, we design a basic unit Binarized Dual Residual Block (BiDRB) composed of Local Convolution Residual (LCR) and Block Residual (BR), which can preserve full-precision information as much as possible. For LCR, we generalize it to four kinds of convolutional modules so that full-precision information can be propagated even between mismatched dimensions. We also binarize the face and hands box-prediction network as Binaried BoxNet, which can further reduce the model redundancy. Comprehensive quantitative and qualitative experiments demonstrate the effectiveness of BiDRN, which has a significant improvement over state-of-the-art binarization algorithms. Moreover, our proposed BiDRN achieves comparable performance with full-precision method Hand4Whole while using just 22.1% parameters and 14.8% operations. We will release all the code and pretrained models.
Cycle Invariant Positional Encoding for Graph Representation Learning
Authors: Zuoyu Yan, Tengfei Ma, Liangcai Gao, Zhi Tang, Chao Chen, Yusu Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14333
Pdf link: https://arxiv.org/pdf/2311.14333
Abstract Cycles are fundamental elements in graph-structured data and have demonstrated their effectiveness in enhancing graph learning models. To encode such information into a graph learning framework, prior works often extract a summary quantity, ranging from the number of cycles to the more sophisticated persistence diagram summaries. However, more detailed information, such as which edges are encoded in a cycle, has not yet been used in graph neural networks. In this paper, we make one step towards addressing this gap, and propose a structure encoding module, called CycleNet, that encodes cycle information via edge structure encoding in a permutation invariant manner. To efficiently encode the space of all cycles, we start with a cycle basis (i.e., a minimal set of cycles generating the cycle space) which we compute via the kernel of the 1-dimensional Hodge Laplacian of the input graph. To guarantee the encoding is invariant w.r.t. the choice of cycle basis, we encode the cycle information via the orthogonal projector of the cycle basis, which is inspired by BasisNet proposed by Lim et al. We also develop a more efficient variant which however requires that the input graph has a unique shortest cycle basis. To demonstrate the effectiveness of the proposed module, we provide some theoretical understandings of its expressive power. Moreover, we show via a range of experiments that networks enhanced by our CycleNet module perform better in various benchmarks compared to several existing SOTA models.
Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset
Authors: Usneek Singh, Piyush Arora, Shamika Ganesan, Mohit Kumar, Siddhant Kulkarni, Salil R. Joshi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.14335
Pdf link: https://arxiv.org/pdf/2311.14335
Abstract We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pre-processing of categorical and numerical features, and addressing substantial computational requirements. To overcome the identified challenges, the study conducts an extensive examination of various transformer-based models using both synthetic datasets and the default prediction Kaggle dataset (2022) from American Express. The paper presents crucial insights into optimal data pre-processing, compares pre-training and direct supervised learning methods, discusses strategies for managing categorical and numerical features, and highlights trade-offs between computational resources and performance. Focusing on temporal financial data modeling, the research aims to facilitate the systematic development and deployment of transformer-based models in real-world scenarios, emphasizing scalability.
Deciphering and integrating invariants for neural operator learning with various physical mechanisms
Authors: Rui Zhang, Qi Meng, Zhi-Ming Ma
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2311.14361
Pdf link: https://arxiv.org/pdf/2311.14361
Abstract Neural operators have been explored as surrogate models for simulating physical systems to overcome the limitations of traditional partial differential equation (PDE) solvers. However, most existing operator learning methods assume that the data originate from a single physical mechanism, limiting their applicability and performance in more realistic scenarios. To this end, we propose Physical Invariant Attention Neural Operator (PIANO) to decipher and integrate the physical invariants (PI) for operator learning from the PDE series with various physical mechanisms. PIANO employs self-supervised learning to extract physical knowledge and attention mechanisms to integrate them into dynamic convolutional layers. Compared to existing techniques, PIANO can reduce the relative error by 13.6\%-82.2\% on PDE forecasting tasks across varying coefficients, forces, or boundary conditions. Additionally, varied downstream tasks reveal that the PI embeddings deciphered by PIANO align well with the underlying invariants in the PDE systems, verifying the physical significance of PIANO. The source code will be publicly available at: https://github.com/optray/PIANO.
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Authors: Mingze Wang, Zeping Min, Lei Wu
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.14387
Pdf link: https://arxiv.org/pdf/2311.14387
Abstract In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
Refinement Proofs in Rust Using Ghost Locks
Authors: Aurel Bílý (1), João C. Pereira (1), Jan Schär (1), Peter Müller (1) ((1) ETH Zurich)
Subjects: Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2311.14452
Pdf link: https://arxiv.org/pdf/2311.14452
Abstract Refinement transforms an abstract system model into a concrete, executable program, such that properties established for the abstract model carry over to the concrete implementation. Refinement has been used successfully in the development of substantial verified systems. Nevertheless, existing refinement techniques have limitations that impede their practical usefulness. Some techniques generate executable code automatically, which generally leads to implementations with sub-optimal performance. Others employ bottom-up program verification to reason about efficient implementations, but impose strict requirements on the structure of the code, the structure of the refinement proofs, as well as the employed verification logic and tools. In this paper, we present a novel refinement technique that removes these limitations. It supports a wide range of program structures, data representations, and proof structures. Our approach supports reasoning about both safety and liveness properties. We implement our approach in a state-of-the-art verifier for the Rust language, which itself offers a strong foundation for memory safety. We demonstrate the practicality of our approach on a number of substantial case studies.
Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling
Authors: Corentin Salaün, Xingchang Huang, Iliyan Georgiev, Niloy J. Mitra, Gurprit Singh
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14468
Pdf link: https://arxiv.org/pdf/2311.14468
Abstract Machine learning problems rely heavily on stochastic gradient descent (SGD) for optimization. The effectiveness of SGD is contingent upon accurately estimating gradients from a mini-batch of data samples. Instead of the commonly used uniform sampling, adaptive or importance sampling reduces noise in gradient estimation by forming mini-batches that prioritize crucial data points. Previous research has suggested that data points should be selected with probabilities proportional to their gradient norm. Nevertheless, existing algorithms have struggled to efficiently integrate importance sampling into machine learning frameworks. In this work, we make two contributions. First, we present an algorithm that can incorporate existing importance functions into our framework. Second, we propose a simplified importance function that relies solely on the loss gradient of the output layer. By leveraging our proposed gradient estimation techniques, we observe improved convergence in classification and regression tasks with minimal computational overhead. We validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets.
Controlled Text Generation via Language Model Arithmetic
Authors: Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.14479
Pdf link: https://arxiv.org/pdf/2311.14479
Abstract As Large Language Models (LLMs) are deployed more widely, customization with respect to vocabulary, style and character becomes more important. In this work we introduce model arithmetic, a novel inference framework for composing and biasing LLMs without the need for model (re)training or highly specific datasets. In addition, the framework allows for more precise control of generated text than direct prompting and prior controlled text generation (CTG) techniques. Using model arithmetic, we can express prior CTG techniques as simple formulas and naturally extend them to new and more effective formulations. Further, we show that speculative sampling, a technique for efficient LLM sampling, extends to our setting. This enables highly efficient text generation with multiple composed models with only marginal overhead over a single model. Our empirical evaluation demonstrates that model arithmetic allows fine-grained control of generated text while outperforming state-of-the-art on the task of toxicity reduction.
Malware Analysis on AI Technique
Authors: Amjani Gupta, Dr. Karan Singh
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.14501
Pdf link: https://arxiv.org/pdf/2311.14501
Abstract In today's world, we are performing our maximum work through the Internet, i.e., online payment, data transfer, etc., per day. More than thousands of users are connecting. So, it's essential to provide security to the user. It is necessary to detect and prevent malicious object from gaining persistence and causing destruction within the organization. Therefore, Malware analysis is needed in order to secure the system. This necessitates the use of effective and efficient approaches for detecting OS malware. Due to the cheap cost of technology, artificial intelligence has also become less difficult to implement in projects to analyse malware. The categorization and analysis of malware on OS using various AI-based analysis techniques are covered in detail in this paper.
Filasofia: A Framework for Streamlined Development of Real-Time Surgical Simulations
Authors: Vladimir Poliakov, Dzmitry Tsetserukou, Emmanuel Vander Poorten
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.14508
Pdf link: https://arxiv.org/pdf/2311.14508
Abstract Virtual reality simulation has become a popular approach for training and assessing medical students. It offers diverse scenarios, realistic visuals, and quantitative performance metrics for objective evaluation. However, creating these simulations can be time-consuming and complex, even for experienced users. The SOFA framework is an open-source solution that efficiently simulates finite element (FE) models in real-time. Yet, some users find it challenging to navigate the software due to the numerous components required for a basic simulation and their variability. Additionally, SOFA has limited visual rendering capabilities, leading developers to integrate other software for high-quality visuals. To address these issues, we developed Filasofia, a dedicated framework that simplifies development, provides modern visualization, and allows fine-tuning using SOFA objects. Our experiments demonstrate that Filasofia outperforms conventional SOFA simulations, even with real-time subdivision. Our design approach aims to streamline development while offering flexibility for fine-tuning. Future work will focus on further simplification of the development process for users.
Morphing Graph Drawings in the Presence of Point Obstacles
Authors: Oksana Firman, Tim Hegemann, Boris Klemz, Felix Klesen, Marie Diana Sieper, Alexander Wolff, Johannes Zink
Subjects: Computational Geometry (cs.CG)
Arxiv link: https://arxiv.org/abs/2311.14516
Pdf link: https://arxiv.org/pdf/2311.14516
Abstract A crossing-free morph is a continuous deformation between two graph drawings that preserves straight-line pairwise noncrossing edges. Motivated by applications in 3D morphing problems, we initiate the study of morphing graph drawings in the plane in the presence of stationary point obstacles, which need to be avoided throughout the deformation. As our main result, we prove that it is NP-hard to decide whether such an obstacle-avoiding 2D morph between two given drawings of the same graph exists. This is in sharp contrast to the classical case without obstacles, where there is an efficiently verifiable (necessary and sufficient) criterion for the existence of a morph.
tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models
Authors: Francesco Paissan, Elisabetta Farella
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.14517
Pdf link: https://arxiv.org/pdf/2311.14517
Abstract Contrastive Language-Audio Pretraining (CLAP) became of crucial importance in the field of audio and speech processing. Its employment ranges from sound event detection to text-to-audio generation. However, one of the main limitations is the considerable amount of data required in the training process and the overall computational complexity during inference. This paper investigates how we can reduce the complexity of contrastive language-audio pre-trained models, yielding an efficient model that we call tinyCLAP. We derive an unimodal distillation loss from first principles and explore how the dimensionality of the shared, multimodal latent space can be reduced via pruning. TinyCLAP uses only 6% of the original Microsoft CLAP parameters with a minimal reduction (less than 5%) in zero-shot classification performance across the three sound event detection datasets on which it was tested
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Authors: Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14521
Pdf link: https://arxiv.org/pdf/2311.14521
Abstract 3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation. GaussianEditor enhances precision and control in editing through our proposed Gaussian semantic tracing, which traces the editing target throughout the training process. Additionally, we propose Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. Project Page: https://buaacyw.github.io/gaussian-editor/
Evaluation of a Non-Coherent Ultra-Wideband Transceiver for Micropower Sensor Nodes
Authors: Jonah Imfeld, Silvano Cortesi, Philipp Mayer, Michele Magno
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.14523
Pdf link: https://arxiv.org/pdf/2311.14523
Abstract Spatial and contextual awareness has the potential to revolutionize sensor nodes, enabling spatially augmented data collection and location-based services. With its high bandwidth, superior energy efficiency, and precise time-of-flight measurements, ultra-wideband (UWB) technology emerges as an ideal solution for such devices. This paper presents an evaluation and comparison of a non-coherent UWB transceiver within the context of highly energy-constrained wireless sensing nodes and pervasive Internet of Things (IoT) devices. Experimental results highlight the unique properties of UWB transceivers, showcasing efficient data transfer ranging from 2 kbit/s to 7.2 Mbit/s while reaching an energy consumption of 0.29 nJ/bit and 1.39 nJ/bit for transmitting and receiving, respectively. Notably, a ranging accuracy of up to +/-25 cm can be achieved. Moreover, the peak power consumption of the UWB transceiver is with 6.7 mW in TX and 23 mW in RX significantly lower than that of other commercial UWB transceivers.
Finding Foundation Models for Time Series Classification with a PreText Task
Authors: Ali Ismail-Fawaz, Maxime Devanne, Stefano Berretti, Jonathan Weber, Germain Forestier
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14534
Pdf link: https://arxiv.org/pdf/2311.14534
Abstract Over the past decade, Time Series Classification (TSC) has gained an increasing attention. While various methods were explored, deep learning - particularly through Convolutional Neural Networks (CNNs)-stands out as an effective approach. However, due to the limited availability of training data, defining a foundation model for TSC that overcomes the overfitting problem is still a challenging task. The UCR archive, encompassing a wide spectrum of datasets ranging from motion recognition to ECG-based heart disease detection, serves as a prime example for exploring this issue in diverse TSC scenarios. In this paper, we address the overfitting challenge by introducing pre-trained domain foundation models. A key aspect of our methodology is a novel pretext task that spans multiple datasets. This task is designed to identify the originating dataset of each time series sample, with the goal of creating flexible convolution filters that can be applied across different datasets. The research process consists of two phases: a pre-training phase where the model acquires general features through the pretext task, and a subsequent fine-tuning phase for specific dataset classifications. Our extensive experiments on the UCR archive demonstrate that this pre-training strategy significantly outperforms the conventional training approach without pre-training. This strategy effectively reduces overfitting in small datasets and provides an efficient route for adapting these models to new datasets, thus advancing the capabilities of deep learning in TSC.
Deep learning based reduced order modeling of Darcy flow systems with local mass conservation
Authors: Wietse M. Boon, Nicola R. Franco, Alessio Fumagalli, Paolo Zunino
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.14554
Pdf link: https://arxiv.org/pdf/2311.14554
Abstract We propose a new reduced order modeling strategy for tackling parametrized Partial Differential Equations (PDEs) with linear constraints, in particular Darcy flow systems in which the constraint is given by mass conservation. Our approach employs classical neural network architectures and supervised learning, but it is constructed in such a way that the resulting Reduced Order Model (ROM) is guaranteed to satisfy the linear constraints exactly. The procedure is based on a splitting of the PDE solution into a particular solution satisfying the constraint and a homogenous solution. The homogeneous solution is approximated by mapping a suitable potential function, generated by a neural network model, onto the kernel of the constraint operator; for the particular solution, instead, we propose an efficient spanning tree algorithm. Starting from this paradigm, we present three approaches that follow this methodology, obtained by exploring different choices of the potential spaces: from empirical ones, derived via Proper Orthogonal Decomposition (POD), to more abstract ones based on differential complexes. All proposed approaches combine computational efficiency with rigorous mathematical interpretation, thus guaranteeing the explainability of the model outputs. To demonstrate the efficacy of the proposed strategies and to emphasize their advantages over vanilla black-box approaches, we present a series of numerical experiments on fluid flows in porous media, ranging from mixed-dimensional problems to nonlinear systems. This research lays the foundation for further exploration and development in the realm of model order reduction, potentially unlocking new capabilities and solutions in computational geosciences and beyond.
Counting Solutions to Conjunctive Queries: Structural and Hybrid Tractability
Authors: Hubie Chen, Gianluigi Greco, Stefan Mengel, Francesco Scarcello
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.14579
Pdf link: https://arxiv.org/pdf/2311.14579
Abstract Counting the number of answers to conjunctive queries is a fundamental problem in databases that, under standard assumptions, does not have an efficient solution. The issue is inherently #P-hard, extending even to classes of acyclic instances. To address this, we pinpoint tractable classes by examining the structural properties of instances and introducing the novel concept of #-hypertree decomposition. We establish the feasibility of counting answers in polynomial time for classes of queries featuring bounded #-hypertree width. Additionally, employing novel techniques from the realm of fixed-parameter computational complexity, we prove that, for bounded arity queries, the bounded #-hypertree width property precisely delineates the frontier of tractability for the counting problem. This result closes an important gap in our understanding of the complexity of such a basic problem for conjunctive queries and, equivalently, for constraint satisfaction problems (CSPs). Drawing upon #-hypertree decompositions, a ''hybrid'' decomposition method emerges. This approach leverages both the structural characteristics of the query and properties intrinsic to the input database, including keys or other (weaker) degree constraints that limit the permissible combinations of values. Intuitively, these features may introduce distinct structural properties that elude identification through the ''worst-possible database'' perspective inherent in purely structural methods.
Target-driven splitting SPH optimization of thermal conductivity distribution
Authors: Bo Zhang, Chi Zhang, Xiangyu Hu
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2311.14598
Pdf link: https://arxiv.org/pdf/2311.14598
Abstract Efficiently enhancing heat conduction through optimized distribution of a limited quantity of high thermal conductivity material is paramount in cooling electronic devices and numerous other applications. This paper introduces a target-driven all-at-once approach for PDE-constrained optimization and derives a splitting smoothed particle hydrodynamics (SPH) method for optimizing the distribution of thermal conductivity in heat conduction problems. In this method, the optimization iteration of the system is split into several easily addressed steps. A targeting step is employed to progressively enforce the direct target, which potentially leads to increased PDE residuals. Then, these residuals are recovered through an evolution step of the design variable. After this, a PDE solution step is carried out to further decrease the PDE residuals, and the system is ready for the next iteration. Unlike the simulation-based approaches, the present method does not rely on the adjoint state equation and converged state variable field in each iteration, and the optimization process is significantly simplified and accelerated. With the utilization of an implicit SPH splitting operator and a general numerical regularization formulation, the information propagation is further accelerated and the numerical stability is greatly enhanced. Typical examples of heat conduction optimization demonstrate that the current method yields optimal results comparable to previous methods and exhibits considerable computational efficiency. Moreover, the optimal results feature more moderate extreme values, which offers distinct advantages for the easier selection of appropriate material with high thermal conductivity.
Received Signal and Channel Parameter Estimation in Molecular Communications
Authors: O. Tansel Baydas, Ozgur B. Akan
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.14621
Pdf link: https://arxiv.org/pdf/2311.14621
Abstract Molecular communication (MC) is a paradigm that employs molecules as information transmitters, hence, requiring unconventional transceivers and detection techniques for the Internet of Bio-Nano Things (IoBNT). In this study, we provide a novel MC model that incorporates a spherical transmitter and receiver with partial absorption. This model offers a more realistic representation than receiver architectures in literature, e.g. passive or entirely absorbing configurations. An optimization-based technique utilizing particle swarm optimization (PSO) is employed to accurately estimate the cumulative number of molecules received. This technique yields nearly constant correction parameters and demonstrates a significant improvement of 5 times in terms of root mean square error (RMSE). The estimated channel model provides an approximate analytical impulse response; hence, it is used for estimating channel parameters such as distance, diffusion coefficient, or a combination of both. We apply iterative maximum likelihood estimation (MLE) for the parameter estimation, which gives consistent errors compared to the estimated Cramer-Rao Lower Bound (CLRB).
A General Framework for User-Guided Bayesian Optimization
Authors: Carl Hvarfner, Frank Hutter, Luigi Nardi
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14645
Pdf link: https://arxiv.org/pdf/2311.14645
Abstract The optimization of expensive-to-evaluate black-box functions is prevalent in various scientific disciplines. Bayesian optimization is an automatic, general and sample-efficient method to solve these problems with minimal knowledge of the underlying function dynamics. However, the ability of Bayesian optimization to incorporate prior knowledge or beliefs about the function at hand in order to accelerate the optimization is limited, which reduces its appeal for knowledgeable practitioners with tight budgets. To allow domain experts to customize the optimization routine, we propose ColaBO, the first Bayesian-principled framework for incorporating prior beliefs beyond the typical kernel structure, such as the likely location of the optimizer or the optimal value. The generality of ColaBO makes it applicable across different Monte Carlo acquisition functions and types of user beliefs. We empirically demonstrate ColaBO's ability to substantially accelerate optimization when the prior information is accurate, and to retain approximately default performance when it is misleading.
Learning in Deep Factor Graphs with Gaussian Belief Propagation
Authors: Seth Nabarro, Mark van der Wilk, Andrew J Davison
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14649
Pdf link: https://arxiv.org/pdf/2311.14649
Abstract We propose an approach to do learning in Gaussian factor graphs. We treat all relevant quantities (inputs, outputs, parameters, latents) as random variables in a graphical model, and view both training and prediction as inference problems with different observed nodes. Our experiments show that these problems can be efficiently solved with belief propagation (BP), whose updates are inherently local, presenting exciting opportunities for distributed and asynchronous training. Our approach can be scaled to deep networks and provides a natural means to do continual learning: use the BP-estimated parameter marginals of the current task as parameter priors for the next. On a video denoising task we demonstrate the benefit of learnable parameters over a classical factor graph approach and we show encouraging performance of deep factor graphs for continual image classification on MNIST.
GVEL: Fast Graph Loading in Edgelist and Compressed Sparse Row (CSR) formats
Authors: Subhajit Sahu
Subjects: Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.14650
Pdf link: https://arxiv.org/pdf/2311.14650
Abstract Efficient IO techniques are crucial in high-performance graph processing frameworks like Gunrock and Hornet, as fast graph loading is essential to minimize processing time and reduce system/cloud usage charges. This research study presents approaches for efficiently reading an Edgelist from a text file and converting it to a Compressed Sparse Row (CSR) representation. On a server with dual 16-core Intel Xeon Gold 6226R processors and MegaRAID SAS-3 storage, our approach, which we term as GVEL, outperforms Hornet, Gunrock, and PIGO by significant margins in CSR reading, exhibiting an average speedup of 78x, 112x, and 1.8x, respectively. For Edgelist reading, GVEL is 2.6x faster than PIGO on average, and achieves a Edgelist read rate of 1.9 billion edges/s. For every doubling of threads, GVEL improves performance at an average rate of 1.9x and 1.7x for reading Edgelist and reading CSR respectively.
History Filtering in Imperfect Information Games: Algorithms and Complexity
Authors: Christopher Solinas, Douglas Rebstock, Nathan R. Sturtevant, Michael Buro
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.14651
Pdf link: https://arxiv.org/pdf/2311.14651
Abstract Historically applied exclusively to perfect information games, depth-limited search with value functions has been key to recent advances in AI for imperfect information games. Most prominent approaches with strong theoretical guarantees require subgame decomposition - a process in which a subgame is computed from public information and player beliefs. However, subgame decomposition can itself require non-trivial computations, and its tractability depends on the existence of efficient algorithms for either full enumeration or generation of the histories that form the root of the subgame. Despite this, no formal analysis of the tractability of such computations has been established in prior work, and application domains have often consisted of games, such as poker, for which enumeration is trivial on modern hardware. Applying these ideas to more complex domains requires understanding their cost. In this work, we introduce and analyze the computational aspects and tractability of filtering histories for subgame decomposition. We show that constructing a single history from the root of the subgame is generally intractable, and then provide a necessary and sufficient condition for efficient enumeration. We also introduce a novel Markov Chain Monte Carlo-based generation algorithm for trick-taking card games - a domain where enumeration is often prohibitively expensive. Our experiments demonstrate its improved scalability in the trick-taking card game Oh Hell. These contributions clarify when and how depth-limited search via subgame decomposition can be an effective tool for sequential decision-making in imperfect information settings.
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space
Authors: Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14652
Pdf link: https://arxiv.org/pdf/2311.14652
Abstract Deploying Large Language Models (LLMs) in streaming applications that involve long contexts, particularly for extended dialogues and text analysis, is of paramount importance but presents two significant challenges. Firstly, the memory consumption is substantial during the decoding phase due to the caching of Key and Value states (KV) of previous tokens. Secondly, attention computation is time-consuming with a time complexity of $O(n^2)$ for the generation of each token. In recent OpenAI DevDay (Nov 6, 2023), OpenAI released a new model that is able to support a 128K-long document, in our paper, we focus on the memory-efficient issue when context length $n$ is much greater than 128K ($n \gg 2^d$). Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$. It accomplishes this by constructing $U_1, U_2 \in \mathbb{R}^{n \times t}$ to expedite attention ${\sf Attn}(Q, K, V)$ computation within $n^{1+o(1)}$ time executions. Despite this, storing the Key and Value matrices $K, V \in \mathbb{R}^{n \times d}$ still necessitates $O( n d)$ space, leading to significant memory usage. In response to these challenges, we introduce a new algorithm that only reads one pass of the data in streaming fashion. This method employs sublinear space $o(n)$ to store three sketch matrices, alleviating the need for exact $K, V$ storage. Notably, our algorithm exhibits exceptional memory-efficient performance with super-long tokens. As the token length $n$ increases, our error guarantee diminishes while the memory usage remains nearly constant. This unique attribute underscores the potential of our technique in efficiently handling LLMs in streaming applications.
Keyword: faster

BackboneLearn: A Library for Scaling Mixed-Integer Optimization-Based Machine Learning
Authors: Vassilis Digalakis Jr, Christos Ziakas
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13695
Pdf link: https://arxiv.org/pdf/2311.13695
Abstract We present BackboneLearn: an open-source software package and framework for scaling mixed-integer optimization (MIO) problems with indicator variables to high-dimensional problems. This optimization paradigm can naturally be used to formulate fundamental problems in interpretable supervised learning (e.g., sparse regression and decision trees), in unsupervised learning (e.g., clustering), and beyond; BackboneLearn solves the aforementioned problems faster than exact methods and with higher accuracy than commonly used heuristics. The package is built in Python and is user-friendly and easily extensible: users can directly implement a backbone algorithm for their MIO problem at hand. The source code of BackboneLearn is available on GitHub (link: https://github.com/chziakas/backbone_learn).
3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology
Authors: Asma Ben Abacha, Alberto Santamaria-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P. Lungren, Thomas Lin, Noel C Codella, Ivan Tarapov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13752
Pdf link: https://arxiv.org/pdf/2311.13752
Abstract The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retrieval holds potential to reduce radiologist workloads by enabling clinicians to efficiently search through diagnostically similar or otherwise relevant cases, resulting in faster and more precise diagnoses. However, the field of 3D medical image retrieval is still emerging, lacking established evaluation benchmarks, comprehensive datasets, and thorough studies. This paper attempts to bridge this gap by introducing a novel benchmark for 3D Medical Image Retrieval (3D-MIR) that encompasses four different anatomies imaged with computed tomography. Using this benchmark, we explore a diverse set of search strategies that use aggregated 2D slices, 3D volumes, and multi-modal embeddings from popular multi-modal foundation models as queries. Quantitative and qualitative assessments of each approach are provided alongside an in-depth discussion that offers insight for future research. To promote the advancement of this field, our benchmark, dataset, and code are made publicly available.
Some Like It Small: Czech Semantic Embedding Models for Industry Applications
Authors: Jiří Bednář, Jakub Náplava, Petra Barančíková, Ondřej Lisický
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.13921
Pdf link: https://arxiv.org/pdf/2311.13921
Abstract This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Given the limited availability of labeled Czech data, alternative approaches, including pre-training, knowledge distillation, and unsupervised contrastive fine-tuning, are investigated. Comprehensive intrinsic and extrinsic analyses are conducted, showcasing the competitive performance of our models compared to significantly larger counterparts, with approximately 8 times smaller size and 5 times faster speed than conventional Base-sized models. To promote cooperation and reproducibility, both the models and the evaluation pipeline are made publicly accessible. Ultimately, this article presents practical applications of the developed sentence embedding models in Seznam.cz, the Czech search engine. These models have effectively replaced previous counterparts, enhancing the overall search experience for instance, in organic search, featured snippets, and image search. This transition has yielded improved performance.
Shadow: A Novel Loss Function for Efficient Training in Siamese Networks
Authors: Alif Elham Khan, Mohammad Junayed Hasan, Humayra Anjum, Nabeel Mohammed
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14012
Pdf link: https://arxiv.org/pdf/2311.14012
Abstract Despite significant recent advances in similarity detection tasks, existing approaches pose substantial challenges under memory constraints. One of the primary reasons for this is the use of computationally expensive metric learning loss functions such as Triplet Loss in Siamese networks. In this paper, we present a novel loss function called Shadow Loss that compresses the dimensions of an embedding space during loss calculation without loss of performance. The distance between the projections of the embeddings is learned from inputs on a compact projection space where distances directly correspond to a measure of class similarity. Projecting on a lower-dimension projection space, our loss function converges faster, and the resulting classified image clusters have higher inter-class and smaller intra-class distances. Shadow Loss not only reduces embedding dimensions favoring memory constraint devices but also consistently performs better than the state-of-the-art Triplet Margin Loss by an accuracy of 5\%-10\% across diverse datasets. The proposed loss function is also model agnostic, upholding its performance across several tested models. Its effectiveness and robustness across balanced, imbalanced, medical, and non-medical image datasets suggests that it is not specific to a particular model or dataset but demonstrates superior performance consistently while using less memory and computation.
DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release
Authors: Jie Fu, Qingqing Ye, Haibo Hu, Zhili Chen, Lulu Wang, Kuncan Wang, Ran Xun
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.14056
Pdf link: https://arxiv.org/pdf/2311.14056
Abstract Machine learning models are known to memorize private data to reduce their training loss, which can be inadvertently exploited by privacy attacks such as model inversion and membership inference. To protect against these attacks, differential privacy (DP) has become the de facto standard for privacy-preserving machine learning, particularly those popular training algorithms using stochastic gradient descent, such as DPSGD. Nonetheless, DPSGD still suffers from severe utility loss due to its slow convergence. This is partially caused by the random sampling, which brings bias and variance to the gradient, and partially by the Gaussian noise, which leads to fluctuation of gradient updates. Our key idea to address these issues is to apply selective updates to the model training, while discarding those useless or even harmful updates. Motivated by this, this paper proposes DPSUR, a Differentially Private training framework based on Selective Updates and Release, where the gradient from each iteration is evaluated based on a validation test, and only those updates leading to convergence are applied to the model. As such, DPSUR ensures the training in the right direction and thus can achieve faster convergence than DPSGD. The main challenges lie in two aspects -- privacy concerns arising from gradient evaluation, and gradient selection strategy for model update. To address the challenges, DPSUR introduces a clipping strategy for update randomization and a threshold mechanism for gradient selection. Experiments conducted on MNIST, FMNIST, CIFAR-10, and IMDB datasets show that DPSUR significantly outperforms previous works in terms of convergence speed and model utility.
You Only Explain Once
Authors: David A. Kelly, Hana Chockler, Daniel Kroening, Nathan Blake, Aditi Ramaswamy, Melane Navaratnarajah, Aaditya Shivakumar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14081
Pdf link: https://arxiv.org/pdf/2311.14081
Abstract In this paper, we propose a new black-box explainability algorithm and tool, YO-ReX, for efficient explanation of the outputs of object detectors. The new algorithm computes explanations for all objects detected in the image simultaneously. Hence, compared to the baseline, the new algorithm reduces the number of queries by a factor of 10X for the case of ten detected objects. The speedup increases further with with the number of objects. Our experimental results demonstrate that YO-ReX can explain the outputs of YOLO with a negligible overhead over the running time of YOLO. We also demonstrate similar results for explaining SSD and Faster R-CNN. The speedup is achieved by avoiding backtracking by combining aggressive pruning with a causal analysis.
Risk Bounds of Accelerated SGD for Overparameterized Linear Regression
Authors: Xuheng Li, Yihe Deng, Jingfeng Wu, Dongruo Zhou, Quanquan Gu
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14222
Pdf link: https://arxiv.org/pdf/2311.14222
Abstract Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest setting of learning with overparameterization. We establish an instance-dependent excess risk bound for ASGD within each eigen-subspace of the data covariance matrix. Our analysis shows that (i) ASGD outperforms SGD in the subspace of small eigenvalues, exhibiting a faster rate of exponential decay for bias error, while in the subspace of large eigenvalues, its bias error decays slower than SGD; and (ii) the variance error of ASGD is always larger than that of SGD. Our result suggests that ASGD can outperform SGD when the difference between the initialization and the true weight vector is mostly confined to the subspace of small eigenvalues. Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.
Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery
Authors: Ekaterina Nikonova, Cheng Xue, Jochen Renz
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.14270
Pdf link: https://arxiv.org/pdf/2311.14270
Abstract Deep reinforcement learning suffers from catastrophic forgetting and sample inefficiency making it less applicable to the ever-changing real world. However, the ability to use previously learned knowledge is essential for AI agents to quickly adapt to novelties. Often, certain spatial information observed by the agent in the previous interactions can be leveraged to infer task-specific rules. Inferred rules can then help the agent to avoid potentially dangerous situations in the previously unseen states and guide the learning process increasing agent's novelty adaptation speed. In this work, we propose a general framework that is applicable to deep reinforcement learning agents. Our framework provides the agent with an autonomous way to discover the task-specific rules in the novel environments and self-supervise it's learning. We provide a rule-driven deep Q-learning agent (RDQ) as one possible implementation of that framework. We show that RDQ successfully extracts task-specific rules as it interacts with the world and uses them to drastically increase its learning efficiency. In our experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents, and is able to detect and adapt to novel situations faster.
Four-set Hypergraphlets for Characterization of Directed Hypergraphs
Authors: Heechan Moon, Hyunju Kim, Sunwoo Kim, Kijung Shin
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.14289
Pdf link: https://arxiv.org/pdf/2311.14289
Abstract A directed hypergraph, which consists of nodes and hyperarcs, is a higher-order data structure that naturally models directional group interactions (e.g., chemical reactions of molecules). Although there have been extensive studies on local structures of (directed) graphs in the real world, those of directed hypergraphs remain unexplored. In this work, we focus on measurements, findings, and applications related to local structures of directed hypergraphs, and they together contribute to a systematic understanding of various real-world systems interconnected by directed group interactions. Our first contribution is to define 91 directed hypergraphlets (DHGs), which disjointly categorize directed connections and overlaps among four node sets that compose two incident hyperarcs. Our second contribution is to develop exact and approximate algorithms for counting the occurrences of each DHG. Our last contribution is to characterize 11 real-world directed hypergraphs and individual hyperarcs in them using the occurrences of DHGs, which reveals clear domain-based local structural patterns. Our experiments demonstrate that our DHG-based characterization gives up to 12% and 33% better performances on hypergraph clustering and hyperarc prediction, respectively, than baseline characterization methods. Moreover, we show that CODA-A, which is our proposed approximate algorithm, is up to 32X faster than its competitors with similar characterization quality.
ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model
Authors: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14542
Pdf link: https://arxiv.org/pdf/2311.14542
Abstract Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework inspired by the human generation system. Unlike traditional diffusion models with opaque denoising steps, our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image. This not only enhances overall performance but also enables robust editing and interaction capabilities. Each stage is meticulously formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM). Extensive experiments on datasets like LSUN-Churches and COCO validate our approach, consistently outperforming existing methods. ToddlerDiffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating three times faster with a 3.76 times smaller architecture. Our source code is provided in the supplementary material and will be publicly accessible.
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Authors: Yufei Zhan, Yousong Zhu, Zhiyang Chen, Fan Yang, Ming Tang, Jinqiao Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.14552
Pdf link: https://arxiv.org/pdf/2311.14552
Abstract Replicating the innate human ability to detect all objects based on free-form texts at any granularity remains a formidable challenge for Vision-Language models. Current Large Vision Language Models (LVLMs) are predominantly constrained to grounding a single, pre-existing object, relying solely on data from Referring Expression Comprehension tasks. The limitation leads to a compromise in model design, necessitating the introduction of visual expert models or the integration of customized head structures. Beyond these constraints, our research delves into the untapped potential of LVLMs and uncover their inherent capability for basic object perception, allowing them to accurately identify and locate objects of interest. Building on this insight, we introduce a novel language-prompted localization dataset designed to fully unleash the capabilities of LVLMs in integrating fine-grained object perception with precise location awareness. More importantly, we present $\textbf{Griffon}$, a purely LVLM-based baseline, which does not require the introduction of any special tokens, expert models, or additional detection modules. It simply maintains a consistent structure with popular LVLMs by unifying data formats across various localization-related scenarios and is trained end-to-end through a well-designed pipeline. Comprehensive experiments demonstrate that $\textbf{Griffon}$ not only achieves state-of-the-art performance on the fine-grained RefCOCO series but also approaches the capabilities of the expert model Faster RCNN on the detection benchmark MSCOCO.
A Metalearned Neural Circuit for Nonparametric Bayesian Inference
Authors: Jake C. Snell, Gianluca Bencomo, Thomas L. Griffiths
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14601
Pdf link: https://arxiv.org/pdf/2311.14601
Abstract Most applications of machine learning to classification assume a closed set of balanced classes. This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution and it is unlikely that all classes are seen in a single sample. Nonparametric Bayesian models naturally capture this phenomenon, but have significant practical barriers to widespread adoption, namely implementation complexity and computational inefficiency. To address this, we present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network. By simulating data with a nonparametric Bayesian prior, we can metalearn a sequence model that performs inference over an unlimited set of classes. After training, this "neural circuit" has distilled the corresponding inductive bias and can successfully perform sequential inference over an open set of classes. Our experimental results show that the metalearned neural circuit achieves comparable or better performance than particle filter-based methods for inference in these models while being faster and simpler to use than methods that explicitly incorporate Bayesian nonparametric inference.
GVEL: Fast Graph Loading in Edgelist and Compressed Sparse Row (CSR) formats
Authors: Subhajit Sahu
Subjects: Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.14650
Pdf link: https://arxiv.org/pdf/2311.14650
Abstract Efficient IO techniques are crucial in high-performance graph processing frameworks like Gunrock and Hornet, as fast graph loading is essential to minimize processing time and reduce system/cloud usage charges. This research study presents approaches for efficiently reading an Edgelist from a text file and converting it to a Compressed Sparse Row (CSR) representation. On a server with dual 16-core Intel Xeon Gold 6226R processors and MegaRAID SAS-3 storage, our approach, which we term as GVEL, outperforms Hornet, Gunrock, and PIGO by significant margins in CSR reading, exhibiting an average speedup of 78x, 112x, and 1.8x, respectively. For Edgelist reading, GVEL is 2.6x faster than PIGO on average, and achieves a Edgelist read rate of 1.9 billion edges/s. For every doubling of threads, GVEL improves performance at an average rate of 1.9x and 1.7x for reading Edgelist and reading CSR respectively.
Keyword: mobile

Differences of communication activity and mobility patterns between urban and rural people
Authors: Fumiko Ogushi, Chandreyee Roy, Kimmo Kaski
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2311.13652
Pdf link: https://arxiv.org/pdf/2311.13652
Abstract Human mobility and other social activity patterns influence various aspects of society such as urban planning, traffic predictions, crisis resilience, and epidemic prevention. The behaviour of individuals, like their communication frequencies and movements, are shaped by societal and socio-economic factors. In addition, the differences in the geolocation of people as well as their gender and age cast effects on their activity patterns. In this study we focus on investigating these patterns by using mobile phone data, specifically the call detail records (CDRs), to analyze the social communication and mobility patterns of people. This dataset can provide us insight into the individual and population-level behaviours in rural and urban environments on a daily, weekly and seasonal basis. The results of our analyses show that in the urban areas people have high calling activity but low mobility, while in the rural areas they show the opposite behaviour, i.e. low calling activity combined with high mobility. Overall, there is a decreasing trend in people's mobility through the year even though their calling activity remained consistent except for the holidays during which time the communication frequency drops markedly. We have also observed that there are significant differences in the mobility between the work days and free days. Finally, the age and gender of individuals have also been observed to play a role in the seasonal patterns differently in urban and rural areas.
Data-Driven Robot Fault Detection and Diagnosis Using Generative Models: A Modified SFDD Algorithm
Authors: Alex Mitrevski, Paul G. Plöger
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.13866
Pdf link: https://arxiv.org/pdf/2311.13866
Abstract This paper presents a modification of the data-driven sensor-based fault detection and diagnosis (SFDD) algorithm for online robot monitoring. Our version of the algorithm uses a collection of generative models, in particular restricted Boltzmann machines, each of which represents the distribution of sliding window correlations between a pair of correlated measurements. We use such models in a residual generation scheme, where high residuals generate conflict sets that are then used in a subsequent diagnosis step. As a proof of concept, the framework is evaluated on a mobile logistics robot for the problem of recognising disconnected wheels, such that the evaluation demonstrates the feasibility of the framework (on the faulty data set, the models obtained 88.6% precision and 75.6% recall rates), but also shows that the monitoring results are influenced by the choice of distribution model and the model parameters as a whole.
AdapterFL: Adaptive Heterogeneous Federated Learning for Resource-constrained Mobile Computing Systems
Authors: Ruixuan Liu, Ming Hu, Zeke Xia, Jun Xia, Pengyu Zhang, Yihao Huang, Yang Liu, Mingsong Chen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14037
Pdf link: https://arxiv.org/pdf/2311.14037
Abstract Federated Learning (FL) enables collaborative learning of large-scale distributed clients without data sharing. However, due to the disparity of computing resources among massive mobile computing devices, the performance of traditional homogeneous model-based Federated Learning (FL) is seriously limited. On the one hand, to achieve model training in all the diverse clients, mobile computing systems can only use small low-performance models for collaborative learning. On the other hand, devices with high computing resources cannot train a high-performance large model with their insufficient raw data. To address the resource-constrained problem in mobile computing systems, we present a novel heterogeneous FL approach named AdapterFL, which uses a model reassemble strategy to facilitate collaborative training of massive heterogeneous mobile devices adaptively. Specifically, we select multiple candidate heterogeneous models based on the computing performance of massive mobile devices and then divide each heterogeneous model into two partitions. By reassembling the partitions, we can generate models with varied sizes that are combined by the partial parameters of the large model with the partial parameters of the small model. Using these reassembled models for FL training, we can train the partial parameters of the large model using low-performance devices. In this way, we can alleviate performance degradation in large models due to resource constraints. The experimental results show that AdapterFL can achieve up to 12\% accuracy improvement compared to the state-of-the-art heterogeneous federated learning methods in resource-constrained scenarios.
Multi-Agent Motion Planning with Bézier Curve Optimization under Kinodynamic Constraints
Authors: Jingtian Yan, Jiaoyang Li
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2311.14145
Pdf link: https://arxiv.org/pdf/2311.14145
Abstract Multi-Agent Motion Planning (MAMP) is a problem that seeks collision-free dynamically-feasible trajectories for multiple moving agents in a known environment while minimizing their travel time. MAMP is closely related to the well-studied Multi-Agent Path-Finding (MAPF) problem. Recently, MAPF methods have achieved great success in finding collision-free paths for a substantial number of agents. However, those methods often overlook the kinodynamic constraints of the agents, assuming instantaneous movement, which limits their practicality and realism. In this paper, we present a three-level MAPF-based planner called PSB to address the challenges posed by MAMP. PSB fully considers the kinodynamic capability of the agents and produces solutions with smooth speed profiles that can be directly executed by the controller. Empirically, we evaluate PSB within the domains of traffic intersection coordination for autonomous vehicles and obstacle-rich grid map navigation for mobile robots. PSB shows up to 49.79% improvements in solution cost compared to existing methods.
CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning
Authors: Shivam Aggarwal, Kuluhan Binici, Tulika Mitra
Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14272
Pdf link: https://arxiv.org/pdf/2311.14272
Abstract Machine learning pipelines for classification tasks often train a universal model to achieve accuracy across a broad range of classes. However, a typical user encounters only a limited selection of classes regularly. This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes. Existing works rely on unstructured pruning, which introduces randomly distributed non-zero values in the model, making it unsuitable for hardware acceleration. Alternatively, some approaches employ structured pruning, such as channel pruning, but these tend to provide only minimal compression and may lead to reduced model accuracy. In this work, we propose CRISP, a novel pruning framework leveraging a hybrid structured sparsity pattern that combines both fine-grained N:M structured sparsity and coarse-grained block sparsity. Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes. CRISP achieves high accuracy with minimal memory consumption for popular models like ResNet-50, VGG-16, and MobileNetV2 on ImageNet and CIFAR-100 datasets. Moreover, CRISP delivers up to 14$\times$ reduction in latency and energy consumption compared to existing pruning methods while maintaining comparable accuracy. Our code is available at https://github.com/shivmgg/CRISP/.
Racing With ROS 2 A Navigation System for an Autonomous Formula Student Race Car
Authors: Alastair Bradford, Grant van Breda, Tobias Fischer
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14276
Pdf link: https://arxiv.org/pdf/2311.14276
Abstract The advent of autonomous vehicle technologies has significantly impacted various sectors, including motorsport, where Formula Student and Formula: Society of Automotive Engineers introduced autonomous racing classes. These offer new challenges to aspiring engineers, including the team at QUT Motorsport, but also raise the entry barrier due to the complexity of high-speed navigation and control. This paper presents an open-source solution using the Robot Operating System 2, specifically its open-source navigation stack, to address these challenges in autonomous Formula Student race cars. We compare off-the-shelf navigation libraries that this stack comprises of against traditional custom-made programs developed by QUT Motorsport to evaluate their applicability in autonomous racing scenarios and integrate them onto an autonomous race car. Our contributions include quantitative and qualitative comparisons of these packages against traditional navigation solutions, aiming to lower the entry barrier for autonomous racing. This paper also serves as a comprehensive tutorial for teams participating in similar racing disciplines and other autonomous mobile robot applications.
Distance-Only Task Orchestration Algorithm for Energy Efficiency in Satellite-Based Mist Computing
Authors: Messaoud Babaghayou, Noureddine Chaib, Leandros Maglaras, Yagmur Yigit, Mohamed Amine Ferrag
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.14308
Pdf link: https://arxiv.org/pdf/2311.14308
Abstract This paper addresses the challenge of efficiently offloading heavy computing tasks from ground mobile devices to the satellite-based mist computing environment. With ground-based edge and cloud servers often being inaccessible, the exploitation of satellite mist computing becomes imperative. Existing offloading algorithms have shown limitations in adapting to the unique characteristics of heavy computing tasks. Thus, we propose a heavy computing task offloading algorithm that prioritizes satellite proximity. This approach not only reduces energy consumption during telecommunications but also ensures tasks are executed within the specified timing constraints, which are typically non-time-critical. Our proposed algorithm outperforms other offloading schemes in terms of satellites energy consumption, average end-to-end delay, and tasks success rates. Although it exhibits a higher average VM CPU usage, this increase does not pose critical challenges. This distance-based approach offers a promising solution to enhance energy efficiency in satellite-based mist computing, making it well-suited for heavy computing tasks demands.
Prototype of deployment of Federated Learning with IoT devices
Authors: Pablo García Santaclara, Ana Fernández Vilas, Rebeca P. Díaz Redondo
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.14401
Pdf link: https://arxiv.org/pdf/2311.14401
Abstract In the age of technology, data is an increasingly important resource. This importance is growing in the field of Artificial Intelligence (AI), where sub fields such as Machine Learning (ML) need more and more data to achieve better results. Internet of Things (IoT) is the connection of sensors and smart objects to collect and exchange data, in addition to achieving many other tasks. A huge amount of the resource desired, data, is stored in mobile devices, sensors and other Internet of Things (IoT) devices, but remains there due to data protection restrictions. At the same time these devices do not have enough data or computational capacity to train good models. Moreover, transmitting, storing and processing all this data on a centralised server is problematic. Federated Learning (FL) provides an innovative solution that allows devices to learn in a collaborative way. More importantly, it accomplishes this without violating data protection laws. FL is currently growing, and there are several solutions that implement it. This article presents a prototype of a FL solution where the IoT devices used were raspberry pi boards. The results compare the performance of a solution of this type with those obtained in traditional approaches. In addition, the FL solution performance was tested in a hostile environment. A convolutional neural network (CNN) and a image data set were used. The results show the feasibility and usability of these techniques, although in many cases they do not reach the performance of traditional approaches.
Receding Horizon Optimization with PPUM: An Approach for Autonomous Robot Path Planning in Uncertain Environments
Authors: Zijian Ge, Jingjing Jiang, Matthew Coombes, Liang Sun
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.14411
Pdf link: https://arxiv.org/pdf/2311.14411
Abstract The ability to understand spatial-temporal patterns for crowds of people is crucial for achieving long-term autonomy of mobile robots deployed in human environments. However, traditional historical data-driven memory models are inadequate for handling anomalies, resulting in poor reasoning by robot in estimating the crowd spatial distribution. In this article, a Receding Horizon Optimization (RHO) formulation is proposed that incorporates a Probability-related Partially Updated Memory (PPUM) for robot path planning in crowded environments with uncertainties. The PPUM acts as a memory layer that combines real-time sensor observations with historical knowledge using a weighted evidence fusion theory to improve robot's adaptivity to the dynamic environments. RHO then utilizes the PPUM as a informed knowledge to generate a path that minimizes the likelihood of encountering dense crowds while reducing the cost of local motion planning. The proposed approach provides an innovative solution to the problem of robot's long-term safe interaction with human in uncertain crowded environments. In simulation, the results demonstrate the superior performance of our approach compared to benchmark methods in terms of crowd distribution estimation accuracy, adaptability to anomalies and path planning efficiency.
Fault Detection in Telecom Networks using Bi-level Federated Graph Neural Networks
Authors: R. Bourgerie, T. Zanouda
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.14469
Pdf link: https://arxiv.org/pdf/2311.14469
Abstract 5G and Beyond Networks become increasingly complex and heterogeneous, with diversified and high requirements from a wide variety of emerging applications. The complexity and diversity of Telecom networks place an increasing strain on maintenance and operation efforts. Moreover, the strict security and privacy requirements present a challenge for mobile operators to leverage network data. To detect network faults, and mitigate future failures, prior work focused on leveraging traditional ML/DL methods to locate anomalies in networks. The current approaches, although powerful, do not consider the intertwined nature of embedded and software-intensive Radio Access Network systems. In this paper, we propose a Bi-level Federated Graph Neural Network anomaly detection and diagnosis model that is able to detect anomalies in Telecom networks in a privacy-preserving manner, while minimizing communication costs. Our method revolves around conceptualizing Telecom data as a bi-level temporal Graph Neural Networks. The first graph captures the interactions between different RAN nodes that are exposed to different deployment scenarios in the network, while each individual Radio Access Network node is further elaborated into its software (SW) execution graph. Additionally, we use Federated Learning to address privacy and security limitations. Furthermore, we study the performance of anomaly detection model under three settings: (1) Centralized (2) Federated Learning and (3) Personalized Federated Learning using real-world data from an operational network. Our comprehensive experiments showed that Personalized Federated Temporal Graph Neural Networks method outperforms the most commonly used techniques for Anomaly Detection.
An Industrial Perspective on Multi-Agent Decision Making for Interoperable Robot Navigation following the VDA5050 Standard
Authors: Niels van Duijkeren, Luigi Palmieri, Ralph Lange, Alexander Kleiner
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.14615
Pdf link: https://arxiv.org/pdf/2311.14615
Abstract This paper provides a perspective on the literature and current challenges in Multi-Agent Systems for interoperable robot navigation in industry. The focus is on the multi-agent decision stack for Autonomous Mobile Robots operating in mixed environments with humans, manually driven vehicles, and legacy Automated Guided Vehicles. We provide typical characteristics of such Multi-Agent Systems observed today and how these are expected to change on the short term due to the new standard VDA5050 and the interoperability framework OpenRMF. We present recent changes in fleet management standards and the role of open middleware frameworks like ROS2 reaching industrial-grade quality. Approaches to increase the robustness and performance of multi-robot navigation systems for transportation are discussed, and research opportunities are derived.
Keyword: pruning

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Authors: Xin Zhang, Jiawei Du, Yunsong Li, Weiying Xie, Joey Tianyi Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13613
Pdf link: https://arxiv.org/pdf/2311.13613
Abstract Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting in poor generalization across various pruning and cross-architecture scenarios. Recent studies have addressed this issue by expanding the scope of training dynamics considered, including factors such as forgetting event and probability change, typically using an averaging approach. However, these works struggle to integrate a broader range of training dynamics without overlooking well-generalized samples, which may not be sufficiently highlighted in an averaging manner. In this study, we propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS), to tackle this problem. TDDS utilizes a dual-depth strategy to achieve a balance between incorporating extensive training dynamics and identifying representative samples for dataset pruning. In the first depth, we estimate the series of each sample's individual contributions spanning the training progress, ensuring comprehensive integration of training dynamics. In the second depth, we focus on the variability of the sample-wise contributions identified in the first depth to highlight well-generalized samples. Extensive experiments conducted on CIFAR and ImageNet datasets verify the superiority of TDDS over previous SOTA methods. Specifically on CIFAR-100, our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning
Authors: Seonghak Kim, Gyeongdo Ham, Yucheol Cho, Daeshik Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13934
Pdf link: https://arxiv.org/pdf/2311.13934
Abstract The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore, in previous KD approaches, we observed that data augmentation, a technique aimed at enhancing a model's generalization, can have an adverse impact. Therefore, we propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning. This approach enables KD to effectively incorporate data augmentation for performance improvement. Extensive experiments on various datasets, including CIFAR-100, FGVR, TinyImagenet, and ImageNet, demonstrate our method's superiority over current state-of-the-art methods.
You Only Explain Once
Authors: David A. Kelly, Hana Chockler, Daniel Kroening, Nathan Blake, Aditi Ramaswamy, Melane Navaratnarajah, Aaditya Shivakumar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14081
Pdf link: https://arxiv.org/pdf/2311.14081
Abstract In this paper, we propose a new black-box explainability algorithm and tool, YO-ReX, for efficient explanation of the outputs of object detectors. The new algorithm computes explanations for all objects detected in the image simultaneously. Hence, compared to the baseline, the new algorithm reduces the number of queries by a factor of 10X for the case of ten detected objects. The speedup increases further with with the number of objects. Our experimental results demonstrate that YO-ReX can explain the outputs of YOLO with a negligible overhead over the running time of YOLO. We also demonstrate similar results for explaining SSD and Faster R-CNN. The speedup is achieved by avoiding backtracking by combining aggressive pruning with a causal analysis.
CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning
Authors: Shivam Aggarwal, Kuluhan Binici, Tulika Mitra
Subjects: Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14272
Pdf link: https://arxiv.org/pdf/2311.14272
Abstract Machine learning pipelines for classification tasks often train a universal model to achieve accuracy across a broad range of classes. However, a typical user encounters only a limited selection of classes regularly. This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes. Existing works rely on unstructured pruning, which introduces randomly distributed non-zero values in the model, making it unsuitable for hardware acceleration. Alternatively, some approaches employ structured pruning, such as channel pruning, but these tend to provide only minimal compression and may lead to reduced model accuracy. In this work, we propose CRISP, a novel pruning framework leveraging a hybrid structured sparsity pattern that combines both fine-grained N:M structured sparsity and coarse-grained block sparsity. Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes. CRISP achieves high accuracy with minimal memory consumption for popular models like ResNet-50, VGG-16, and MobileNetV2 on ImageNet and CIFAR-100 datasets. Moreover, CRISP delivers up to 14$\times$ reduction in latency and energy consumption compared to existing pruning methods while maintaining comparable accuracy. Our code is available at https://github.com/shivmgg/CRISP/.
Analysing the Impact of Removing Infrequent Words on Topic Quality in LDA Models
Authors: Victor Bystrov, Viktoriia Naboka-Krell, Anna Staszewska-Bystrova, Peter Winker
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.14505
Pdf link: https://arxiv.org/pdf/2311.14505
Abstract An initial procedure in text-as-data applications is text preprocessing. One of the typical steps, which can substantially facilitate computations, consists in removing infrequent words believed to provide limited information about the corpus. Despite popularity of vocabulary pruning, not many guidelines on how to implement it are available in the literature. The aim of the paper is to fill this gap by examining the effects of removing infrequent words for the quality of topics estimated using Latent Dirichlet Allocation. The analysis is based on Monte Carlo experiments taking into account different criteria for infrequent terms removal and various evaluation metrics. The results indicate that pruning is beneficial and that the share of vocabulary which might be eliminated can be quite considerable.
tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models
Authors: Francesco Paissan, Elisabetta Farella
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.14517
Pdf link: https://arxiv.org/pdf/2311.14517
Abstract Contrastive Language-Audio Pretraining (CLAP) became of crucial importance in the field of audio and speech processing. Its employment ranges from sound event detection to text-to-audio generation. However, one of the main limitations is the considerable amount of data required in the training process and the overall computational complexity during inference. This paper investigates how we can reduce the complexity of contrastive language-audio pre-trained models, yielding an efficient model that we call tinyCLAP. We derive an unimodal distillation loss from first principles and explore how the dimensionality of the shared, multimodal latent space can be reduced via pruning. TinyCLAP uses only 6% of the original Microsoft CLAP parameters with a minimal reduction (less than 5%) in zero-shot classification performance across the three sound event detection datasets on which it was tested
Keyword: diffusion

Breathing Life Into Sketches Using Text-to-Video Priors
Authors: Rinon Gal, Yael Vinker, Yuval Alaluf, Amit H. Bermano, Daniel Cohen-Or, Ariel Shamir, Gal Chechik
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13608
Pdf link: https://arxiv.org/pdf/2311.13608
Abstract A sketch is one of the most intuitive and versatile tools humans use to convey their ideas visually. An animated sketch opens another dimension to the expression of ideas and is widely used by designers for a variety of purposes. Animating sketches is a laborious process, requiring extensive experience and professional design skills. In this work, we present a method that automatically adds motion to a single-subject sketch (hence, "breathing life into it"), merely by providing a text prompt indicating the desired motion. The output is a short animation provided in vector representation, which can be easily edited. Our method does not require extensive training, but instead leverages the motion prior of a large pretrained text-to-video diffusion model using a score-distillation loss to guide the placement of strokes. To promote natural and smooth motion and to better preserve the sketch's appearance, we model the learned motion through two components. The first governs small local deformations and the second controls global affine transformations. Surprisingly, we find that even models that struggle to generate sketch videos on their own can still serve as a useful backbone for animating abstract representations.
Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to 3D Prior with Progressive Learning
Authors: Kai Yu, Jinlin Liu, Mengyang Feng, Miaomiao Cui, Xuansong Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13617
Pdf link: https://arxiv.org/pdf/2311.13617
Abstract We present Boosting3D, a multi-stage single image-to-3D generation method that can robustly generate reasonable 3D objects in different data domains. The point of this work is to solve the view consistency problem in single image-guided 3D generation by modeling a reasonable geometric structure. For this purpose, we propose to utilize better 3D prior to training the NeRF. More specifically, we train an object-level LoRA for the target object using original image and the rendering output of NeRF. And then we train the LoRA and NeRF using a progressive training strategy. The LoRA and NeRF will boost each other while training. After the progressive training, the LoRA learns the 3D information of the generated object and eventually turns to an object-level 3D prior. In the final stage, we extract the mesh from the trained NeRF and use the trained LoRA to optimize the structure and appearance of the mesh. The experiments demonstrate the effectiveness of the proposed method. Boosting3D learns object-specific 3D prior which is beyond the ability of pre-trained diffusion priors and achieves state-of-the-art performance in the single image-to-3d generation task.
The Challenges of Image Generation Models in Generating Multi-Component Images
Authors: Tham Yik Foong, Shashank Kotyan, Po Yuan Mao, Danilo Vasconcellos Vargas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13620
Pdf link: https://arxiv.org/pdf/2311.13620
Abstract Recent advances in text-to-image generators have led to substantial capabilities in image generation. However, the complexity of prompts acts as a bottleneck in the quality of images generated. A particular under-explored facet is the ability of generative models to create high-quality images comprising multiple components given as a prior. In this paper, we propose and validate a metric called Components Inclusion Score (CIS) to evaluate the extent to which a model can correctly generate multiple components. Our results reveal that the evaluated models struggle to incorporate all the visual elements from prompts with multiple components (8.53% drop in CIS per component for all evaluated models). We also identify a significant decline in the quality of the images and context awareness within an image as the number of components increased (15.91% decrease in inception Score and 9.62% increase in Frechet Inception Distance). To remedy this issue, we fine-tuned Stable Diffusion V2 on a custom-created test dataset with multiple components, outperforming its vanilla counterpart. To conclude, these findings reveal a critical limitation in existing text-to-image generators, shedding light on the challenge of generating multiple components within a single image using a complex prompt.
TDiffDe: A Truncated Diffusion Model for Remote Sensing Hyperspectral Image Denoising
Authors: Jiang He, Yajie Li, Jie L, Qiangqiang Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.13622
Pdf link: https://arxiv.org/pdf/2311.13622
Abstract Hyperspectral images play a crucial role in precision agriculture, environmental monitoring or ecological analysis. However, due to sensor equipment and the imaging environment, the observed hyperspectral images are often inevitably corrupted by various noise. In this study, we proposed a truncated diffusion model, called TDiffDe, to recover the useful information in hyperspectral images gradually. Rather than starting from a pure noise, the input data contains image information in hyperspectral image denoising. Thus, we cut the trained diffusion model from small steps to avoid the destroy of valid information.
Diffusion models meet image counter-forensics
Authors: Matías Tailanian, Marina Gardella, Álvaro Pardo, Pablo Musé
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.13629
Pdf link: https://arxiv.org/pdf/2311.13629
Abstract From its acquisition in the camera sensors to its storage, different operations are performed to generate the final image. This pipeline imprints specific traces into the image to form a natural watermark. Tampering with an image disturbs these traces; these disruptions are clues that are used by most methods to detect and locate forgeries. In this article, we assess the capabilities of diffusion models to erase the traces left by forgers and, therefore, deceive forensics methods. Such an approach has been recently introduced for adversarial purification, achieving significant performance. We show that diffusion purification methods are well suited for counter-forensics tasks. Such approaches outperform already existing counter-forensics techniques both in deceiving forensics methods and in preserving the natural look of the purified images. The source code is publicly available at https://github.com/mtailanian/diff-cf.
A Somewhat Robust Image Watermark against Diffusion-based Editing Models
Authors: Mingtian Tan, Tianhao Wang, Somesh Jha
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13713
Pdf link: https://arxiv.org/pdf/2311.13713
Abstract Recently, diffusion models (DMs) have become the state-of-the-art method for image synthesis. Editing models based on DMs, known for their high fidelity and precision, have inadvertently introduced new challenges related to image copyright infringement and malicious editing. Our work is the first to formalize and address this issue. After assessing and attempting to enhance traditional image watermarking techniques, we recognize their limitations in this emerging context. In response, we develop a novel technique, RIW (Robust Invisible Watermarking), to embed invisible watermarks leveraging adversarial example techniques. Our technique ensures a high extraction accuracy of $96\%$ for the invisible watermark after editing, compared to the $0\%$ offered by conventional methods. We provide access to our code at https://github.com/BennyTMT/RIW.
Sample-Efficient Training for Diffusion
Authors: Shivam Gupta, Aditya Parulekar, Eric Price, Zhiyang Xun
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13745
Pdf link: https://arxiv.org/pdf/2311.13745
Abstract Score-based diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. Recently, a number of theoretical works \citep{chen2022, Chen2022ImprovedAO, Chenetal23flowode, benton2023linear} have shown that diffusion models can efficiently sample, assuming $L^2$-accurate score estimates. The score-matching objective naturally approximates the true score in $L^2$, but the sample complexity of existing bounds depends \emph{polynomially} on the data radius and desired Wasserstein accuracy. By contrast, the time complexity of sampling is only logarithmic in these parameters. We show that estimating the score in $L^2$ \emph{requires} this polynomial dependence, but that a number of samples that scales polylogarithmically in the Wasserstein accuracy actually do suffice for sampling. We show that with a polylogarithmic number of samples, the ERM of the score-matching objective is $L^2$ accurate on all but a probability $\delta$ fraction of the true distribution, and that this weaker guarantee is sufficient for efficient sampling.
Posterior Distillation Sampling
Authors: Juil Koo, Chanho Park, Minhyuk Sung
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.13831
Pdf link: https://arxiv.org/pdf/2311.13831
Abstract We introduce Posterior Distillation Sampling (PDS), a novel optimization method for parametric image editing based on diffusion models. Existing optimization-based methods, which leverage the powerful 2D prior of diffusion models to handle various parametric images, have mainly focused on generation. Unlike generation, editing requires a balance between conforming to the target attribute and preserving the identity of the source content. Recent 2D image editing methods have achieved this balance by leveraging the stochastic latent encoded in the generative process of diffusion models. To extend the editing capabilities of diffusion models shown in pixel space to parameter space, we reformulate the 2D image editing method into an optimization form named PDS. PDS matches the stochastic latents of the source and the target, enabling the sampling of targets in diverse parameter spaces that align with a desired attribute while maintaining the source's identity. We demonstrate that this optimization resembles running a generative process with the target attribute, but aligning this process with the trajectory of the source's generative process. Extensive editing results in Neural Radiance Fields and Scalable Vector Graphics representations demonstrate that PDS is capable of sampling targets to fulfill the aforementioned balance across various parameter spaces.
Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
Authors: Saman Motamed, Danda Pani Paudel, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.13833
Pdf link: https://arxiv.org/pdf/2311.13833
Abstract Diffusion models have revolutionized generative content creation and text-to-image (T2I) diffusion models in particular have increased the creative freedom of users by allowing scene synthesis using natural language. T2I models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes. However, inverting more general concepts that go beyond object appearance and style (adjectives and verbs) through natural language, remains a challenge. Two key characteristics of these concepts contribute to the limitations of current inversion methods. 1) Adjectives and verbs are entangled with nouns (subject) and can hinder appearance-based inversion methods, where the subject appearance leaks into the concept embedding and 2) describing such concepts often extends beyond single word embeddings (being frozen in ice, walking on a tightrope, etc.) that current methods do not handle. In this study, we introduce Lego, a textual inversion method designed to invert subject entangled concepts from a few example images. Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step and employs a Context Loss that guides the inversion of single/multi-embedding concepts. In a thorough user study, Lego-generated concepts were preferred over 70% of the time when compared to the baseline. Additionally, visual question answering using a large language model suggested Lego-generated concepts are better aligned with the text description of the concept.
Adversarial defense based on distribution transfer
Authors: Jiahao Chen, Diqun Yan, Li Dong
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.13841
Pdf link: https://arxiv.org/pdf/2311.13841
Abstract The presence of adversarial examples poses a significant threat to deep learning models and their applications. Existing defense methods provide certain resilience against adversarial examples, but often suffer from decreased accuracy and generalization performance, making it challenging to achieve a trade-off between robustness and generalization. To address this, our paper interprets the adversarial example problem from the perspective of sample distribution and proposes a defense method based on distribution shift, leveraging the distribution transfer capability of a diffusion model for adversarial defense. The core idea is to exploit the discrepancy between normal and adversarial sample distributions to achieve adversarial defense using a pretrained diffusion model. Specifically, an adversarial sample undergoes a forward diffusion process, moving away from the source distribution, followed by a reverse process guided by the protected model (victim model) output to map it back to the normal distribution. Experimental evaluations on CIFAR10 and ImageNet30 datasets are conducted, comparing with adversarial training and input preprocessing methods. For infinite-norm attacks with 8/255 perturbation, accuracy rates of 78.1% and 83.5% are achieved, respectively. For 2-norm attacks with 128/255 perturbation, accuracy rates are 74.3% and 82.5%. Additional experiments considering perturbation amplitude, diffusion iterations, and adaptive attacks also validate the effectiveness of the proposed method. Results demonstrate that even when the attacker has knowledge of the defense, the proposed distribution-based method effectively withstands adversarial examples. It fills the gaps of traditional approaches, restoring high-quality original samples and showcasing superior performance in model robustness and generalization.
Touring sampling with pushforward maps
Authors: Vivien Cabannes, Charles Arnal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.13845
Pdf link: https://arxiv.org/pdf/2311.13845
Abstract The number of sampling methods could be daunting for a practitioner looking to cast powerful machine learning methods to their specific problem. This paper takes a theoretical stance to review and organize many sampling approaches in the ``generative modeling'' setting, where one wants to generate new data that are similar to some training examples. By revealing links between existing methods, it might prove useful to overcome some of the current challenges in sampling with diffusion models, such as long inference time due to diffusion simulation, or the lack of diversity in generated samples.
A reduced basis warm-start iterative solver for the parameterized systems
Authors: Shijin Hou, Yanlai Chen, Yinhua Xia
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13862
Pdf link: https://arxiv.org/pdf/2311.13862
Abstract The reduced basis methods (RBMs) are widely use in fast solution of the parametrized parametrized linear systems. In some problems lacking good order-reduction condition, only the RBMs are not competent to give a high-precision solution with an affordable computational cost of the offline stage. To develop a high-precision solution and balance the offline and online cost, we explore a reasonable and effective framework for accelerating the iterative methods that is based on the RBMs. Firstly, the highly efficient reduced basis (RB) solver is used as the generation tool of accurate initial values. This data-driven initialization method could provide a warm start for the iterative methods. Secondly, we analyze the further acceleration of the RBMs as a preconditioner. For the purpose of high-precision solution, the RBM-preconditioner not only fail to accelerate the convergence but also need to pay more cost for the overuse of the RBMs. Two numerical test on 3D steady-state diffusion equations for two- and six-dimensional parameter space are presented to demonstrate the capability and efficiency of the RBM-initialized pure high-fidelity iterative methods.
An Application of Reduced Basis Methods to Core Computation in APOLLO3
Authors: Yonah Conjungo Taumhas (SERMA), Geneviève Dusson (LMB), Virginie Ehrlacher (CERMICS, MATHERIALS), Tony Lelièvre (CERMICS, MATHERIALS), François Madiot (SERMA)
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13902
Pdf link: https://arxiv.org/pdf/2311.13902
Abstract In the aim of reducing the computational cost of the resolution of parameter-dependent eigenvalue problems, a model order reduction (MOR) procedure is proposed. We focus on the case of non-self-adjoint generalized eigenvalue problems, such as the stationary multigroup neutron diffusion equations. The method lies in an approximation of the manifold of solutions using a Proper Orthogonal Decomposition approach. The numerical method is composed of two stages. In the offline stage, we build a reduced space which approximates the manifold. In the online stage, for any given new set of parameters, we solve a reduced problem on the reduced space within a much smaller computational time than the required time to solve the high-fidelity problem. This method is applied to core computations in the APOLLO3 code.
A comparison of Algebraic Multigrid Bidomain solvers on hybrid CPU-GPU architectures
Authors: Edoardo Centofanti, Simone Scacchi
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.13914
Pdf link: https://arxiv.org/pdf/2311.13914
Abstract The numerical simulation of cardiac electrophysiology is a highly challenging problem in scientific computing. The Bidomain system is the most complete mathematical model of cardiac bioelectrical activity. It consists of an elliptic and a parabolic partial differential equation (PDE), of reaction-diffusion type, describing the spread of electrical excitation in the cardiac tissue. The two PDEs are coupled with a stiff system of ordinary differential equations (ODEs), representing ionic currents through the cardiac membrane. Developing efficient and scalable preconditioners for the linear systems arising from the discretization of such computationally challenging model is crucial in order to reduce the computational costs required by the numerical simulations of cardiac electrophysiology. In this work, focusing on the Bidomain system as a model problem, we have benchmarked two popular implementations of the Algebraic Multigrid (AMG) preconditioner embedded in the PETSc library and we have studied the performance on the calibration of specific parameters. We have conducted our analysis on modern HPC architectures, performing scalability tests on multi-core and multi-GPUs setttings. The results have shown that, for our problem, although scalability is verified on CPUs, GPUs are the optimal choice, since they yield the best performance in terms of solution time.
Continual Learning of Diffusion Models with Generative Distillation
Authors: Sergi Masip, Pau Rodriguez, Tinne Tuytelaars, Gido M. van de Ven
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14028
Pdf link: https://arxiv.org/pdf/2311.14028
Abstract Diffusion models are powerful generative models that achieve state-of-the-art performance in tasks such as image synthesis. However, training them demands substantial amounts of data and computational resources. Continual learning would allow for incrementally learning new tasks and accumulating knowledge, thus reusing already trained models would be possible. One potentially suitable approach is generative replay, where a copy of a generative model trained on previous tasks produces synthetic data that are interleaved with data from the current task. However, standard generative replay applied to diffusion models results in a catastrophic loss in denoising capabilities. In this paper, we propose generative distillation, an approach that distils the entire reverse process of a diffusion model. We demonstrate that our approach significantly improves the continual learning performance of generative replay with only a moderate increase in the computational costs.
RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation
Authors: Yiming Wang, Yuxuan Song, Minkai Xu, Rui Wang, Hao Zhou, Weiying Ma
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2311.14077
Pdf link: https://arxiv.org/pdf/2311.14077
Abstract Retrosynthesis poses a fundamental challenge in biopharmaceuticals, aiming to aid chemists in finding appropriate reactant molecules and synthetic pathways given determined product molecules. With the reactant and product represented as 2D graphs, retrosynthesis constitutes a conditional graph-to-graph generative task. Inspired by the recent advancements in discrete diffusion models for graph generation, we introduce Retrosynthesis Diffusion (RetroDiff), a novel diffusion-based method designed to address this problem. However, integrating a diffusion-based graph-to-graph framework while retaining essential chemical reaction template information presents a notable challenge. Our key innovation is to develop a multi-stage diffusion process. In this method, we decompose the retrosynthesis procedure to first sample external groups from the dummy distribution given products and then generate the external bonds to connect the products and generated groups. Interestingly, such a generation process is exactly the reverse of the widely adapted semi-template retrosynthesis procedure, i.e. from reaction center identification to synthon completion, which significantly reduces the error accumulation. Experimental results on the benchmark have demonstrated the superiority of our method over all other semi-template methods.
ACT: Adversarial Consistency Models
Authors: Fei Kong, Jinhao Duan, Lichao Sun, Hao Cheng, Renjing Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14097
Pdf link: https://arxiv.org/pdf/2311.14097
Abstract Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions. As timestep increases, the upper bound accumulates previous consistency training losses. Therefore, larger batch sizes are needed to reduce both current and accumulated losses. We propose Adversarial Consistency Training (ACT), which directly minimizes the Jensen-Shannon (JS) divergence between distributions at each timestep using a discriminator. Theoretically, ACT enhances generation quality, and convergence. By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$\times$64, retains zero-shot image inpainting capabilities, and uses less than $1/6$ of the original batch size and fewer than $1/2$ of the model parameters and training steps compared to the baseline method, this leads to a substantial reduction in resource consumption.
HACD: Hand-Aware Conditional Diffusion for Monocular Hand-Held Object Reconstruction
Authors: Bowen Fu, Yan Di, Chenyangguang Zhang, Gu Wang, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14189
Pdf link: https://arxiv.org/pdf/2311.14189
Abstract Reconstructing hand-held objects from a single RGB image without known 3D object templates, category prior, or depth information is a vital yet challenging problem in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, which make it hard to account for the uncertainties introduced by hand- and self-occlusion, we employ a probabilistic point cloud denoising diffusion model to tackle the above challenge. In this work, we present Hand-Aware Conditional Diffusion for monocular hand-held object reconstruction (HACD), modeling the hand-object interaction in two aspects. First, we introduce hand-aware conditioning to model hand-object interaction from both semantic and geometric perspectives. Specifically, a unified hand-object semantic embedding compensates for the 2D local feature deficiency induced by hand occlusion, and a hand articulation embedding further encodes the relationship between object vertices and hand joints. Second, we propose a hand-constrained centroid fixing scheme, which utilizes hand vertices priors to restrict the centroid deviation of partially denoised point cloud during diffusion and reverse process. Removing the centroid bias interference allows the diffusion models to focus on the reconstruction of shape, thus enhancing the stability and precision of local feature projection. Experiments on the synthetic ObMan dataset and two real-world datasets, HO3D and MOW, demonstrate our approach surpasses all existing methods by a large margin.
Image Super-Resolution with Text Prompt Diffusion
Authors: Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai Chen, Xiaokang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14282
Pdf link: https://arxiv.org/pdf/2311.14282
Abstract Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional priors. Inspired by advancements in multi-modal methods and text prompt image processing, we introduce text prompts to image SR to provide degradation priors. Specifically, we first design a text-image generation pipeline to integrate text into SR dataset through the text degradation representation and degradation model. The text representation applies a discretization manner based on the binning method to describe the degradation abstractly. This representation method can also maintain the flexibility of language. Meanwhile, we propose the PromptSR to realize the text prompt SR. The PromptSR employs the diffusion model and the pre-trained language model (e.g., T5 and CLIP). We train the model on the generated text-image dataset. Extensive experiments indicate that introducing text prompts into image SR, yields excellent results on both synthetic and real-world images. Code: https://github.com/zhengchen1999/PromptSR.
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Authors: Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14284
Pdf link: https://arxiv.org/pdf/2311.14284
Abstract Text-to-image (T2I) models have recently experienced rapid development, achieving astonishing performance in terms of fidelity and textual alignment capabilities. However, given a long paragraph (up to 512 words), these generation models still struggle to achieve strong alignment and are unable to generate images depicting complex scenes. In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation. At its core is using a large language model (e.g., Llama V2) to encode long-form text, followed by fine-tuning with LORA to alignthe text-image feature spaces in the generation task. To facilitate the training of long-text semantic alignment, we also curated a high-quality paragraph-image pair dataset, namely ParaImage. This dataset contains a small amount of high-quality, meticulously annotated data, and a large-scale synthetic dataset with long text descriptions being generated using a vision-language model. Experiments demonstrate that ParaDiffusion outperforms state-of-the-art models (SD XL, DeepFloyd IF) on ViLG-300 and ParaPrompts, achieving up to 15% and 45% human voting rate improvements for visual appeal and text faithfulness, respectively. The code and dataset will be released to foster community research on long-text alignment.
Decouple Content and Motion for Conditional Image-to-Video Generation
Authors: Cuifeng Shen, Yulu Gan, Chen Chen, Xiongwei Zhu, Lele Cheng, Jinzhi Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14294
Pdf link: https://arxiv.org/pdf/2311.14294
Abstract The goal of conditional image-to-video (cI2V) generation is to create a believable new video by beginning with the condition, i.e., one image and text.The previous cI2V generation methods conventionally perform in RGB pixel space, with limitations in modeling motion consistency and visual continuity. Additionally, the efficiency of generating videos in pixel space is quite low.In this paper, we propose a novel approach to address these challenges by disentangling the target RGB pixels into two distinct components: spatial content and temporal motions. Specifically, we predict temporal motions which include motion vector and residual based on a 3D-UNet diffusion model. By explicitly modeling temporal motions and warping them to the starting image, we improve the temporal consistency of generated videos. This results in a reduction of spatial redundancy, emphasizing temporal details. Our proposed method achieves performance improvements by disentangling content and motion, all without introducing new structural complexities to the model. Extensive experiments on various datasets confirm our approach's superior performance over the majority of state-of-the-art methods in both effectiveness and efficiency.
An Adaptive Fast-Multipole-Accelerated Hybrid Boundary Integral Equation Method for Accurate Diffusion Curves
Authors: Seungbae Bang, Kirill Serkh, Oded Stein, Alec Jacobson
Subjects: Numerical Analysis (math.NA); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.14312
Pdf link: https://arxiv.org/pdf/2311.14312
Abstract In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. Previous applications of the boundary element method to diffusion curves have relied on polygonal approximations, which either forfeit the high-order smoothness of B\'ezier curves, or, when the polygonal approximation is extremely detailed, result in large and costly systems of equations that must be solved. In this paper, we utilize the boundary integral equation method to accurately and efficiently solve the underlying partial differential equation. Given a desired resolution and viewport, we then interpolate this solution and use the boundary element method to render it. We couple this hybrid approach with the fast multipole method on a non-uniform quadtree for efficient computation. Furthermore, we introduce an adaptive strategy to enable truly scalable infinite-resolution diffusion curves.
Numerical methods and regularity properties for viscosity solutions of nonlocal in space and time diffusion equations
Authors: Félix del Teso, Łukasz Płociniczak
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2311.14317
Pdf link: https://arxiv.org/pdf/2311.14317
Abstract We consider a general family of nonlocal in space and time diffusion equations with space-time dependent diffusivity and prove convergence of finite difference schemes in the context of viscosity solutions under very mild conditions. The proofs, based on regularity properties and compactness arguments on the numerical solution, allow to inherit a number of interesting results for the limit equation. More precisely, assuming H\"older regularity only on the initial condition, we prove convergence of the scheme, space-time H\"older regularity of the solution depending on the fractional orders of the operators, as well as specific blow up rates of the first time derivative. Finally, using the obtained regularity results, we are able to prove orders of convergence of the scheme in some cases. These results are consistent with previous studies. The schemes' performance is further numerically verified using both constructed exact solutions and realistic examples. Our experiments show that multithreaded implementation yields an efficient method to solve nonlocal equations numerically.
Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion
Authors: Minshan Xie, Hanyuan Liu, Chengze Li, Tien-Tsin Wong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14343
Pdf link: https://arxiv.org/pdf/2311.14343
Abstract Text-guided video-to-video stylization transforms the visual appearance of a source video to a different appearance guided on textual prompts. Existing text-guided image diffusion models can be extended for stylized video synthesis. However, they struggle to generate videos with both highly detailed appearance and temporal consistency. In this paper, we propose a synchronized multi-frame diffusion framework to maintain both the visual details and the temporal consistency. Frames are denoised in a synchronous fashion, and more importantly, information of different frames is shared since the beginning of the denoising process. Such information sharing ensures that a consensus, in terms of the overall structure and color distribution, among frames can be reached in the early stage of the denoising process before it is too late. The optical flow from the original video serves as the connection, and hence the venue for information sharing, among frames. We demonstrate the effectiveness of our method in generating high-quality and diverse results in extensive experiments. Our method shows superior qualitative and quantitative results compared to state-of-the-art video editing methods.
MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation
Authors: Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14494
Pdf link: https://arxiv.org/pdf/2311.14494
Abstract We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models by incorporating additional input conditions, e.g. edge maps. Our approach enables the generation of controllable multi-view images and view-consistent 3D content. To achieve controllable multi-view image generation, we leverage MVDream as our base model, and train a new neural network module as additional plugin for end-to-end task-specific condition learning. To precisely control the shapes and views of generated images, we innovatively propose a new conditioning mechanism that predicts an embedding encapsulating the input spatial and view conditions, which is then injected to the network globally. Once MVControl is trained, score-distillation (SDS) loss based optimization can be performed to generate 3D content, in which process we propose to use a hybrid diffusion prior. The hybrid prior relies on a pre-trained Stable-Diffusion network and our trained MVControl for additional guidance. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Authors: Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14521
Pdf link: https://arxiv.org/pdf/2311.14521
Abstract 3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation. GaussianEditor enhances precision and control in editing through our proposed Gaussian semantic tracing, which traces the editing target throughout the training process. Additionally, we propose Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. Project Page: https://buaacyw.github.io/gaussian-editor/
ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model
Authors: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14542
Pdf link: https://arxiv.org/pdf/2311.14542
Abstract Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework inspired by the human generation system. Unlike traditional diffusion models with opaque denoising steps, our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image. This not only enhances overall performance but also enables robust editing and interaction capabilities. Each stage is meticulously formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM). Extensive experiments on datasets like LSUN-Churches and COCO validate our approach, consistently outperforming existing methods. ToddlerDiffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating three times faster with a 3.76 times smaller architecture. Our source code is provided in the supplementary material and will be publicly accessible.
Animate124: Animating One Image to 4D Dynamic Scene
Authors: Yuyang Zhao, Zhiwen Yan, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14603
Pdf link: https://arxiv.org/pdf/2311.14603
Abstract We introduce Animate124 (Animate-one-image-to-4D), the first work to animate a single in-the-wild image into 3D video through textual motion descriptions, an underexplored problem with significant applications. Our 4D generation leverages an advanced 4D grid dynamic Neural Radiance Field (NeRF) model, optimized in three distinct stages using multiple diffusion priors. Initially, a static model is optimized using the reference image, guided by 2D and 3D diffusion priors, which serves as the initialization for the dynamic NeRF. Subsequently, a video diffusion model is employed to learn the motion specific to the subject. However, the object in the 3D videos tends to drift away from the reference image over time. This drift is mainly due to the misalignment between the text prompt and the reference image in the video diffusion model. In the final stage, a personalized diffusion prior is therefore utilized to address the semantic drift. As the pioneering image-text-to-4D generation framework, our method demonstrates significant advancements over existing baselines, evidenced by comprehensive quantitative and qualitative assessments.
Received Signal and Channel Parameter Estimation in Molecular Communications
Authors: O. Tansel Baydas, Ozgur B. Akan
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.14621
Pdf link: https://arxiv.org/pdf/2311.14621
Abstract Molecular communication (MC) is a paradigm that employs molecules as information transmitters, hence, requiring unconventional transceivers and detection techniques for the Internet of Bio-Nano Things (IoBNT). In this study, we provide a novel MC model that incorporates a spherical transmitter and receiver with partial absorption. This model offers a more realistic representation than receiver architectures in literature, e.g. passive or entirely absorbing configurations. An optimization-based technique utilizing particle swarm optimization (PSO) is employed to accurately estimate the cumulative number of molecules received. This technique yields nearly constant correction parameters and demonstrates a significant improvement of 5 times in terms of root mean square error (RMSE). The estimated channel model provides an approximate analytical impulse response; hence, it is used for estimating channel parameters such as distance, diffusion coefficient, or a combination of both. We apply iterative maximum likelihood estimation (MLE) for the parameter estimation, which gives consistent errors compared to the estimated Cramer-Rao Lower Bound (CLRB).
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization
Authors: Ruoyu Zhao, Mingrui Zhu, Shiyin Dong, Nannan Wang, Xinbo Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14631
Pdf link: https://arxiv.org/pdf/2311.14631
Abstract We propose CatVersion, an inversion-based method that learns the personalized concept through a handful of examples. Subsequently, users can utilize text prompts to generate images that embody the personalized concept, thereby achieving text-to-image personalization. In contrast to existing approaches that emphasize word embedding learning or parameter fine-tuning for the diffusion model, which potentially causes concept dilution or overfitting, our method concatenates embeddings on the feature-dense space of the text encoder in the diffusion model to learn the gap between the personalized concept and its base class, aiming to maximize the preservation of prior knowledge in diffusion models while restoring the personalized concepts. To this end, we first dissect the text encoder's integration in the image generation process to identify the feature-dense space of the encoder. Afterward, we concatenate embeddings on the Keys and Values in this space to learn the gap between the personalized concept and its base class. In this way, the concatenated embeddings ultimately manifest as a residual on the original attention output. To more accurately and unbiasedly quantify the results of personalized image generation, we improve the CLIP image alignment score based on masks. Qualitatively and quantitatively, CatVersion helps to restore personalization concepts more faithfully and enables more robust editing.
Keyword: adaptive

Cross-layer scheme for low latency multiple description video streaming over Vehicular Ad-hoc NETworks (VANETs)
Authors: Mohamed Aymen Labiod, Mohamed Gharbi, Francois-Xavier Coudoux, Patrick Corlay, Noureddine Doghmane
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Networking and Internet Architecture (cs.NI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.13603
Pdf link: https://arxiv.org/pdf/2311.13603
Abstract There is nowadays a growing demand in vehicular communications for real-time applications requiring video assistance. The new state-of-the-art high-efficiency video coding (HEVC) standard is very promising for real-time video streaming. It offers high coding efficiency, as well as dedicated low delay coding structures. Among these, the all intra (AI) coding structure guarantees minimal coding time at the expense of higher video bitrates, which therefore penalizes transmission performances. In this work, we propose an original cross-layer system in order to enhance received video quality in vehicular communications. The system is low complex and relies on a multiple description coding (MDC) approach. It is based on an adaptive mapping mechanism applied at the IEEE 802.11p standard medium access control (MAC) layer. Simulation results in a realistic vehicular environment demonstrate that for low delay video communications, the proposed method provides significant video quality improvements on the receiver side.
Sample as You Infer: Predictive Coding With Langevin Dynamics
Authors: Umais Zahid, Qinghai Guo, Zafeirios Fountas
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2311.13664
Pdf link: https://arxiv.org/pdf/2311.13664
Abstract We present a novel algorithm for parameter learning in generic deep generative models that builds upon the predictive coding (PC) framework of computational neuroscience. Our approach modifies the standard PC algorithm to bring performance on-par and exceeding that obtained from standard variational auto-encoder (VAE) training. By injecting Gaussian noise into the PC inference procedure we re-envision it as an overdamped Langevin sampling, which facilitates optimisation with respect to a tight evidence lower bound (ELBO). We improve the resultant encoder-free training method by incorporating an encoder network to provide an amortised warm-start to our Langevin sampling and test three different objectives for doing so. Finally, to increase robustness to the sampling step size and reduce sensitivity to curvature, we validate a lightweight and easily computable form of preconditioning, inspired by Riemann Manifold Langevin and adaptive optimizers from the SGD literature. We compare against VAEs by training like-for-like generative models using our technique against those trained with standard reparameterisation-trick-based ELBOs. We observe our method out-performs or matches performance across a number of metrics, including sample quality, while converging in a fraction of the number of SGD training iterations.
A Unified Framework for Fair Spectral Clustering With Effective Graph Learning
Authors: Xiang Zhang, Qiao Wang
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2311.13766
Pdf link: https://arxiv.org/pdf/2311.13766
Abstract We consider the problem of spectral clustering under group fairness constraints, where samples from each sensitive group are approximately proportionally represented in each cluster. Traditional fair spectral clustering (FSC) methods consist of two consecutive stages, i.e., performing fair spectral embedding on a given graph and conducting $k$means to obtain discrete cluster labels. However, in practice, the graph is usually unknown, and we need to construct the underlying graph from potentially noisy data, the quality of which inevitably affects subsequent fair clustering performance. Furthermore, performing FSC through separate steps breaks the connections among these steps, leading to suboptimal results. To this end, we first theoretically analyze the effect of the constructed graph on FSC. Motivated by the analysis, we propose a novel graph construction method with a node-adaptive graph filter to learn graphs from noisy data. Then, all independent stages of conventional FSC are integrated into a single objective function, forming an end-to-end framework that inputs raw data and outputs discrete cluster labels. An algorithm is developed to jointly and alternately update the variables in each stage. Finally, we conduct extensive experiments on synthetic, benchmark, and real data, which show that our model is superior to state-of-the-art fair clustering methods.
Adversarial defense based on distribution transfer
Authors: Jiahao Chen, Diqun Yan, Li Dong
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.13841
Pdf link: https://arxiv.org/pdf/2311.13841
Abstract The presence of adversarial examples poses a significant threat to deep learning models and their applications. Existing defense methods provide certain resilience against adversarial examples, but often suffer from decreased accuracy and generalization performance, making it challenging to achieve a trade-off between robustness and generalization. To address this, our paper interprets the adversarial example problem from the perspective of sample distribution and proposes a defense method based on distribution shift, leveraging the distribution transfer capability of a diffusion model for adversarial defense. The core idea is to exploit the discrepancy between normal and adversarial sample distributions to achieve adversarial defense using a pretrained diffusion model. Specifically, an adversarial sample undergoes a forward diffusion process, moving away from the source distribution, followed by a reverse process guided by the protected model (victim model) output to map it back to the normal distribution. Experimental evaluations on CIFAR10 and ImageNet30 datasets are conducted, comparing with adversarial training and input preprocessing methods. For infinite-norm attacks with 8/255 perturbation, accuracy rates of 78.1% and 83.5% are achieved, respectively. For 2-norm attacks with 128/255 perturbation, accuracy rates are 74.3% and 82.5%. Additional experiments considering perturbation amplitude, diffusion iterations, and adaptive attacks also validate the effectiveness of the proposed method. Results demonstrate that even when the attacker has knowledge of the defense, the proposed distribution-based method effectively withstands adversarial examples. It fills the gaps of traditional approaches, restoring high-quality original samples and showcasing superior performance in model robustness and generalization.
Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression
Authors: Shiyu Qin, Yimin Zhou, Jinpeng Wang, Bin Chen, Baoyi An, Tao Dai, Shu-Tao Xia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.13846
Pdf link: https://arxiv.org/pdf/2311.13846
Abstract In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression. Our approach covers a wide range of compression rates with the assistance of the Layer-adaptive Prompt Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively, which are fed as additional information into the Swin Transformer layer of a pre-trained transformer-based image compression model to affect the allocation of attention region and the bits, which in turn changes the target compression ratio of the model. To ensure the network is more lightweight, we involves the integration of prompt networks with less convolutional layers. Exhaustive experiments show that compared to methods based on multiple models, which are optimized separately for different target rates, the proposed method arrives at the same performance with 80% savings in parameter storage and 90% savings in datasets. Meanwhile, our model outperforms all current variable bitrate image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed bitrate image compression methods trained from scratch.
Perceptual Image Compression with Cooperative Cross-Modal Side Information
Authors: Shiyu Qin, Bin Chen, Yujun Huang, Baoyi An, Tao Dai, Shu-Tao Via
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.13847
Pdf link: https://arxiv.org/pdf/2311.13847
Abstract The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have been widely demonstrated in research. This begs the following question: How can we effectively transfer text-level semantic dependencies to help image compression, which is only available to the decoder? In this work, we propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features. This is done by predicting a semantic mask to guide the learned text-adaptive affine transformation at the pixel level. Furthermore, we design a text-conditional generative adversarial networks to improve the perceptual quality of reconstructed images. Extensive experiments involving four datasets and ten image quality assessment metrics demonstrate that the proposed approach achieves superior results in terms of rate-perception trade-off and semantic distortion.
Parameter Exchange for Robust Dynamic Domain Generalization
Authors: Luojun Lin, Zhifeng Shen, Zhishu Sun, Yuanlong Yu, Lei Zhang, Weijie Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.13928
Pdf link: https://arxiv.org/pdf/2311.13928
Abstract Agnostic domain shift is the main reason of model degradation on the unknown target domains, which brings an urgent need to develop Domain Generalization (DG). Recent advances at DG use dynamic networks to achieve training-free adaptation on the unknown target domains, termed Dynamic Domain Generalization (DDG), which compensates for the lack of self-adaptability in static models with fixed weights. The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively. Based on the existing arts, in this work, we try to push the limits of DDG by disentangling the static and dynamic components more thoroughly from an optimization perspective. Our main consideration is that we can enable the static component to learn domain-invariant features more comprehensively by augmenting the domain-specific information. As a result, the more comprehensive domain-invariant features learned by the static component can then enforce the dynamic component to focus more on learning adaptive domain-specific features. To this end, we propose a simple yet effective Parameter Exchange (PE) method to perturb the combination between the static and dynamic components. We optimize the model using the gradients from both the perturbed and non-perturbed feed-forward jointly to implicitly achieve the aforementioned disentanglement. In this way, the two components can be optimized in a mutually-beneficial manner, which can resist the agnostic domain shifts and improve the self-adaptability on the unknown target domain. Extensive experiments show that PE can be easily plugged into existing dynamic networks to improve their generalization ability without bells and whistles.
5G Edge Vision: Wearable Assistive Technology for People with Blindness and Low Vision
Authors: Tommy Azzino, Marco Mezzavilla, Sundeep Rangan, Yao Wang, John-Ross Rizzo
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.13939
Pdf link: https://arxiv.org/pdf/2311.13939
Abstract In an increasingly visual world, people with blindness and low vision (pBLV) face substantial challenges in navigating their surroundings and interpreting visual information. From our previous work, VIS4ION is a smart wearable that helps pBLV in their day-to-day challenges. It enables multiple artificial intelligence (AI)-based microservices such as visual scene processing, navigation, and vision-language inference. These microservices require powerful computational resources and, in some cases, stringent inference times, hence the need to offload computation to edge servers. This paper introduces a novel video streaming platform that improves the capabilities of VIS4ION by providing real-time support of the microservices at the network edge. When video is offloaded wirelessly to the edge, the time-varying nature of the wireless network requires the use of adaptation strategies for a seamless video service. We demonstrate the performance of an adaptive real-time video streaming platform through experimentation with an open-source 5G deployment based on open air interface (OAI). The experiments demonstrate the ability to provide the microservices robustly in time-varying network loads.
AdapterFL: Adaptive Heterogeneous Federated Learning for Resource-constrained Mobile Computing Systems
Authors: Ruixuan Liu, Ming Hu, Zeke Xia, Jun Xia, Pengyu Zhang, Yihao Huang, Yang Liu, Mingsong Chen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14037
Pdf link: https://arxiv.org/pdf/2311.14037
Abstract Federated Learning (FL) enables collaborative learning of large-scale distributed clients without data sharing. However, due to the disparity of computing resources among massive mobile computing devices, the performance of traditional homogeneous model-based Federated Learning (FL) is seriously limited. On the one hand, to achieve model training in all the diverse clients, mobile computing systems can only use small low-performance models for collaborative learning. On the other hand, devices with high computing resources cannot train a high-performance large model with their insufficient raw data. To address the resource-constrained problem in mobile computing systems, we present a novel heterogeneous FL approach named AdapterFL, which uses a model reassemble strategy to facilitate collaborative training of massive heterogeneous mobile devices adaptively. Specifically, we select multiple candidate heterogeneous models based on the computing performance of massive mobile devices and then divide each heterogeneous model into two partitions. By reassembling the partitions, we can generate models with varied sizes that are combined by the partial parameters of the large model with the partial parameters of the small model. Using these reassembled models for FL training, we can train the partial parameters of the large model using low-performance devices. In this way, we can alleviate performance degradation in large models due to resource constraints. The experimental results show that AdapterFL can achieve up to 12\% accuracy improvement compared to the state-of-the-art heterogeneous federated learning methods in resource-constrained scenarios.
Class Balanced Dynamic Acquisition for Domain Adaptive Semantic Segmentation using Active Learning
Authors: Marc Schachtsiek, Simone Rossi, Thomas Hannagan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.14146
Pdf link: https://arxiv.org/pdf/2311.14146
Abstract Domain adaptive active learning is leading the charge in label-efficient training of neural networks. For semantic segmentation, state-of-the-art models jointly use two criteria of uncertainty and diversity to select training labels, combined with a pixel-wise acquisition strategy. However, we show that such methods currently suffer from a class imbalance issue which degrades their performance for larger active learning budgets. We then introduce Class Balanced Dynamic Acquisition (CBDA), a novel active learning method that mitigates this issue, especially in high-budget regimes. The more balanced labels increase minority class performance, which in turn allows the model to outperform the previous baseline by 0.6, 1.7, and 2.4 mIoU for budgets of 5%, 10%, and 20%, respectively. Additionally, the focus on minority classes leads to improvements of the minimum class performance of 0.5, 2.9, and 4.6 IoU respectively. The top-performing model even exceeds the fully supervised baseline, showing that a more balanced label than the entire ground truth can be beneficial.
On the convergence of adaptive approximations for stochastic differential equations
Authors: James Foster
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2311.14201
Pdf link: https://arxiv.org/pdf/2311.14201
Abstract In this paper, we study numerical approximations for stochastic differential equations (SDEs) that use adaptive step sizes. In particular, we consider a general setting where decisions to reduce step sizes are allowed to depend on the future trajectory of the underlying Brownian motion. Since these adaptive step sizes may not be previsible, the standard mean squared error analysis cannot be directly applied to show that the numerical method converges to the solution of the SDE. Building upon the pioneering work of Gaines and Lyons, we shall instead use rough path theory to establish convergence for a wide class of adaptive numerical methods on general Stratonovich SDEs (with sufficiently smooth vector fields). To the author's knowledge, this is the first error analysis applicable to standard solvers, such as the Milstein and Heun methods, with non-previsible step sizes. In our analysis, we require the sequence of adaptive step sizes to be nested and the SDE solver to have unbiased "L\'evy area" terms in its Taylor expansion. We conjecture that for adaptive SDE solvers more generally, convergence is still possible provided the method does not introduce "L\'evy area bias". We present a simple example where the step size control can skip over previously considered times, resulting in the numerical method converging to an incorrect limit (i.e. not the Stratonovich SDE). Finally, we conclude with a numerical experiment demonstrating a newly introduced adaptive scheme and showing the potential improvements in accuracy when step sizes are allowed to be non-previsible.
Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects
Authors: Cristina Tavares, Nathalia Nascimento, Paulo Alencar, Donald Cowan
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.14214
Pdf link: https://arxiv.org/pdf/2311.14214
Abstract Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and non-functional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.
AdaMedGraph: Adaboosting Graph Neural Networks for Personalized Medicine
Authors: Jie Lian, Xufang Luo, Caihua Shan, Dongqi Han, Varut Vardhanabhuti, Dongsheng Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14304
Pdf link: https://arxiv.org/pdf/2311.14304
Abstract Precision medicine tailored to individual patients has gained significant attention in recent times. Machine learning techniques are now employed to process personalized data from various sources, including images, genetics, and assessments. These techniques have demonstrated good outcomes in many clinical prediction tasks. Notably, the approach of constructing graphs by linking similar patients and then applying graph neural networks (GNNs) stands out, because related information from analogous patients are aggregated and considered for prediction. However, selecting the appropriate edge feature to define patient similarity and construct the graph is challenging, given that each patient is depicted by high-dimensional features from diverse sources. Previous studies rely on human expertise to select the edge feature, which is neither scalable nor efficient in pinpointing crucial edge features for complex diseases. In this paper, we propose a novel algorithm named \ours, which can automatically select important features to construct multiple patient similarity graphs, and train GNNs based on these graphs as weak learners in adaptive boosting. \ours{} is evaluated on two real-world medical scenarios and shows superiors performance.
RelJoin: Relative-cost-based Selection of Distributed Join Methods for Query Plan Optimization
Authors: F. Liang, F.C.M. Lau, H. Cui, Y. Li, B. Lin, C. Li, X. Hu
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.14311
Pdf link: https://arxiv.org/pdf/2311.14311
Abstract Selecting appropriate distributed join methods for logical join operations in a query plan is crucial for the performance of data-intensive scalable computing (DISC). Different network communication patterns in the data exchange phase generate varying network communication workloads and significantly affect the distributed join performance. However, most cost-based query optimizers focus on the local computing cost and do not precisely model the network communication cost. We propose a cost model for various distributed join methods to optimize join queries in DISC platforms. Our method precisely measures the network and local computing workloads in different execution phases, using information on the size and cardinality statistics of datasets and cluster join parallelism. Our cost model reveals the importance of the relative size of the joining datasets. We implement an efficient distributed join selection strategy, known as RelJoin in SparkSQL, which is an industry-prevalent distributed data processing framework. RelJoin uses runtime adaptive statistics for accurate cost estimation and selects optimal distributed join methods for logical joins to optimize the physical query plan. The evaluation results on the TPC-DS benchmark show that RelJoin performs best in 62 of the 97 queries and can reduce the average query time by 21% compared with other strategies.
An Adaptive Fast-Multipole-Accelerated Hybrid Boundary Integral Equation Method for Accurate Diffusion Curves
Authors: Seungbae Bang, Kirill Serkh, Oded Stein, Alec Jacobson
Subjects: Numerical Analysis (math.NA); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.14312
Pdf link: https://arxiv.org/pdf/2311.14312
Abstract In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. Previous applications of the boundary element method to diffusion curves have relied on polygonal approximations, which either forfeit the high-order smoothness of B\'ezier curves, or, when the polygonal approximation is extremely detailed, result in large and costly systems of equations that must be solved. In this paper, we utilize the boundary integral equation method to accurately and efficiently solve the underlying partial differential equation. Given a desired resolution and viewport, we then interpolate this solution and use the boundary element method to render it. We couple this hybrid approach with the fast multipole method on a non-uniform quadtree for efficient computation. Furthermore, we introduce an adaptive strategy to enable truly scalable infinite-resolution diffusion curves.
Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling
Authors: Corentin Salaün, Xingchang Huang, Iliyan Georgiev, Niloy J. Mitra, Gurprit Singh
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.14468
Pdf link: https://arxiv.org/pdf/2311.14468
Abstract Machine learning problems rely heavily on stochastic gradient descent (SGD) for optimization. The effectiveness of SGD is contingent upon accurately estimating gradients from a mini-batch of data samples. Instead of the commonly used uniform sampling, adaptive or importance sampling reduces noise in gradient estimation by forming mini-batches that prioritize crucial data points. Previous research has suggested that data points should be selected with probabilities proportional to their gradient norm. Nevertheless, existing algorithms have struggled to efficiently integrate importance sampling into machine learning frameworks. In this work, we make two contributions. First, we present an algorithm that can incorporate existing importance functions into our framework. Second, we propose a simplified importance function that relies solely on the loss gradient of the output layer. By leveraging our proposed gradient estimation techniques, we observe improved convergence in classification and regression tasks with minimal computational overhead. We validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets.
MABFuzz: Multi-Armed Bandit Algorithms for Fuzzing Processors
Authors: Vasudev Gohil, Rahul Kande, Chen Chen, Ahmad-Reza Sadeghi, Jeyavijayan Rajendran
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.14594
Pdf link: https://arxiv.org/pdf/2311.14594
Abstract As the complexities of processors keep increasing, the task of effectively verifying their integrity and security becomes ever more daunting. The intricate web of instructions, microarchitectural features, and interdependencies woven into modern processors pose a formidable challenge for even the most diligent verification and security engineers. To tackle this growing concern, recently, researchers have developed fuzzing techniques explicitly tailored for hardware processors. However, a prevailing issue with these hardware fuzzers is their heavy reliance on static strategies to make decisions in their algorithms. To address this problem, we develop a novel dynamic and adaptive decision-making framework, MABFuzz, that uses multi-armed bandit (MAB) algorithms to fuzz processors. MABFuzz is agnostic to, and hence, applicable to, any existing hardware fuzzer. In the process of designing MABFuzz, we encounter challenges related to the compatibility of MAB algorithms with fuzzers and maximizing their efficacy for fuzzing. We overcome these challenges by modifying the fuzzing process and tailoring MAB algorithms to accommodate special requirements for hardware fuzzing. We integrate three widely used MAB algorithms in a state-of-the-art hardware fuzzer and evaluate them on three popular RISC-V-based processors. Experimental results demonstrate the ability of MABFuzz to cover a broader spectrum of processors' intricate landscapes and doing so with remarkable efficiency. In particular, MABFuzz achieves up to 308x speedup in detecting vulnerabilities and up to 5x speedup in achieving coverage compared to a state-of-the-art technique.
Keyword: quantization

Compact 3D Gaussian Representation for Radiance Field
Authors: Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, Eunbyung Park
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.13681
Pdf link: https://arxiv.org/pdf/2311.13681
Abstract Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. In our extensive experiments, we consistently show over 10$\times$ reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at https://maincold2.github.io/c3dgs/.
SySMOL: A Hardware-software Co-design Framework for Ultra-Low and Fine-Grained Mixed-Precision Neural Networks
Authors: Cyrus Zhou, Vaughn Richard, Pedro Savarese, Zachary Hassman, Michael Maire, Michael DiBrino, Yanjing Li
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.14114
Pdf link: https://arxiv.org/pdf/2311.14114
Abstract Recent advancements in quantization and mixed-precision techniques offer significant promise for improving the run-time and energy efficiency of neural networks. In this work, we further showed that neural networks, wherein individual parameters or activations can take on different precisions ranging between 1 and 4 bits, can achieve accuracies comparable to or exceeding the full-precision counterparts. However, the deployment of such networks poses numerous challenges, stemming from the necessity to manage and control the compute/communication/storage requirements associated with these extremely fine-grained mixed precisions for each piece of data. There is a lack of existing efficient hardware and system-level support tailored to these unique and challenging requirements. Our research introduces the first novel holistic hardware-software co-design approach for these networks, which enables a continuous feedback loop between hardware design, training, and inference to facilitate systematic design exploration. As a proof-of-concept, we illustrate this co-design approach by designing new, configurable CPU SIMD architectures tailored for these networks, tightly integrating the architecture with new system-aware training and inference techniques. We perform systematic design space exploration using this framework to analyze various tradeoffs. The design for mixed-precision networks that achieves optimized tradeoffs corresponds to an architecture that supports 1, 2, and 4-bit fixed-point operations with four configurable precision patterns, when coupled with system-aware training and inference optimization -- networks trained for this design achieve accuracies that closely match full-precision accuracies, while compressing and improving run-time efficiency of the neural networks drastically by 10-20x, compared to full-precision networks.
A Blockchain Solution for Collaborative Machine Learning over IoT
Authors: Carlos Beis-Penedo, Francisco Troncoso-Pastoriza, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas, Manuel Fernández-Veiga, Martín González Soto
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.14136
Pdf link: https://arxiv.org/pdf/2311.14136
Abstract The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.
Binarized 3D Whole-body Human Mesh Recovery
Authors: Zhiteng Li, Yulun Zhang, Jing Lin, Haotong Qin, Jinjin Gu, Xin Yuan, Linghe Kong, Xiaokang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.14323
Pdf link: https://arxiv.org/pdf/2311.14323
Abstract 3D whole-body human mesh recovery aims to reconstruct the 3D human body, face, and hands from a single image. Although powerful deep learning models have achieved accurate estimation in this task, they require enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited edge devices. In this work, we propose a Binarized Dual Residual Network (BiDRN), a novel quantization method to estimate the 3D human body, face, and hands parameters efficiently. Specifically, we design a basic unit Binarized Dual Residual Block (BiDRB) composed of Local Convolution Residual (LCR) and Block Residual (BR), which can preserve full-precision information as much as possible. For LCR, we generalize it to four kinds of convolutional modules so that full-precision information can be propagated even between mismatched dimensions. We also binarize the face and hands box-prediction network as Binaried BoxNet, which can further reduce the model redundancy. Comprehensive quantitative and qualitative experiments demonstrate the effectiveness of BiDRN, which has a significant improvement over state-of-the-art binarization algorithms. Moreover, our proposed BiDRN achieves comparable performance with full-precision method Hand4Whole while using just 22.1% parameters and 14.8% operations. We will release all the code and pretrained models.

A-suozhang / GetArxivDaily

New submissions for Mon, 27 Nov 23 #212

Keyword: efficient

TRIDENT: The Nonlinear Trilogy for Implicit Neural Representations

Efficient Transformer Knowledge Distillation: A Performance Review

Molly: A Verified Compiler for Cryptoprotocol Roles

DiverseNet: Decision Diversified Semi-supervised Semantic Segmentation Networks for Remote Sensing Imagery

A Unified Approach to Count-Based Weakly-Supervised Learning

Sample-Efficient Training for Diffusion

3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology

Work-Efficient Parallel Derandomization I: Chernoff-like Concentrations via Pairwise Independence

Work-Efficient Parallel Derandomization II: Optimal Concentrations via Bootstrapping

Scalable AI Generative Content for Vehicular Network Semantic Communication

Safe Physical Human-Robot Interaction through Variable Impedance Control based on ISO/TS 15066

HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted Corrections for Reliable Regression on Imbalanced Electrocardiograms

Constraint-Guided Online Data Selection for Scalable Data-Driven Safety Filters in Uncertain Robotic Systems

A reduced basis warm-start iterative solver for the parameterized systems

PointPCA+: Extending PointPCA objective quality assessment metric

A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs

Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach

High-order upwind summation-by-parts methods for nonlinear conservation laws

Beamforming Design for Hybrid IRS-aided AF Relay Wireless Networks

A comparison of Algebraic Multigrid Bidomain solvers on hybrid CPU-GPU architectures

Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning

High-Ratio Compression for Machine-Generated Data

Optimal Power Flow in Highly Renewable Power System Based on Attention Neural Networks

Electric Network Frequency Optical Sensing Devices

Efficient Trigger Word Insertion

An Efficient Distributed Nash Equilibrium Seeking with Compressed and Event-triggered Communication

An efficient mixed finite element method for nonlinear magnetostatics and quasistatics

PrivateLoRA For Efficient Privacy Preserving LLM

Step size control for explicit relaxation Runge-Kutta methods preserving invariants

Assessing the Impact of Noise on Quantum Neural Networks: An Experimental Analysis

Hardware Resilience Properties of Text-Guided Image Classifiers

You Only Explain Once

SySMOL: A Hardware-software Co-design Framework for Ultra-Low and Fine-Grained Mixed-Precision Neural Networks

A Blockchain Solution for Collaborative Machine Learning over IoT

Class Balanced Dynamic Acquisition for Domain Adaptive Semantic Segmentation using Active Learning

Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs

Variational Annealing on Graphs for Combinatorial Optimization

Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus

ECRF: Entropy-Constrained Neural Radiance Fields Compression with Frequency Domain Optimization

Maximum Cardinality $f$-Matching in Time $O(n^{2/3}m)$

How We Manage an Army of Teaching Assistants: Experience Report on Scaling a CS1 Course

Distribution Testing with a Confused Collector

Efficient Local Search for Nonlinear Real Arithmetic

Bursting Spikes: Efficient and High-performance SNNs for Event-based Vision

Segmentation-Based Parametric Painting

Fair Influence Maximization in Social Networks: A Community-Based Evolutionary Algorithm

Exploiting Active RIS in NOMA Networks with Hardware Impairments

AdaMedGraph: Adaboosting Graph Neural Networks for Personalized Medicine

Distance-Only Task Orchestration Algorithm for Energy Efficiency in Satellite-Based Mist Computing

Stable Cluster Discrimination for Deep Clustering

RelJoin: Relative-cost-based Selection of Distributed Join Methods for Query Plan Optimization

An Adaptive Fast-Multipole-Accelerated Hybrid Boundary Integral Equation Method for Accurate Diffusion Curves

Numerical methods and regularity properties for viscosity solutions of nonlocal in space and time diffusion equations

Binarized 3D Whole-body Human Mesh Recovery

Cycle Invariant Positional Encoding for Graph Representation Learning

Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset

Deciphering and integrating invariants for neural operator learning with various physical mechanisms

Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

Refinement Proofs in Rust Using Ghost Locks

Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

Controlled Text Generation via Language Model Arithmetic

Malware Analysis on AI Technique

Filasofia: A Framework for Streamlined Development of Real-Time Surgical Simulations

Morphing Graph Drawings in the Presence of Point Obstacles

tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Evaluation of a Non-Coherent Ultra-Wideband Transceiver for Micropower Sensor Nodes

Finding Foundation Models for Time Series Classification with a PreText Task

Deep learning based reduced order modeling of Darcy flow systems with local mass conservation

Counting Solutions to Conjunctive Queries: Structural and Hybrid Tractability

Target-driven splitting SPH optimization of thermal conductivity distribution

Received Signal and Channel Parameter Estimation in Molecular Communications

A General Framework for User-Guided Bayesian Optimization

Learning in Deep Factor Graphs with Gaussian Belief Propagation

GVEL: Fast Graph Loading in Edgelist and Compressed Sparse Row (CSR) formats

History Filtering in Imperfect Information Games: Algorithms and Complexity

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space