New submissions for Fri, 14 Jul 23

Keyword: efficient

Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes

Authors: Fidae El Morer, Stefan Wittek, Andreas Rausch
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.06341
Pdf link: https://arxiv.org/pdf/2307.06341
Abstract The degradation of sewer pipes poses significant economical, environmental and health concerns. The maintenance of such assets requires structured plans to perform inspections, which are more efficient when structural and environmental features are considered along with the results of previous inspection reports. The development of such plans requires degradation models that can be based on statistical and machine learning methods. This work proposes a methodology to assess their suitability to plan inspections considering three dimensions: accuracy metrics, ability to produce long-term degradation curves and explainability. Results suggest that although ensemble models yield the highest accuracy, they are unable to infer the long-term degradation of the pipes, whereas the Logistic Regression offers a slightly less accurate model that is able to produce consistent degradation curves with a high explainability. A use case is presented to demonstrate this methodology and the efficiency of model-based planning compared to the current inspection plan.
ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression
Authors: Ahmed Ghorbel, Wassim Hamidouche, Luce Morin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.06342
Pdf link: https://arxiv.org/pdf/2307.06342
Abstract Over the last few years, neural image compression has gained wide attention from research and industry, yielding promising end-to-end deep neural codecs outperforming their conventional counterparts in rate-distortion performance. Despite significant advancement, current methods, including attention-based transform coding, still need to be improved in reducing the coding rate while preserving the reconstruction fidelity, especially in non-homogeneous textured image areas. Those models also require more parameters and a higher decoding time. To tackle the above challenges, we propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive prior to capturing both global and local contexts from the hyper and quantized latent representations. The proposed architecture can be optimized end-to-end to fully exploit the context information and extract compact latent representation while reconstructing higher-quality images. Experimental results on four widely-used datasets showed that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM, respectively. Moreover, we provide model scaling studies to verify the computational efficiency of our approach and conduct several objective and subjective analyses to bring to the fore the performance gap between the next generation ConvNet, namely ConvNeXt, and Swin Transformer.
Curve Fitting Simplified: Exploring the Intuitive Features of CurvPy
Authors: Sidharth SS
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.06377
Pdf link: https://arxiv.org/pdf/2307.06377
Abstract Curve fitting is a fundamental task in data analysis, allowing researchers to uncover underlying patterns and relationships in their datasets. In this paper, we introduce CurvPy, a powerful data analysis tool designed to streamline the curve-fitting process. CurvPy offers three main functionalities: DataSleuth, FuncPlot, and OptiFit. DataSleuth analyses input data in CSV format and provides a best-guess estimate of the underlying mathematical function. FuncPlot enables users to visually inspect the fit between the function and the data by generating graphs. OptiFit harnesses the power of optimal parameters, allowing effortless optimisation of equation parameters for precise and efficient data modelling. CurvPy is built using Flask, pandas, numpy, matplotlib, scipy, and scikit-learn, providing a user-friendly interface and efficient computational capabilities. By integrating these tools, CurvPy empowers researchers to gain insights from their data and will help to make decisions. Evaluation demonstrates the effectiveness and efficiency of CurvPy in diverse curve-fitting scenarios. The availability of CurvPy as an open-source tool further encourages collaboration and expands its potential applications in various domains. Overall, CurvPy offers a comprehensive solution for curve-fitting tasks and holds great promise for advancing data analysis techniques.
A Program That Simplifies Regular Expressions (Tool paper)
Authors: Baudouin Le Charlier
Subjects: Symbolic Computation (cs.SC)
Arxiv link: https://arxiv.org/abs/2307.06436
Pdf link: https://arxiv.org/pdf/2307.06436
Abstract This paper presents the main features of a system that aims to transform regular expressions into shorter equivalent expressions. The system is also capable of computing other operations useful for simplification, such as checking the inclusion of regular languages. The main novelty of this work is that it combines known but distinct ways of representing regular languages into a global unified data structure that makes the operations more efficient. In addition, representations of regular languages are dynamically reduced as operations are performed on them. Expressions are normalized and represented by a unique identifier (an integer). Expressions found to be equivalent (i.e. denoting the same regular language) are grouped into equivalence classes from which a shortest representative is chosen. The article briefly describes the main algorithms working on the global data structure. Some of them are direct adaptations of well-known algorithms, but most of them incorporate new ideas, which are really necessary to make the system efficient. Finally, to show its usefulness, the system is applied to some examples from the literature. Statistics on randomly generated sets of expressions are also provided.
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Authors: Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2307.06440
Pdf link: https://arxiv.org/pdf/2307.06440
Abstract The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed computation budget using such methods, we find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate. We define an evaluation protocol that enables computation to be done on arbitrary machines by mapping all computation time to a reference machine which we call reference system time. We discuss the limitations of our proposed protocol and release our code to encourage rigorous research in efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.
Primal logic of information
Authors: Yuri Gurevich, Andreas Blass
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
Arxiv link: https://arxiv.org/abs/2307.06454
Pdf link: https://arxiv.org/pdf/2307.06454
Abstract Primal logic arose in access control; it has a remarkably efficient (linear time) decision procedure for its entailment problem. But primal logic is a general logic of information. In the realm of arbitrary items of information (infons), conjunction, disjunction, and implication may seem to correspond (set-theoretically) to union, intersection, and relative complementation. But, while infons are closed under union, they are not closed under intersection or relative complementation. It turns out that there is a systematic transformation of propositional intuitionistic calculi to the original (propositional) primal calculi; we call it Flatting. We extend Flatting to quantifier rules, obtaining arguably the right quantified primal logic, QPL. The QPL entailment problem is exponential-time complete, but it is polynomial-time complete in the case, of importance to applications (at least to access control), where the number of quantifiers is bounded.
Efficiently-Verifiable Strong Uniquely Solvable Puzzles and Matrix Multiplication
Authors: Matthew Anderson, Vu Le
Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Mathematical Software (cs.MS)
Arxiv link: https://arxiv.org/abs/2307.06463
Pdf link: https://arxiv.org/pdf/2307.06463
Abstract We advance the Cohn-Umans framework for developing fast matrix multiplication algorithms. We introduce, analyze, and search for a new subclass of strong uniquely solvable puzzles (SUSP), which we call simplifiable SUSPs. We show that these puzzles are efficiently verifiable, which remains an open question for general SUSPs. We also show that individual simplifiable SUSPs can achieve the same strength of bounds on the matrix multiplication exponent $\omega$ that infinite families of SUSPs can. We report on the construction, by computer search, of larger SUSPs than previously known for small width. This, combined with our tighter analysis, strengthens the upper bound on the matrix multiplication exponent from $2.66$ to $2.505$ obtainable via this computational approach, and nears the results of the handcrafted constructions of Cohn et al.
Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!
Authors: Nathan TeBlunthuis, Valerie Hase, Chung-Hong Chan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.06483
Pdf link: https://arxiv.org/pdf/2307.06483
Abstract Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses-unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias and produce consistent estimates. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels, via Monte Carlo simulations designed to reveal each method's limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.
Market Driven Multi-domain Network Service Orchestration in 5G Networks
Authors: Mouhamad Dieye, Wael Jaafar, Halima Elbiaze, Roch Glitho
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.06488
Pdf link: https://arxiv.org/pdf/2307.06488
Abstract The advent of a new breed of enhanced multimedia services has put network operators into a position where they must support innovative services while ensuring both end-to-end Quality of Service requirements and profitability. Recently, Network Function Virtualization (NFV) has been touted as a cost-effective underlying technology in 5G networks to efficiently provision novel services. These NFV-based services have been increasingly associated with multi-domain networks. However, several orchestration issues, linked to cross-domain interactions and emphasized by the heterogeneity of underlying technologies and administrative authorities, present an important challenge. In this paper, we tackle the cross-domain interaction issue by proposing an intelligent and profitable auction-based approach to allow inter-domains resource allocation.
Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems
Authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06496
Pdf link: https://arxiv.org/pdf/2307.06496
Abstract Deep learning models are susceptible to adversarial samples in white and black-box environments. Although previous studies have shown high attack success rates, coupling DNN models with interpretation models could offer a sense of security when a human expert is involved, who can identify whether a given sample is benign or malicious. However, in white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations. In black-box settings, as access to the components of IDLSes is limited, it becomes more challenging for the adversary to fool the system. In this work, we propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model. QuScore is based on transfer-based and score-based methods by employing an effective microbial genetic algorithm. Our method is designed to reduce the number of queries necessary to carry out successful attacks, resulting in a more efficient process. By continuously refining the adversarial samples created based on feedback scores from the IDLS, our approach effectively navigates the search space to identify perturbations that can fool the system. We evaluate the attack's effectiveness on four CNN models (Inception, ResNet, VGG, DenseNet) and two interpretation models (CAM, Grad), using both ImageNet and CIFAR datasets. Our results show that the proposed approach is query-efficient with a high attack success rate that can reach between 95% and 100% and transferability with an average success rate of 69% in the ImageNet and CIFAR datasets. Our attack method generates adversarial examples with attribution maps that resemble benign samples. We have also demonstrated that our attack is resilient against various preprocessing defense techniques and can easily be transferred to different DNN models.
Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning
Authors: Wenzhou Lv, Tianyu Wu, Luolin Xiong, Liang Wu, Jian Zhou, Yang Tang, Feng Qi
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06501
Pdf link: https://arxiv.org/pdf/2307.06501
Abstract Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control.
Improving Nonalcoholic Fatty Liver Disease Classification Performance With Latent Diffusion Models
Authors: Romain Hardy, Cornelia Ilin, Joe Klepich, Ryan Mitchell, Steve Hall, Jericho Villareal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.06507
Pdf link: https://arxiv.org/pdf/2307.06507
Abstract Integrating deep learning with clinical expertise holds great potential for addressing healthcare challenges and empowering medical professionals with improved diagnostic tools. However, the need for annotated medical images is often an obstacle to leveraging the full power of machine learning models. Our research demonstrates that by combining synthetic images, generated using diffusion models, with real images, we can enhance nonalcoholic fatty liver disease (NAFLD) classification performance. We evaluate the quality of the synthetic images by comparing two metrics: Inception Score (IS) and Fr\'{e}chet Inception Distance (FID), computed on diffusion-generated images and generative adversarial networks (GANs)-generated images. Our results show superior performance for the diffusion-generated images, with a maximum IS score of $1.90$ compared to $1.67$ for GANs, and a minimum FID score of $69.45$ compared to $99.53$ for GANs. Utilizing a partially frozen CNN backbone (EfficientNet v1), our synthetic augmentation method achieves a maximum image-level ROC AUC of $0.904$ on a NAFLD prediction task.
Migrating to Post-Quantum Cryptography: a Framework Using Security Dependency Analysis
Authors: Khondokar Fida Hasan, Leonie Simpson, Mir Ali Rezazadeh Baee, Chadni Islam, Ziaur Rahman, Warren Armstrong, Praveen Gauravaram, Matthew McKague
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.06520
Pdf link: https://arxiv.org/pdf/2307.06520
Abstract Quantum computing is emerging as an unprecedented threat to the current state of widely used cryptographic systems. Cryptographic methods that have been considered secure for decades will likely be broken, with enormous impact on the security of sensitive data and communications in enterprises worldwide. A plan to migrate to quantum-resistant cryptographic systems is required. However, migrating an enterprise system to ensure a quantum-safe state is a complex process. Enterprises will require systematic guidance to perform this migration to remain resilient in a post-quantum era, as many organisations do not have staff with the expertise to manage this process unaided. This paper presents a comprehensive framework designed to aid enterprises in their migration. The framework articulates key steps and technical considerations in the cryptographic migration process. It makes use of existing organisational inventories and provides a roadmap for prioritising the replacement of cryptosystems in a post-quantum context. The framework enables the efficient identification of cryptographic objects, and can be integrated with other frameworks in enterprise settings to minimise operational disruption during migration. Practical case studies are included to demonstrate the utility and efficacy of the proposed framework using graph theoretic techniques to determine and evaluate cryptographic dependencies.
Optimised Least Squares Approach for Accurate Rectangle Fitting
Authors: Yiming Quan, Shian Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.06528
Pdf link: https://arxiv.org/pdf/2307.06528
Abstract This study introduces a novel and efficient least squares based method for rectangle fitting, using a continuous fitness function that approximates a unit square accurately. The proposed method is compared with the existing method in the literature using both simulated data and real data. The real data is derived from aerial photogrammetry point clouds of a rectangular building. The simulated tests show that the proposed method performs better than the reference method, reducing the root-mean-square error by about 93% and 14% for clean datasets and noisy point clouds, respectively. The proposed method also improves the fitting of the real dataset by about 81%, achieving centimetre level accuracy. Furthermore, the test results show that the proposed method converges in fewer than 10 iterations.
Wavelet-based Edge Multiscale Parareal Algorithm for subdiffusion equations with heterogeneous coefficients in a large time domain
Authors: Guanglian Li
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.06529
Pdf link: https://arxiv.org/pdf/2307.06529
Abstract We present the Wavelet-based Edge Multiscale Parareal (WEMP) Algorithm, recently proposed in [Li and Hu, {\it J. Comput. Phys.}, 2021], for efficiently solving subdiffusion equations with heterogeneous coefficients in long time. This algorithm combines the benefits of multiscale methods, which can handle heterogeneity in the spatial domain, and the strength of parareal algorithms for speeding up time evolution problems when sufficient processors are available. Our algorithm overcomes the challenge posed by the nonlocality of the fractional derivative in previous parabolic problem work by constructing an auxiliary problem on each coarse temporal subdomain to completely uncouple the temporal variable. We prove the approximation properties of the correction operator and derive a new summation of exponential to generate a single-step time stepping scheme, with the number of terms of $\mathcal{O}(|\log{\tau_f}|^2)$ independent of the final time, where $\tau_f$ is the fine-scale time step size. We establish the convergence rate of our algorithm in terms of the mesh size in the spatial domain, the level parameter used in the multiscale method, the coarse-scale time step size, and the fine-scale time step size. Finally, we present several numerical tests that demonstrate the effectiveness of our algorithm and validate our theoretical results.
Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification
Authors: Lianke Qin, Zhao Song, Yuanyuan Yang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06565
Pdf link: https://arxiv.org/pdf/2307.06565
Abstract Deep learning has been widely used in many fields, but the model training process usually consumes massive computational resources and time. Therefore, designing an efficient neural network training method with a provable convergence guarantee is a fundamental and important research question. In this paper, we present a static half-space report data structure that consists of a fully connected two-layer neural network for shifted ReLU activation to enable activated neuron identification in sublinear time via geometric search. We also prove that our algorithm can converge in $O(M^2/\epsilon^2)$ time with network size quadratic in the coefficient norm upper bound $M$ and error term $\epsilon$.
Deep learning based enhancement of ordered statistics decoding of LDPC codes
Authors: Guangwen Li, Xiao Yu
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2307.06575
Pdf link: https://arxiv.org/pdf/2307.06575
Abstract Aiming at designing plausible decoders with channel information free, low complexity, high throughput, and approaching maximum likelihood performance, we put forward a streamlined architecture which concatenates sequentially three components. Specifically, to tackle the decoding failures of normalized min-sum, the whole decoding trajectory, not limited to the last iteration information conventionally, is fed into a trained convolutional neural network to yield new reliability metric for each sequence bit, termed decoding information aggregation. Then an adapted order statistics decoding, following the suggested decoding path, is adopted to process the sequence ordered with new metric more efficiently in that many invalid searches contained in conventional methods otherwise are evaded. The role of decoding information aggregation is elaborated via statistics data to reveal that it can arrange more error-prone bits into the fore part of most reliable basis of order statistics decoding, which is vital for the effective decoding enhancement. We argue the superposition of improved bitwise reliability of the most reliable basis and the imposed rigorous code structure by OSD enables the proposed architecture being a competitive rival of the state of the art decoders, which was verified in extensive simulation in terms of performance, complexity and latency for short and moderate LDPC codes.
Online Distributed Learning with Quantized Finite-Time Coordination
Authors: Nicola Bastianello, Apostolos I. Rikos, Karl H. Johansson
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2307.06620
Pdf link: https://arxiv.org/pdf/2307.06620
Abstract In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.
cjdb: a simple, fast, and lean database solution for the CityGML data model
Authors: Leon Powałka, Chris Poon, Yitong Xia, Siebren Meines, Lan Yan, Yuduan Cai, Gina Stavropoulou, Balázs Dukai, Hugo Ledoux
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.06621
Pdf link: https://arxiv.org/pdf/2307.06621
Abstract When it comes to storing 3D city models in a database, the implementation of the CityGML data model can be quite demanding and often results in complicated schemas. As an example, 3DCityDB, a widely used solution, depends on a schema having 66 tables, mapping closely the CityGML architecture. In this paper, we propose an alternative (called cjdb) for storing CityGML models efficiently in PostgreSQL with a much simpler table structure and data model design (only 3 tables are necessary). This is achieved by storing the attributes and geometries of the objects directly in JSON. In the case of the geometries we thus adopt the Simple Feature paradigm and we use the structure of CityJSON. We compare our solution against 3DCityDB with large real-world 3D city models, and we find that cjdb has significantly lower demands in storage space (around a factor of 10), allows for faster import/export of data, and has a comparable data retrieval speed with some queries being faster and some slower. The accompanying software (importer and exporter) is available at https://github.com/cityjson/cjdb/ under a permissive open-source license.
Frameless Graph Knowledge Distillation
Authors: Dai Shi, Zhiqi Shao, Yi Guo, Junbin Gao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06631
Pdf link: https://arxiv.org/pdf/2307.06631
Abstract Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to the graph representation learning models such as graph neural networks (GNNs) to accelerate the model's inference speed via student models. However, many existing KD-based GNNs utilize MLP as a universal approximator in the student model to imitate the teacher model's process without considering the graph knowledge from the teacher model. In this work, we provide a KD-based framework on multi-scaled GNNs, known as graph framelet, and prove that by adequately utilizing the graph knowledge in a multi-scaled manner provided by graph framelet decomposition, the student model is capable of adapting both homophilic and heterophilic graphs and has the potential of alleviating the over-squashing issue with a simple yet effectively graph surgery. Furthermore, we show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Comprehensive experiments show that our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.
Making local algorithms efficiently self-stabilizing in arbitrary asynchronous environments
Authors: Stéphane Devismes (UPJV), David Ilcinkas (LaBRI), Colette Johnen (LaBRI), Frédéric Mazoit (LaBRI)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.06635
Pdf link: https://arxiv.org/pdf/2307.06635
Abstract This paper deals with the trade-off between time, workload, and versatility in self-stabilization, a general and lightweight fault-tolerant concept in distributed computing.In this context, we propose a transformer that provides an asynchronous silent self-stabilizing version Trans(AlgI) of any terminating synchronous algorithm AlgI. The transformed algorithm Trans(AlgI) works under the distributed unfair daemon and is efficient both in moves and rounds.Our transformer allows to easily obtain fully-polynomial silent self-stabilizing solutions that are also asymptotically optimal in rounds.We illustrate the efficiency and versatility of our transformer with several efficient (i.e., fully-polynomial) silent self-stabilizing instances solving major distributed computing problems, namely vertex coloring, Breadth-First Search (BFS) spanning tree construction, k-clustering, and leader election.
Packing squares independently
Authors: Wei Wu, Hiroki Numaguchi, Yannan Hu, Mutsunori Yagiura
Subjects: Discrete Mathematics (cs.DM); Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2307.06654
Pdf link: https://arxiv.org/pdf/2307.06654
Abstract Given a set of squares and a strip of bounded width and infinite height, we consider a square strip packaging problem, which we call the square independent packing problem (SIPP), to minimize the strip height so that all the squares are packed into independent cells separated by horizontal and vertical partitions. For the SIPP, we first investigate efficient solution representations and propose a compact representation that reduces the search space from $\Omega(n!)$ to $O(2^n)$, with $n$ the number of given squares, while guaranteeing that there exists a solution representation that corresponds to an optimal solution. Based on the solution representation, we show that the problem is NP-hard, and then we propose a fully polynomial-time approximation scheme (FPTAS) to solve it. We also propose three mathematical programming formulations based on different solution representations and confirm the performance of these algorithms through computational experiments. Finally, we discuss several extensions that are relevant to practical applications.
Downlink Precoding for Cell-free FBMC/OQAM Systems With Asynchronous Reception
Authors: Yuhao Qi, Jian Dang, Zaichen Zhang, Liang Wu, Yongpeng Wu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.06657
Pdf link: https://arxiv.org/pdf/2307.06657
Abstract In this work, an efficient precoding design scheme is proposed for downlink cell-free distributed massive multiple-input multiple-output (DM-MIMO) filter bank multi-carrier (FBMC) systems with asynchronous reception and highly frequency selectivity. The proposed scheme includes a multiple interpolation structure to eliminate the impact of response difference we recently discovered, which has better performance in highly frequency-selective channels. Besides, we also consider the phase shift in asynchronous reception and introduce a phase compensation in the design process. The phase compensation also benefits from the multiple interpolation structure and better adapts to asynchronous reception. Based on the proposed scheme, we theoretically analyze its ergodic achievable rate performance and derive a closed-form expression. Simulation results show that the derived expression can accurately characterize the rate performance, and FBMC with the proposed scheme outperforms orthogonal frequency-division multiplexing (OFDM) in the asynchronous scenario.
Transformer-based end-to-end classification of variable-length volumetric data
Authors: Marzieh Oghbaie, Teresa Araujo, Taha Emre, Ursula Schmidt-Erfurth, Hrvoje Bogunovic
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.06666
Pdf link: https://arxiv.org/pdf/2307.06666
Abstract The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Naive solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long-sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input slice-wise resolution during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the slice-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume. Our code is available at: https://github.com/marziehoghbaie/VLFAT.
Overcoming the Mental Set Effect in Programming Problem Solving
Authors: Agnia Sergeyuk, Sergey Titov, Yaroslav Golubev, Timofey Bryksin
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2307.06673
Pdf link: https://arxiv.org/pdf/2307.06673
Abstract This paper adopts a cognitive psychology perspective to investigate the recurring mistakes in code resulting from the mental set (Einstellung) effect. The Einstellung effect is the tendency to approach problem-solving with a preconceived mindset, often overlooking better solutions that may be available. This effect can significantly impact creative thinking, as the development of patterns of thought can hinder the emergence of novel and creative ideas. Our study aims to test the Einstellung effect and the two mechanisms of its overcoming in the field of programming. The first intervention was the change of the color scheme of the code editor to the less habitual one. The second intervention was a combination of instruction to "forget the previous solutions and tasks" and the change in the color scheme. During the experiment, participants were given two sets of four programming tasks. Each task had two possible solutions: one using suboptimal code dictated by the mental set, and the other using a less familiar but more efficient and recommended methodology. Between the sets, participants either received no treatment or one of two interventions aimed at helping them overcome the mental set. The results of our experiment suggest that the tested techniques were insufficient to support overcoming the mental set, which we attribute to the specificity of the programming domain. The study contributes to the existing literature by providing insights into creativity support during problem-solving in software development and offering a framework for experimental research in this field.
Meta-State-Space Learning: An Identification Approach for Stochastic Dynamical Systems
Authors: Gerben I. Beintema, Maarten Schoukens, Roland Tóth
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.06675
Pdf link: https://arxiv.org/pdf/2307.06675
Abstract Available methods for identification of stochastic dynamical systems from input-output data generally impose restricting structural assumptions on either the noise structure in the data-generating system or the possible state probability distributions. In this paper, we introduce a novel identification method of such systems, which results in a dynamical model that is able to produce the time-varying output distribution accurately without taking restrictive assumptions on the data-generating process. The method is formulated by first deriving a novel and exact representation of a wide class of nonlinear stochastic systems in a so-called meta-state-space form, where the meta-state can be interpreted as a parameter vector of a state probability function space parameterization. As the resulting representation of the meta-state dynamics is deterministic, we can capture the stochastic system based on a deterministic model, which is highly attractive for identification. The meta-state-space representation often involves unknown and heavily nonlinear functions, hence, we propose an artificial neural network (ANN)-based identification method capable of efficiently learning nonlinear meta-state-space models. We demonstrate that the proposed identification method can obtain models with a log-likelihood close to the theoretical limit even for highly nonlinear, highly stochastic systems.
YOLIC: An Efficient Method for Object Localization and Classification on Edge Devices
Authors: Kai Su, Qiangfu Zhao, Yoichi Tomioka, Yong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.06689
Pdf link: https://arxiv.org/pdf/2307.06689
Abstract In the realm of Tiny AI, we introduce "You Only Look at Interested Cells" (YOLIC), an efficient method for object localization and classification on edge devices. Seamlessly blending the strengths of semantic segmentation and object detection, YOLIC offers superior computational efficiency and precision. By adopting Cells of Interest for classification instead of individual pixels, YOLIC encapsulates relevant information, reduces computational load, and enables rough object shape inference. Importantly, the need for bounding box regression is obviated, as YOLIC capitalizes on the predetermined cell configuration that provides information about potential object location, size, and shape. To tackle the issue of single-label classification limitations, a multi-label classification approach is applied to each cell, effectively recognizing overlapping or closely situated objects. This paper presents extensive experiments on multiple datasets, demonstrating that YOLIC achieves detection performance comparable to the state-of-the-art YOLO algorithms while surpassing in speed, exceeding 30fps on a Raspberry Pi 4B CPU. All resources related to this study, including datasets, cell designer, image annotation tool, and source code, have been made publicly available on our project website at https://kai3316.github.io/yolic.github.io
Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds
Authors: Nairen Cao, Shang-En Huang, Hsin-Hao Su
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06723
Pdf link: https://arxiv.org/pdf/2307.06723
Abstract In this paper, we study parallel algorithms for the correlation clustering problem, where every pair of two different entities is labeled with similar or dissimilar. The goal is to partition the entities into clusters to minimize the number of disagreements with the labels. Currently, all efficient parallel algorithms have an approximation ratio of at least 3. In comparison with the $1.994+\epsilon$ ratio achieved by polynomial-time sequential algorithms [CLN22], a significant gap exists. We propose the first poly-logarithmic depth parallel algorithm that achieves a better approximation ratio than 3. Specifically, our algorithm computes a $(2.4+\epsilon)$-approximate solution and uses $\tilde{O}(m^{1.5})$ work. Additionally, it can be translated into a $\tilde{O}(m^{1.5})$-time sequential algorithm and a poly-logarithmic rounds sublinear-memory MPC algorithm with $\tilde{O}(m^{1.5})$ total memory. Our approach is inspired by Awerbuch, Khandekar, and Rao's [AKR12] length-constrained multi-commodity flow algorithm, where we develop an efficient parallel algorithm to solve a truncated correlation clustering linear program of Charikar, Guruswami, and Wirth [CGW05]. Then we show the solution of the truncated linear program can be rounded with a factor of at most 2.4 loss by using the framework of [CMSY15]. Such a rounding framework can then be implemented using parallel pivot-based approaches.
Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent
Authors: Ruichong Zhang
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.06753
Pdf link: https://arxiv.org/pdf/2307.06753
Abstract The learning of Gaussian Mixture Models (also referred to simply as GMMs) plays an important role in machine learning. Known for their expressiveness and interpretability, Gaussian mixture models have a wide range of applications, from statistics, computer vision to distributional reinforcement learning. However, as of today, few known algorithms can fit or learn these models, some of which include Expectation-Maximization algorithms and Sliced Wasserstein Distance. Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks. In this paper, we derive a closed formula of two GMMs in the univariate, one-dimensional case, then propose a distance function called Sliced Cram\'er 2-distance for learning general multivariate GMMs. Our approach has several advantages over many previous methods. First, it has a closed-form expression for the univariate case and is easy to compute and implement using common machine learning libraries (e.g., PyTorch and TensorFlow). Second, it is compatible with gradient descent, which enables us to integrate GMMs with neural networks seamlessly. Third, it can fit a GMM not only to a set of data points, but also to another GMM directly, without sampling from the target model. And fourth, it has some theoretical guarantees like global gradient boundedness and unbiased sampling gradient. These features are especially useful for distributional reinforcement learning and Deep Q Networks, where the goal is to learn a distribution over future rewards. We will also construct a Gaussian Mixture Distributional Deep Q Network as a toy example to demonstrate its effectiveness. Compared with previous models, this model is parameter efficient in terms of representing a distribution and possesses better interpretability.
Layered controller synthesis for dynamic multi-agent systems
Authors: Emily Clement, Nicolas Perrin-Gilbert, Philipp Schlehuber-Caissier
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2307.06758
Pdf link: https://arxiv.org/pdf/2307.06758
Abstract In this paper we present a layered approach for multi-agent control problem, decomposed into three stages, each building upon the results of the previous one. First, a high-level plan for a coarse abstraction of the system is computed, relying on parametric timed automata augmented with stopwatches as they allow to efficiently model simplified dynamics of such systems. In the second stage, the high-level plan, based on SMT-formulation, mainly handles the combinatorial aspects of the problem, provides a more dynamically accurate solution. These stages are collectively referred to as the SWA-SMT solver. They are correct by construction but lack a crucial feature: they cannot be executed in real time. To overcome this, we use SWA-SMT solutions as the initial training dataset for our last stage, which aims at obtaining a neural network control policy. We use reinforcement learning to train the policy, and show that the initial dataset is crucial for the overall success of the method.
Planar Disjoint Paths, Treewidth, and Kernels
Authors: Michał Włodarczyk, Meirav Zehavi
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.06792
Pdf link: https://arxiv.org/pdf/2307.06792
Abstract In the Planar Disjoint Paths problem, one is given an undirected planar graph with a set of $k$ vertex pairs $(s_i,t_i)$ and the task is to find $k$ pairwise vertex-disjoint paths such that the $i$-th path connects $s_i$ to $t_i$. We study the problem through the lens of kernelization, aiming at efficiently reducing the input size in terms of a parameter. We show that Planar Disjoint Paths does not admit a polynomial kernel when parameterized by $k$ unless coNP $\subseteq$ NP/poly, resolving an open problem by [Bodlaender, Thomass{\'e}, Yeo, ESA'09]. Moreover, we rule out the existence of a polynomial Turing kernel unless the WK-hierarchy collapses. Our reduction carries over to the setting of edge-disjoint paths, where the kernelization status remained open even in general graphs. On the positive side, we present a polynomial kernel for Planar Disjoint Paths parameterized by $k + tw$, where $tw$ denotes the treewidth of the input graph. As a consequence of both our results, we rule out the possibility of a polynomial-time (Turing) treewidth reduction to $tw= k^{O(1)}$ under the same assumptions. To the best of our knowledge, this is the first hardness result of this kind. Finally, combining our kernel with the known techniques [Adler, Kolliopoulos, Krause, Lokshtanov, Saurabh, Thilikos, JCTB'17; Schrijver, SICOMP'94] yields an alternative (and arguably simpler) proof that Planar Disjoint Paths can be solved in time $2^{O(k^2)}\cdot n^{O(1)}$, matching the result of [Lokshtanov, Misra, Pilipczuk, Saurabh, Zehavi, STOC'20].
Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics
Authors: Alessandra Carbone, Aurélien Decelle, Lorenzo Rosset, Beatriz Seoane
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Biomolecules (q-bio.BM); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2307.06797
Pdf link: https://arxiv.org/pdf/2307.06797
Abstract In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects. This approach, applied on the Restricted Boltzmann Machine, improves the model's ability to correctly classify samples and generate high-quality synthetic data in only a few sampling steps. The effectiveness of this method is demonstrated by its successful application to four different types of data: handwritten digits, mutations of human genomes classified by continental origin, functionally characterized sequences of an enzyme protein family, and homologous RNA sequences from specific taxonomies.
Data-driven Nonlinear Parametric Model Order Reduction Framework using Deep Hierarchical Variational Autoencoder
Authors: SiHun Lee, Sangmin Lee, Kijoo Jang, Haeseong Cho, SangJoon Shin
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2307.06816
Pdf link: https://arxiv.org/pdf/2307.06816
Abstract A data-driven parametric model order reduction (MOR) method using a deep artificial neural network is proposed. The present network, which is the least-squares hierarchical variational autoencoder (LSH-VAE), is capable of performing nonlinear MOR for the parametric interpolation of a nonlinear dynamic system with a significant number of degrees of freedom. LSH-VAE exploits two major changes to the existing networks: a hierarchical deep structure and a hybrid weighted, probabilistic loss function. The enhancements result in a significantly improved accuracy and stability compared against the conventional nonlinear MOR methods, autoencoder, and variational autoencoder. Upon LSH-VAE, a parametric MOR framework is presented based on the spherically linear interpolation of the latent manifold. The present framework is validated and evaluated on three nonlinear and multiphysics dynamic systems. First, the present framework is evaluated on the fluid-structure interaction benchmark problem to assess its efficiency and accuracy. Then, a highly nonlinear aeroelastic phenomenon, limit cycle oscillation, is analyzed. Finally, the present framework is applied to a three-dimensional fluid flow to demonstrate its capability of efficiently analyzing a significantly large number of degrees of freedom. The performance of LSH-VAE is emphasized by comparing its results against that of the widely used nonlinear MOR methods, convolutional autoencoder, and $\beta$-VAE. The present framework exhibits a significantly enhanced accuracy to the conventional methods while still exhibiting a large speed-up factor.
Federated Multi-Agent Deep Reinforcement Learning for Dynamic and Flexible 3D Operation of 5G Multi-MAP Networks
Authors: Esteban Catté, Mohamed Sana, Mickael Maman
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2307.06842
Pdf link: https://arxiv.org/pdf/2307.06842
Abstract This paper addresses the efficient management of Mobile Access Points (MAPs), which are Unmanned Aerial Vehicles (UAV), in 5G networks. We propose a two-level hierarchical architecture, which dynamically reconfigures the network while considering Integrated Access-Backhaul (IAB) constraints. The high-layer decision process determines the number of MAPs through consensus, and we develop a joint optimization process to account for co-dependence in network self-management. In the low-layer, MAPs manage their placement using a double-attention based Deep Reinforcement Learning (DRL) model that encourages cooperation without retraining. To improve generalization and reduce complexity, we propose a federated mechanism for training and sharing one placement model for every MAP in the low-layer. Additionally, we jointly optimize the placement and backhaul connectivity of MAPs using a multi-objective reward function, considering the impact of varying MAP placement on wireless backhaul connectivity.
The Human Blockage Impact on ARIS Assisted D2D Communication Systems
Authors: Ahmed M. Nor, Octavian Fratu, Simona Halunga
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.06856
Pdf link: https://arxiv.org/pdf/2307.06856
Abstract Aerial reconfigurable intelligent surface (ARIS), is an intelligent reflecting surface (IRS) mounted by unmanned aerial vehicle (UAV), represent a promising candidate for assisting device to device (D2D) millimeter wave (mmWave) communication in temporal and urgent situations, e.g., open-air events. IRS can efficiently mitigate the high blockage impact on mmWave propagation signal in base station to device use case. But, the scenario of D2D communication is different as both the transmitter (TX) to ARIS and the ARIS to receiver (RX) links are highly susceptible to be blocked due to the low height of the TX and RX. Consequently, in this paper, the impact of human bodies blockage on ARIS aided D2D mmWave communication is studied. Firstly, we assure the effectiveness of using ARIS in this network to significantly enhance its performance, then, the effect of ARIS height on the blockage occurrence and system performance is investigated to find out the optimum height. Our results proves that ARIS highly mitigates the blockage, reduces it by 85%, comparable to the case without it. Moreover, a high increase in system spectral efficiency, 1.2 bps/Hz, can be guaranteed, if ARIS is configured at optimum height.
Digital Twinning in Smart Grid Networks: Interplay, Resource Allocation and Use Cases
Authors: Abdullah Othman, Georges Kaddoum, Joao V. C. Evangelista, Minh Au, Basile L. Agba
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.06861
Pdf link: https://arxiv.org/pdf/2307.06861
Abstract Motivated by climate change, increasing industrialization and energy reliability concerns, the smart grid is set to revolutionize traditional power systems. Moreover, the exponential annual rise in number of grid-connected users and emerging key players e.g. electric vehicles strain the limited radio resources, which stresses the need for novel and scalable resource management techniques. Digital twin is a cutting-edge virtualization technology that has shown great potential by offering solutions for inherent bottlenecks in traditional wireless networks. In this article, we set the stage for various roles digital twinning can fulfill by optimizing congested radio resources in a proactive and resilient smart grid. Digital twins can help smart grid networks through real-time monitoring, advanced precise modeling and efficient radio resource allocation for normal operations and service restoration following unexpected events. However, reliable real-time communications, intricate abstraction abilities, interoperability with other smart grid technologies, robust computing capabilities and resilient security schemes are some open challenges for future work on digital twins.
Open Source Reconfigurable Intelligent Surface for the Frequency Range of 5 GHz WiFi
Authors: Markus Heinrichs, Aydin Sezgin, Rainer Kronberger
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.06868
Pdf link: https://arxiv.org/pdf/2307.06868
Abstract Refonfigurable Intelligent Surfaces (RIS) have been idenitified as a potential ingredient to enhance the performance of contemporary wireless communication and sensing systems. Yet, most of the existing devices are either costly or not available for reproduction. To close this gap, a Reconfigurable Intelligent Surface for the frequency range of 5 GHz WiFi is presented in this work. We describe the designed unit cell, which is optimized for the full frequency range of 5.15 to 5.875 GHz. Standard FR4 substrate is used for cost optimization. The measured reflection coefficient of a rectangular RIS prototype with 256 elements is used for RF performance evaluation. Fabrication data and firmware source code are made open source, which makes RIS more available in real measurement setups.
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Authors: Gregor Geigle, Abhay Jain, Radu Timofte, Goran Glavaš
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.06930
Pdf link: https://arxiv.org/pdf/2307.06930
Abstract Modular vision-language models (Vision-LLMs) align pretrained image encoders with (pretrained) large language models (LLMs), representing a computationally much more efficient alternative to end-to-end training of large vision-language models from scratch, which is prohibitively expensive for most. Vision-LLMs instead post-hoc condition LLMs to `understand' the output of an image encoder. With the abundance of readily available high-quality English image-text data as well as monolingual English LLMs, the research focus has been on English-only Vision-LLMs. Multilingual vision-language models are still predominantly obtained via expensive end-to-end pretraining, resulting in comparatively smaller models, trained on limited multilingual image data supplemented with text-only multilingual corpora. In this work, we present mBLIP, the first multilingual Vision-LLM, which we obtain in a computationally efficient manner -- on consumer hardware using only a few million training examples -- by leveraging a pretrained multilingual LLM. To this end, we \textit{re-align} an image encoder previously tuned to an English LLM to a new, multilingual LLM -- for this, we leverage multilingual data from a mix of vision-and-language tasks, which we obtain by machine-translating high-quality English data to 95 languages. On the IGLUE benchmark, mBLIP yields results competitive with state-of-the-art models. Moreover, in image captioning on XM3600, mBLIP (zero-shot) even outperforms PaLI-X (a model with 55B parameters). Compared to these very large multilingual vision-language models trained from scratch, we obtain mBLIP by training orders of magnitude fewer parameters on magnitudes less data. We release our model and code at \url{https://github.com/gregor-ge/mBLIP}.
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Authors: Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.06947
Pdf link: https://arxiv.org/pdf/2307.06947
Abstract Recent video recognition models utilize Transformer models for long-range spatio-temporal context modeling. Video transformer designs are based on self-attention that can model global context at a high computational cost. In comparison, convolutional designs for videos offer an efficient alternative but lack long-range dependency modeling. Towards achieving the best of both designs, this work proposes Video-FocalNet, an effective and efficient architecture for video recognition that models both local and global contexts. Video-FocalNet is based on a spatio-temporal focal modulation architecture that reverses the interaction and aggregation steps of self-attention for better efficiency. Further, the aggregation step and the interaction step are both implemented using efficient convolution and element-wise multiplication operations that are computationally less expensive than their self-attention counterparts on video representations. We extensively explore the design space of focal modulation-based spatio-temporal context modeling and demonstrate our parallel spatial and temporal encoding design to be the optimal choice. Video-FocalNets perform favorably well against the state-of-the-art transformer-based models for video recognition on three large-scale datasets (Kinetics-400, Kinetics-600, and SS-v2) at a lower computational cost. Our code/models are released at https://github.com/TalalWasim/Video-FocalNets.
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
Authors: Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.06948
Pdf link: https://arxiv.org/pdf/2307.06948
Abstract Prompt learning has emerged as an efficient alternative for fine-tuning foundational models, such as CLIP, for various downstream tasks. Conventionally trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downstream data distributions and find it challenging to capture task-agnostic general features from the frozen CLIP. This leads to the loss of the model's original generalization capability. To address this issue, our work introduces a self-regularization framework for prompting called PromptSRC (Prompting with Self-regulating Constraints). PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations using a three-pronged approach by: (a) regulating {prompted} representations via mutual agreement maximization with the frozen model, (b) regulating with self-ensemble of prompts over the training trajectory to encode their complementary strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance with the visual branch. To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity. PromptSRC explicitly steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization. We perform extensive experiments on 4 benchmarks where PromptSRC overall performs favorably well compared to the existing methods. Our code and pre-trained models are publicly available at: https://github.com/muzairkhattak/PromptSRC.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Authors: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06949
Pdf link: https://arxiv.org/pdf/2307.06949
Abstract Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
Keyword: faster

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Authors: Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2307.06440
Pdf link: https://arxiv.org/pdf/2307.06440
Abstract The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed computation budget using such methods, we find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate. We define an evaluation protocol that enables computation to be done on arbitrary machines by mapping all computation time to a reference machine which we call reference system time. We discuss the limitations of our proposed protocol and release our code to encourage rigorous research in efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.
Efficient Convolution and Transformer-Based Network for Video Frame Interpolation
Authors: Issa Khalifeh, Luka Murn, Marta Mrak, Ebroul Izquierdo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.06443
Pdf link: https://arxiv.org/pdf/2307.06443
Abstract Video frame interpolation is an increasingly important research task with several key industrial applications in the video coding, broadcast and production sectors. Recently, transformers have been introduced to the field resulting in substantial performance gains. However, this comes at a cost of greatly increased memory usage, training and inference time. In this paper, a novel method integrating a transformer encoder and convolutional features is proposed. This network reduces the memory burden by close to 50% and runs up to four times faster during inference time compared to existing transformer-based interpolation methods. A dual-encoder architecture is introduced which combines the strength of convolutions in modelling local correlations with those of the transformer for long-range dependencies. Quantitative evaluations are conducted on various benchmarks with complex motion to showcase the robustness of the proposed method, achieving competitive performance compared to state-of-the-art interpolation networks.
WiscSort: External Sorting For Byte-Addressable Storage
Authors: Vinay Banakar, Kan Wu, Yuvraj Patel, Kimberly Keeton, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Subjects: Databases (cs.DB); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2307.06476
Pdf link: https://arxiv.org/pdf/2307.06476
Abstract We present WiscSort, a new approach to high-performance concurrent sorting for existing and future byte-addressable storage (BAS) devices. WiscSort carefully reduces writes, exploits random reads by splitting keys and values during sorting, and performs interference-aware scheduling with thread pool sizing to avoid I/O bandwidth degradation. We introduce the BRAID model which encompasses the unique characteristics of BAS devices. Many state-of-the-art sorting systems do not comply with the BRAID model and deliver sub-optimal performance, whereas WiscSort demonstrates the effectiveness of complying with BRAID. We show that WiscSort is 2-7x faster than competing approaches on a standard sort benchmark. We evaluate the effectiveness of key-value separation on different key-value sizes and compare our concurrency optimizations with various other concurrency models. Finally, we emulate generic BAS devices and show how our techniques perform well with various combinations of hardware properties.
Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning
Authors: Wenzhou Lv, Tianyu Wu, Luolin Xiong, Liang Wu, Jian Zhou, Yang Tang, Feng Qi
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06501
Pdf link: https://arxiv.org/pdf/2307.06501
Abstract Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control.
On the Effective Horizon of Inverse Reinforcement Learning
Authors: Yiqing Xu, Finale Doshi-Velez, David Hsu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.06541
Pdf link: https://arxiv.org/pdf/2307.06541
Abstract Inverse reinforcement learning (IRL) algorithms often rely on (forward) reinforcement learning or planning over a given time horizon to compute an approximately optimal policy for a hypothesized reward function and then match this policy with expert demonstrations. The time horizon plays a critical role in determining both the accuracy of reward estimate and the computational efficiency of IRL algorithms. Interestingly, an effective time horizon shorter than the ground-truth value often produces better results faster. This work formally analyzes this phenomenon and provides an explanation: the time horizon controls the complexity of an induced policy class and mitigates overfitting with limited data. This analysis leads to a principled choice of the effective horizon for IRL. It also prompts us to reexamine the classic IRL formulation: it is more natural to learn jointly the reward and the effective horizon together rather than the reward alone with a given horizon. Our experimental results confirm the theoretical analysis.
cjdb: a simple, fast, and lean database solution for the CityGML data model
Authors: Leon Powałka, Chris Poon, Yitong Xia, Siebren Meines, Lan Yan, Yuduan Cai, Gina Stavropoulou, Balázs Dukai, Hugo Ledoux
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2307.06621
Pdf link: https://arxiv.org/pdf/2307.06621
Abstract When it comes to storing 3D city models in a database, the implementation of the CityGML data model can be quite demanding and often results in complicated schemas. As an example, 3DCityDB, a widely used solution, depends on a schema having 66 tables, mapping closely the CityGML architecture. In this paper, we propose an alternative (called cjdb) for storing CityGML models efficiently in PostgreSQL with a much simpler table structure and data model design (only 3 tables are necessary). This is achieved by storing the attributes and geometries of the objects directly in JSON. In the case of the geometries we thus adopt the Simple Feature paradigm and we use the structure of CityJSON. We compare our solution against 3DCityDB with large real-world 3D city models, and we find that cjdb has significantly lower demands in storage space (around a factor of 10), allows for faster import/export of data, and has a comparable data retrieval speed with some queries being faster and some slower. The accompanying software (importer and exporter) is available at https://github.com/cityjson/cjdb/ under a permissive open-source license.
Tensor Completion via Leverage Sampling and Tensor QR Decomposition for Network Latency Estimation
Authors: Jun Lei, Ji-Qian Zhao, Jing-Qi Wang, An-Bao Xu
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.06848
Pdf link: https://arxiv.org/pdf/2307.06848
Abstract In this paper, we consider the network latency estimation, which has been an important metric for network performance. However, a large scale of network latency estimation requires a lot of computing time. Therefore, we propose a new method that is much faster and maintains high accuracy. The data structure of network nodes can form a matrix, and the tensor model can be formed by introducing the time dimension. Thus, the entire problem can be be summarized as a tensor completion problem. The main idea of our method is improving the tensor leverage sampling strategy and introduce tensor QR decomposition into tensor completion. To achieve faster tensor leverage sampling, we replace tensor singular decomposition (t-SVD) with tensor CSVD-QR to appoximate t-SVD. To achieve faster completion for incomplete tensor, we use the tensor $L_{2,1}$-norm rather than traditional tensor nuclear norm. Furthermore, we introduce tensor QR decomposition into alternating direction method of multipliers (ADMM) framework. Numerical experiments witness that our method is faster than state-of-art algorithms with satisfactory accuracy.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Authors: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06949
Pdf link: https://arxiv.org/pdf/2307.06949
Abstract Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
Keyword: mobile

Scientific mobility, prestige and skill alignment in academic institutions
Authors: Marcia Ferreira, Rodrigo Costas, Vito Servedio, Stefan Thurner
Subjects: Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2307.06426
Pdf link: https://arxiv.org/pdf/2307.06426
Abstract Scientific institutions play a crucial role in driving intellectual, social, and technological progress. Their capacity to innovate depends mainly on their ability to attract, retain, and nurture scientific talent and ultimately make it available to other organizations, industries, or the economy. As researchers change institutions during their careers, their skills are also transferred. The extent and mechanisms by which academic institutions manage their internal portfolio of scientific skills by attracting and sending researchers are far from being understood. We examine 25 million publication histories of 9.2 million scientists extracted from a large-scale bibliographic database covering thousands of research institutions worldwide to understand how the skills of mobile scientists align with those present in-house. We find a clear association between top-ranked institutions and greater skill alignment, i.e., the degree to which skills of incoming academics match those of their colleagues at the institution. We uncover similar high-alignment for scientists leaving top-ranked institutions. This type of academic alignment is more pronounced in engineering and life, health, earth, and physical sciences than in mathematics, computer science, social sciences, and the humanities. We show that over the past two decades, institutions generally have become more closely aligned in their overall skill profiles. We interpret these results in terms of levels of proactive management of the composition of the scientific workforce, diversity, and internal collaboration strategies at the institutional level.
Regression-Oriented Knowledge Distillation for Lightweight Ship Orientation Angle Prediction with Optical Remote Sensing Images
Authors: Zhan Shi, Xin Ding, Peng Ding, Chun Yang, Ru Huang, Xiaoxuan Song
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.06566
Pdf link: https://arxiv.org/pdf/2307.06566
Abstract Ship orientation angle prediction (SOAP) with optical remote sensing images is an important image processing task, which often relies on deep convolutional neural networks (CNNs) to make accurate predictions. This paper proposes a novel framework to reduce the model sizes and computational costs of SOAP models without harming prediction accuracy. First, a new SOAP model called Mobile-SOAP is designed based on MobileNetV2, achieving state-of-the-art prediction accuracy. Four tiny SOAP models are also created by replacing the convolutional blocks in Mobile-SOAP with four small-scale networks, respectively. Then, to transfer knowledge from Mobile-SOAP to four lightweight models, we propose a novel knowledge distillation (KD) framework termed SOAP-KD consisting of a novel feature-based guidance loss and an optimized synthetic samples-based knowledge transfer mechanism. Lastly, extensive experiments on the FGSC-23 dataset confirm the superiority of Mobile-SOAP over existing models and also demonstrate the effectiveness of SOAP-KD in improving the prediction performance of four specially designed tiny models. Notably, by using SOAP-KD, the test mean absolute error of the ShuffleNetV2x1.0-based model is only 8% higher than that of Mobile-SOAP, but its number of parameters and multiply-accumulate operations (MACs) are respectively 61.6% and 60.8% less.
Multivariate Time Series characterization and forecasting of VoIP traffic in real mobile networks
Authors: Mario Di Mauro, Giovanni Galatro, Fabio Postiglione, Wei Song, Antonio Liotta
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.06645
Pdf link: https://arxiv.org/pdf/2307.06645
Abstract Predicting the behavior of real-time traffic (e.g., VoIP) in mobility scenarios could help the operators to better plan their network infrastructures and to optimize the allocation of resources. Accordingly, in this work the authors propose a forecasting analysis of crucial QoS/QoE descriptors (some of which neglected in the technical literature) of VoIP traffic in a real mobile environment. The problem is formulated in terms of a multivariate time series analysis. Such a formalization allows to discover and model the temporal relationships among various descriptors and to forecast their behaviors for future periods. Techniques such as Vector Autoregressive models and machine learning (deep-based and tree-based) approaches are employed and compared in terms of performance and time complexity, by reframing the multivariate time series problem into a supervised learning one. Moreover, a series of auxiliary analyses (stationarity, orthogonal impulse responses, etc.) are performed to discover the analytical structure of the time series and to provide deep insights about their relationships. The whole theoretical analysis has an experimental counterpart since a set of trials across a real-world LTE-Advanced environment has been performed to collect, post-process and analyze about 600,000 voice packets, organized per flow and differentiated per codec.
Federated Multi-Agent Deep Reinforcement Learning for Dynamic and Flexible 3D Operation of 5G Multi-MAP Networks
Authors: Esteban Catté, Mohamed Sana, Mickael Maman
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2307.06842
Pdf link: https://arxiv.org/pdf/2307.06842
Abstract This paper addresses the efficient management of Mobile Access Points (MAPs), which are Unmanned Aerial Vehicles (UAV), in 5G networks. We propose a two-level hierarchical architecture, which dynamically reconfigures the network while considering Integrated Access-Backhaul (IAB) constraints. The high-layer decision process determines the number of MAPs through consensus, and we develop a joint optimization process to account for co-dependence in network self-management. In the low-layer, MAPs manage their placement using a double-attention based Deep Reinforcement Learning (DRL) model that encourages cooperation without retraining. To improve generalization and reduce complexity, we propose a federated mechanism for training and sharing one placement model for every MAP in the low-layer. Additionally, we jointly optimize the placement and backhaul connectivity of MAPs using a multi-objective reward function, considering the impact of varying MAP placement on wireless backhaul connectivity.
Measuring a Low-Earth-Orbit Satellite Network
Authors: Jianping Pan, Jinwei Zhao, Lin Cai
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.06863
Pdf link: https://arxiv.org/pdf/2307.06863
Abstract Starlink and alike have attracted a lot of attention recently, however, the inner working of these low-earth-orbit (LEO) satellite networks is still largely unknown. This paper presents an ongoing measurement campaign focusing on Starlink, including its satellite access networks, gateway and point-of-presence structures, and backbone and Internet connections, revealing insights applicable to other LEO satellite providers. It also highlights the challenges and research opportunities of the integrated space-air-ground-aqua network envisioned by 6G mobile communication systems, and calls for a concerted community effort from practical and experimentation aspects.
Keyword: pruning

There is no result

Keyword: diffusion

Improving Nonalcoholic Fatty Liver Disease Classification Performance With Latent Diffusion Models
Authors: Romain Hardy, Cornelia Ilin, Joe Klepich, Ryan Mitchell, Steve Hall, Jericho Villareal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.06507
Pdf link: https://arxiv.org/pdf/2307.06507
Abstract Integrating deep learning with clinical expertise holds great potential for addressing healthcare challenges and empowering medical professionals with improved diagnostic tools. However, the need for annotated medical images is often an obstacle to leveraging the full power of machine learning models. Our research demonstrates that by combining synthetic images, generated using diffusion models, with real images, we can enhance nonalcoholic fatty liver disease (NAFLD) classification performance. We evaluate the quality of the synthetic images by comparing two metrics: Inception Score (IS) and Fr\'{e}chet Inception Distance (FID), computed on diffusion-generated images and generative adversarial networks (GANs)-generated images. Our results show superior performance for the diffusion-generated images, with a maximum IS score of $1.90$ compared to $1.67$ for GANs, and a minimum FID score of $69.45$ compared to $99.53$ for GANs. Utilizing a partially frozen CNN backbone (EfficientNet v1), our synthetic augmentation method achieves a maximum image-level ROC AUC of $0.904$ on a NAFLD prediction task.
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion
Authors: Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2307.06526
Pdf link: https://arxiv.org/pdf/2307.06526
Abstract Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars. The previous state-of-the-art method utilized CLIP to supervise neural implicit models that reconstructed a human body mesh. However, this approach has two limitations. Firstly, the lack of avatar-specific models can cause facial distortion and unrealistic clothing in the generated avatars. Secondly, CLIP only provides optimization direction for the overall appearance, resulting in less impressive results. To address these limitations, we propose AvatarFusion, the first framework to use a latent diffusion model to provide pixel-level guidance for generating human-realistic avatars while simultaneously segmenting clothing from the avatar's body. AvatarFusion includes the first clothing-decoupled neural implicit avatar model that employs a novel Dual Volume Rendering strategy to render the decoupled skin and clothing sub-models in one space. We also introduce a novel optimization method, called Pixel-Semantics Difference-Sampling (PS-DS), which semantically separates the generation of body and clothes, and generates a variety of clothing styles. Moreover, we establish the first benchmark for zero-shot text-to-avatar generation. Our experimental results demonstrate that our framework outperforms previous approaches, with significant improvements observed in all metrics. Additionally, since our model is clothing-decoupled, we can exchange the clothes of avatars. Code will be available on Github.
Wavelet-based Edge Multiscale Parareal Algorithm for subdiffusion equations with heterogeneous coefficients in a large time domain
Authors: Guanglian Li
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.06529
Pdf link: https://arxiv.org/pdf/2307.06529
Abstract We present the Wavelet-based Edge Multiscale Parareal (WEMP) Algorithm, recently proposed in [Li and Hu, {\it J. Comput. Phys.}, 2021], for efficiently solving subdiffusion equations with heterogeneous coefficients in long time. This algorithm combines the benefits of multiscale methods, which can handle heterogeneity in the spatial domain, and the strength of parareal algorithms for speeding up time evolution problems when sufficient processors are available. Our algorithm overcomes the challenge posed by the nonlocality of the fractional derivative in previous parabolic problem work by constructing an auxiliary problem on each coarse temporal subdomain to completely uncouple the temporal variable. We prove the approximation properties of the correction operator and derive a new summation of exponential to generate a single-step time stepping scheme, with the number of terms of $\mathcal{O}(|\log{\tau_f}|^2)$ independent of the final time, where $\tau_f$ is the fine-scale time step size. We establish the convergence rate of our algorithm in terms of the mesh size in the spatial domain, the level parameter used in the multiscale method, the coarse-scale time step size, and the fine-scale time step size. Finally, we present several numerical tests that demonstrate the effectiveness of our algorithm and validate our theoretical results.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Authors: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06949
Pdf link: https://arxiv.org/pdf/2307.06949
Abstract Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
Keyword: adaptive

Trainability, Expressivity and Interpretability in Gated Neural ODEs
Authors: Timothy Doyeon Kim, Tankut Can, Kamesh Krishnamurthy
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2307.06398
Pdf link: https://arxiv.org/pdf/2307.06398
Abstract Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks.
Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning
Authors: Wenzhou Lv, Tianyu Wu, Luolin Xiong, Liang Wu, Jian Zhou, Yang Tang, Feng Qi
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06501
Pdf link: https://arxiv.org/pdf/2307.06501
Abstract Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control.
Free-Form Composition Networks for Egocentric Action Recognition
Authors: Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.06527
Pdf link: https://arxiv.org/pdf/2307.06527
Abstract Egocentric action recognition is gaining significant attention in the field of human action recognition. In this paper, we address data scarcity issue in egocentric action recognition from a compositional generalization perspective. To tackle this problem, we propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations, and then use them to compose new samples in the feature space for rare classes of action videos. First, we use a graph to capture the spatial-temporal relations among different hand/object instances in each action video. We thus decompose each action into a set of verb and preposition spatial-temporal representations using the edge features in the graph. The temporal decomposition extracts verb and preposition representations from different video frames, while the spatial decomposition adaptively learns verb and preposition representations from action-related instances in each frame. With these spatial-temporal representations of verbs and prepositions, we can compose new samples for those rare classes in a free-form manner, which is not restricted to a rigid form of a verb and a noun. The proposed FFCN can directly generate new training data samples for rare classes, hence significantly improve action recognition performance. We evaluated our method on three popular egocentric action recognition datasets, Something-Something V2, H2O, and EPIC-KITCHENS-100, and the experimental results demonstrate the effectiveness of the proposed method for handling data scarcity problems, including long-tailed and few-shot egocentric action recognition.
Domain-adaptive Person Re-identification without Cross-camera Paired Samples
Authors: Huafeng Li, Yanmei Mao, Yafei Zhang, Guanqiu Qi, Zhengtao Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.06533
Pdf link: https://arxiv.org/pdf/2307.06533
Abstract Existing person re-identification (re-ID) research mainly focuses on pedestrian identity matching across cameras in adjacent areas. However, in reality, it is inevitable to face the problem of pedestrian identity matching across long-distance scenes. The cross-camera pedestrian samples collected from long-distance scenes often have no positive samples. It is extremely challenging to use cross-camera negative samples to achieve cross-region pedestrian identity matching. Therefore, a novel domain-adaptive person re-ID method that focuses on cross-camera consistent discriminative feature learning under the supervision of unpaired samples is proposed. This method mainly includes category synergy co-promotion module (CSCM) and cross-camera consistent feature learning module (CCFLM). In CSCM, a task-specific feature recombination (FRT) mechanism is proposed. This mechanism first groups features according to their contributions to specific tasks. Then an interactive promotion learning (IPL) scheme between feature groups is developed and embedded in this mechanism to enhance feature discriminability. Since the control parameters of the specific task model are reduced after division by task, the generalization ability of the model is improved. In CCFLM, instance-level feature distribution alignment and cross-camera identity consistent learning methods are constructed. Therefore, the supervised model training is achieved under the style supervision of the target domain by exchanging styles between source-domain samples and target-domain samples, and the challenges caused by the lack of cross-camera paired samples are solved by utilizing cross-camera similar samples. In experiments, three challenging datasets are used as target domains, and the effectiveness of the proposed method is demonstrated through four experimental settings.
MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting
Authors: Tianlong Zhao, Xiang Ma, Xuemei Li, Caiming Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.06736
Pdf link: https://arxiv.org/pdf/2307.06736
Abstract Time series forecasting has received wide interest from existing research due to its broad applications and inherent challenging. The research challenge lies in identifying effective patterns in historical series and applying them to future forecasting. Advanced models based on point-wise connected MLP and Transformer architectures have strong fitting power, but their secondary computational complexity limits practicality. Additionally, those structures inherently disrupt the temporal order, reducing the information utilization and making the forecasting process uninterpretable. To solve these problems, this paper proposes a forecasting model, MPR-Net. It first adaptively decomposes multi-scale historical series patterns using convolution operation, then constructs a pattern extension forecasting method based on the prior knowledge of pattern reproduction, and finally reconstructs future patterns into future series using deconvolution operation. By leveraging the temporal dependencies present in the time series, MPR-Net not only achieves linear time complexity, but also makes the forecasting process interpretable. By carrying out sufficient experiments on more than ten real data sets of both short and long term forecasting tasks, MPR-Net achieves the state of the art forecasting performance, as well as good generalization and robustness performance.
Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach
Authors: Jinhua Si, Fang He, Xi Lin, Xindi Tang
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.06742
Pdf link: https://arxiv.org/pdf/2307.06742
Abstract The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and pooled-ride vehicle routing. To tackle these challenges, this study proposes a two-level framework designed to facilitate online fleet management. Specifically, a novel multi-agent feudal reinforcement learning model is proposed at the upper level of the framework to cooperatively assign idle vehicles to different intercity lines, while the lower level updates the routes of vehicles using an adaptive large neighborhood search heuristic. Numerical studies based on the realistic dataset of Xiamen and its surrounding cities in China show that the proposed framework effectively mitigates the supply and demand imbalances, and achieves significant improvement in both the average daily system profit and order fulfillment ratio.
Ensemble learning for blending gridded satellite and gauge-measured precipitation data
Authors: Georgia Papacharalampous, Hristos Tyralis, Nikolaos Doulamis, Anastasios Doulamis
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Applications (stat.AP); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2307.06840
Pdf link: https://arxiv.org/pdf/2307.06840
Abstract Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, ground-based measurements are the dependent variable and the satellite data are the predictor variables, together with topography factors. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this work, we fill this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions by six regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on the top of the base learners to combine their independent predictions...
FDAPT: Federated Domain-adaptive Pre-training for Language Models
Authors: Lekang Jiang, Filip Svoboda, Nicholas D. Lane
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.06933
Pdf link: https://arxiv.org/pdf/2307.06933
Abstract Combining Domain-adaptive Pre-training (DAPT) with Federated Learning (FL) can enhance model adaptation by leveraging more sensitive and distributed data while preserving data privacy. However, few studies have focused on this method. Therefore, we conduct the first comprehensive empirical study to evaluate the performance of Federated Domain-adaptive Pre-training (FDAPT). We demonstrate that FDAPT can maintain competitive downstream task performance to the centralized baseline in both IID and non-IID situations. Furthermore, we propose a novel algorithm, Frozen Federated Domain-adaptive Pre-training (FFDAPT). FFDAPT improves the computational efficiency by 12.1% on average and exhibits similar downstream task performance to standard FDAPT, with general performance fluctuations remaining less than 1%. Finally, through a critical evaluation of our work, we identify promising future research directions for this new research area.
Keyword: quantization

There is no result

A-suozhang / GetArxivDaily

New submissions for Fri, 14 Jul 23 #102

Keyword: efficient

Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes

ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression

Curve Fitting Simplified: Exploring the Intuitive Features of CurvPy

A Program That Simplifies Regular Expressions (Tool paper)

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

Primal logic of information

Efficiently-Verifiable Strong Uniquely Solvable Puzzles and Matrix Multiplication

Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!

Market Driven Multi-domain Network Service Orchestration in 5G Networks

Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems

Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning

Improving Nonalcoholic Fatty Liver Disease Classification Performance With Latent Diffusion Models

Migrating to Post-Quantum Cryptography: a Framework Using Security Dependency Analysis

Optimised Least Squares Approach for Accurate Rectangle Fitting

Wavelet-based Edge Multiscale Parareal Algorithm for subdiffusion equations with heterogeneous coefficients in a large time domain

Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

Deep learning based enhancement of ordered statistics decoding of LDPC codes

Online Distributed Learning with Quantized Finite-Time Coordination

cjdb: a simple, fast, and lean database solution for the CityGML data model

Frameless Graph Knowledge Distillation

Making local algorithms efficiently self-stabilizing in arbitrary asynchronous environments

Packing squares independently

Downlink Precoding for Cell-free FBMC/OQAM Systems With Asynchronous Reception

Transformer-based end-to-end classification of variable-length volumetric data

Overcoming the Mental Set Effect in Programming Problem Solving

Meta-State-Space Learning: An Identification Approach for Stochastic Dynamical Systems

YOLIC: An Efficient Method for Object Localization and Classification on Edge Devices

Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds

Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent

Layered controller synthesis for dynamic multi-agent systems

Planar Disjoint Paths, Treewidth, and Kernels

Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics

Data-driven Nonlinear Parametric Model Order Reduction Framework using Deep Hierarchical Variational Autoencoder

Federated Multi-Agent Deep Reinforcement Learning for Dynamic and Flexible 3D Operation of 5G Multi-MAP Networks

The Human Blockage Impact on ARIS Assisted D2D Communication Systems

Digital Twinning in Smart Grid Networks: Interplay, Resource Allocation and Use Cases

Open Source Reconfigurable Intelligent Surface for the Frequency Range of 5 GHz WiFi

mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Keyword: faster

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

Efficient Convolution and Transformer-Based Network for Video Frame Interpolation

WiscSort: External Sorting For Byte-Addressable Storage

Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning

On the Effective Horizon of Inverse Reinforcement Learning

cjdb: a simple, fast, and lean database solution for the CityGML data model

Tensor Completion via Leverage Sampling and Tensor QR Decomposition for Network Latency Estimation

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Keyword: mobile

Scientific mobility, prestige and skill alignment in academic institutions

Regression-Oriented Knowledge Distillation for Lightweight Ship Orientation Angle Prediction with Optical Remote Sensing Images

Multivariate Time Series characterization and forecasting of VoIP traffic in real mobile networks

Federated Multi-Agent Deep Reinforcement Learning for Dynamic and Flexible 3D Operation of 5G Multi-MAP Networks

Measuring a Low-Earth-Orbit Satellite Network

Keyword: pruning

Keyword: diffusion

Improving Nonalcoholic Fatty Liver Disease Classification Performance With Latent Diffusion Models

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

Wavelet-based Edge Multiscale Parareal Algorithm for subdiffusion equations with heterogeneous coefficients in a large time domain

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Keyword: adaptive

Trainability, Expressivity and Interpretability in Gated Neural ODEs

Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning

Free-Form Composition Networks for Egocentric Action Recognition

Domain-adaptive Person Re-identification without Cross-camera Paired Samples

MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach

Ensemble learning for blending gridded satellite and gauge-measured precipitation data

FDAPT: Federated Domain-adaptive Pre-training for Language Models

Keyword: quantization