New submissions for Mon, 26 Jun 23

Keyword: efficient

AmicroN: A Framework for Generating Annotations for Human Activity Recognition with Granular Micro-Activities

Authors: Soumyajit Chatterjee, Bivas Mitra, Sandip Chakraborty
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13149
Pdf link: https://arxiv.org/pdf/2306.13149
Abstract Efficient human activity recognition (HAR) using sensor data needs a significant volume of annotated data. The growing volume of unlabelled sensor data has challenged conventional practices for gathering HAR annotations with human-in-the-loop approaches, often leading to the collection of shallower annotations. These shallower annotations ignore the fine-grained micro-activities that constitute any complex activities of daily living (ADL). Understanding this, we, in this paper, first analyze this lack of granular annotations from available pre-annotated datasets to understand the practical inconsistencies and also perform a detailed survey to look into the human perception surrounding annotations. Drawing motivations from these, we next develop the framework AmicroN that can automatically generate micro-activity annotations using locomotive signatures and the available coarse-grain macro-activity labels. In the backend, AmicroN applies change-point detection followed by zero-shot learning with activity embeddings to identify the unseen micro-activities in an unsupervised manner. Rigorous evaluation on publicly available datasets shows that AmicroN can accurately generate micro-activity annotations with a median F1-score of >0.75. Additionally, we also show that AmicroN can be used in a plug-and-play manner with Large Language Models (LLMs) to obtain the micro-activity labels, thus making it more practical for realistic applications.
Optimal Cost-Preference Trade-off Planning with Multiple Temporal Tasks
Authors: Peter Amorese, Morteza Lahijanian
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
Arxiv link: https://arxiv.org/abs/2306.13222
Pdf link: https://arxiv.org/pdf/2306.13222
Abstract Autonomous robots are increasingly utilized in realistic scenarios with multiple complex tasks. In these scenarios, there may be a preferred way of completing all of the given tasks, but it is often in conflict with optimal execution. Recent work studies preference-based planning, however, they have yet to extend the notion of preference to the behavior of the robot with respect to each task. In this work, we introduce a novel notion of preference that provides a generalized framework to express preferences over individual tasks as well as their relations. Then, we perform an optimal trade-off (Pareto) analysis between behaviors that adhere to the user's preference and the ones that are resource optimal. We introduce an efficient planning framework that generates Pareto-optimal plans given user's preference by extending A search. Further, we show a method of computing the entire Pareto front (the set of all optimal trade-offs) via an adaptation of a multi-objective A algorithm. We also present a problem-agnostic search heuristic to enable scalability. We illustrate the power of the framework on both mobile robots and manipulators. Our benchmarks show the effectiveness of the heuristic with up to 2-orders of magnitude speedup.
Document Image Cleaning using Budget-Aware Black-Box Approximation
Authors: Ganesh Tata, Katyani Singh, Eric Van Oeveren, Nilanjan Ray
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13236
Pdf link: https://arxiv.org/pdf/2306.13236
Abstract Recent work has shown that by approximating the behaviour of a non-differentiable black-box function using a neural network, the black-box can be integrated into a differentiable training pipeline for end-to-end training. This methodology is termed "differentiable bypass,'' and a successful application of this method involves training a document preprocessor to improve the performance of a black-box OCR engine. However, a good approximation of an OCR engine requires querying it for all samples throughout the training process, which can be computationally and financially expensive. Several zeroth-order optimization (ZO) algorithms have been proposed in black-box attack literature to find adversarial examples for a black-box model by computing its gradient in a query-efficient manner. However, the query complexity and convergence rate of such algorithms makes them infeasible for our problem. In this work, we propose two sample selection algorithms to train an OCR preprocessor with less than 10% of the original system's OCR engine queries, resulting in more than 60% reduction of the total training time without significant loss of accuracy. We also show an improvement of 4% in the word-level accuracy of a commercial OCR engine with only 2.5% of the total queries and a 32x reduction in monetary cost. Further, we propose a simple ranking technique to prune 30% of the document images from the training dataset without affecting the system's performance.
Nonsmooth Control Barrier Functions for Obstacle Avoidance between Convex Regions
Authors: Akshay Thirugnanam, Jun Zeng, Koushil Sreenath
Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.13259
Pdf link: https://arxiv.org/pdf/2306.13259
Abstract In this paper, we focus on non-conservative obstacle avoidance between robots with control affine dynamics with strictly convex and polytopic shapes. The core challenge for this obstacle avoidance problem is that the minimum distance between strictly convex regions or polytopes is generally implicit and non-smooth, such that distance constraints cannot be enforced directly in the optimization problem. To handle this challenge, we employ non-smooth control barrier functions to reformulate the avoidance problem in the dual space, with the positivity of the minimum distance between robots equivalently expressed using a quadratic program. Our approach is proven to guarantee system safety. We theoretically analyze the smoothness properties of the minimum distance quadratic program and its KKT conditions. We validate our approach by demonstrating computationally-efficient obstacle avoidance for multi-agent robotic systems with strictly convex and polytopic shapes. To our best knowledge, this is the first time a real-time QP problem can be formulated for general non-conservative avoidance between strictly convex shapes and polytopes.
Variance-Covariance Regularization Improves Representation Learning
Authors: Jiachen Zhu, Ravid Shwartz-Ziv, Yubei Chen, Yann LeCun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13292
Pdf link: https://arxiv.org/pdf/2306.13292
Abstract Transfer learning has emerged as a key approach in the machine learning domain, enabling the application of knowledge derived from one domain to improve performance on subsequent tasks. Given the often limited information about these subsequent tasks, a strong transfer learning approach calls for the model to capture a diverse range of features during the initial pretraining stage. However, recent research suggests that, without sufficient regularization, the network tends to concentrate on features that primarily reduce the pretraining loss function. This tendency can result in inadequate feature learning and impaired generalization capability for target tasks. To address this issue, we propose Variance-Covariance Regularization (VCR), a regularization technique aimed at fostering diversity in the learned network features. Drawing inspiration from recent advancements in the self-supervised learning approach, our approach promotes learned representations that exhibit high variance and minimal covariance, thus preventing the network from focusing solely on loss-reducing features. We empirically validate the efficacy of our method through comprehensive experiments coupled with in-depth analytical studies on the learned representations. In addition, we develop an efficient implementation strategy that assures minimal computational overhead associated with our method. Our results indicate that VCR is a powerful and efficient method for enhancing transfer learning performance for both supervised learning and self-supervised learning, opening new possibilities for future research in this domain.
Deep Omni-supervised Learning for Rib Fracture Detection from Chest Radiology Images
Authors: Zhizhong Chai, Luyang Luo, Huangjing Lin, Pheng-Ann Heng, Hao Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13301
Pdf link: https://arxiv.org/pdf/2306.13301
Abstract Deep learning (DL)-based rib fracture detection has shown promise of playing an important role in preventing mortality and improving patient outcome. Normally, developing DL-based object detection models requires huge amount of bounding box annotation. However, annotating medical data is time-consuming and expertise-demanding, making obtaining a large amount of fine-grained annotations extremely infeasible. This poses pressing need of developing label-efficient detection models to alleviate radiologists' labeling burden. To tackle this challenge, the literature of object detection has witnessed an increase of weakly-supervised and semi-supervised approaches, yet still lacks a unified framework that leverages various forms of fully-labeled, weakly-labeled, and unlabeled data. In this paper, we present a novel omni-supervised object detection network, ORF-Netv2, to leverage as much available supervision as possible. Specifically, a multi-branch omni-supervised detection head is introduced with each branch trained with a specific type of supervision. A co-training-based dynamic label assignment strategy is then proposed to enable flexibly and robustly learning from the weakly-labeled and unlabeled data. Extensively evaluation was conducted for the proposed framework with three rib fracture datasets on both chest CT and X-ray. By leveraging all forms of supervision, ORF-Netv2 achieves mAPs of 34.7, 44.7, and 19.4 on the three datasets, respectively, surpassing the baseline detector which uses only box annotations by mAP gains of 3.8, 4.8, and 5.0, respectively. Furthermore, ORF-Netv2 consistently outperforms other competitive label-efficient methods over various scenarios, showing a promising framework for label-efficient fracture detection.
Abstractive Text Summarization for Resumes With Cutting Edge NLP Transformers and LSTM
Authors: Öykü Berfin Mercan, Sena Nur Cavsak, Aysu Deliahmetoglu (Intern), Senem Tanberk
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.13315
Pdf link: https://arxiv.org/pdf/2306.13315
Abstract Text summarization is a fundamental task in natural language processing that aims to condense large amounts of textual information into concise and coherent summaries. With the exponential growth of content and the need to extract key information efficiently, text summarization has gained significant attention in recent years. In this study, LSTM and pre-trained T5, Pegasus, BART and BART-Large model performances were evaluated on the open source dataset (Xsum, CNN/Daily Mail, Amazon Fine Food Review and News Summary) and the prepared resume dataset. This resume dataset consists of many information such as language, education, experience, personal information, skills, and this data includes 75 resumes. The primary objective of this research was to classify resume text. Various techniques such as LSTM, pre-trained models, and fine-tuned models were assessed using a dataset of resumes. The BART-Large model fine-tuned with the resume dataset gave the best performance.
Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning
Authors: Shaofeng Zhang, Feng Zhu, Rui Zhao, Junchi Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13337
Pdf link: https://arxiv.org/pdf/2306.13337
Abstract We propose ADCLR: A ccurate and D ense Contrastive Representation Learning, a novel self-supervised learning framework for learning accurate and dense vision representation. To extract spatial-sensitive information, ADCLR introduces query patches for contrasting in addition with global contrasting. Compared with previous dense contrasting methods, ADCLR mainly enjoys three merits: i) achieving both global-discriminative and spatial-sensitive representation, ii) model-efficient (no extra parameters in addition to the global contrasting baseline), and iii) correspondence-free and thus simpler to implement. Our approach achieves new state-of-the-art performance for contrastive methods. On classification tasks, for ViT-S, ADCLR achieves 77.5% top-1 accuracy on ImageNet with linear probing, outperforming our baseline (DINO) without our devised techniques as plug-in, by 0.5%. For ViT-B, ADCLR achieves 79.8%, 84.0% accuracy on ImageNet by linear probing and finetune, outperforming iBOT by 0.3%, 0.2% accuracy. For dense tasks, on MS-COCO, ADCLR achieves significant improvements of 44.3% AP on object detection, 39.7% AP on instance segmentation, outperforming previous SOTA method SelfPatch by 2.2% and 1.2%, respectively. On ADE20K, ADCLR outperforms SelfPatch by 1.0% mIoU, 1.2% mAcc on the segme
Multi-objective optimization based network control principles for identifying personalized drug targets with cancer
Authors: Jing Liang, Zhuo Hu, Zong-Wei Li, Kang-Jia Qiao, Wei-Feng Guo
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2306.13349
Pdf link: https://arxiv.org/pdf/2306.13349
Abstract It is a big challenge to develop efficient models for identifying personalized drug targets (PDTs) from high-dimensional personalized genomic profile of individual patients. Recent structural network control principles have introduced a new approach to discover PDTs by selecting an optimal set of driver genes in personalized gene interaction network (PGIN). However, most of current methods only focus on controlling the system through a minimum driver-node set and ignore the existence of multiple candidate driver-node sets for therapeutic drug target identification in PGIN. Therefore, this paper proposed multi-objective optimization-based structural network control principles (MONCP) by considering minimum driver nodes and maximum prior-known drug-target information. To solve MONCP, a discrete multi-objective optimization problem is formulated with many constrained variables, and a novel evolutionary optimization model called LSCV-MCEA was developed by adapting a multi-tasking framework and a rankings-based fitness function method. With genomics data of patients with breast or lung cancer from The Cancer Genome Atlas database, the effectiveness of LSCV-MCEA was validated. The experimental results indicated that compared with other advanced methods, LSCV-MCEA can more effectively identify PDTs with the highest Area Under the Curve score for predicting clinically annotated combinatorial drugs. Meanwhile, LSCV-MCEA can more effectively solve MONCP than other evolutionary optimization methods in terms of algorithm convergence and diversity. Particularly, LSCV-MCEA can efficiently detect disease signals for individual patients with BRCA cancer. The study results show that multi-objective optimization can solve structural network control principles effectively and offer a new perspective for understanding tumor heterogeneity in cancer precision medicine.
A selectively reduced degree basis for efficient mixed nonlinear isogeometric beam formulations with extensible directors
Authors: Myung-Jin Choi, Roger A. Sauer, Sven Klinkel
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.13354
Pdf link: https://arxiv.org/pdf/2306.13354
Abstract The effect of higher order continuity in the solution field by using NURBS basis function in isogeometric analysis (IGA) is investigated for an efficient mixed finite element formulation for elastostatic beams. It is based on the Hu-Washizu variational principle considering geometrical and material nonlinearities. Here we present a reduced degree of basis functions for the additional fields of the stress resultants and strains of the beam, which are allowed to be discontinuous across elements. This approach turns out to significantly improve the computational efficiency and the accuracy of the results. We consider a beam formulation with extensible directors, where cross-sectional strains are enriched to avoid Poisson locking by an enhanced assumed strain method. In numerical examples, we show the superior per degree-of-freedom accuracy of IGA over conventional finite element analysis, due to the higher order continuity in the displacement field. We further verify the efficient rotational coupling between beams, as well as the path-independence of the results.
Human Activity Behavioural Pattern Recognition in Smarthome with Long-hour Data Collection
Authors: Ranjit Kolkar, Geetha V
Subjects: Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.13374
Pdf link: https://arxiv.org/pdf/2306.13374
Abstract The research on human activity recognition has provided novel solutions to many applications like healthcare, sports, and user profiling. Considering the complex nature of human activities, it is still challenging even after effective and efficient sensors are available. The existing works on human activity recognition using smartphone sensors focus on recognizing basic human activities like sitting, sleeping, standing, stair up and down and running. However, more than these basic activities is needed to analyze human behavioural pattern. The proposed framework recognizes basic human activities using deep learning models. Also, ambient sensors like PIR, pressure sensors, and smartphone-based sensors like accelerometers and gyroscopes are combined to make it hybrid-sensor-based human activity recognition. The hybrid approach helped derive more activities than the basic ones, which also helped derive human activity patterns or user profiling. User profiling provides sufficient information to identify daily living activity patterns and predict whether any anomaly exists. The framework provides the base for applications such as elderly monitoring when they are alone at home. The GRU model's accuracy of 95\% is observed to recognize the basic activities. Finally, Human activity patterns over time are recognized based on the duration and frequency of the activities. It is observed that human activity pattern, like, morning walking duration, varies depending on the day of the week.
Solving a class of multi-scale elliptic PDEs by means of Fourier-based mixed physics informed neural networks
Authors: Xi'an Li, Jinran Wu, Zhi-Qin John Xu, You-Gan Wang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.13385
Pdf link: https://arxiv.org/pdf/2306.13385
Abstract Deep neural networks have received significant attention due to their simplicity and flexibility in the fields of engineering and scientific calculation. In this work, we probe into solving a class of elliptic PDEs with multiple scales by means of Fourier-based mixed physics-informed neural networks (called FMPINN), and its solver is configured as a multi-scale DNN model. Unlike the classical PINN method, a dual (flux) variable about the rough coefficient of PDEs is introduced to avoid the ill-condition of neural tangent kernel matrix that resulted from the oscillating coefficient of multi-scale PDEs. Therefore, apart from the physical conservation laws, the discrepancy between the auxiliary variables and the gradients of multi-scale coefficients is incorporated into the cost function, then leveraging the optimization method to yield the satisfactory solution of PDEs by minimizing the defined loss. Additionally, a novel trigonometric activation function is introduced for FMPINN, which is suited for representing the derivatives of complex target functions. Handling the input data by Fourier feature mapping will effectively improve the capacity of deep neural networks to solve high-frequency problems. Finally, by introducing several numerical examples of multi-scale problems in various dimensional Euclidean spaces, we validate the efficiency and robustness of the proposed FMPINN algorithm in both low-frequency and high-frequency oscillation cases.
3DSAM-adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation
Authors: Shizhan Gong, Yuan Zhong, Wenao Ma, Jinpeng Li, Zhao Wang, Jingyang Zhang, Pheng-Ann Heng, Qi Dou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13465
Pdf link: https://arxiv.org/pdf/2306.13465
Abstract Despite that the segment anything model (SAM) achieved impressive results on general-purpose semantic segmentation with strong generalization ability on daily images, its demonstrated performance on medical image segmentation is less precise and not stable, especially when dealing with tumor segmentation tasks that involve objects of small sizes, irregular shapes, and low contrast. Notably, the original SAM architecture is designed for 2D natural images, therefore would not be able to extract the 3D spatial information from volumetric medical data effectively. In this paper, we propose a novel adaptation method for transferring SAM from 2D to 3D for promptable medical image segmentation. Through a holistically designed scheme for architecture modification, we transfer the SAM to support volumetric inputs while retaining the majority of its pre-trained parameters for reuse. The fine-tuning process is conducted in a parameter-efficient manner, wherein most of the pre-trained parameters remain frozen, and only a few lightweight spatial adapters are introduced and tuned. Regardless of the domain gap between natural and medical data and the disparity in the spatial arrangement between 2D and 3D, the transformer trained on natural images can effectively capture the spatial patterns present in volumetric medical images with only lightweight adaptations. We conduct experiments on four open-source tumor segmentation datasets, and with a single click prompt, our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation. We also compare our adaptation method with existing popular adapters, and observed significant performance improvement on most datasets.
A Stabilized Circuit-Consistent Foil Conductor Model
Authors: Jonas Bundschuh, Idoia Cortes Garcia, Herbert De Gersem, Elias Paakkunainen, Sebastian Schöps
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.13477
Pdf link: https://arxiv.org/pdf/2306.13477
Abstract The magnetoquasistatic simulation of large power converters, in particular transformers, requires efficient models for their foils windings by means of homogenization techniques. In this article, the classical foil conductor model is derived and an inconsistency in terms of circuit theory is observed, which may lead to time-stepping instability. This can be related to the differential-algebraic nature of the resulting system of equations. It is shown how the foil conductor model can be adapted to mitigate this problem by a modified definition of the turn-by-turn conductance matrix. Numerical results are presented to demonstrate the instability and to verify the effectiveness of the new adapted foil conductor model.
Safe Risk-averse Bayesian Optimization for Controller Tuning
Authors: Christopher Koenig, Miks Ozols, Anastasia Makarova, Efe C. Balta, Andreas Krause, Alisa Rupenyan
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.13479
Pdf link: https://arxiv.org/pdf/2306.13479
Abstract Controller tuning and parameter optimization are crucial in system design to improve both the controller and underlying system performance. Bayesian optimization has been established as an efficient model-free method for controller tuning and adaptation. Standard methods, however, are not enough for high-precision systems to be robust with respect to unknown input-dependent noise and stable under safety constraints. In this work, we present a novel data-driven approach, RaGoOSE, for safe controller tuning in the presence of heteroscedastic noise, combining safe learning with risk-averse Bayesian optimization. We demonstrate the method for synthetic benchmark and compare its performance to established BO-based tuning methods. We further evaluate RaGoOSE performance on a real precision-motion system utilized in semiconductor industry applications and compare it to the built-in auto-tuning routine.
Circuit simulation using explicit methods: singular matrix issues
Authors: Mahesh B. Patil
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2306.13489
Pdf link: https://arxiv.org/pdf/2306.13489
Abstract Some aspects of the ELectrical EXplicit (ELEX) scheme for using explicit integration schemes in circuit simulation are discussed. It is pointed out that the parallel resistor approach, presented earlier to address singular matrix issues arising in the ELEX scheme, is not adequately robust for incorporation in a general-purpose simulator for power electronic circuits. New topology-aware approaches, which are more robust and efficient compared to the parallel resistor approach, are presented. Several circuit examples are considered to illustrate the new approaches.
Smoothed Circulant Embedding with Applications to Multilevel Monte Carlo Methods for PDEs with Random Coefficients
Authors: Anastasia Istratuca, Aretha Teckentrup
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.13493
Pdf link: https://arxiv.org/pdf/2306.13493
Abstract We consider the computational efficiency of Monte Carlo (MC) and Multilevel Monte Carlo (MLMC) methods applied to partial differential equations with random coefficients. These arise, for example, in groundwater flow modelling, where a commonly used model for the unknown parameter is a random field. We make use of the circulant embedding procedure for sampling from the aforementioned coefficient. To improve the computational complexity of the MLMC estimator in the case of highly oscillatory random fields, we devise and implement a smoothing technique integrated into the circulant embedding method. This allows to choose the coarsest mesh on the first level of MLMC independently of the correlation length of the covariance function of the random field, leading to considerable savings in computational cost. We illustrate this with numerical experiments, where we see a saving of factor 5-10 in computational cost for accuracies of practical interest.
Onboarding Citizens to Digital Identity Systems
Authors: Tasos Spiliotopoulos, Al Tariq Sheik, Debora Gottardello, Robert Dover
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2306.13511
Pdf link: https://arxiv.org/pdf/2306.13511
Abstract Digital Identity (DI) technologies have the potential to enhance the quality of life of citizens through the provision of seamless services, improve the effectiveness of public services, and increase overall economic competitiveness. However, lack of access to DIs can limit these benefits, while unequal access can lead to uneven distribution of these benefits across social groups and escalate existing tensions. Accessible, user-friendly and efficient onboarding can play a key role in ensuring equitable access and wide adoption of DI technologies. This paper proposes the development of physical locations (Experience Centres) that can be used for citizen onboarding to national DI systems, positively shaping citizens' first impression with the technology and, in turn, promoting adoption. To this end, we outline a multidisciplinary research approach for identifying and addressing the considerations necessary for designing, developing and operating a model Experience Centre for DI onboarding in an inclusive manner.
DISCO-10M: A Large-Scale Music Dataset
Authors: Luca A. Lanzendörfer, Florian Grötschla, Emil Funke, Roger Wattenhofer
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.13512
Pdf link: https://arxiv.org/pdf/2306.13512
Abstract Music datasets play a crucial role in advancing research in machine learning for music. However, existing music datasets suffer from limited size, accessibility, and lack of audio resources. To address these shortcomings, we present DISCO-10M, a novel and extensive music dataset that surpasses the largest previously available music dataset by an order of magnitude. To ensure high-quality data, we implement a multi-stage filtering process. This process incorporates similarities based on textual descriptions and audio embeddings. Moreover, we provide precomputed CLAP embeddings alongside DISCO-10M, facilitating direct application on various downstream tasks. These embeddings enable efficient exploration of machine learning applications on the provided data. With DISCO-10M, we aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.
Binary domain generalization for sparsifying binary neural networks
Authors: Riccardo Schiavone, Francesco Galati, Maria A. Zuluaga
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13515
Pdf link: https://arxiv.org/pdf/2306.13515
Abstract Binary neural networks (BNNs) are an attractive solution for developing and deploying deep neural network (DNN)-based applications in resource constrained devices. Despite their success, BNNs still suffer from a fixed and limited compression factor that may be explained by the fact that existing pruning methods for full-precision DNNs cannot be directly applied to BNNs. In fact, weight pruning of BNNs leads to performance degradation, which suggests that the standard binarization domain of BNNs is not well adapted for the task. This work proposes a novel more general binary domain that extends the standard binary one that is more robust to pruning techniques, thus guaranteeing improved compression and avoiding severe performance losses. We demonstrate a closed-form solution for quantizing the weights of a full-precision network into the proposed binary domain. Finally, we show the flexibility of our method, which can be combined with other pruning strategies. Experiments over CIFAR-10 and CIFAR-100 demonstrate that the novel approach is able to generate efficient sparse networks with reduced memory usage and run-time latency, while maintaining performance.
Torsion Graph Neural Networks
Authors: Cong Shen, Xiang Liu, Jiawei Luo, Kelin Xia
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.13541
Pdf link: https://arxiv.org/pdf/2306.13541
Abstract Geometric deep learning (GDL) models have demonstrated a great potential for the analysis of non-Euclidian data. They are developed to incorporate the geometric and topological information of non-Euclidian data into the end-to-end deep learning architectures. Motivated by the recent success of discrete Ricci curvature in graph neural network (GNNs), we propose TorGNN, an analytic Torsion enhanced Graph Neural Network model. The essential idea is to characterize graph local structures with an analytic torsion based weight formula. Mathematically, analytic torsion is a topological invariant that can distinguish spaces which are homotopy equivalent but not homeomorphic. In our TorGNN, for each edge, a corresponding local simplicial complex is identified, then the analytic torsion (for this local simplicial complex) is calculated, and further used as a weight (for this edge) in message-passing process. Our TorGNN model is validated on link prediction tasks from sixteen different types of networks and node classification tasks from three types of networks. It has been found that our TorGNN can achieve superior performance on both tasks, and outperform various state-of-the-art models. This demonstrates that analytic torsion is a highly efficient topological invariant in the characterization of graph structures and can significantly boost the performance of GNNs.
Manifold Contrastive Learning with Variational Lie Group Operators
Authors: Kion Fallah, Alec Helbling, Kyle A. Johnsen, Christopher J. Rozell
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13544
Pdf link: https://arxiv.org/pdf/2306.13544
Abstract Self-supervised learning of deep neural networks has become a prevalent paradigm for learning representations that transfer to a variety of downstream tasks. Similar to proposed models of the ventral stream of biological vision, it is observed that these networks lead to a separation of category manifolds in the representations of the penultimate layer. Although this observation matches the manifold hypothesis of representation learning, current self-supervised approaches are limited in their ability to explicitly model this manifold. Indeed, current approaches often only apply augmentations from a pre-specified set of "positive pairs" during learning. In this work, we propose a contrastive learning approach that directly models the latent manifold using Lie group operators parameterized by coefficients with a sparsity-promoting prior. A variational distribution over these coefficients provides a generative model of the manifold, with samples which provide feature augmentations applicable both during contrastive training and downstream tasks. Additionally, learned coefficient distributions provide a quantification of which transformations are most likely at each point on the manifold while preserving identity. We demonstrate benefits in self-supervised benchmarks for image datasets, as well as a downstream semi-supervised task. In the former case, we demonstrate that the proposed methods can effectively apply manifold feature augmentations and improve learning both with and without a projection head. In the latter case, we demonstrate that feature augmentations sampled from learned Lie group operators can improve classification performance when using few labels.
Inferring Hierarchical Structure in Multi-Room Maze Environments
Authors: Daria de Tinguy, Toon Van de Maele, Tim Verbelen, Bart Dhoedt
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.13546
Pdf link: https://arxiv.org/pdf/2306.13546
Abstract Cognitive maps play a crucial role in facilitating flexible behaviour by representing spatial and conceptual relationships within an environment. The ability to learn and infer the underlying structure of the environment is crucial for effective exploration and navigation. This paper introduces a hierarchical active inference model addressing the challenge of inferring structure in the world from pixel-based observations. We propose a three-layer hierarchical model consisting of a cognitive map, an allocentric, and an egocentric world model, combining curiosity-driven exploration with goal-oriented behaviour at the different levels of reasoning from context to place to motion. This allows for efficient exploration and goal-directed search in room-structured mini-grid environments.
Active Coverage for PAC Reinforcement Learning
Authors: Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13601
Pdf link: https://arxiv.org/pdf/2306.13601
Abstract Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of "good coverage" really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of active coverage in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an instance-dependent lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, CovGame, that nearly matches it. We then show that CovGame can be used as a building block to solve different PAC RL tasks. In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one. By further coupling this exploration algorithm with a new technique to do implicit eliminations in policy space, we obtain a computationally-efficient algorithm for best-policy identification whose instance-dependent sample complexity scales with gaps between policy values.
Machine Learning methods for simulating particle response in the Zero Degree Calorimeter at the ALICE experiment, CERN
Authors: Jan Dubiński, Kamil Deja, Sandro Wenzel, Przemysław Rokita, Tomasz Trzciński
Subjects: Computer Vision and Pattern Recognition (cs.CV); High Energy Physics - Experiment (hep-ex)
Arxiv link: https://arxiv.org/abs/2306.13606
Pdf link: https://arxiv.org/pdf/2306.13606
Abstract Currently, over half of the computing power at CERN GRID is used to run High Energy Physics simulations. The recent updates at the Large Hadron Collider (LHC) create the need for developing more efficient simulation methods. In particular, there exists a demand for a fast simulation of the neutron Zero Degree Calorimeter, where existing Monte Carlo-based methods impose a significant computational burden. We propose an alternative approach to the problem that leverages machine learning. Our solution utilises neural network classifiers and generative models to directly simulate the response of the calorimeter. In particular, we examine the performance of variational autoencoders and generative adversarial networks, expanding the GAN architecture by an additional regularisation network and a simple, yet effective postprocessing step. Our approach increases the simulation speed by 2 orders of magnitude while maintaining the high fidelity of the simulation.
Adversarial Robustness Certification for Bayesian Neural Networks
Authors: Matthew Wicker, Andrea Patane, Luca Laurenti, Marta Kwiatkowska
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.13614
Pdf link: https://arxiv.org/pdf/2306.13614
Abstract We study the problem of certifying the robustness of Bayesian neural networks (BNNs) to adversarial input perturbations. Given a compact set of input points $T \subseteq \mathbb{R}^m$ and a set of output points $S \subseteq \mathbb{R}^n$, we define two notions of robustness for BNNs in an adversarial setting: probabilistic robustness and decision robustness. Probabilistic robustness is the probability that for all points in $T$ the output of a BNN sampled from the posterior is in $S$. On the other hand, decision robustness considers the optimal decision of a BNN and checks if for all points in $T$ the optimal decision of the BNN for a given loss function lies within the output set $S$. Although exact computation of these robustness properties is challenging due to the probabilistic and non-convex nature of BNNs, we present a unified computational framework for efficiently and formally bounding them. Our approach is based on weight interval sampling, integration, and bound propagation techniques, and can be applied to BNNs with a large number of parameters, and independently of the (approximate) inference method employed to train the BNN. We evaluate the effectiveness of our methods on various regression and classification tasks, including an industrial regression benchmark, MNIST, traffic sign recognition, and airborne collision avoidance, and demonstrate that our approach enables certification of robustness and uncertainty of BNN predictions.
CIDGIKc: Distance-Geometric Inverse Kinematics for Continuum Robots
Authors: Hanna Jiamei Zhang, Matthew Giamou, Filip Marić, Jonathan Kelly, Jessica Burgner-Kahrs
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.13617
Pdf link: https://arxiv.org/pdf/2306.13617
Abstract The small size, high dexterity, and intrinsic compliance of continuum robots (CRs) make them well suited for constrained environments. Solving the inverse kinematics (IK), that is finding robot joint configurations that satisfy desired position or pose queries, is a fundamental challenge in motion planning, control, and calibration for any robot structure. For CRs, the need to avoid obstacles in tightly confined workspaces greatly complicates the search for feasible IK solutions. Without an accurate initialization or multiple re-starts, existing algorithms often fail to find a solution. We present CIDGIKc (Convex Iteration for Distance-Geometric Inverse Kinematics for Continuum Robots), an algorithm that solves these nonconvex feasibility problems with a sequence of semidefinite programs whose objectives are designed to encourage low-rank minimizers. CIDGIKc is enabled by a novel distance-geometric parameterization of constant curvature segment geometry for CRs with extensible segments. The resulting IK formulation involves only quadratic expressions and can efficiently incorporate a large number of collision avoidance constraints. Our experimental results demonstrate >98% solve success rates within complex, highly cluttered environments which existing algorithms cannot account for.
On particular solutions of linear partial differential equations with polynomial right-hand-sides
Authors: Thomas G. Anderson, Marc Bonnet, Luiz M. Faria, Carlos Pérez-Arancibia
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2306.13628
Pdf link: https://arxiv.org/pdf/2306.13628
Abstract This paper introduces general methodologies for constructing closed-form solutions to several important partial differential equations (PDEs) with polynomial right-hand sides in two and three spatial dimensions. The covered equations include the isotropic and anisotropic Poisson, Helmholtz, Stokes, and elastostatic equations, as well as the time-harmonic linear elastodynamic and Maxwell equations. Polynomial solutions have recently regained significance in the development of numerical techniques for evaluating volume integral operators and have potential applications in certain kinds of Trefftz finite element methods. Our approach to all of these PDEs relates the particular solution to polynomial solutions of the Poisson and Helmholtz polynomial particular solutions, solutions that can in turn be obtained, respectively, from expansions using homogeneous polynomials and the Neumann series expansion of the operator $(k^2+\Delta)^{-1}$. No matrix inversion is required to compute the solution. The method naturally incorporates divergence constraints on the solution, such as in the case of Maxwell and Stokes flow equations. This work is accompanied by a freely available Julia library, \texttt{PolynomialSolutions.jl}, which implements the proposed methodology in a non-symbolic format and efficiently constructs and provides access to rapid evaluation of the desired solution.
LightGlue: Local Feature Matching at Light Speed
Authors: Philipp Lindenberger, Paul-Edouard Sarlin, Marc Pollefeys
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13643
Pdf link: https://arxiv.org/pdf/2306.13643
Abstract We introduce LightGlue, a deep neural network that learns to match local features across images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse matching, and derive simple but effective improvements. Cumulatively, they make LightGlue more efficient - in terms of both memory and computation, more accurate, and much easier to train. One key property is that LightGlue is adaptive to the difficulty of the problem: the inference is much faster on image pairs that are intuitively easy to match, for example because of a larger visual overlap or limited appearance change. This opens up exciting prospects for deploying deep matchers in latency-sensitive applications like 3D reconstruction. The code and trained models are publicly available at https://github.com/cvg/LightGlue.
ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration
Authors: Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, Lefei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13653
Pdf link: https://arxiv.org/pdf/2306.13653
Abstract Image restoration aims to reconstruct degraded images, e.g., denoising or deblurring. Existing works focus on designing task-specific methods and there are inadequate attempts at universal methods. However, simply unifying multiple tasks into one universal architecture suffers from uncontrollable and undesired predictions. To address those issues, we explore prompt learning in universal architectures for image restoration tasks. In this paper, we present Degradation-aware Visual Prompts, which encode various types of image degradation, e.g., noise and blur, into unified visual prompts. These degradation-aware prompts provide control over image processing and allow weighted combinations for customized image restoration. We then leverage degradation-aware visual prompts to establish a controllable and universal model for image restoration, called ProRes, which is applicable to an extensive range of image restoration tasks. ProRes leverages the vanilla Vision Transformer (ViT) without any task-specific designs. Furthermore, the pre-trained ProRes can easily adapt to new tasks through efficient prompt tuning with only a few images. Without bells and whistles, ProRes achieves competitive performance compared to task-specific methods and experiments can demonstrate its ability for controllable restoration and adaptation for new tasks. The code and models will be released in \url{https://github.com/leonmakise/ProRes}.
Keyword: faster

Improving Log-Cumulant Based Estimation of Roughness Information in SAR imagery
Authors: Jeova Farias Sales Rocha Neto, Francisco Alixandre Avila Rodrigues
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13200
Pdf link: https://arxiv.org/pdf/2306.13200
Abstract Synthetic Aperture Radar (SAR) image understanding is crucial in remote sensing applications, but it is hindered by its intrinsic noise contamination, called speckle. Sophisticated statistical models, such as the $\mathcal{G}^0$ family of distributions, have been employed to SAR data and many of the current advancements in processing this imagery have been accomplished through extracting information from these models. In this paper, we propose improvements to parameter estimation in $\mathcal{G}^0$ distributions using the Method of Log-Cumulants. First, using Bayesian modeling, we construct that regularly produce reliable roughness estimates under both $\mathcal{G}^0_A$ and $\mathcal{G}^0_I$ models. Second, we make use of an approximation of the Trigamma function to compute the estimated roughness in constant time, making it considerably faster than the existing method for this task. Finally, we show how we can use this method to achieve fast and reliable SAR image understanding based on roughness information.
Neural Network Pruning for Real-time Polyp Segmentation
Authors: Suman Sapkota, Pranav Poudel, Sudarshan Regmi, Bibek Panthi, Binod Bhattarai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13203
Pdf link: https://arxiv.org/pdf/2306.13203
Abstract Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising model size. Neural network pruning has emerged as an exciting area that aims to eliminate redundant parameters to make the inference faster. In this study, we show an application of neural network pruning in polyp segmentation. We compute the importance score of convolutional filters and remove the filters having the least scores, which to some value of pruning does not degrade the performance. For computing the importance score, we use the Taylor First Order (TaylorFO) approximation of the change in network output for the removal of certain filters. Specifically, we employ a gradient-normalized backpropagation for the computation of the importance score. Through experiments in the polyp datasets, we validate that our approach can significantly reduce the parameter count and FLOPs retaining similar performance.
LightGlue: Local Feature Matching at Light Speed
Authors: Philipp Lindenberger, Paul-Edouard Sarlin, Marc Pollefeys
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13643
Pdf link: https://arxiv.org/pdf/2306.13643
Abstract We introduce LightGlue, a deep neural network that learns to match local features across images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse matching, and derive simple but effective improvements. Cumulatively, they make LightGlue more efficient - in terms of both memory and computation, more accurate, and much easier to train. One key property is that LightGlue is adaptive to the difficulty of the problem: the inference is much faster on image pairs that are intuitively easy to match, for example because of a larger visual overlap or limited appearance change. This opens up exciting prospects for deploying deep matchers in latency-sensitive applications like 3D reconstruction. The code and trained models are publicly available at https://github.com/cvg/LightGlue.
Keyword: mobile

Cryptanalysis on Secure ECC based Mutual Authentication Protocol for Cloud-Assisted TMIS
Authors: Diksha, Meenakshi
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2306.13100
Pdf link: https://arxiv.org/pdf/2306.13100
Abstract The creation of TMIS (Telecare Medical Information System) makes it simpler for patients to receive healthcare services and opens up options for seeking medical attention and storing medical records with access control. With Wireless Medical Sensor Network and cloud-based architecture, TMIS gives the chance to patients to collect their physical health information from medical sensors and also upload this information to the cloud through their mobile devices. The communication is held through internet connectivity, therefore security and privacy are the main motive aspects of a secure cloud-assisted TMIS. However, because very sensitive data is transmitted between patients and doctors through the cloud server, thus security protection is important for this system. Recently, Kumar et al designed a mutual authentication protocol for cloud-assisted TMIS based on ECC [2]. In this paper, we revisited this scheme and traced out that their scheme has some significant pitfalls like health report revelation attack, and report confidentiality. In this study, we will provide the cryptanalysis of the scheme developed by Kumar et al.
Optimal Cost-Preference Trade-off Planning with Multiple Temporal Tasks
Authors: Peter Amorese, Morteza Lahijanian
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
Arxiv link: https://arxiv.org/abs/2306.13222
Pdf link: https://arxiv.org/pdf/2306.13222
Abstract Autonomous robots are increasingly utilized in realistic scenarios with multiple complex tasks. In these scenarios, there may be a preferred way of completing all of the given tasks, but it is often in conflict with optimal execution. Recent work studies preference-based planning, however, they have yet to extend the notion of preference to the behavior of the robot with respect to each task. In this work, we introduce a novel notion of preference that provides a generalized framework to express preferences over individual tasks as well as their relations. Then, we perform an optimal trade-off (Pareto) analysis between behaviors that adhere to the user's preference and the ones that are resource optimal. We introduce an efficient planning framework that generates Pareto-optimal plans given user's preference by extending A search. Further, we show a method of computing the entire Pareto front (the set of all optimal trade-offs) via an adaptation of a multi-objective A algorithm. We also present a problem-agnostic search heuristic to enable scalability. We illustrate the power of the framework on both mobile robots and manipulators. Our benchmarks show the effectiveness of the heuristic with up to 2-orders of magnitude speedup.
Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion
Authors: Chu Kiong Loo, Wei Shiung Liew, Stefan Wermter
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13410
Pdf link: https://arxiv.org/pdf/2306.13410
Abstract Real-time on-device continual learning applications are used on mobile phones, consumer robots, and smart appliances. Such devices have limited processing and memory storage capabilities, whereas continual learning acquires data over a long period of time. By necessity, lifelong learning algorithms have to be able to operate under such constraints while delivering good performance. This study presents the Explainable Lifelong Learning (ExLL) model, which incorporates several important traits: 1) learning to learn, in a single pass, from streaming data with scarce examples and resources; 2) a self-organizing prototype-based architecture that expands as needed and clusters streaming data into separable groups by similarity and preserves data against catastrophic forgetting; 3) an interpretable architecture to convert the clusters into explainable IF-THEN rules as well as to justify model predictions in terms of what is similar and dissimilar to the inference; and 4) inferences at the global and local level using a pairwise decision fusion process to enhance the accuracy of the inference, hence ``Glocal Pairwise Fusion.'' We compare ExLL against contemporary online learning algorithms for image recognition, using OpenLoris, F-SIOL-310, and Places datasets to evaluate several continual learning scenarios for video streams, low-sample learning, ability to scale, and imbalanced data streams. The algorithms are evaluated for their performance in accuracy, number of parameters, and experiment runtime requirements. ExLL outperforms all algorithms for accuracy in the majority of the tested scenarios.
Keyword: pruning

Neural Network Pruning for Real-time Polyp Segmentation
Authors: Suman Sapkota, Pranav Poudel, Sudarshan Regmi, Bibek Panthi, Binod Bhattarai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13203
Pdf link: https://arxiv.org/pdf/2306.13203
Abstract Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising model size. Neural network pruning has emerged as an exciting area that aims to eliminate redundant parameters to make the inference faster. In this study, we show an application of neural network pruning in polyp segmentation. We compute the importance score of convolutional filters and remove the filters having the least scores, which to some value of pruning does not degrade the performance. For computing the importance score, we use the Taylor First Order (TaylorFO) approximation of the change in network output for the removal of certain filters. Specifically, we employ a gradient-normalized backpropagation for the computation of the importance score. Through experiments in the polyp datasets, we validate that our approach can significantly reduce the parameter count and FLOPs retaining similar performance.
Pruning for Better Domain Generalizability
Authors: Xinglong Sun
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13237
Pdf link: https://arxiv.org/pdf/2306.13237
Abstract In this paper, we investigate whether we could use pruning as a reliable method to boost the generalization ability of the model. We found that existing pruning method like L2 can already offer small improvement on the target domain performance. We further propose a novel pruning scoring method, called DSS, designed not to maintain source accuracy as typical pruning work, but to directly enhance the robustness of the model. We conduct empirical experiments to validate our method and demonstrate that it can be even combined with state-of-the-art generalization work like MIRO(Cha et al., 2022) to further boost the performance. On MNIST to MNIST-M, we could improve the baseline performance by over 5 points by introducing 60% channel sparsity into the model. On DomainBed benchmark and state-of-the-art MIRO, we can further boost its performance by 1 point only by introducing 10% sparsity into the model. Code can be found at: https://github.com/AlexSunNik/Pruning-for-Better-Domain-Generalizability
Efficient Online Processing with Deep Neural Networks
Authors: Lukas Hedegaard
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13474
Pdf link: https://arxiv.org/pdf/2306.13474
Abstract The capabilities and adoption of deep neural networks (DNNs) grow at an exhilarating pace: Vision models accurately classify human actions in videos and identify cancerous tissue in medical scans as precisely than human experts; large language models answer wide-ranging questions, generate code, and write prose, becoming the topic of everyday dinner-table conversations. Even though their uses are exhilarating, the continually increasing model sizes and computational complexities have a dark side. The economic cost and negative environmental externalities of training and serving models is in evident disharmony with financial viability and climate action goals. Instead of pursuing yet another increase in predictive performance, this dissertation is dedicated to the improvement of neural network efficiency. Specifically, a core contribution addresses the efficiency aspects during online inference. Here, the concept of Continual Inference Networks (CINs) is proposed and explored across four publications. CINs extend prior state-of-the-art methods developed for offline processing of spatio-temporal data and reuse their pre-trained weights, improving their online processing efficiency by an order of magnitude. These advances are attained through a bottom-up computational reorganization and judicious architectural modifications. The benefit to online inference is demonstrated by reformulating several widely used network architectures into CINs, including 3D CNNs, ST-GCNs, and Transformer Encoders. An orthogonal contribution tackles the concurrent adaptation and computational acceleration of a large source model into multiple lightweight derived models. Drawing on fusible adapter networks and structured pruning, Structured Pruning Adapters achieve superior predictive accuracy under aggressive pruning using significantly fewer learned weights compared to fine-tuning with pruning.
Binary domain generalization for sparsifying binary neural networks
Authors: Riccardo Schiavone, Francesco Galati, Maria A. Zuluaga
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13515
Pdf link: https://arxiv.org/pdf/2306.13515
Abstract Binary neural networks (BNNs) are an attractive solution for developing and deploying deep neural network (DNN)-based applications in resource constrained devices. Despite their success, BNNs still suffer from a fixed and limited compression factor that may be explained by the fact that existing pruning methods for full-precision DNNs cannot be directly applied to BNNs. In fact, weight pruning of BNNs leads to performance degradation, which suggests that the standard binarization domain of BNNs is not well adapted for the task. This work proposes a novel more general binary domain that extends the standard binary one that is more robust to pruning techniques, thus guaranteeing improved compression and avoiding severe performance losses. We demonstrate a closed-form solution for quantizing the weights of a full-precision network into the proposed binary domain. Finally, we show the flexibility of our method, which can be combined with other pruning strategies. Experiments over CIFAR-10 and CIFAR-100 demonstrate that the novel approach is able to generate efficient sparse networks with reduced memory usage and run-time latency, while maintaining performance.
Keyword: diffusion

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks
Authors: Hongcheng Gao, Hao Zhang, Yinpeng Dong, Zhijie Deng
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13103
Pdf link: https://arxiv.org/pdf/2306.13103
Abstract Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions. The real-world applications of these models require particular attention to their safety and fidelity, but this has not been sufficiently explored. One fundamental question is whether existing T2I DMs are robust against variations over input texts. To answer it, this work provides the first robustness evaluation of T2I DMs against real-world attacks. Unlike prior studies that focus on malicious attacks involving apocryphal alterations to the input texts, we consider an attack space spanned by realistic errors (e.g., typo, glyph, phonetic) that humans can make, to ensure semantic consistency. Given the inherent randomness of the generation process, we develop novel distribution-based attack objectives to mislead T2I DMs. We perform attacks in a black-box manner without any knowledge of the model. Extensive experiments demonstrate the effectiveness of our method for attacking popular T2I DMs and simultaneously reveal their non-trivial robustness issues. Moreover, we provide an in-depth analysis of our method to show that it is not designed to attack the text encoder in T2I DMs solely.
DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability
Authors: Xiaolin Fang, Caelan Reed Garrett, Clemens Eppner, Tomás Lozano-Pérez, Leslie Pack Kaelbling, Dieter Fox
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.13196
Pdf link: https://arxiv.org/pdf/2306.13196
Abstract Task and Motion Planning (TAMP) approaches are effective at planning long-horizon autonomous robot manipulation. However, because they require a planning model, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by leveraging deep generative modeling, specifically diffusion models, to learn constraints and samplers that capture these difficult-to-engineer aspects of the planning model. These learned samplers are composed and combined within a TAMP solver in order to find action parameter values jointly that satisfy the constraints along a plan. To tractably make predictions for unseen objects in the environment, we define these samplers on low-dimensional learned latent embeddings of changing object state. We evaluate our approach in an articulated object manipulation domain and show how the combination of classical TAMP, generative learning, and latent embeddings enables long-horizon constraint-based reasoning.
Directional diffusion models for graph representation learning
Authors: Run Yang, Yuling Yang, Fan Zhou, Qiang Sun
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13210
Pdf link: https://arxiv.org/pdf/2306.13210
Abstract In recent years, diffusion models have achieved remarkable success in various domains of artificial intelligence, such as image synthesis, super-resolution, and 3D molecule generation. However, the application of diffusion models in graph learning has received relatively little attention. In this paper, we address this gap by investigating the use of diffusion models for unsupervised graph representation learning. We begin by identifying the anisotropic structures of graphs and a crucial limitation of the vanilla forward diffusion process in learning anisotropic structures. This process relies on continuously adding an isotropic Gaussian noise to the data, which may convert the anisotropic signals to noise too quickly. This rapid conversion hampers the training of denoising neural networks and impedes the acquisition of semantically meaningful representations in the reverse process. To address this challenge, we propose a new class of models called {\it directional diffusion models}. These models incorporate data-dependent, anisotropic, and directional noises in the forward diffusion process. To assess the efficacy of our proposed models, we conduct extensive experiments on 12 publicly available datasets, focusing on two distinct graph representation learning tasks. The experimental results demonstrate the superiority of our models over state-of-the-art baselines, indicating their effectiveness in capturing meaningful graph representations. Our studies not only provide valuable insights into the forward process of diffusion models but also highlight the wide-ranging potential of these models for various graph-related tasks.
DreamEditor: Text-Driven 3D Scene Editing with Neural Fields
Authors: Jingyu Zhuang, Chen Wang, Lingjie Liu, Liang Lin, Guanbin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13455
Pdf link: https://arxiv.org/pdf/2306.13455
Abstract Neural fields have achieved impressive advancements in view synthesis and scene reconstruction. However, editing these neural fields remains challenging due to the implicit encoding of geometry and texture information. In this paper, we propose DreamEditor, a novel framework that enables users to perform controlled editing of neural fields using text prompts. By representing scenes as mesh-based neural fields, DreamEditor allows localized editing within specific regions. DreamEditor utilizes the text encoder of a pretrained text-to-Image diffusion model to automatically identify the regions to be edited based on the semantics of the text prompts. Subsequently, DreamEditor optimizes the editing region and aligns its geometry and texture with the text prompts through score distillation sampling [29]. Extensive experiments have demonstrated that DreamEditor can accurately edit neural fields of real-world scenes according to the given text prompts while ensuring consistency in irrelevant areas. DreamEditor generates highly realistic textures and geometry, significantly surpassing previous works in both quantitative and qualitative evaluations.
Keyword: adaptive

A First Order Meta Stackelberg Method for Robust Federated Learning (Technical Report)
Authors: Henger Li, Tianyi Xu, Tao Li, Yunian Pan, Quanyan Zhu, Zizhan Zheng
Subjects: Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2306.13273
Pdf link: https://arxiv.org/pdf/2306.13273
Abstract Recent research efforts indicate that federated learning (FL) systems are vulnerable to a variety of security breaches. While numerous defense strategies have been suggested, they are mainly designed to counter specific attack patterns and lack adaptability, rendering them less effective when facing uncertain or adaptive threats. This work models adversarial FL as a Bayesian Stackelberg Markov game (BSMG) between the defender and the attacker to address the lack of adaptability to uncertain adaptive attacks. We further devise an effective meta-learning technique to solve for the Stackelberg equilibrium, leading to a resilient and adaptable defense. The experiment results suggest that our meta-Stackelberg learning approach excels in combating intense model poisoning and backdoor attacks of indeterminate types.
Energy-optimal control of adaptive structures
Authors: Manuel Schaller, Amelie Zeller, Michael Böhm, Oliver Sawodny, Cristina Tarín, Karl Worthmann
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.13331
Pdf link: https://arxiv.org/pdf/2306.13331
Abstract Adaptive structures are equipped with sensors and actuators to actively counteract external loads such as wind. This can significantly reduce resource consumption and emissions during the life cycle compared to conventional structures. A common approach is to derive a port-Hamiltonian model and to employ linear quadratic control. However, the quadratic control penalization lacks physical interpretation and merely serves as a regularization term. Rather, we propose a controller, which achieves the goal of vibration damping while acting energy optimal. Exploiting the port-Hamiltonian structure, we show that the optimal control is uniquely determined, even on singular arcs. Further, we prove a stable long-time behavior of optimal trajectories in the sense of a turnpike property. Last, the proposed controller's efficiency is evaluated by means of a numerical study.
PathMLP: Smooth Path Towards High-order Homophily
Authors: Chenxuan Xie, Jiajun Zhou, Shengbo Gong, Jiacheng Wan, Jiaxu Qian, Shanqing Yu, Qi Xuan, Xiaoniu Yang
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2306.13532
Pdf link: https://arxiv.org/pdf/2306.13532
Abstract Real-world graphs exhibit increasing heterophily, where nodes no longer tend to be connected to nodes with the same label, challenging the homophily assumption of classical graph neural networks (GNNs) and impeding their performance. Intriguingly, we observe that certain high-order information on heterophilous data exhibits high homophily, which motivates us to involve high-order information in node representation learning. However, common practices in GNNs to acquire high-order information mainly through increasing model depth and altering message-passing mechanisms, which, albeit effective to a certain extent, suffer from three shortcomings: 1) over-smoothing due to excessive model depth and propagation times; 2) high-order information is not fully utilized; 3) low computational efficiency. In this regard, we design a similarity-based path sampling strategy to capture smooth paths containing high-order homophily. Then we propose a lightweight model based on multi-layer perceptrons (MLP), named PathMLP, which can encode messages carried by paths via simple transformation and concatenation operations, and effectively learn node representations in heterophilous graphs through adaptive path aggregation. Extensive experiments demonstrate that our method outperforms baselines on 16 out of 20 datasets, underlining its effectiveness and superiority in alleviating the heterophily problem. In addition, our method is immune to over-smoothing and has high computational efficiency.
LightGlue: Local Feature Matching at Light Speed
Authors: Philipp Lindenberger, Paul-Edouard Sarlin, Marc Pollefeys
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.13643
Pdf link: https://arxiv.org/pdf/2306.13643
Abstract We introduce LightGlue, a deep neural network that learns to match local features across images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse matching, and derive simple but effective improvements. Cumulatively, they make LightGlue more efficient - in terms of both memory and computation, more accurate, and much easier to train. One key property is that LightGlue is adaptive to the difficulty of the problem: the inference is much faster on image pairs that are intuitively easy to match, for example because of a larger visual overlap or limited appearance change. This opens up exciting prospects for deploying deep matchers in latency-sensitive applications like 3D reconstruction. The code and trained models are publicly available at https://github.com/cvg/LightGlue.
Keyword: quantization

There is no result

A-suozhang / GetArxivDaily

New submissions for Mon, 26 Jun 23 #88

Keyword: efficient

AmicroN: A Framework for Generating Annotations for Human Activity Recognition with Granular Micro-Activities

Optimal Cost-Preference Trade-off Planning with Multiple Temporal Tasks

Document Image Cleaning using Budget-Aware Black-Box Approximation

Nonsmooth Control Barrier Functions for Obstacle Avoidance between Convex Regions

Variance-Covariance Regularization Improves Representation Learning

Deep Omni-supervised Learning for Rib Fracture Detection from Chest Radiology Images

Abstractive Text Summarization for Resumes With Cutting Edge NLP Transformers and LSTM

Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning

Multi-objective optimization based network control principles for identifying personalized drug targets with cancer

A selectively reduced degree basis for efficient mixed nonlinear isogeometric beam formulations with extensible directors

Human Activity Behavioural Pattern Recognition in Smarthome with Long-hour Data Collection

Solving a class of multi-scale elliptic PDEs by means of Fourier-based mixed physics informed neural networks

3DSAM-adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation

A Stabilized Circuit-Consistent Foil Conductor Model

Safe Risk-averse Bayesian Optimization for Controller Tuning

Circuit simulation using explicit methods: singular matrix issues

Smoothed Circulant Embedding with Applications to Multilevel Monte Carlo Methods for PDEs with Random Coefficients

Onboarding Citizens to Digital Identity Systems

DISCO-10M: A Large-Scale Music Dataset

Binary domain generalization for sparsifying binary neural networks

Torsion Graph Neural Networks

Manifold Contrastive Learning with Variational Lie Group Operators

Inferring Hierarchical Structure in Multi-Room Maze Environments

Active Coverage for PAC Reinforcement Learning

Machine Learning methods for simulating particle response in the Zero Degree Calorimeter at the ALICE experiment, CERN

Adversarial Robustness Certification for Bayesian Neural Networks

CIDGIKc: Distance-Geometric Inverse Kinematics for Continuum Robots

On particular solutions of linear partial differential equations with polynomial right-hand-sides

LightGlue: Local Feature Matching at Light Speed

ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration

Keyword: faster

Improving Log-Cumulant Based Estimation of Roughness Information in SAR imagery

Neural Network Pruning for Real-time Polyp Segmentation

LightGlue: Local Feature Matching at Light Speed

Keyword: mobile

Cryptanalysis on Secure ECC based Mutual Authentication Protocol for Cloud-Assisted TMIS

Optimal Cost-Preference Trade-off Planning with Multiple Temporal Tasks

Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion

Keyword: pruning

Neural Network Pruning for Real-time Polyp Segmentation

Pruning for Better Domain Generalizability

Efficient Online Processing with Deep Neural Networks

Binary domain generalization for sparsifying binary neural networks

Keyword: diffusion

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability

Directional diffusion models for graph representation learning

DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

Keyword: adaptive

A First Order Meta Stackelberg Method for Robust Federated Learning (Technical Report)

Energy-optimal control of adaptive structures

PathMLP: Smooth Path Towards High-order Homophily

LightGlue: Local Feature Matching at Light Speed

Keyword: quantization