Abstract
We propose an academic publishing system where research papers are stored in a network of data centres owned by university libraries and research institutions, and are interfaced with the academic community through a website. In our system, the editor is replaced by an initial adjusted community-wide evaluation, the standard peer-review is accompanied by a post-publication open-ended and community-wide review process, aiming at a more objective and longer-term evaluation, the publishing costs are reduced to the running costs of the servers, and access is fully open. Our proposal addresses the fundamental problems of the current system: it reduces publishing costs, allowing easier access by less well-funded institutions (especially from developing countries); it makes the editorial evaluation distributed and more transparent; it speeds up the peer review process by eliminating the need for multiple resubmissions; and it introduces a long-term, community-wide evaluation of papers, ensuring their continued relevance and accuracy; while maximising its main goals, i.e. ensuring the highest quality of peer review and giving the best referees, the most visibility and the most credit to the best papers. Our scheme is time-efficient, financially sustainable, ethically fair and represents a significant improvement over the current system.
Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications
Authors: J. Viquerat, E. Hachem
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Abstract
The coupling of deep reinforcement learning to numerical flow control problems has recently received a considerable attention, leading to groundbreaking results and opening new perspectives for the domain. Due to the usually high computational cost of fluid dynamics solvers, the use of parallel environments during the learning process represents an essential ingredient to attain efficient control in a reasonable time. Yet, most of the deep reinforcement learning literature for flow control relies on on-policy algorithms, for which the massively parallel transition collection may break theoretical assumptions and lead to suboptimal control models. To overcome this issue, we propose a parallelism pattern relying on partial-trajectory buffers terminated by a return bootstrapping step, allowing a flexible use of parallel environments while preserving the on-policiness of the updates. This approach is illustrated on a CPU-intensive continuous flow control problem from the literature.
Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Temperature Prediction
Authors: Christophe Bolduc, Justine Giroux, Marc Hébert, Claude Demers, Jean-François Lalonde
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Light plays an important role in human well-being. However, most computer vision tasks treat pixels without considering their relationship to physical luminance. To address this shortcoming, we present the first large-scale photometrically calibrated dataset of high dynamic range \ang{360} panoramas. Our key contribution is the calibration of an existing, uncalibrated HDR Dataset. We do so by accurately capturing RAW bracketed exposures simultaneously with a professional photometric measurement device (chroma meter) for multiple scenes across a variety of lighting conditions. Using the resulting measurements, we establish the calibration coefficients to be applied to the HDR images. The resulting dataset is a rich representation of indoor scenes which displays a wide range of illuminance and color temperature, and varied types of light sources. We exploit the dataset to introduce three novel tasks: where per-pixel luminance, per-pixel temperature and planar illuminance can be predicted from a single input image. Finally, we also capture another smaller calibrated dataset with a commercial \ang{360} camera, to experiment on generalization across cameras. We are optimistic that the release of our datasets and associated code will spark interest in physically accurate light estimation within the community.
Efficient and Scalable Path-Planning Algorithms for Curvature Constrained Motion in the Hamilton-Jacobi Formulation
Abstract
We present a partial-differential-equation-based optimal path-planning framework for curvature constrained motion, with application to vehicles in 2- and 3-spatial-dimensions. This formulation relies on optimal control theory, dynamic programming, and a Hamilton-Jacobi-Bellman equation. Many authors have developed similar models and employed grid-based numerical methods to solve the partial differential equation required to generate optimal trajectories. However, these methods can be inefficient and do not scale well to high dimensions. We describe how efficient and scalable algorithms for solutions of high dimensional Hamilton-Jacobi equations can be developed to solve similar problems very efficiently, even in high dimensions, while maintaining the Hamilton-Jacobi formulation. We demonstrate our method with several examples.
Recognizing and generating unswitchable graphs
Authors: Asish Mukhopadhyay, Daniel John, Srivatsan Vasudevan
Abstract
In this paper, we show that unswitchable graphs are a proper subclass of split graphs, and exploit this fact to propose efficient algorithms for their recognition and generation.
Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming
Authors: Vignesh V Menon, Christian Feldmann, Klaus Schoeffmann, Mohammad Ghanbari, Christian Timmerer
Abstract
For adaptive streaming applications, low-complexity and accurate video complexity features are necessary to analyze the video content in real time, which ensures fast and compression-efficient video streaming without disruptions. State-of-the-art video complexity features are Spatial Information (SI) and Temporal Information (TI) features which do not correlate well with the encoding parameters in adaptive streaming applications. To this light, Video Complexity Analyzer (VCA) was introduced, determining the features based on Discrete Cosine Transform (DCT)-energy. This paper presents optimizations on VCA for faster and energy-efficient video complexity analysis. Experimental results show that VCA v2.0, using eight CPU threads, Single Instruction Multiple Data (SIMD), and low-pass DCT optimization, determines seven complexity features of Ultra High Definition 8-bit videos with better accuracy at a speed of up to 292.68 fps and an energy consumption of 97.06% lower than the reference SITI implementation.
Matrix-free GPU-accelerated saddle-point solvers for high-order problems in $H(\mathrm{div})$
Authors: Will Pazner, Tzanio Kolev, Panayot Vassilevski
Abstract
This work describes the development of matrix-free GPU-accelerated solvers for high-order finite element problems in $H(\mathrm{div})$. The solvers are applicable to grad-div and Darcy problems in saddle-point formulation, and have applications in radiation diffusion and porous media flow problems, among others. Using the interpolation-histopolation basis (cf. SIAM J. Sci. Comput., 45 (2023), A675-A702, arXiv:2203.02465), efficient matrix-free preconditioners can be constructed for the $(1,1)$-block and Schur complement of the block system. With these approximations, block-preconditioned MINRES converges in a number of iterations that is independent of the mesh size and polynomial degree. The approximate Schur complement takes the form of an M-matrix graph Laplacian, and therefore can be well-preconditioned by highly scalable algebraic multigrid methods. High-performance GPU-accelerated algorithms for all components of the solution algorithm are developed, discussed, and benchmarked. Numerical results are presented on a number of challenging test cases, including the "crooked pipe" grad-div problem, the SPE10 reservoir modeling benchmark problem, and a nonlinear radiation diffusion test case.
HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing
Authors: Pere Vergés, Mike Heddes, Igor Nunes, Tony Givargis, Alexandru Nicolau
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Hyperdimensional Computing (HDC) is a bio-inspired computing framework that has gained increasing attention, especially as a more efficient approach to machine learning (ML). This work introduces the \name{} compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code. The code generated by the proposed compiler has three main features for embedded systems and High-Performance Computing: (1) it is self-contained and has no library or platform dependencies; (2) it supports multithreading and single instruction multiple data (SIMD) instructions using C intrinsics; (3) it is optimized for maximum performance and minimal memory usage. \name{} is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend. This makes \name{} a valuable tool for research and applications exploring HDC for classification tasks on embedded systems and High-Performance Computing. To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature. The experiments were run on four different machines, including different hyperparameter configurations, and the results were compared to a popular prototyping library built on PyTorch. The results show a training and inference speedup of up to 132x, averaging 25x across all datasets and machines. Regarding memory usage, using 10240-dimensional hypervectors, the average reduction was 5x, reaching up to 14x. When considering vectors of 64 dimensions, the average reduction was 85x, with a maximum of 158x less memory utilization.
Abstract
We consider the problem of constructing a code capable of correcting a single long tandem duplication error of variable length. As the main contribution of this paper, we present a $q$-ary efficiently encodable code of length $n+1$ and redundancy $1$ that can correct a single duplication of length at least $K=4\cdot\lceil \log_q n\rceil +1$. The complexity of encoding is $O(\frac{n^2}{\log n})$ and the complexity of decoding is $O(n)$. We also present a $q$-ary non-efficient code of length $n+1$ correcting single long duplication of length at least $K = \lceil \log_q n\rceil +\phi(n)$, where $\phi(n)\rightarrow{\infty}$ as $n\rightarrow{\infty}$. This code has redundancy less than $1$ for sufficiently large $n$. Moreover, we show that in the class of codes correcting a single long duplication with redundancy $1$, the value $K$ in our constructions is order-optimal.
PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques
Authors: Mohammed Sabry, Anya Belz
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Recent parameter-efficient finetuning (PEFT) techniques aim to improve over the considerable cost of fully finetuning large pretrained language models (PLM). As different PEFT techniques proliferate, it is becoming difficult to compare them, in particular in terms of (i) the structure and functionality they add to the PLM, (ii) the different types and degrees of efficiency improvements achieved, (iii) performance at different downstream tasks, and (iv) how differences in structure and functionality relate to efficiency and task performance. To facilitate such comparisons, this paper presents a reference framework which standardises aspects shared by different PEFT techniques, while isolating differences to specific locations and interactions with the standard components. Through this process of standardising and isolating differences, a modular view of PEFT techniques emerges, supporting not only direct comparison of different techniques and their efficiency and task performance, but also systematic exploration of reusability and composability of the different types of finetuned modules. We demonstrate how the reference framework can be applied to understand properties and relative advantages of PEFT techniques, hence to inform selection of techniques for specific tasks, and design choices for new PEFT techniques.
Sample-Efficient and Surrogate-Based Design Optimization of Underwater Vehicle Hulls
Authors: Harsh Vardhan, David Hyde, Umesh Timalsina, Peter Volgyesi, Janos Sztipanovits
Abstract
Physics simulations are a computational bottleneck in computer-aided design (CAD) optimization processes. Hence, in order to make accurate (computationally expensive) simulations feasible for use in design optimization, one requires either an optimization framework that is highly sample-efficient or fast data-driven proxies (surrogate models) for long running simulations. In this work, we leverage recent advances in optimization and artificial intelligence (AI) to address both of these potential solutions, in the context of designing an optimal unmanned underwater vehicle (UUV). We first investigate and compare the sample efficiency and convergence behavior of different optimization techniques with a standard computational fluid dynamics (CFD) solver in the optimization loop. We then develop a deep neural network (DNN) based surrogate model to approximate drag forces that would otherwise be computed via direct numerical simulation with the CFD solver. The surrogate model is in turn used in the optimization loop of the hull design. Our study finds that the Bayesian Optimization Lower Condition Bound (BO LCB) algorithm is the most sample-efficient optimization framework and has the best convergence behavior of those considered. Subsequently, we show that our DNN-based surrogate model predicts drag force on test data in tight agreement with CFD simulations, with a mean absolute percentage error (MAPE) of 1.85%. Combining these results, we demonstrate a two-orders-of-magnitude speedup (with comparable accuracy) for the design optimization process when the surrogate model is used. To our knowledge, this is the first study applying Bayesian optimization and DNN-based surrogate modeling to the problem of UUV design optimization, and we share our developments as open-source software.
TIGTEC : Token Importance Guided TExt Counterfactuals
Authors: Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Methodology (stat.ME)
Abstract
Counterfactual examples explain a prediction by highlighting changes of instance that flip the outcome of a classifier. This paper proposes TIGTEC, an efficient and modular method for generating sparse, plausible and diverse counterfactual explanations for textual data. TIGTEC is a text editing heuristic that targets and modifies words with high contribution using local feature importance. A new attention-based local feature importance is proposed. Counterfactual candidates are generated and assessed with a cost function integrating semantic distance, while the solution space is efficiently explored in a beam search fashion. The conducted experiments show the relevance of TIGTEC in terms of success rate, sparsity, diversity and plausibility. This method can be used in both model-specific or model-agnostic way, which makes it very convenient for generating counterfactual explanations.
Sparse Private LASSO Logistic Regression
Authors: Amol Khanna, Fred Lu, Edward Raff
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.
VpROM: A novel Variational AutoEncoder-boosted Reduced Order Model for the treatment of parametric dependencies in nonlinear systems
Authors: Thomas Simpson, Konstantinos Vlachas, Anthony Garland, Nikolaos Dervilis, Eleni Chatzi
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)
Abstract
Reduced Order Models (ROMs) are of considerable importance in many areas of engineering in which computational time presents difficulties. Established approaches employ projection-based reduction such as Proper Orthogonal Decomposition, however, such methods can become inefficient or fail in the case of parameteric or strongly nonlinear models. Such limitations are usually tackled via a library of local reduction bases each of which being valid for a given parameter vector. The success of such methods, however, is strongly reliant upon the method used to relate the parameter vectors to the local bases, this is typically achieved using clustering or interpolation methods. We propose the replacement of these methods with a Variational Autoencoder (VAE) to be used as a generative model which can infer the local basis corresponding to a given parameter vector in a probabilistic manner. The resulting VAE-boosted parametric ROM \emph{VpROM} still retains the physical insights of a projection-based method but also allows for better treatment of problems where model dependencies or excitation traits cause the dynamic behavior to span multiple response regimes. Moreover, the probabilistic treatment of the VAE representation allows for uncertainty quantification on the reduction bases which may then be propagated to the ROM response. The performance of the proposed approach is validated on an open-source simulation benchmark featuring hysteresis and multi-parametric dependencies, and on a large-scale wind turbine tower characterised by nonlinear material behavior and model uncertainty.
Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory
Abstract
We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance. We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms. Instance-optimality enjoys a rich asymptotic theory originating from the work of \citet{lai1985asymptotically,graves1997asymptotically}, but non-asymptotic guarantees have remained elusive outside of certain special cases. Even for problems as simple as tabular reinforcement learning, existing algorithms do not attain instance-optimal performance until the number of rounds of interaction is doubly exponential in the number of states. In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation. We introduce a new complexity measure, the Allocation-Estimation Coefficient (AEC), and provide a new algorithm, $\mathsf{AE}^2$, which attains non-asymptotic instance-optimal performance at a rate controlled by the AEC. Our results recover the best known guarantees for well-studied problems such as finite-armed and linear bandits and, when specialized to tabular reinforcement learning, attain the first instance-optimal regret bounds with polynomial dependence on all problem parameters, improving over prior work exponentially. We complement these results with lower bounds that show that i) existing notions of statistical complexity are insufficient to derive non-asymptotic guarantees, and ii) under certain technical conditions, boundedness of the AEC is necessary to learn an instance-optimal allocation of decisions in finite time.
Evaluating Adversarial Robustness on Document Image Classification
Abstract
Adversarial attacks and defenses have gained increasing interest on computer vision systems in recent years, but as of today, most investigations are limited to images. However, many artificial intelligence models actually handle documentary data, which is very different from real world images. Hence, in this work, we try to apply the adversarial attack philosophy on documentary and natural data and to protect models against such attacks. We focus our work on untargeted gradient-based, transfer-based and score-based attacks and evaluate the impact of adversarial training, JPEG input compression and grey-scale input transformation on the robustness of ResNet50 and EfficientNetB0 model architectures. To the best of our knowledge, no such work has been conducted by the community in order to study the impact of these attacks on the document image classification task.
Queue Routing Strategies to Improve Equitable Housing Coordination in New York City
Authors: Yaren Bilge Kaya, Kayse Lee Maass
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Abstract
Runaway and homeless youth (RHY) are a group of youth and young adults who are at high risk of being exploited through human trafficking. Although access to housing and support services is an effective way to decrease their vulnerability to being exploited, research reveals that coordination of these services provided to RHY by non-profit and government organizations is neither standardized, nor efficient. This situation often causes decreased, delayed, and inequitable access to these scarce housing resources. In this study, we aim to increase the housing system efficiency and reduce the barriers that are contributing to inequitable access to housing through simulation modeling and analyses. Specifically, we simulate a set of crisis and emergency shelters in New York City, funded by a single governmental organization, considering a queuing network with pools of multiple parallel servers, servers with demographic eligibility criteria, stochastic RHY arrival, impatient youth behaviour (possibility of abandonment), and a decision-maker (coordinator) that determines which server pool RHY is routed to. This simulation allows us to evaluate the impact of different queue routing strategies. Our simulation results show that by changing the way RHY is routed to shelters, we can reduce the average wait time by approximately a day and decrease the proportion of RHY abandoning the shelters by 13%.
Graph Convolutional Networks based on Manifold Learning for Semi-Supervised Image Classification
Authors: Lucas Pascotti Valem, Daniel Carlos Guimarães Pedronette, Longin Jan Latecki
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Due to a huge volume of information in many domains, the need for classification methods is imperious. In spite of many advances, most of the approaches require a large amount of labeled data, which is often not available, due to costs and difficulties of manual labeling processes. In this scenario, unsupervised and semi-supervised approaches have been gaining increasing attention. The GCNs (Graph Convolutional Neural Networks) represent a promising solution since they encode the neighborhood information and have achieved state-of-the-art results on scenarios with limited labeled data. However, since GCNs require graph-structured data, their use for semi-supervised image classification is still scarce in the literature. In this work, we propose a novel approach, the Manifold-GCN, based on GCNs for semi-supervised image classification. The main hypothesis of this paper is that the use of manifold learning to model the graph structure can further improve the GCN classification. To the best of our knowledge, this is the first framework that allows the combination of GCNs with different types of manifold learning approaches for image classification. All manifold learning algorithms employed are completely unsupervised, which is especially useful for scenarios where the availability of labeled data is a concern. A broad experimental evaluation was conducted considering 5 GCN models, 3 manifold learning approaches, 3 image datasets, and 5 deep features. The results reveal that our approach presents better accuracy than traditional and recent state-of-the-art methods with very efficient run times for both training and testing.
DualSlide: Global-to-Local Sketching Interface for Slide Content and Layout Design
Abstract
Online learning and academic conferences have become pervasive and essential for education and professional development, especially since the onset of pandemics. Academic presentations usually require well-designed slides that are easily understood. Sketches that visually represent design intentions and are readily accessible to the average users. To assist non-expert users in creating visually appealing academic slides, we propose DualSlide, a global and local two-stage sketching interface system that provides image retrieval and user guidance. At the global stage, DualSlide provides a heat map canvas to display the distribution of all slide layouts in a dataset, allowing users to explore the reference slides efficiently. At the local stage of the system, detailed references and guidance for designing slide content, such as diagrams and fonts, can be provided. We further propose a sketch-matching algorithm to compare the user's input sketch and similar diagrams. All user guidance can be adapted in real-time editing, and users can design slides with a high degree of freedom. We conducted a user study to verify the effectiveness and usability of the proposed DualSlide system confirming that DualSlide provides high retrieval accuracy and satisfactory design results with a good user experience. Video: https://youtu.be/lUI1zjxCdM0
Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning
Abstract
Despite the growing demand for tuning foundation vision transformers (FViTs) on downstream tasks, fully unleashing FViTs' potential under data-limited scenarios (e.g., few-shot tuning) remains a challenge due to FViTs' data-hungry nature. Common data augmentation techniques fall short in this context due to the limited features contained in the few-shot tuning data. To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning. We thus hypothesize that leveraging those learned features to augment the tuning data can boost the effectiveness of few-shot FViT tuning. To this end, we propose a framework called Hint-based Data Augmentation (Hint-Aug), which aims to boost FViT in few-shot tuning by augmenting the over-fitted parts of tuning samples with the learned features of pretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) an Attentive Over-fitting Detector (AOD) to detect over-confident patches of foundation ViTs for potentially alleviating their over-fitting on the few-shot tuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuse easy-to-confuse features from the pretrained FViTs with the over-confident patches detected by the above AOD in order to enhance the feature diversity during tuning. Extensive experiments and ablation studies on five datasets and three parameter-efficient tuning techniques consistently validate Hint-Aug's effectiveness: 0.04% ~ 32.91% higher accuracy over the state-of-the-art (SOTA) data augmentation method under various low-shot settings. For example, on the Pet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less training data over SOTA data augmentation methods.
Abstract
The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or automatic Foley synthesis techniques. To promote further research in this area, we have organized a challenge in DCASE 2023: Task 7 - Foley Sound Synthesis. Our challenge aims to provide a standardized evaluation framework that is both rigorous and efficient, allowing for the evaluation of different Foley synthesis systems. Through this challenge, we hope to encourage active participation from the research community and advance the state-of-the-art in automatic Foley synthesis. In this technical report, we provide a detailed overview of the Foley sound synthesis challenge, including task definition, dataset, baseline, evaluation scheme and criteria, and discussion.
Text-guided Eyeglasses Manipulation with Spatial Constraints
Abstract
Virtual try-on of eyeglasses involves placing eyeglasses of different shapes and styles onto a face image without physically trying them on. While existing methods have shown impressive results, the variety of eyeglasses styles is limited and the interactions are not always intuitive or efficient. To address these limitations, we propose a Text-guided Eyeglasses Manipulation method that allows for control of the eyeglasses shape and style based on a binary mask and text, respectively. Specifically, we introduce a mask encoder to extract mask conditions and a modulation module that enables simultaneous injection of text and mask conditions. This design allows for fine-grained control of the eyeglasses' appearance based on both textual descriptions and spatial constraints. Our approach includes a disentangled mapper and a decoupling strategy that preserves irrelevant areas, resulting in better local editing. We employ a two-stage training scheme to handle the different convergence speeds of the various modality conditions, successfully controlling both the shape and style of eyeglasses. Extensive comparison experiments and ablation analyses demonstrate the effectiveness of our approach in achieving diverse eyeglasses styles while preserving irrelevant areas.
Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems
Abstract
In the paper, we propose a novel approach for solving Bayesian inverse problems with physics-informed invertible neural networks (PI-INN). The architecture of PI-INN consists of two sub-networks: an invertible neural network (INN) and a neural basis network (NB-Net). The invertible map between the parametric input and the INN output with the aid of NB-Net is constructed to provide a tractable estimation of the posterior distribution, which enables efficient sampling and accurate density evaluation. Furthermore, the loss function of PI-INN includes two components: a residual-based physics-informed loss term and a new independence loss term. The presented independence loss term can Gaussianize the random latent variables and ensure statistical independence between two parts of INN output by effectively utilizing the estimated density function. Several numerical experiments are presented to demonstrate the efficiency and accuracy of the proposed PI-INN, including inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations, and seismic traveltime tomography.
SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge
Authors: Ke Chen, Liangyan Li, Huan Liu, Yunzhe Li, Congling Tang, Jun Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.
Performance Optimization using Multimodal Modeling and Heterogeneous GNN
Authors: Akash Dutta, Jordi Alcaraz, Ali TehraniJamsaz, Anna Sikora, Eduardo Cesar, Ali Jannesari
Abstract
Growing heterogeneity and configurability in HPC architectures has made auto-tuning applications and runtime parameters on these systems very complex. Users are presented with a multitude of options to configure parameters. In addition to application specific solutions, a common approach is to use general purpose search strategies, which often might not identify the best configurations or their time to convergence is a significant barrier. There is, thus, a need for a general purpose and efficient tuning approach that can be easily scaled and adapted to various tuning tasks. We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. To this end, we propose the Multimodal Graph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learning based approach that adapts Heterogeneous Graph Neural Networks and Denoizing Autoencoders for modeling IR-based code representations that serve as separate modalities. This approach is used as part of our pipeline to model a syntax, semantics, and structure-aware IR-based code representation for tuning parallel code regions/kernels. We extensively experiment on OpenMP and OpenCL code regions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, and LULESH benchmarks. We apply our multimodal learning techniques to the tasks of i) optimizing the number of threads, scheduling policy and chunk size in OpenMP loops and, ii) identifying the best device for heterogeneous device mapping of OpenCL kernels. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Authors: Evangelos Georganas, Dhiraj Kalamkar, Kirill Voronin, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Abstract
During the past decade, Deep Learning (DL) algorithms, programming systems and hardware have converged with the High Performance Computing (HPC) counterparts. Nevertheless, the programming methodology of DL and HPC systems is stagnant, relying on highly-optimized, yet platform-specific and inflexible vendor-optimized libraries. Such libraries provide close-to-peak performance on specific platforms, kernels and shapes thereof that vendors have dedicated optimizations efforts, while they underperform in the remaining use-cases, yielding non-portable codes with performance glass-jaws. This work introduces a framework to develop efficient, portable DL and HPC kernels for modern CPU architectures. We decompose the kernel development in two steps: 1) Expressing the computational core using Tensor Processing Primitives (TPPs): a compact, versatile set of 2D-tensor operators, 2) Expressing the logical loops around TPPs in a high-level, declarative fashion whereas the exact instantiation (ordering, tiling, parallelization) is determined via simple knobs. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments
Authors: Hyungtae Lim, Daebeom Kim, Beomsoo Kim, Hyun Myung
Abstract
In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.
MixNeRF: Memory Efficient NeRF with Feature Mixed-up Hash Table
Authors: Yongjae Lee, Li Yang, Deliang Fan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Neural radiance field (NeRF) has shown remarkable performance in generating photo-realistic novel views. Since the emergence of NeRF, many studies have been conducted, among which managing features with explicit structures such as grids has achieved exceptionally fast training by reducing the complexity of multilayer perceptron (MLP) networks. However, storing features in dense grids requires significantly large memory space, which leads to memory bottleneck in computer systems and thus large training time. To address this issue, in this work, we propose MixNeRF, a memory-efficient NeRF framework that employs a mixed-up hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality. We first design a \textit{mixed-up hash table} to adaptively mix part of multi-level feature grids into one and map it to a single hash table. Following that, in order to obtain the correct index of a grid point, we further design an \textit{index transformation} method that transforms indices of an arbitrary level grid to those of a canonical grid. Extensive experiments benchmarking with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MixNeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality. Source code is available at \url{https://github.com/nfyfamr/MixNeRF}.
Analog Iterative Machine (AIM): using light to solve quadratic optimization problems with mixed variables
Authors: Kirill kalinin, George Mourgias-Alexandris, Hitesh Ballani, Natalia G. Berloff, James H. Clegg, Daniel Cletheroe, Christos Gkantsidis, Istvan Haller, Vassily Lyutsarev, Francesca Parmigiani, Lucinda Pickup, Antony Rowstron
Subjects: Emerging Technologies (cs.ET); Optimization and Control (math.OC); Applied Physics (physics.app-ph)
Abstract
Solving optimization problems is challenging for existing digital computers and even for future quantum hardware. The practical importance of diverse problems, from healthcare to financial optimization, has driven the emergence of specialised hardware over the past decade. However, their support for problems with only binary variables severely restricts the scope of practical problems that can be efficiently embedded. We build analog iterative machine (AIM), the first instance of an opto-electronic solver that natively implements a wider class of quadratic unconstrained mixed optimization (QUMO) problems and supports all-to-all connectivity of both continuous and binary variables.Beyond synthetic 7-bit problems at small-scale, AIM solves the financial transaction settlement problem entirely in analog domain with higher accuracy than quantum hardware and at room temperature. With compute-in-memory operation and spatial-division multiplexed representation of variables, the design of AIM paves the path to chip-scale architecture with 100 times speed-up per unit-power over the latest GPUs for solving problems with 10,000 variables. The robustness of the AIM algorithm at such scale is further demonstrated by comparing it with commercial production solvers across multiple benchmarks, where for several problems we report new best solutions. By combining the superior QUMO abstraction, sophisticated gradient descent methods inspired by machine learning, and commodity hardware, AIM introduces a novel platform with a step change in expressiveness, performance, and scalability, for optimization in the post-Moores law era.
Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction
Abstract
Streaming graphs are drawing increasing attention in both academic and industrial communities as many graphs in real applications evolve over time. Continuous subgraph matching (shorted as CSM) aims to report the incremental matches of a query graph in such streaming graphs. It involves two major steps, i.e., candidate maintenance and incremental match generation, to answer CSM. Throughout the course of continuous subgraph matching, incremental match generation backtracking over the search space dominates the total cost. However, most previous approaches focus on developing techniques for efficient candidate maintenance, while incremental match generation receives less attention despite its importance in CSM. Aiming to minimize the overall cost, we propose two techniques to reduce backtrackings in this paper. We present a cost-effective index CaLiG that yields tighter candidate maintenance, shrinking the search space of backtracking. In addition, we develop a novel incremental matching paradigm KSS that decomposes the query vertices into conditional kernel vertices and shell vertices. With the matches of kernel vertices, the incremental matches can be produced immediately by joining the candidates of shell vertices without any backtrackings. Benefiting from reduced backtrackings, the elapsed time of CSM decreases significantly. Extensive experiments over real graphs show that our method runs faster than the state-of-the-art algorithm orders of magnitude.
Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint
Authors: Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Jie Li, Xinbo Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Weakly Supervised Temporal Action Localization (WTAL) aims to classify and localize temporal boundaries of actions for the video, given only video-level category labels in the training datasets. Due to the lack of boundary information during training, existing approaches formulate WTAL as a classificationproblem, i.e., generating the temporal class activation map (T-CAM) for localization. However, with only classification loss, the model would be sub-optimized, i.e., the action-related scenes are enough to distinguish different class labels. Regarding other actions in the action-related scene ( i.e., the scene same as positive actions) as co-scene actions, this sub-optimized model would misclassify the co-scene actions as positive actions. To address this misclassification, we propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi-SCC), to discriminate the positive actions from co-scene actions. The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions. However, we find that this augmented video would destroy the original temporal context. Simply applying the consistency constraint would affect the completeness of localized positive actions. Hence, we boost the SCC in a bidirectional way to suppress co-scene actions while ensuring the integrity of positive actions, by cross-supervising the original and augmented videos. Finally, our proposed Bi-SCC can be applied to current WTAL approaches, and improve their performance. Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.
Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation
Abstract
The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation. Thanks to its impressive capabilities in all-round segmentation tasks and its prompt-based interface, SAM has sparked intensive discussion within the community. It is even said by many prestigious experts that image segmentation task has been "finished" by SAM. However, medical image segmentation, although an important branch of the image segmentation family, seems not to be included in the scope of Segmenting "Anything". Many individual experiments and recent studies have shown that SAM performs subpar in medical image segmentation. A natural question is how to find the missing piece of the puzzle to extend the strong segmentation capability of SAM to medical image segmentation. In this paper, we present a possible solution by fine-tuning the pretrained SAM model following parameter-efficient fine-tuning paradigm with Adapter. Although this work is still one of a few to transfer the popular NLP technique Adapter to computer vision cases, this simple implementation shows surprisingly good performance on medical image segmentation. A medical image adapted SAM, which we have dubbed Medical SAM Adapter (MSA), shows superior performance on 19 medical image segmentation tasks with various image modalities including CT, MRI, ultrasound image, fundus image, and dermoscopic images. MSA outperforms a wide range of state-of-the-art (SOTA) medical image segmentation methods, such as nnUNet, TransUNet, UNetr, MedSegDiff, and so on. Code will be released at: https://github.com/WuJunde/Medical-SAM-Adapter.
Spatiotemporal Graph Convolutional Recurrent Neural Network Model for Citywide Air Pollution Forecasting
Authors: Van-Duc Le
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
Citywide Air Pollution Forecasting tries to precisely predict the air quality multiple hours ahead for the entire city. This topic is challenged since air pollution varies in a spatiotemporal manner and depends on many complicated factors. Our previous research has solved the problem by considering the whole city as an image and leveraged a Convolutional Long Short-Term Memory (ConvLSTM) model to learn the spatiotemporal features. However, an image-based representation may not be ideal as air pollution and other impact factors have natural graph structures. In this research, we argue that a Graph Convolutional Network (GCN) can efficiently represent the spatial features of air quality readings in the whole city. Specially, we extend the ConvLSTM model to a Spatiotemporal Graph Convolutional Recurrent Neural Network (Spatiotemporal GCRNN) model by tightly integrating a GCN architecture into an RNN structure for efficient learning spatiotemporal characteristics of air quality values and their influential factors. Our extensive experiments prove the proposed model has a better performance compare to the state-of-the-art ConvLSTM model for air pollution predicting while the number of parameters is much smaller. Moreover, our approach is also superior to a hybrid GCN-based method in a real-world air pollution dataset.
LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves
Authors: Jian Gao, Xin Cao, Xin Yao, Gong Zhang, Wei Wang
Abstract
The recently proposed learned indexes have attracted much attention as they can adapt to the actual data and query distributions to attain better search efficiency. Based on this technique, several existing works build up indexes for multi-dimensional data and achieve improved query performance. A common paradigm of these works is to (i) map multi-dimensional data points to a one-dimensional space using a fixed space-filling curve (SFC) or its variant and (ii) then apply the learned indexing techniques. We notice that the first step typically uses a fixed SFC method, such as row-major order and z-order. It definitely limits the potential of learned multi-dimensional indexes to adapt variable data distributions via different query workloads. In this paper, we propose a novel idea of learning a space-filling curve that is carefully designed and actively optimized for efficient query processing. We also identify innovative offline and online optimization opportunities common to SFC-based learned indexes and offer optimal and/or heuristic solutions. Experimental results demonstrate that our proposed method, LMSFC, outperforms state-of-the-art non-learned or learned methods across three commonly used real-world datasets and diverse experimental settings.
A Practical Algorithm for Max-Norm Optimal Binary Labeling of Graphs
Abstract
This paper concerns the efficient implementation of a method for optimal binary labeling of graph vertices, originally proposed by Malmberg and Ciesielski (2020). This method finds, in quadratic time with respect to graph size, a labeling that globally minimizes an objective function based on the $L_\infty$-norm. The method enables global optimization for a novel class of optimization problems, with high relevance in application areas such as image processing and computer vision. In the original formulation, the Malmberg-Ciesielski algorithm is unfortunately very computationally expensive, limiting its utility in practical applications. Here, we present a modified version of the algorithm that exploits redundancies in the original method to reduce computation time. While our proposed method has the same theoretical asymptotic time complexity, we demonstrate that is substantially more efficient in practice. Even for small problems, we observe a speedup of 4-5 orders of magnitude. This reduction in computation time makes the Malmberg-Ciesielski method a viable option for many practical applications.
Evaluating the Energy Measurements of the IBM POWER9 On-Chip Controller
Authors: Hannes Tröpgen, Mario Bielert, Thomas Ilsche
Abstract
Dependable power measurements are the backbone of energy-efficient computing systems. The IBM PowerNV platform offers such power measurements through an embedded PowerPC 405 processor: The On-Chip Controller (OCC). Among other system-control tasks, the OCC provides power measurements for several domains, such as system, CPU, and GPU. This paper provides a detailed description and an in-depth evaluation of these OCC-provided power measurements. For that, we describe the provided interfaces themselves and experimentally verify their overhead (3.6 us to 10.8 us per access) and readout rate (24.95 Sa/s). We also study the consistency of the reported sensor readouts across the measurement domains and compare it to externally measured data. Furthermore, we estimate the internal sampling rate (1996 Sa/s) by provoking aliasing errors with artificial workloads, and quantify the errors that such aliasing could introduce in practice (for power consumption of processors 12% in our experimental worst-case scenario). Given these insights, practitioners using the IBM PowerNV platform can assess the quality of the embedded measurements, permitting sought-after energy efficiency improvements.
Towards Generating Hop-constrained s-t Simple Path Graphs
Authors: Yuzheng Cai, Siyuan Liu, Weiguo Zheng, Xuemin Lin
Abstract
Graphs have been widely used in real-world applications, in which investigating relations between vertices is an important task. In this paper, we study the problem of generating the k-hop-constrained s-t simple path graph, i.e., the subgraph consisting of all simple paths from vertex s to vertex t of length no larger than k. To our best knowledge, we are the first to formalize this problem and prove its NP-hardness on directed graphs. To tackle this challenging problem, we propose an efficient algorithm named EVE, which exploits the paradigm of edge-wise examination rather than exhaustively enumerating all paths. Powered by essential vertices appearing in all simple paths between vertex pairs, EVE distinguishes the edges that are definitely (or not) contained in the desired simple path graph, producing a tight upper-bound graph in the time cost $\mathcal{O}(k^2|E|)$. Each remaining undetermined edge is further verified to deliver the exact answer. Extensive experiments are conducted on 15 real networks. The results show that EVE significantly outperforms all baselines by several orders of magnitude. Moreover, by taking EVE as a built-in block, state-of-the-art for hop-constrained simple path enumeration can be accelerated by up to an order of magnitude.
Patch-based 3D Natural Scene Generation from a Single Example
Abstract
We target a 3D generative model for general natural scenes that are typically unique and intricate. Lacking the necessary volumes of training data, along with the difficulties of having ad hoc designs in presence of varying scene characteristics, renders existing setups intractable. Inspired by classical patch-based image models, we advocate for synthesizing 3D scenes at the patch level, given a single example. At the core of this work lies important algorithmic designs w.r.t the scene representation and generative patch nearest-neighbor module, that address unique challenges arising from lifting classical 2D patch-based framework to 3D generation. These design choices, on a collective level, contribute to a robust, effective, and efficient model that can generate high-quality general natural scenes with both realistic geometric structure and visual appearance, in large quantities and varieties, as demonstrated upon a variety of exemplar scenes.
A Static Pruning Study on Sparse Neural Retrievers
Authors: Carlos Lassance, Simon Lupart, Hervé Dejean, Stéphane Clinchant, Nicola Tonellotto
Abstract
Sparse neural retrievers, such as DeepImpact, uniCOIL and SPLADE, have been introduced recently as an efficient and effective way to perform retrieval with inverted indexes. They aim to learn term importance and, in some cases, document expansions, to provide a more effective document ranking compared to traditional bag-of-words retrieval models such as BM25. However, these sparse neural retrievers have been shown to increase the computational costs and latency of query processing compared to their classical counterparts. To mitigate this, we apply a well-known family of techniques for boosting the efficiency of query processing over inverted indexes: static pruning. We experiment with three static pruning strategies, namely document-centric, term-centric and agnostic pruning, and we assess, over diverse datasets, that these techniques still work with sparse neural retrievers. In particular, static pruning achieves $2\times$ speedup with negligible effectiveness loss ($\leq 2\%$ drop) and, depending on the use case, even $4\times$ speedup with minimal impact on the effectiveness ($\leq 8\%$ drop). Moreover, we show that neural rerankers are robust to candidates from statically pruned indexes.
Focusing on Information Context for ITS using a Spatial Age of Information Model
Abstract
New technologies for sensing and communication act as enablers for cooperative driving applications. Sensors are able to detect objects in the surrounding environment and information such as their current location is exchanged among vehicles. In order to cope with the vehicles' mobility, such information is required to be as fresh as possible for proper operation of cooperative driving applications. The age of information (AoI) has been proposed as a metric for evaluating freshness of information; recently also within the context of intelligent transportation systems (ITS). We investigate mechanisms to reduce the AoI of data transported in form of beacon messages while controlling their emission rate. We aim to balance packet collision probability and beacon frequency using the average peak age of information (PAoI) as a metric. This metric, however, only accounts for the generation time of the data but not for application-specific aspects, such as the location of the transmitting vehicle. We thus propose a new way of interpreting the AoI by considering information context; thereby incorporating vehicles' locations. As an example, we characterize such importance using the orientation and the distance of the involved vehicles. In particular, we introduce a weighting coefficient used in combination with the PAoI to evaluate the information freshness; emphasizing on information from more important neighbors. We further design the beaconing approach in a way to meet a given AoI requirement, thus, saving resources on the wireless channel while keeping the AoI minimal. We illustrate the effectiveness of our approach in Manhattan-like urban scenarios; reaching pre-specified targets for the AoI of beacon messages.
Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games
Authors: Hédi Hadiji, Sarah Sachs (UvA), Tim van Erven (UvA), Wouter M. Koolen (CWI)
Subjects: Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
In the first-order query model for zero-sum $K\times K$ matrix games, playersobserve the expected pay-offs for all their possible actions under therandomized action played by their opponent. This is a classical model,which has received renewed interest after the discoveryby Rakhlin and Sridharan that $\epsilon$-approximate Nash equilibria can be computedefficiently from $O(\ln K / \epsilon) $ instead of $O( \ln K / \epsilon^2)$ queries.Surprisingly, the optimal number of such queries, as a function of both$\epsilon$ and $K$, is not known.We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ($\epsilon=0$), by showing that they require a number of queries that is linearin $K$, which means that it is essentially as hard as querying the wholematrix, which can also be done with $K$ queries. Second, for $\epsilon > 0$, the currentquery complexity upper bound stands at $O(\min(\ln(K) / \epsilon , K))$. We argue that, unfortunately, obtaining matchinglower bound is not possible with existing techniques: we prove that nolower bound can be derived by constructing hard matrices whose entriestake values in a known countable set, because such matrices can be fullyidentified by a single query. This rules out, for instance, reducing toa submodular optimization problem over the hypercube by encoding itas a binary matrix. We then introduce a new technique for lower bounds,which allows us to obtain lower bounds of order$\tilde\Omega(\log(1 / (K\epsilon)))$ for any $\epsilon \leq1 / cK^4$, where $c$ is a constant independent of $K$. We furtherdiscuss possible future directions to improve on our techniques in orderto close the gap with the upper bounds.
Authors: Yang Li, Wei Wang, Ming Wang, Chunmeng Dou, Zhengyu Ma, Huihui Zhou, Peng Zhang, Nicola Lepri, Xumeng Zhang, Qing Luo, Xiaoxin Xu, Guanhua Yang, Feng Zhang, Ling Li, Daniele Ielmini, Ming Liu
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.data-an)
Abstract
Deep learning needs high-precision handling of forwarding signals, backpropagating errors, and updating weights. This is inherently required by the learning algorithm since the gradient descent learning rule relies on the chain product of partial derivatives. However, it is challenging to implement deep learning in hardware systems that use noisy analog memristors as artificial synapses, as well as not being biologically plausible. Memristor-based implementations generally result in an excessive cost of neuronal circuits and stringent demands for idealized synaptic devices. Here, we demonstrate that the requirement for high precision is not necessary and that more efficient deep learning can be achieved when this requirement is lifted. We propose a binary stochastic learning algorithm that modifies all elementary neural network operations, by introducing (i) stochastic binarization of both the forwarding signals and the activation function derivatives, (ii) signed binarization of the backpropagating errors, and (iii) step-wised weight updates. Through an extensive hybrid approach of software simulation and hardware experiments, we find that binary stochastic deep learning systems can provide better performance than the software-based benchmarks using the high-precision learning algorithm. Also, the binary stochastic algorithm strongly simplifies the neural network operations in hardware, resulting in an improvement of the energy efficiency for the multiply-and-accumulate operations by more than three orders of magnitudes.
SPDH-Sign: towards Efficient, Post-quantum Group-based Signatures
Authors: Christopher Battarbee, Delaram Kahrobaei, Ludovic Perret, Siamak F. Shahandashti
Abstract
In this paper, we present a new diverse class of post-quantum group-based Digital Signature Schemes (DSS). The approach is significantly different from previous examples of group-based digital signatures and adopts the framework of group action-based cryptography: we show that each finite group defines a group action relative to the semidirect product of the group by its automorphism group, and give security bounds on the resulting signature scheme in terms of the group-theoretic computational problem known as the Semidirect Discrete Logarithm Problem (SDLP). Crucially, we make progress towards being able to efficiently compute the novel group action, and give an example of a parameterised family of groups for which the group action can be computed for any parameters, thereby negating the need for expensive offline computation or inclusion of redundancy required in other schemes of this type.
User-Centric Federated Learning: Trading off Wireless Resources for Personalization
Authors: Mohamad Mestoukirdi, Matteo Zecchin, David Gesbert, Qianrui Li
Abstract
Statistical heterogeneity across clients in a Federated Learning (FL) system increases the algorithm convergence time and reduces the generalization performance, resulting in a large communication overhead in return for a poor model. To tackle the above problems without violating the privacy constraints that FL imposes, personalized FL methods have to couple statistically similar clients without directly accessing their data in order to guarantee a privacy-preserving transfer. In this work, we design user-centric aggregation rules at the parameter server (PS) that are based on readily available gradient information and are capable of producing personalized models for each FL client. The proposed aggregation rules are inspired by an upper bound of the weighted aggregate empirical risk minimizer. Secondly, we derive a communication-efficient variant based on user clustering which greatly enhances its applicability to communication-constrained systems. Our algorithm outperforms popular personalized FL baselines in terms of average accuracy, worst node performance, and training communication overhead.
SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators
Authors: Victor J.B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, Luca Benini
Abstract
To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA and Timeloop on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding up the search by 1.7x and 24x compared to LOMA and Timeloop, respectively.
Abstract
Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we remedy this discrepancy by proposing a method of incorporating nondeterministic stacks into neural networks. We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton, representing an exponential number of computations with a dynamic programming algorithm. We incorporate this module into two predominant architectures: recurrent neural networks (RNNs) and transformers. We show that this raises their formal recognition power to arbitrary context-free languages, and also aids training, even on deterministic context-free languages. Empirically, neural networks with nondeterministic stacks learn context-free languages much more effectively than prior stack-augmented models, including a language with theoretically maximal parsing difficulty. We also show that an RNN augmented with a nondeterminsitic stack is capable of surprisingly powerful behavior, such as learning cross-serial dependencies, a well-known non-context-free pattern. We demonstrate improvements on natural language modeling and provide analysis on a syntactic generalization benchmark. This work represents an important step toward building systems that learn to use syntax in more human-like fashion.
Faster High Accuracy Multi-Commodity Flow from Single-Commodity Techniques
Authors: Jan van den Brand, Daniel Zhang
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Abstract
Since the development of efficient linear program solvers in the 80s, all major improvements for solving multi-commodity flows to high accuracy came from improvements to general linear program solvers. This differs from the single commodity problem (e.g.~maximum flow) where all recent improvements also rely on graph specific techniques such as graph decompositions or the Laplacian paradigm (see e.g.~[CMSV17,KLS20,BLL+21,CKL+22]). This phenomenon sparked research to understand why these graph techniques are unlikely to help for multi-commodity flow. [Kyng, Zhang'20] reduced solving multi-commodity Laplacians to general linear systems and [Ding, Kyng, Zhang'22] showed that general linear programs can be reduced to 2-commodity flow. However, the reductions create sparse graph instances, so improvement to multi-commodity flows on denser graphs might exist. We show that one can indeed speed up multi-commodity flow algorithms on non-sparse graphs using graph techniques from single-commodity flow algorithms. This is the first improvement to high accuracy multi-commodity flow algorithms that does not just stem from improvements to general linear program solvers. In particular, using graph data structures from recent min-cost flow algorithm by [BLL+21] based on the celebrated expander decomposition framework, we show that 2-commodity flow on an $n$-vertex $m$-edge graph can be solved in $\tilde{O}(\sqrt{m}n^{\omega-1/2})$ time for current bounds on fast matrix multiplication $\omega \approx 2.373$, improving upon the previous fastest algorithms with $\tilde{O}(m^\omega)$ [CLS19] and $\tilde{O}(\sqrt{m}n^2)$ [KV96] time complexity. For general $k$ commodities, our algorithm runs in $\tilde{O}(k^{2.5}\sqrt{m}n^{\omega-1/2})$ time.
Room dimensions and absorption inference from room transfer function via machine learning
Authors: Yuanxin Xia, Cheol-Ho Jeong
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
The inference of the absorption configuration of an existing room solely using acoustic signals can be challenging. This research presents two methods for estimating the room dimensions and frequency-dependent absorption coefficients using room transfer functions. The first method, a knowledge-based approach, calculates the room dimensions through damped resonant frequencies of the room. The second method, a machine learning approach, employs multi-task convolutional neural networks for inferring the room dimensions and frequency-dependent absorption coefficients of each surface. The study shows that accurate wave-based simulation data can be used to train neural networks for real-world measurements and demonstrates a potential for this algorithm to be used to estimate the boundary input data for room acoustic simulations. The proposed methods can be a valuable tool for room acoustic simulations during acoustic renovation or intervention projects, as they enable to infer the room geometry and absorption conditions with reasonably small data requirements.
On the Generalization of Learned Structured Representations
Abstract
Despite tremendous progress over the past decade, deep learning methods generally fall short of human-level systematic generalization. It has been argued that explicitly capturing the underlying structure of data should allow connectionist systems to generalize in a more predictable and systematic manner. Indeed, evidence in humans suggests that interpreting the world in terms of symbol-like compositional entities may be crucial for intelligent behavior and high-level reasoning. Another common limitation of deep learning systems is that they require large amounts of training data, which can be expensive to obtain. In representation learning, large datasets are leveraged to learn generic data representations that may be useful for efficient learning of arbitrary downstream tasks. This thesis is about structured representation learning. We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure. In the first part of the thesis, we focus on representations that disentangle the explanatory factors of variation of the data. We scale up disentangled representation learning to a novel robotic dataset, and perform a systematic large-scale study on the role of pretrained representations for out-of-distribution generalization in downstream robotic tasks. The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities, such as objects in visual scenes. Object-centric learning methods learn to form meaningful entities from unstructured input, enabling symbolic information processing on a connectionist substrate. In this study, we train a selection of methods on several common datasets, and investigate their usefulness for downstream tasks and their ability to generalize out of distribution.
Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database
Authors: Diego Pasmino, Carlos Aravena, Juan Tapia, Christoph Busch
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Nowadays, Presentation Attack Detection is a very active research area. Several databases are constituted in the state-of-the-art using images extracted from videos. One of the main problems identified is that many databases present a low-quality, small image size and do not represent an operational scenario in a real remote biometric system. Currently, these images are captured from smartphones with high-quality and bigger resolutions. In order to increase the diversity of image quality, this work presents a new PAD database based on open-access Flickr images called: "Flickr-PAD". Our new hand-made database shows high-quality printed and screen scenarios. This will help researchers to compare new approaches to existing algorithms on a wider database. This database will be available for other researchers. A leave-one-out protocol was used to train and evaluate three PAD models based on MobileNet-V3 (small and large) and EfficientNet-B0. The best result was reached with MobileNet-V3 large with BPCER10 of 7.08% and BPCER20 of 11.15%.
Keyword: faster
Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming
Authors: Vignesh V Menon, Christian Feldmann, Klaus Schoeffmann, Mohammad Ghanbari, Christian Timmerer
Abstract
For adaptive streaming applications, low-complexity and accurate video complexity features are necessary to analyze the video content in real time, which ensures fast and compression-efficient video streaming without disruptions. State-of-the-art video complexity features are Spatial Information (SI) and Temporal Information (TI) features which do not correlate well with the encoding parameters in adaptive streaming applications. To this light, Video Complexity Analyzer (VCA) was introduced, determining the features based on Discrete Cosine Transform (DCT)-energy. This paper presents optimizations on VCA for faster and energy-efficient video complexity analysis. Experimental results show that VCA v2.0, using eight CPU threads, Single Instruction Multiple Data (SIMD), and low-pass DCT optimization, determines seven complexity features of Ultra High Definition 8-bit videos with better accuracy at a speed of up to 292.68 fps and an energy consumption of 97.06% lower than the reference SITI implementation.
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
Abstract
Information Extraction from visually rich documents is a challenging task that has gained a lot of attention in recent years due to its importance in several document-control based applications and its widespread commercial value. The majority of the research work conducted on this topic to date follow a two-step pipeline. First, they read the text using an off-the-shelf Optical Character Recognition (OCR) engine, then, they extract the fields of interest from the obtained text. The main drawback of these approaches is their dependence on an external OCR system, which can negatively impact both performance and computational speed. Recent OCR-free methods were proposed to address the previous issues. Inspired by their promising results, we propose in this paper an OCR-free end-to-end information extraction model named DocParser. It differs from prior end-to-end approaches by its ability to better extract discriminative character features. DocParser achieves state-of-the-art results on various datasets, while still being faster than previous works.
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
Abstract
Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve state-of-the-art FID scores 1.77 on CelebA-64$\times$64 and 1.93 on AFHQv2-Wild-64$\times$64. We will share our code and pre-trained models soon.
Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction
Abstract
Streaming graphs are drawing increasing attention in both academic and industrial communities as many graphs in real applications evolve over time. Continuous subgraph matching (shorted as CSM) aims to report the incremental matches of a query graph in such streaming graphs. It involves two major steps, i.e., candidate maintenance and incremental match generation, to answer CSM. Throughout the course of continuous subgraph matching, incremental match generation backtracking over the search space dominates the total cost. However, most previous approaches focus on developing techniques for efficient candidate maintenance, while incremental match generation receives less attention despite its importance in CSM. Aiming to minimize the overall cost, we propose two techniques to reduce backtrackings in this paper. We present a cost-effective index CaLiG that yields tighter candidate maintenance, shrinking the search space of backtracking. In addition, we develop a novel incremental matching paradigm KSS that decomposes the query vertices into conditional kernel vertices and shell vertices. With the matches of kernel vertices, the incremental matches can be produced immediately by joining the candidates of shell vertices without any backtrackings. Benefiting from reduced backtrackings, the elapsed time of CSM decreases significantly. Extensive experiments over real graphs show that our method runs faster than the state-of-the-art algorithm orders of magnitude.
Demystifying Random Number in Ethereum Smart Contract: Taxonomy, Vulnerability Identification, and Attack Detection
Authors: Peng Qian, Jianting He, Lingling Lu, Siwei Wu, Zhipeng Lu, Lei Wu, Yajin Zhou, Qinming He
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Abstract
Recent years have witnessed explosive growth in blockchain smart contract applications. As smart contracts become increasingly popular and carry trillion dollars worth of digital assets, they become more of an appealing target for attackers, who have exploited vulnerabilities in smart contracts to cause catastrophic economic losses. Notwithstanding a proliferation of work that has been developed to detect an impressive list of vulnerabilities, the bad randomness vulnerability is overlooked by many existing tools. In this paper, we make the first attempt to provide a systematic analysis of random numbers in Ethereum smart contracts, by investigating the principles behind pseudo-random number generation and organizing them into a taxonomy. We also lucubrate various attacks against bad random numbers and group them into four categories. Furthermore, we present RNVulDet - a tool that incorporates taint analysis techniques to automatically identify bad randomness vulnerabilities and detect corresponding attack transactions. To extensively verify the effectiveness of RNVulDet, we construct three new datasets: i) 34 well-known contracts that are reported to possess bad randomness vulnerabilities, ii) 214 popular contracts that have been rigorously audited before launch and are regarded as free of bad randomness vulnerabilities, and iii) a dataset consisting of 47,668 smart contracts and 49,951 suspicious transactions. We compare RNVulDet with three state-of-the-art smart contract vulnerability detectors, and our tool significantly outperforms them. Meanwhile, RNVulDet spends 2.98s per contract on average, in most cases orders-of-magnitude faster than other tools. RNVulDet successfully reveals 44,264 attack transactions. Our implementation and datasets are released, hoping to inspire others.
Channel Estimation and Signal Detection for NLOS Ultraviolet Scattering Communication with Space Division Multiple Access
Abstract
We design a receiver assembling several photomultipliers (PMTs) as an array to increase the field of view (FOV) of the receiver and adapt to multiuser situation over None-line-of-sight (NLOS) ultraviolet (UV) channels. Channel estimation and signal detection have been investigated according to the space division characteristics of the structure. Firstly, we adopt the balanced structure on the pilot matrix, analyze the channel estimation mean square error (MSE), and optimize the structure parameters. Then, with the estimated parameters, an analytical threshold detection rule is proposed as a preliminary work of multiuser detection. The detection rule can be optimized by analyzing the separability of two users based on the Gaussian approximation of Poisson weighted sum. To assess the effect of imperfect estimation, the sensitivity analysis of channel estimation error on two-user signal detection is performed. Moreover, we propose a successive elimination method for on-off keying (OOK) modulated multiuser symbol detection based on the previous threshold detection rule. A closed-form upper bound on the detection error rate is calculated, which turns out to be a good approximation of that of multiuser maximum-likelihood (ML) detection. The proposed successive elimination method is twenty times faster than the ML detection with negligible detection error rate degradation.
Keyword: mobile
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds
Abstract
Tracking body pose on-the-go could have powerful uses in fitness, mobile gaming, context-aware virtual assistants, and rehabilitation. However, users are unlikely to buy and wear special suits or sensor arrays to achieve this end. Instead, in this work, we explore the feasibility of estimating body pose using IMUs already in devices that many users own -- namely smartphones, smartwatches, and earbuds. This approach has several challenges, including noisy data from low-cost commodity IMUs, and the fact that the number of instrumentation points on a users body is both sparse and in flux. Our pipeline receives whatever subset of IMU data is available, potentially from just a single device, and produces a best-guess pose. To evaluate our model, we created the IMUPoser Dataset, collected from 10 participants wearing or holding off-the-shelf consumer devices and across a variety of activity contexts. We provide a comprehensive evaluation of our system, benchmarking it on both our own and existing IMU datasets.
SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge
Authors: Ke Chen, Liangyan Li, Huan Liu, Yunzhe Li, Congling Tang, Jun Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Stereo Image Super-Resolution (stereoSR) has attracted significant attention in recent years due to the extensive deployment of dual cameras in mobile phones, autonomous vehicles and robots. In this work, we propose a new StereoSR method, named SwinFSR, based on an extension of SwinIR, originally designed for single image restoration, and the frequency domain knowledge obtained by the Fast Fourier Convolution (FFC). Specifically, to effectively gather global information, we modify the Residual Swin Transformer blocks (RSTBs) in SwinIR by explicitly incorporating the frequency domain knowledge using the FFC and employing the resulting residual Swin Fourier Transformer blocks (RSFTBs) for feature extraction. Besides, for the efficient and accurate fusion of stereo views, we propose a new cross-attention module referred to as RCAM, which achieves highly competitive performance while requiring less computational cost than the state-of-the-art cross-attention modules. Extensive experimental results and ablation studies demonstrate the effectiveness and efficiency of our proposed SwinFSR.
Social media in the Global South: A Network Dataset of the Malian Twittersphere
Authors: Daniel Thilo Schroeder, Mirjam de Bruijn, Luca Bruls, Mulatu Alemayehu Moges, Samba Dialimpa Badji, Noémie Fritz, Modibo Galy Cisse, Johannes Langguth, Bruce Mutsvairo, Kristin Skare Orgeret
Abstract
With the expansion of mobile communications infrastructure and the resulting proliferation of smartphones, social media usage in the Global South is surging with Twitter fast becoming an important platform. In this paper, we present what to our knowledge is the first data set of a Twitter landscape in an African country that is beset by conflict. In particular, we provide a comprehensive data base to explore Twitter usage in Mali, a west African country that until recently has had a relatively precarious media ecology. Mali has since 2012 been affected by an intersection of armed conflicts, often between different ethnic and religious groups. We collected the database in 2022, in a period when the Malian conflict became more violent, both internally and towards external, international actors. We assume that this context influences the ways in which people access social media, and therefore the shape of the Twittersphere and its characteristics. Hence our aim is to primarily invite researchers from various disciplines including complex networks and social sciences scholars to further explore these characteristics. The given snapshot of the Malian Twitter follower network, contains 7M accounts with 56K accounts clearly identifiable as Malian, a figure that coincides with official numbers. In addition, we present the tweets. Both are attached to the data set. The dataset is available at https://osf.io/XXX (available after review). The corresponding hydrate scripts are available at https: //github.com/XXX (available after review).
Linguistic Dead-Ends and Alphabet Soup: Finding Dark Patterns in Japanese Apps
Authors: Shun Hidaka, Sota Kobuki, Mizuki Watanabe, Katie Seaborn
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Graphics (cs.GR)
Abstract
Dark patterns are deceptive and malicious properties of user interfaces that lead the end-user to do something different from intended or expected. While now a key topic in critical computing, most work has been conducted in Western contexts. Japan, with its booming app market, is a relatively uncharted context that offers culturally- and linguistically-sensitive differences in design standards, contexts of use, values, and language, all of which could influence the presence and expression of dark patterns. In this work, we analyzed 200 popular mobile apps in the Japanese market. We found that most apps had dark patterns, with an average of 3.9 per app. We also identified a new class of dark pattern: "Linguistic Dead-Ends" in the forms of "Untranslation" and "Alphabet Soup." We outline the implications for design and research practice, especially for future cross-cultural research on dark patterns.
Automated Solubility Analysis System and Method Using Computer Vision and Machine Learning
Authors: Gahee Kim, Minwoo Jeon, Hyun Do Choi, Jun Ki Cho, Youn-Suk Choi, Hyoseok Hwang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Chemical Physics (physics.chem-ph)
Abstract
In this study, a novel active solubility sensing device using computer vision is proposed to improve separation purification performance and prevent malfunctions of separation equipment such as preparative liquid chromatographers and evaporators. The proposed device actively measures the solubility by transmitting a solution using a background image. The proposed system is a combination of a device that uses a background image and a method for estimating the dissolution and particle presence by changing the background image. The proposed device consists of four parts: camera, display, adjustment, and server units. The camera unit is made up of a rear image sensor on a mobile phone. The display unit is comprised of a tablet screen. The adjustment unit is composed of rotating and height-adjustment jigs. Finally, the server unit consists of a socket server for communication between the units and a PC, including an automated solubility analysis system implemented in Python. The dissolution status of the solution was divided into four categories and a case study was conducted. The algorithms were trained based on these results. Six organic materials and four organic solvents were combined with 202 tests to train the developed algorithm. As a result, the evaluation rate for the dissolution state exhibited an accuracy of 95 %. In addition, the device and method must develop a feedback function that can add a solvent or solute after dissolution detection using solubility results for use in autonomous systems, such as a synthetic automation system. Finally, the diversification of the sensing method is expected to extend not only to the solution but also to the solubility and homogeneity analysis of the film.
Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database
Authors: Diego Pasmino, Carlos Aravena, Juan Tapia, Christoph Busch
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Nowadays, Presentation Attack Detection is a very active research area. Several databases are constituted in the state-of-the-art using images extracted from videos. One of the main problems identified is that many databases present a low-quality, small image size and do not represent an operational scenario in a real remote biometric system. Currently, these images are captured from smartphones with high-quality and bigger resolutions. In order to increase the diversity of image quality, this work presents a new PAD database based on open-access Flickr images called: "Flickr-PAD". Our new hand-made database shows high-quality printed and screen scenarios. This will help researchers to compare new approaches to existing algorithms on a wider database. This database will be available for other researchers. A leave-one-out protocol was used to train and evaluate three PAD models based on MobileNet-V3 (small and large) and EfficientNet-B0. The best result was reached with MobileNet-V3 large with BPCER10 of 7.08% and BPCER20 of 11.15%.
Keyword: pruning
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures
Authors: Eugenia Iofinova, Alexandra Peste, Dan Alistarh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Pruning - that is, setting a significant subset of the parameters of a neural network to zero - is one of the most popular methods of model compression. Yet, several recent works have raised the issue that pruning may induce or exacerbate bias in the output of the compressed model. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood. In this work, we systematically investigate and characterize this phenomenon in Convolutional Neural Networks for computer vision. First, we show that it is in fact possible to obtain highly-sparse models, e.g. with less than 10% remaining weights, which do not decrease in accuracy nor substantially increase in bias when compared to dense models. At the same time, we also find that, at higher sparsities, pruned models exhibit higher uncertainty in their outputs, as well as increased correlations, which we directly link to increased bias. We propose easy-to-use criteria which, based only on the uncompressed model, establish whether bias will increase with pruning, and identify the samples most susceptible to biased predictions post-compression.
A Static Pruning Study on Sparse Neural Retrievers
Authors: Carlos Lassance, Simon Lupart, Hervé Dejean, Stéphane Clinchant, Nicola Tonellotto
Abstract
Sparse neural retrievers, such as DeepImpact, uniCOIL and SPLADE, have been introduced recently as an efficient and effective way to perform retrieval with inverted indexes. They aim to learn term importance and, in some cases, document expansions, to provide a more effective document ranking compared to traditional bag-of-words retrieval models such as BM25. However, these sparse neural retrievers have been shown to increase the computational costs and latency of query processing compared to their classical counterparts. To mitigate this, we apply a well-known family of techniques for boosting the efficiency of query processing over inverted indexes: static pruning. We experiment with three static pruning strategies, namely document-centric, term-centric and agnostic pruning, and we assess, over diverse datasets, that these techniques still work with sparse neural retrievers. In particular, static pruning achieves $2\times$ speedup with negligible effectiveness loss ($\leq 2\%$ drop) and, depending on the use case, even $4\times$ speedup with minimal impact on the effectiveness ($\leq 8\%$ drop). Moreover, we show that neural rerankers are robust to candidates from statically pruned indexes.
Expand-and-Cluster: Exact Parameter Recovery of Neural Networks
Abstract
Can we recover the hidden parameters of an Artificial Neural Network (ANN) by probing its input-output mapping? We propose a systematic method, called `Expand-and-Cluster' that needs only the number of hidden layers and the activation function of the probed ANN to identify all network parameters. In the expansion phase, we train a series of student networks of increasing size using the probed data of the ANN as a teacher. Expansion stops when a minimal loss is consistently reached in student networks of a given size. In the clustering phase, weight vectors of the expanded students are clustered, which allows structured pruning of superfluous neurons in a principled way. We find that an overparameterization of a factor four is sufficient to reliably identify the minimal number of neurons and to retrieve the original network parameters in $80\%$ of tasks across a family of 150 toy problems of variable difficulty. Furthermore, a teacher network trained on MNIST data can be identified with less than $5\%$ overhead in the neuron number. Thus, while direct training of a student network with a size identical to that of the teacher is practically impossible because of the non-convex loss function, training with mild overparameterization followed by clustering and structured pruning correctly identifies the target network.
Keyword: voxel
AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments
Authors: Hyungtae Lim, Daebeom Kim, Beomsoo Kim, Hyun Myung
Abstract
In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.
DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection
Abstract
In this paper, we study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes. We resort to the robust and principled framework of selfteaching, which has triggered notable progress for semisupervised learning recently. While this paradigm is natural for image-level or pixel-level prediction, adapting it to the detection problem is challenged by the issue of proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected proposals generated in the first stage and resulting in spatially sparse training signals. In contrast, we propose the first semisupervised 3D detection algorithm that works in the singlestage manner and allows spatially dense training signals. A fundamental issue of this new design is the quantization error caused by point-to-voxel discretization, which inevitably leads to misalignment between two transformed views in the voxel domain. To this end, we derive and implement closed-form rules that compensate this misalignment onthe-fly. Our results are significant, e.g., promoting ScanNet mAP@0.5 from 35.2% to 48.5% using 20% annotation. Codes and data will be publicly available.
Keyword: lidar
Pointersect: Neural Rendering with Cloud-Ray Intersection
Abstract
We propose a novel method that renders point clouds as if they are surfaces. The proposed method is differentiable and requires no scene-specific optimization. This unique capability enables, out-of-the-box, surface normal estimation, rendering room-scale point clouds, inverse rendering, and ray tracing with global illumination. Unlike existing work that focuses on converting point clouds to other representations--e.g., surfaces or implicit functions--our key idea is to directly infer the intersection of a light ray with the underlying surface represented by the given point cloud. Specifically, we train a set transformer that, given a small number of local neighbor points along a light ray, provides the intersection point, the surface normal, and the material blending weights, which are used to render the outcome of this light ray. Localizing the problem into small neighborhoods enables us to train a model with only 48 meshes and apply it to unseen point clouds. Our model achieves higher estimation accuracy than state-of-the-art surface reconstruction and point-cloud rendering methods on three test sets. When applied to room-scale point clouds, without any scene-specific optimization, the model achieves competitive quality with the state-of-the-art novel-view rendering methods. Moreover, we demonstrate ability to render and manipulate Lidar-scanned point clouds such as lighting control and object insertion.
End-to-End Lidar-Camera Self-Calibration for Autonomous Vehicles
Authors: Arya Rachman, Jürgen Seiler, André Kaup
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Autonomous vehicles are equipped with a multi-modal sensor setup to enable the car to drive safely. The initial calibration of such perception sensors is a highly matured topic and is routinely done in an automated factory environment. However, an intriguing question arises on how to maintain the calibration quality throughout the vehicle's operating duration. Another challenge is to calibrate multiple sensors jointly to ensure no propagation of systemic errors. In this paper, we propose CaLiCa, an end-to-end deep self-calibration network which addresses the automatic calibration problem for pinhole camera and Lidar. We jointly predict the camera intrinsic parameters (focal length and distortion) as well as Lidar-Camera extrinsic parameters (rotation and translation), by regressing feature correlation between the camera image and the Lidar point cloud. The network is arranged in a Siamese-twin structure to constrain the network features learning to a mutually shared feature in both point cloud and camera (Lidar-camera constraint). Evaluation using KITTI datasets shows that we achieve 0.154 {\deg} and 0.059 m accuracy with a reprojection error of 0.028 pixel with a single-pass inference. We also provide an ablative study of how our end-to-end learning architecture offers lower terminal loss (21% decrease in rotation loss) compared to isolated calibration
Object Semantics Give Us the Depth We Need: Multi-task Approach to Aerial Depth Completion
Abstract
Depth completion and object detection are two crucial tasks often used for aerial 3D mapping, path planning, and collision avoidance of Uncrewed Aerial Vehicles (UAVs). Common solutions include using measurements from a LiDAR sensor; however, the generated point cloud is often sparse and irregular and limits the system's capabilities in 3D rendering and safety-critical decision-making. To mitigate this challenge, information from other sensors on the UAV (viz., a camera used for object detection) is utilized to help the depth completion process generate denser 3D models. Performing both aerial depth completion and object detection tasks while fusing the data from the two sensors poses a challenge to resource efficiency. We address this challenge by proposing a novel approach to jointly execute the two tasks in a single pass. The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features. We demonstrate how semantic expectations of the objects in the scene learned by the object detection pathway can boost the performance of the depth completion pathway while placing the missing depth values. Experimental results show that the proposed multi-task network outperforms its single-task counterpart, particularly when exposed to defective inputs.
AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments
Authors: Hyungtae Lim, Daebeom Kim, Beomsoo Kim, Hyun Myung
Abstract
In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.
ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds
Authors: Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose \textit{Soft Discriminative Loss} that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose \textit{Gated Multi-frame Fusion} block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, \textit{pillar association} is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our \textbf{ContrastMotion} on both scene flow and motion prediction tasks. The code is available soon.
Keyword: diffusion
Matrix-free GPU-accelerated saddle-point solvers for high-order problems in $H(\mathrm{div})$
Authors: Will Pazner, Tzanio Kolev, Panayot Vassilevski
Abstract
This work describes the development of matrix-free GPU-accelerated solvers for high-order finite element problems in $H(\mathrm{div})$. The solvers are applicable to grad-div and Darcy problems in saddle-point formulation, and have applications in radiation diffusion and porous media flow problems, among others. Using the interpolation-histopolation basis (cf. SIAM J. Sci. Comput., 45 (2023), A675-A702, arXiv:2203.02465), efficient matrix-free preconditioners can be constructed for the $(1,1)$-block and Schur complement of the block system. With these approximations, block-preconditioned MINRES converges in a number of iterations that is independent of the mesh size and polynomial degree. The approximate Schur complement takes the form of an M-matrix graph Laplacian, and therefore can be well-preconditioned by highly scalable algebraic multigrid methods. High-performance GPU-accelerated algorithms for all components of the solution algorithm are developed, discussed, and benchmarked. Numerical results are presented on a number of challenging test cases, including the "crooked pipe" grad-div problem, the SPE10 reservoir modeling benchmark problem, and a nonlinear radiation diffusion test case.
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Authors: Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, Michael Niemeyer, Federico Tombari
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on 2D images, for supervision of 3D model generation using view dependent prompts. While achieving impressive results, these methods, however, have two major drawbacks. First, rather than commonly used 3D meshes, they instead generate neural radiance fields (NeRFs), making them impractical for most real applications. Second, these approaches tend to produce over-saturated models, giving the output a cartoonish looking effect. Therefore, in this work we propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction. In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.
RenderDiffusion: Text Generation as Image Generation
Authors: Junyi Li, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Diffusion models have become a new generative paradigm for text generation. Considering the discrete categorical nature of text, in this paper, we propose \textsc{RenderDiffusion}, a novel diffusion approach for text generation via text-guided image generation. Our key idea is to render the target text as a \emph{glyph image} containing visual language content. In this way, conditional text generation can be cast as a glyph image generation task, and it is then natural to apply continuous diffusion models to discrete texts. Specially, we utilize a cascaded architecture (\ie a base and a super-resolution diffusion model) to generate high-fidelity glyph images, conditioned on the input text. Furthermore, we design a text grounding module to transform and refine the visual language content from generated glyph images into the final texts. In experiments over four conditional text generation tasks and two classes of metrics (\ie quality and diversity), \textsc{RenderDiffusion} can achieve comparable or even better results than several baselines, including pretrained language models. Our model also makes significant improvements compared to the recent diffusion model.
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
Abstract
Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve state-of-the-art FID scores 1.77 on CelebA-64$\times$64 and 1.93 on AFHQv2-Wild-64$\times$64. We will share our code and pre-trained models soon.
Exploring Compositional Visual Generation with Latent Classifier Guidance
Authors: Changhao Shi, Haomiao Ni, Kai Li, Shaobo Han, Mingfu Liang, Martin Renqiang Min
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space for compositional visual tasks. linear fashion. Specifically, we train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation for any pre-trained generative model with a semantic latent space. We demonstrate that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training. To maintain the original semantics during manipulation, we introduce a new guidance term, which we show is crucial for achieving compositionality. With additional assumptions, we show that the non-linear manipulation reduces to a simple latent arithmetic approach. We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images. Our findings suggest that latent classifier guidance is a promising approach that merits further exploration, even in the presence of other strong competing methods.
Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems
Abstract
In the paper, we propose a novel approach for solving Bayesian inverse problems with physics-informed invertible neural networks (PI-INN). The architecture of PI-INN consists of two sub-networks: an invertible neural network (INN) and a neural basis network (NB-Net). The invertible map between the parametric input and the INN output with the aid of NB-Net is constructed to provide a tractable estimation of the posterior distribution, which enables efficient sampling and accurate density evaluation. Furthermore, the loss function of PI-INN includes two components: a residual-based physics-informed loss term and a new independence loss term. The presented independence loss term can Gaussianize the random latent variables and ensure statistical independence between two parts of INN output by effectively utilizing the estimated density function. Several numerical experiments are presented to demonstrate the efficiency and accuracy of the proposed PI-INN, including inverse kinematics, inverse problems of the 1-d and 2-d diffusion equations, and seismic traveltime tomography.
CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis
Abstract
With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
Authors: Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
Abstract
Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.
The Score-Difference Flow for Implicit Generative Modeling
Abstract
Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that, taken together, address all three challenges of the "generative modeling trilemma": high sample quality, mode coverage, and fast sampling.
Keyword: dynamic
Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications
Authors: J. Viquerat, E. Hachem
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Abstract
The coupling of deep reinforcement learning to numerical flow control problems has recently received a considerable attention, leading to groundbreaking results and opening new perspectives for the domain. Due to the usually high computational cost of fluid dynamics solvers, the use of parallel environments during the learning process represents an essential ingredient to attain efficient control in a reasonable time. Yet, most of the deep reinforcement learning literature for flow control relies on on-policy algorithms, for which the massively parallel transition collection may break theoretical assumptions and lead to suboptimal control models. To overcome this issue, we propose a parallelism pattern relying on partial-trajectory buffers terminated by a return bootstrapping step, allowing a flexible use of parallel environments while preserving the on-policiness of the updates. This approach is illustrated on a CPU-intensive continuous flow control problem from the literature.
Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Temperature Prediction
Authors: Christophe Bolduc, Justine Giroux, Marc Hébert, Claude Demers, Jean-François Lalonde
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Light plays an important role in human well-being. However, most computer vision tasks treat pixels without considering their relationship to physical luminance. To address this shortcoming, we present the first large-scale photometrically calibrated dataset of high dynamic range \ang{360} panoramas. Our key contribution is the calibration of an existing, uncalibrated HDR Dataset. We do so by accurately capturing RAW bracketed exposures simultaneously with a professional photometric measurement device (chroma meter) for multiple scenes across a variety of lighting conditions. Using the resulting measurements, we establish the calibration coefficients to be applied to the HDR images. The resulting dataset is a rich representation of indoor scenes which displays a wide range of illuminance and color temperature, and varied types of light sources. We exploit the dataset to introduce three novel tasks: where per-pixel luminance, per-pixel temperature and planar illuminance can be predicted from a single input image. Finally, we also capture another smaller calibrated dataset with a commercial \ang{360} camera, to experiment on generalization across cameras. We are optimistic that the release of our datasets and associated code will spark interest in physically accurate light estimation within the community.
Efficient and Scalable Path-Planning Algorithms for Curvature Constrained Motion in the Hamilton-Jacobi Formulation
Abstract
We present a partial-differential-equation-based optimal path-planning framework for curvature constrained motion, with application to vehicles in 2- and 3-spatial-dimensions. This formulation relies on optimal control theory, dynamic programming, and a Hamilton-Jacobi-Bellman equation. Many authors have developed similar models and employed grid-based numerical methods to solve the partial differential equation required to generate optimal trajectories. However, these methods can be inefficient and do not scale well to high dimensions. We describe how efficient and scalable algorithms for solutions of high dimensional Hamilton-Jacobi equations can be developed to solve similar problems very efficiently, even in high dimensions, while maintaining the Hamilton-Jacobi formulation. We demonstrate our method with several examples.
PID-inspired modifications in response threshold models in swarm intelligent systems
Authors: Maryam Kebari, Annie S. Wu, H. David Mathias
Abstract
In this study, we investigate the effectiveness of using the PID (Proportional - Integral - Derivative) control loop factors for modifying response thresholds in a decentralized, non-communicating, threshold-based swarm. Each agent in our swarm has a set of four thresholds, each corresponding to a task the agent is capable of performing. The agent will act on a particular task if the stimulus is higher than its corresponding threshold. The ability to modify their thresholds allows the agents to specialize dynamically in response to task demands. Current approaches to dynamic thresholds typically use a learning and forgetting process to adjust thresholds. These methods are able to effectively specialize once, but can have difficulty re-specializing if the task demands change. Our approach, inspired by the PID control loop, alters the threshold values based on the current task demand value, the change in task demand, and the cumulative sum of previous task demands. We show that our PID-inspired method is scalable and outperforms fixed and current learning and forgetting response thresholds with non-changing, constant, and abrupt changes in task demand. This superior performance is due to the ability of our method to re-specialize repeatedly in response to changing task demands.
CEDR-API: Productive, Performant Programming of Domain-Specific Embedded Systems
Authors: Joshua Mack, Serhan Gener, Sahil Hassan, H. Umut Suluhan, Ali Akoglu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
As the computing landscape evolves, system designers continue to explore design methodologies that leverage increased levels of heterogeneity to push performance within limited size, weight, power, and cost budgets. One such methodology is to build Domain-Specific System on Chips (DSSoCs) that promise increased productivity through narrowed scope of their target application domain. In previous works, we have proposed CEDR, an open source, unified compilation and runtime framework for DSSoC architectures that allows applications, scheduling heuristics, and accelerators to be co-designed in a cohesive manner that maximizes system performance. In this work, we present changes to the application development workflow that enable a more productive and expressive API-based programming methodology. These changes allow for more rapid integration of new applications without sacrificing application performance. Towards the design of heterogeneous SoCs with rich set of accelerators, in this study we experimentally study the impact of increase in workload complexity and growth in the pool of compute resources on execution time of dynamically arriving workloads composed of real-life applications executed over architectures emulated on Xilinx ZCU102 MPSoC and Nvidia Jetson AGX Xavier. We expand CEDR into the application domain of autonomous vehicles, and we find that API-based CEDR achieves a runtime overhead reduction of 19.5% with respect to the original CEDR.
Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear Systems via Sums-of-Squares Optimization
Authors: Glen Chou, Russ Tedrake
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
We present a method for synthesizing dynamic, reduced-order output-feedback polynomial control policies for control-affine nonlinear systems which guarantees runtime stability to a goal state, when using visual observations and a learned perception module in the feedback control loop. We leverage Lyapunov analysis to formulate the problem of synthesizing such policies. This problem is nonconvex in the policy parameters and the Lyapunov function that is used to prove the stability of the policy. To solve this problem approximately, we propose two approaches: the first solves a sequence of sum-of-squares optimization problems to iteratively improve a policy which is provably-stable by construction, while the second directly performs gradient-based optimization on the parameters of the polynomial policy, and its closed-loop stability is verified a posteriori. We extend our approach to provide stability guarantees in the presence of observation noise, which realistically arises due to errors in the learned perception module. We evaluate our approach on several underactuated nonlinear systems, including pendula and quadrotors, showing that our guarantees translate to empirical stability when controlling these systems from images, while baseline approaches can fail to reliably stabilize the system.
Sample-Efficient and Surrogate-Based Design Optimization of Underwater Vehicle Hulls
Authors: Harsh Vardhan, David Hyde, Umesh Timalsina, Peter Volgyesi, Janos Sztipanovits
Abstract
Physics simulations are a computational bottleneck in computer-aided design (CAD) optimization processes. Hence, in order to make accurate (computationally expensive) simulations feasible for use in design optimization, one requires either an optimization framework that is highly sample-efficient or fast data-driven proxies (surrogate models) for long running simulations. In this work, we leverage recent advances in optimization and artificial intelligence (AI) to address both of these potential solutions, in the context of designing an optimal unmanned underwater vehicle (UUV). We first investigate and compare the sample efficiency and convergence behavior of different optimization techniques with a standard computational fluid dynamics (CFD) solver in the optimization loop. We then develop a deep neural network (DNN) based surrogate model to approximate drag forces that would otherwise be computed via direct numerical simulation with the CFD solver. The surrogate model is in turn used in the optimization loop of the hull design. Our study finds that the Bayesian Optimization Lower Condition Bound (BO LCB) algorithm is the most sample-efficient optimization framework and has the best convergence behavior of those considered. Subsequently, we show that our DNN-based surrogate model predicts drag force on test data in tight agreement with CFD simulations, with a mean absolute percentage error (MAPE) of 1.85%. Combining these results, we demonstrate a two-orders-of-magnitude speedup (with comparable accuracy) for the design optimization process when the surrogate model is used. To our knowledge, this is the first study applying Bayesian optimization and DNN-based surrogate modeling to the problem of UUV design optimization, and we share our developments as open-source software.
Neuroevolution of Recurrent Architectures on Control Tasks
Authors: Maximilien Le Clei, Pierre Bellec
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Abstract
Modern artificial intelligence works typically train the parameters of fixed-sized deep neural networks using gradient-based optimization techniques. Simple evolutionary algorithms have recently been shown to also be capable of optimizing deep neural network parameters, at times matching the performance of gradient-based techniques, e.g. in reinforcement learning settings. In addition to optimizing network parameters, many evolutionary computation techniques are also capable of progressively constructing network architectures. However, constructing network architectures from elementary evolution rules has not yet been shown to scale to modern reinforcement learning benchmarks. In this paper we therefore propose a new approach in which the architectures of recurrent neural networks dynamically evolve according to a small set of mutation rules. We implement a massively parallel evolutionary algorithm and run experiments on all 19 OpenAI Gym state-based reinforcement learning control tasks. We find that in most cases, dynamic agents match or exceed the performance of gradient-based agents while utilizing orders of magnitude fewer parameters. We believe our work to open avenues for real-life applications where network compactness and autonomous design are of critical importance. We provide our source code, final model checkpoints and full results at github.com/MaximilienLC/nra.
VpROM: A novel Variational AutoEncoder-boosted Reduced Order Model for the treatment of parametric dependencies in nonlinear systems
Authors: Thomas Simpson, Konstantinos Vlachas, Anthony Garland, Nikolaos Dervilis, Eleni Chatzi
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)
Abstract
Reduced Order Models (ROMs) are of considerable importance in many areas of engineering in which computational time presents difficulties. Established approaches employ projection-based reduction such as Proper Orthogonal Decomposition, however, such methods can become inefficient or fail in the case of parameteric or strongly nonlinear models. Such limitations are usually tackled via a library of local reduction bases each of which being valid for a given parameter vector. The success of such methods, however, is strongly reliant upon the method used to relate the parameter vectors to the local bases, this is typically achieved using clustering or interpolation methods. We propose the replacement of these methods with a Variational Autoencoder (VAE) to be used as a generative model which can infer the local basis corresponding to a given parameter vector in a probabilistic manner. The resulting VAE-boosted parametric ROM \emph{VpROM} still retains the physical insights of a projection-based method but also allows for better treatment of problems where model dependencies or excitation traits cause the dynamic behavior to span multiple response regimes. Moreover, the probabilistic treatment of the VAE representation allows for uncertainty quantification on the reduction bases which may then be propagated to the ROM response. The performance of the proposed approach is validated on an open-source simulation benchmark featuring hysteresis and multi-parametric dependencies, and on a large-scale wind turbine tower characterised by nonlinear material behavior and model uncertainty.
Real-Time Ground Fault Detection for Inverter-Based Microgrid Systems
Abstract
Ground fault detection in inverter-based microgrid systems is challenging, particularly in a real-time setting, as the fault current deviates slightly from the nominal value. This difficulty is reinforced when natural disturbances exhibit similar output patterns as a faulty setting does. The conventional solution of installing more relays to obtain additional measurements is costly and also increases the complexity of the system. In this paper, we propose diagnosis schemes based on optimization-based fault detection filters with the output current as the only measurement. Modeling the microgrid dynamics and the diagnosis filter, we formulate the filter design as a linear programming (LP) problem that accounts for decoupling a class of disturbances and ensuring fault sensitivity simultaneously. Next, we robustify the filter to disturbances that cannot be fully decoupled. To this end, we leverage tools from the existing literature and extend the optimization program to a quadratic programming (QP) problem in which the filter is trained for this class of disturbances. To ease the computational effort, we also provide an approximate but analytical solution to this QP. Additionally, we use classical statistical results to provide a thresholding mechanism that enjoys probabilistic false-alarm guarantees. Finally, we verify the effectiveness of the proposed methods through several numerical simulations.
Large Intelligent Surface Measurements for Joint Communication and Sensing
Authors: Christian Nelson, Xuhong Li, Thomas Wilding, Benjamin Deutschmann, Klaus Witrisal, Fredrik Tufvesson
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Multiple concepts for future generations of wireless communication standards utilize coherent processing of signals from many distributed antennas. Names for these concepts include distributed MIMO, cell-free massive MIMO, XL-MIMO, and large intelligent surfaces. They aim to improve communication reliability, capacity, as well as energy efficiency and provide possibilities for new applications through joint communication and sensing. One such recently proposed solution is the concept of RadioWeaves. It proposes a new radio infrastructure for distributed MIMO with distributed internal processing, storage, and compute resources integrated into the infrastructure. The large bandwidths available in the higher bands have inspired much work regarding sensing in the mmWave- and sub-THz-bands, however, sub-6 GHz cellular bands will still be the main provider of broad cellular coverage due to the more favorable propagation conditions. In this paper, we present results from a sub-6 GHz measurement campaign targeting the non-stationary spatial channel statistics for a large RadioWeave and the temporal non-stationarity in a dynamic scenario with RadioWeaves. From the results, we also predict the possibility of multi-static sensing and positioning of users in the environment.
Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout
Authors: Carmel Fiscko, Soummya Kar, Bruno Sinopoli
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
This work studies a multi-agent Markov decision process (MDP) that can undergo agent dropout and the computation of policies for the post-dropout system based on control and sampling of the pre-dropout system. The controller's objective is to find an optimal policy that maximizes the value of the expected system given a priori knowledge of the agents' dropout probabilities. Finding an optimal policy for any specific dropout realization is a special case of this problem. For MDPs with a certain transition independence and reward separability structure, we assume that removing agents from the system forms a new MDP comprised of the remaining agents with new state and action spaces, transition dynamics that marginalize the removed agents, and rewards that are independent of the removed agents. We first show that under these assumptions, the value of the expected post-dropout system can be represented by a single MDP; this "robust MDP" eliminates the need to evaluate all $2^N$ realizations of the system, where $N$ denotes the number of agents. More significantly, in a model-free context, it is shown that the robust MDP value can be estimated with samples generated by the pre-dropout system, meaning that robust policies can be found before dropout occurs. This fact is used to propose a policy importance sampling (IS) routine that performs policy evaluation for dropout scenarios while controlling the existing system with good pre-dropout policies. The policy IS routine produces value estimates for both the robust MDP and specific post-dropout system realizations and is justified with exponential confidence bounds. Finally, the utility of this approach is verified in simulation, showing how structural properties of agent dropout can help a controller find good post-dropout policies before dropout occurs.
Recurrent Transformer Encoders for Vision-based Estimation of Fatigue and Engagement in Cognitive Training Sessions
Abstract
The effectiveness of computerized cognitive training in slowing cognitive decline and brain aging in dementia is often limited by the engagement of participants in the training. Monitoring older users' real-time engagement in domains of attention, motivation, and affect is crucial to understanding the overall effectiveness of such training. In this paper, we propose to predict engagement, quantified via an established mental fatigue measure assessing users' perceived attention, motivation, and affect throughout computerized cognitive training sessions, in older adults with mild cognitive impairment (MCI), by monitoring their real-time video-recorded facial gestures in training sessions. To achieve the goal, we used computer vision, analyzing video frames every 5 seconds to optimize the balance between information retention and data size, and developed a novel Recurrent Video Transformer (RVT). Our RVT model, which combines a clip-wise transformer encoder module and a session-wise Recurrent Neural Network (RNN) classifier, achieved the highest balanced accuracy, F1 score, and precision compared to other state-of-the-art models for both detecting mental fatigue/disengagement cases (binary classification) and rating the level of mental fatigue (multi-class classification). By leveraging dynamic temporal information, the RVT model demonstrates the potential to accurately predict engagement among computerized cognitive training users, which lays the foundation for future work to modulate the level of engagement in computerized cognitive training interventions. The code will be released.
Artificial General Intelligence (AGI) for Education
Authors: Ehsan Latif, Gengchen Mai, Matthew Nyaaba, Xuansheng Wu, Ninghao Liu, Guoyu Lu, Sheng Li, Tianming Liu, Xiaoming Zhai
Abstract
Artificial general intelligence (AGI) has gained global recognition as a future technology due to the emergence of breakthrough large language models and chatbots such as GPT-4 and ChatGPT, respectively. AGI aims to replicate human intelligence through computer systems, which is one of the critical technologies having the potential to revolutionize the field of education. Compared to conventional AI models, typically designed for a limited range of tasks, demand significant amounts of domain-specific data for training and may not always consider intricate interpersonal dynamics in education. AGI, driven by the recent large pre-trained models, represents a significant leap in the capability of machines to perform tasks that require human-level intelligence, such as reasoning, problem-solving, decision-making, and even understanding human emotions and social interactions. This work reviews AGI's key concepts, capabilities, scope, and potential within future education, including setting educational goals, designing pedagogy and curriculum, and performing assessments. We also provide rich discussions over various ethical issues in education faced by AGI and how AGI will affect human educators. The development of AGI necessitates interdisciplinary collaborations between educators and AI engineers to advance research and application efforts.
Information Theory for Complex Systems Scientists
Authors: Thomas F. Varley
Subjects: Information Theory (cs.IT); Data Analysis, Statistics and Probability (physics.data-an); Quantitative Methods (q-bio.QM); Other Statistics (stat.OT)
Abstract
In the 21st century, many of the crucial scientific and technical issues facing humanity can be understood as problems associated with understanding, modelling, and ultimately controlling complex systems: systems comprised of a large number of non-trivially interacting components whose collective behaviour can be difficult to predict. Information theory, a branch of mathematics historically associated with questions about encoding and decoding messages, has emerged as something of a lingua franca for those studying complex systems, far exceeding its original narrow domain of communication systems engineering. In the context of complexity science, information theory provides a set of tools which allow researchers to uncover the statistical and effective dependencies between interacting components; relationships between systems and their environment; mereological whole-part relationships; and is sensitive to non-linearities missed by commonly parametric statistical models. In this review, we aim to provide an accessible introduction to the core of modern information theory, aimed specifically at aspiring (and established) complex systems scientists. This includes standard measures, such as Shannon entropy, relative entropy, and mutual information, before building to more advanced topics, including: information dynamics, measures of statistical complexity, information decomposition, and effective network inference. In addition to detailing the formal definitions, in this review we make an effort to discuss how information theory can be interpreted and develop the intuition behind abstract concepts like "entropy," in the hope that this will enable interested readings to understand what information is, and how it is used, at a more fundamental level.
What is the Expected Transient Behavior of Opinion Evolution for Two Communities?
Authors: Yu Xing, Karl H. Johansson
Subjects: Systems and Control (eess.SY); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Abstract
We study the transient behavior of a gossip model, in which agents randomly interact pairwise over a weighted graph with two communities. Edges within each community have identical weights, different from the weights between communities. It is shown that, at the early stage of the opinion evolution, the expected agent states in the same community have identical sign, despite influence of stubborn agents. Moreover, it is shown that the expected states of the agents in the same community concentrate around the initial average opinion of that community, if the weights within communities are larger than between. In contrast, if the edge weights between communities are larger, then the expected states of all agents concentrate around everyone's initial average opinion. Different from the traditional asymptotic analysis in the opinion dynamics literature, these results focus on the initial phase of opinion evolution and establish a correspondence between community structure and transient behavior of the gossip model. The results are illustrated by numerical examples.
Causal Semantic Communication for Digital Twins: A Generalizable Imitation Learning Approach
Abstract
A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing (e.g., edge computing), and artificial intelligence (AI) technologies to enable many connected intelligence services. In order to handle the large amounts of network data based on digital twins (DTs), wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints by utilizing AI techniques such as causal reasoning. In this paper, a novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems. The CSC system is posed as an imitation learning (IL) problem, where the transmitter, with access to optimal network control policies using a DT, teaches the receiver using SC over a bandwidth limited wireless channel how to improve its knowledge to perform optimal control actions. The causal structure in the source data is extracted using novel approaches from the framework of deep end-to-end causal inference, thereby enabling the creation of a semantic representation that is causally invariant, which in turn helps generalize the learned knowledge of the system to unseen scenarios. The CSC decoder at the receiver is designed to extract and estimate semantic information while ensuring high semantic reliability. The receiver control policies, semantic decoder, and causal inference are formulated as a bi-level optimization problem within a variational inference framework. This problem is solved using a novel concept called network state models, inspired from world models in generative AI, that faithfully represents the environment dynamics leading to data generation. Simulation results demonstrate that the proposed CSC system outperforms state-of-the-art SC systems by achieving better semantic reliability and reduced semantic representation.
Mobilizing Personalized Federated Learning via Random Walk Stochastic ADMM
Authors: Ziba Parsons, Fei Dou, Houyi Du, Jin Lu
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Abstract
In this research, we investigate the barriers associated with implementing Federated Learning (FL) in real-world scenarios, where a consistent connection between the central server and all clients cannot be maintained, and data distribution is heterogeneous. To address these challenges, we focus on mobilizing the federated setting, where the server moves between groups of adjacent clients to learn local models. Specifically, we propose a new algorithm, Random Walk Stochastic Alternating Direction Method of Multipliers (RWSADMM), capable of adapting to dynamic and ad-hoc network conditions as long as a sufficient number of connected clients are available for model training. In RWSADMM, the server walks randomly toward a group of clients. It formulates local proximity among adjacent clients based on hard inequality constraints instead of consensus updates to address data heterogeneity. Our proposed method is convergent, reduces communication costs, and enhances scalability by reducing the number of clients the central server needs to communicate with.
Opinion Control under Adversarial Network Perturbation: A Stackelberg Game Approach
Authors: Yuejiang Li, Zhanjiang Chen, H. Vicky Zhao
Abstract
The emerging social network platforms enable users to share their own opinions, as well as to exchange opinions with others. However, adversarial network perturbation, where malicious users intentionally spread their extreme opinions, rumors, and misinformation to others, is ubiquitous in social networks. Such adversarial network perturbation greatly influences the opinion formation of the public and threatens our societies. Thus, it is critical to study and control the influence of adversarial network perturbation. Although tremendous efforts have been made in both academia and industry to guide and control the public opinion dynamics, most of these works assume that the network is static, and ignore such adversarial network perturbation. In this work, based on the well-accepted Friedkin-Johnsen opinion dynamics model, we model the adversarial network perturbation and analyze its impact on the networks' opinion. Then, from the adversary's perspective, we analyze its optimal network perturbation, which maximally changes the network's opinion. Next, from the network defender's perspective, we formulate a Stackelberg game and aim to control the network's opinion even under such adversarial network perturbation. We devise a projected subgradient algorithm to solve the formulated Stackelberg game. Extensive simulations on real social networks validate our analysis of the adversarial network perturbation's influence and the effectiveness of the proposed opinion control algorithm.
Real-time Safety Assessment of Dynamic Systems in Non-stationary Environments: A Review of Methods and Techniques
Authors: Zeyi Liu, Songqiao Hu, Xiao He
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Real-time safety assessment (RTSA) of dynamic systems is a critical task that has significant implications for various fields such as industrial and transportation applications, especially in non-stationary environments. However, the absence of a comprehensive review of real-time safety assessment methods in non-stationary environments impedes the progress and refinement of related methods. In this paper, a review of methods and techniques for RTSA tasks in non-stationary environments is provided. Specifically, the background and significance of RTSA approaches in non-stationary environments are firstly highlighted. We then present a problem description that covers the definition, classification, and main challenges. We review recent developments in related technologies such as online active learning, online semi-supervised learning, online transfer learning, and online anomaly detection. Finally, we discuss future outlooks and potential directions for further research. Our review aims to provide a comprehensive and up-to-date overview of real-time safety assessment methods in non-stationary environments, which can serve as a valuable resource for researchers and practitioners in this field.
ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds
Authors: Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose \textit{Soft Discriminative Loss} that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose \textit{Gated Multi-frame Fusion} block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, \textit{pillar association} is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our \textbf{ContrastMotion} on both scene flow and motion prediction tasks. The code is available soon.
Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention
Abstract
Traditional multi-agent reinforcement learning algorithms are difficultly applied in a large-scale multi-agent environment. The introduction of mean field theory has enhanced the scalability of multi-agent reinforcement learning in recent years. This paper considers partially observable multi-agent reinforcement learning (MARL), where each agent can only observe other agents within a fixed range. This partial observability affects the agent's ability to assess the quality of the actions of surrounding agents. This paper focuses on developing a method to capture more effective information from local observations in order to select more effective actions. Previous work in this field employs probability distributions or weighted mean field to update the average actions of neighborhood agents, but it does not fully consider the feature information of surrounding neighbors and leads to a local optimum. In this paper, we propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attention module and a mean field module to describe how an agent is influenced by the actions of other agents at each time step. This graph attention module consists of a graph attention encoder and a differentiable attention mechanism, and this mechanism outputs a dynamic graph to represent the effectiveness of neighborhood agents against central agents. The mean--field module approximates the effect of a neighborhood agent on a central agent as the average effect of effective neighborhood agents. We evaluate GAMFQ on three challenging tasks in the MAgents framework. Experiments show that GAMFQ outperforms baselines including the state-of-the-art partially observable mean-field reinforcement learning algorithms.
Dynamic Video Frame Interpolation with integrated Difficulty Pre-Assessment
Authors: Ban Chen, Xin Jin, Youxin Chen, Longhai Wu, Jie Chen, Jayoon Koo, Cheul-hee Hahm
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Video frame interpolation(VFI) has witnessed great progress in recent years. While existing VFI models still struggle to achieve a good trade-off between accuracy and efficiency: fast models often have inferior accuracy; accurate models typically run slowly. However, easy samples with small motion or clear texture can achieve competitive results with simple models and do not require heavy computation. In this paper, we present an integrated pipeline which combines difficulty assessment with video frame interpolation. Specifically, it firstly leverages a pre-assessment model to measure the interpolation difficulty level of input frames, and then dynamically selects an appropriate VFI model to generate interpolation results. Furthermore, a large-scale VFI difficulty assessment dataset is collected and annotated to train our pre-assessment model. Extensive experiments show that easy samples pass through fast models while difficult samples inference with heavy models, and our proposed pipeline can improve the accuracy-efficiency trade-off for VFI.
Low-Power Data Streaming in Systolic Arrays with Bus-Invert Coding and Zero-Value Clock Gating
Abstract
Systolic Array (SA) architectures are well suited for accelerating matrix multiplications through the use of a pipelined array of Processing Elements (PEs) communicating with local connections and pre-orchestrated data movements. Even though most of the dynamic power consumption in SAs is due to multiplications and additions, pipelined data movement within the SA constitutes an additional important contributor. The goal of this work is to reduce the dynamic power consumption associated with the feeding of data to the SA, by synergistically applying bus-invert coding and zero-value clock gating. By exploiting salient attributes of state-of-the-art CNNs, such as the value distribution of the weights, the proposed SA applies appropriate encoding only to the data that exhibits high switching activity. Similarly, when one of the inputs is zero, unnecessary operations are entirely skipped. This selectively targeted, application-aware encoding approach is demonstrated to reduce the dynamic power consumption of data streaming in CNN applications using Bfloat16 arithmetic by 1%-19%. This translates to an overall dynamic power reduction of 6.2%-9.4%.
Learning Robust Deep Equilibrium Models
Authors: Haoyu Chu, Shikui Wei, Ting Liu
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep equilibrium (DEQ) models have emerged as a promising class of implicit layer models in deep learning, which abandon traditional depth by solving for the fixed points of a single nonlinear layer. Despite their success, the stability of the fixed points for these models remains poorly understood. Recently, Lyapunov theory has been applied to Neural ODEs, another type of implicit layer model, to confer adversarial robustness. By considering DEQ models as nonlinear dynamic systems, we propose a robust DEQ model named LyaDEQ with guaranteed provable stability via Lyapunov theory. The crux of our method is ensuring the fixed points of the DEQ models are Lyapunov stable, which enables the LyaDEQ models to resist the minor initial perturbations. To avoid poor adversarial defense due to Lyapunov-stable fixed points being located near each other, we add an orthogonal fully connected layer after the Lyapunov stability module to separate different fixed points. We evaluate LyaDEQ models on several widely used datasets under well-known adversarial attacks, and experimental results demonstrate significant improvement in robustness. Furthermore, we show that the LyaDEQ model can be combined with other defense methods, such as adversarial training, to achieve even better adversarial robustness.
Inverting the Imaging Process by Learning an Implicit Camera Model
Abstract
Representing visual signals with implicit coordinate-based neural networks, as an effective replacement of the traditional discrete signal representation, has gained considerable popularity in computer vision and graphics. In contrast to existing implicit neural representations which focus on modelling the scene only, this paper proposes a novel implicit camera model which represents the physical imaging process of a camera as a deep neural network. We demonstrate the power of this new implicit camera model on two inverse imaging tasks: i) generating all-in-focus photos, and ii) HDR imaging. Specifically, we devise an implicit blur generator and an implicit tone mapper to model the aperture and exposure of the camera's imaging process, respectively. Our implicit camera model is jointly learned together with implicit scene models under multi-focus stack and multi-exposure bracket supervision. We have demonstrated the effectiveness of our new model on a large number of test images and videos, producing accurate and visually appealing all-in-focus and high dynamic range images. In principle, our new implicit neural camera model has the potential to benefit a wide array of other inverse imaging tasks.
Blockchain Large Language Models
Authors: Yu Gai, Liyi Zhou, Kaihua Qin, Dawn Song, Arthur Gervais
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, TXRANK, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, TXRANK is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of TXRANK through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, TXRANK identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.
Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability
Abstract
Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated in serial and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics are reformulated in a non-iterative form and can be parallelized. By rewriting neuronal dynamics without resetting to a general formulation, we propose the Parallel Spiking Neuron (PSN), which uses dense connections between time-steps to maximize the utilization of temporal information. To avoid the use of future inputs for low-latency inference, we add masks on the weights and obtain the masked PSN. By sharing weights across time-steps, the sliding PSN is proposed with the ability to deal with sequences with variant lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To our best knowledge, this is the first research about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning community. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}.
Abstract
The direction of conditional branches is predicted correctly in modern processors with great accuracy. We find several instructions in the dynamic instruction stream that contribute only towards computing the condition of these instructions. Hence, when the predicted direction of conditional branches is indeed correct, these instructions become Ineffectual - the functional state of the program would not be different had these instructions been dropped. However, the execution of ineffectual instructions cannot be avoided altogether because it is possible that the prediction of the branch direction is wrong. In this work, we determine all sources of ineffectuality in an instruction stream such as conditional branches, predicated instructions, indirect jumps and dynamically dead instructions. Then, we propose a technique to steer the ineffectual instructions away from the primary execution cluster so that effectual instructions can execute uncontended. We find that such ineffectuality-based clustering of instructions naturally simplifies the design and avoids several caveats of a clustered architecture. Finally, we propose a technique to detect instances when instructions were incorrectly marked as ineffectual, say due to a branch misprediction, and recover the pipeline. The empirical evaluation of the proposed changes on the SPEC CPU2017 and GAPBS benchmarks show performance uplifts of up to 4.9% and 10.3% on average respectively.
Adaptive Collective Responses to Local Stimuli in Anonymous Dynamic Networks
Authors: Shunhao Oh, Dana Randall, Andréa W. Richa
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)
Abstract
We develop a framework for self-induced phase changes in programmable matter in which a collection of agents with limited computational and communication capabilities can collectively perform appropriate global tasks in response to local stimuli that dynamically appear and disappear. Agents reside on graph vertices, where each stimulus is only recognized locally, and agents communicate via token passing along edges to alert other agents to transition to an "aware" state when stimuli are present and an "unaware" state when the stimuli disappear. We present an Adaptive Stimuli Algorithm that is robust to competing waves of messages as multiple stimuli change, possibly adversarially. Moreover, in addition to handling arbitrary stimulus dynamics, the algorithm can handle agents reconfiguring the connections (edges) of the graph over time in a controlled way. As an application, we show how this Adaptive Stimuli Algorithm on reconfigurable graphs can be used to solve the foraging problem, where food sources may be discovered, removed, or shifted at arbitrary times. We would like the agents to consistently self-organize using only local interactions, such that if the food remains in position long enough, the agents transition to a gather phase, collectively forming a single large component with small perimeter around the food. Alternatively, if no food source has existed recently, the agents should self-induce a switch to a search phase in which they distribute themselves randomly throughout the lattice region to search for food. Unlike previous approaches to foraging, this process is indefinitely repeatable. Like a physical phase change, microscopic changes such as the deletion or addition of a single food source triggers these macroscopic, system-wide transitions as agents share information about the environment and respond locally to get the desired collective response.
Modeling Adaptive Self-healing Systems
Authors: Habtom Kahsay Gidey, Diego Marmsoler, Dominik Ascher
Abstract
Motivation: Smart grids design requires energy distribution operations to be adaptable to abnormality. This requirement entails distribution system operators (DSOs) to optimize restoration to normal operational states dynamically. However, these design challenges demand collaborative research efforts on sophisticated modeling and simulation approaches. Approach: In the ESOSEG research project, analyzing the smart grid domain as a software-intensive system, we employed a dynamic architecture approach, particularly the FOCUS theory, to model and assure the domains' self-healing requirements. Although some works specify various self-healing systems, to the best of our knowledge, the use of the approach in smart grids is the first work to enable a formal specification and verification of self-healing properties in smart grids. Results: As a result, to support the modeling and verification process, we developed tool support with Eclipse Modeling Framework (EMF), Xtext, and other languages in the EMF ecosystem. The tool includes a grammar or a meta-model of the DSL, an interface to enable textual and graphical modeling of architectural patterns and code transformer engine for verification. Furthermore, we evaluated the modeling and verification features of the tool support with an e-Car charging scenario for modeling adaptive self-healing properties. Futureworks: As an outlook, future works could include investigation of comprehensive case studies. These, for instance, could be further particular adaptability scenarios addressing challenges in DSOs. Another interesting aspect could be the evaluation of the modeling approach by investigating its use with engineers involved in a smart grid design. Next, the evaluation could be followed with abstractions of the verification process to make it useable by system architects with no knowledge of the proof language, Isabelle/HOL.
Towards a generalizable simulation framework to study collisions between spacecraft and debris
Abstract
In recent years, computer simulators of rigid-body systems have been successfully used to improve and expand the field of developing new space robots, becoming a leading tool for the preliminary investigation and evaluation of space robotic missions. However, the impressive progress in performance has not been matched yet by an improvement in modelling capabilities, which remain limited to very basic representations of real systems. We present a new approach to modelling and simulation of collision-inclusive multibody dynamics by leveraging symbolic models generated by a computer algebra system (CAS). While similar investigations into contact dynamics on other domains exploit pre-existing models of common multibody systems (e.g., industrial robot arms, humanoids, and wheeled robots), our focus is on allowing researchers to develop models of novel designs of systems that are not as common or yet to be fabricated: e.g., small spacecraft manipulators. In this paper, we demonstrate the usefulness of our approach to investigate spacecraft-debris collision dynamics.
Adaptive Services Function Chain Orchestration For Digital Health Twin Use Cases: Heuristic-boosted Q-Learning Approach
Authors: Jamila Alsayed Kassem, Li Zhong, Arie Taal, Paola Grosso
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Digital Twin (DT) is a prominent technology to utilise and deploy within the healthcare sector. Yet, the main challenges facing such applications are: Strict health data-sharing policies, high-performance network requirements, and possible infrastructure resource limitations. In this paper, we address all the challenges by provisioning adaptive Virtual Network Functions (VNFs) to enforce security policies associated with different data-sharing scenarios. We define a Cloud-Native Network orchestrator on top of a multi-node cluster mesh infrastructure for flexible and dynamic container scheduling. The proposed framework considers the intended data-sharing use case, the policies associated, and infrastructure configurations, then provision Service Function Chaining (SFC) and provides routing configurations accordingly with little to no human intervention. Moreover, what is \textit{optimal} when deploying SFC is dependent on the use case itself, and we tune the hyperparameters to prioritise resource utilisation or latency in an effort to comply with the performance requirements. As a result, we provide an adaptive network orchestration for digital health twin use cases, that is policy-aware, requirements-aware, and resource-aware.
Constraining Chaos: Enforcing dynamical invariants in the training of recurrent neural networks
Authors: Jason A. Platt, Stephen G. Penny, Timothy A. Smith, Tse-Chun Chen, Henry D. I. Abarbanel
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Geophysics (physics.geo-ph)
Abstract
Drawing on ergodic theory, we introduce a novel training method for machine learning based forecasting methods for chaotic dynamical systems. The training enforces dynamical invariants--such as the Lyapunov exponent spectrum and fractal dimension--in the systems of interest, enabling longer and more stable forecasts when operating with limited data. The technique is demonstrated in detail using the recurrent neural network architecture of reservoir computing. Results are given for the Lorenz 1996 chaotic dynamical system and a spectral quasi-geostrophic model, both typical test cases for numerical weather prediction.
Data-Driven Robust Optimization for Energy-Aware and Safe Navigation of Electric Vehicles
Authors: Simran Kumari, Ashish R. Hota, Siddhartha Mukhopadhyay
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Abstract
In this paper, we simultaneously tackle the problem of energy optimal and safe navigation of electric vehicles in a data-driven robust optimization framework. We consider a dynamic model of the electric vehicle which includes both longitudinal and lateral motion as well as dynamics of stored energy level. We leverage past data of obstacle motion to construct a future occupancy set with probabilistic guarantees, and formulate robust collision avoidance constraints with respect to such an occupancy set using convex programming duality. Consequently, we present the finite horizon optimal control problem subject to robust collision avoidance constraints while penalizing resulting energy consumption. Finally, we show the effectiveness of the proposed techniques in reducing energy consumption and ensuring safe navigation via extensive simulations.
The Score-Difference Flow for Implicit Generative Modeling
Abstract
Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. We introduce the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schr\"odinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. However, unlike diffusion models, SD flow places no restrictions on the prior distribution. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that, taken together, address all three challenges of the "generative modeling trilemma": high sample quality, mode coverage, and fast sampling.
Direct Collocation Methods for Trajectory Optimization in Constrained Robotic Systems
Authors: Ricard Bordalba, Tobias Schoels, Lluís Ros, Josep M. Porta, Moritz Diehl
Abstract
Direct collocation methods are powerful tools to solve trajectory optimization problems in robotics. While their resulting trajectories tend to be dynamically accurate, they may also present large kinematic errors in the case of constrained mechanical systems, i.e., those whose state coordinates are subject to holonomic or nonholonomic constraints, like loop-closure or rolling-contact constraints. These constraints confine the robot trajectories to an implicitly-defined manifold, which complicates the computation of accurate solutions. Discretization errors inherent to the transcription of the problem easily make the trajectories drift away from this manifold, which results in physically inconsistent motions that are difficult to track with a controller. This paper reviews existing methods to deal with this problem and proposes new ones to overcome their limitations. Current approaches either disregard the kinematic constraints (which leads to drift accumulation) or modify the system dynamics to keep the trajectory close to the manifold (which adds artificial forces or energy dissipation to the system). The methods we propose, in contrast, achieve full drift elimination on the discrete trajectory, or even along the continuous one, without artificial modifications of the system dynamics. We illustrate and compare the methods using various examples of different complexity.
System Identification with Copula Entropy
Authors: Jian Ma
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Methodology (stat.ME)
Abstract
Identifying differential equation governing dynamical system is an important problem with wide applications. Copula Entropy (CE) is a mathematical concept for measuring statistical independence in information theory. In this paper we propose a method for identifying differential equation of dynamical systems with CE. The problem is considered as a variable selection problem and solved with the previously proposed CE-based method for variable selection. The proposed method composed of two components: the difference operator and the CE estimator. Since both components can be done non-parametrically, the proposed method is therefore model-free and hyperparameter-free. The simulation experiment with the 3D Lorenz system verified the effectiveness of the proposed method.
SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators
Authors: Victor J.B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, Luca Benini
Abstract
To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA and Timeloop on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding up the search by 1.7x and 24x compared to LOMA and Timeloop, respectively.
The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist
Authors: Carlos Cancino-Chacón, Silvan Peter, Patricia Hu, Emmanouil Karystinaios, Florian Henkel, Francesco Foscarin, Nimrod Varga, Gerhard Widmer
Abstract
This paper introduces the ACCompanion, an expressive accompaniment system. Similarly to a musician who accompanies a soloist playing a given musical piece, our system can produce a human-like rendition of the accompaniment part that follows the soloist's choices in terms of tempo, dynamics, and articulation. The ACCompanion works in the symbolic domain, i.e., it needs a musical instrument capable of producing and playing MIDI data, with explicitly encoded onset, offset, and pitch for each played note. We describe the components that go into such a system, from real-time score following and prediction to expressive performance generation and online adaptation to the expressive choices of the human player. Based on our experience with repeated live demonstrations in front of various audiences, we offer an analysis of the challenges of combining these components into a system that is highly reactive and precise, while still a reliable musical partner, robust to possible performance errors and responsive to expressive variations.
Latent Traversals in Generative Models as Potential Flows
Authors: Yue Song, Andy Keller, Nicu Sebe, Max Welling
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Despite the significant recent progress in deep generative models, the underlying structure of their latent spaces is still poorly understood, thereby making the task of performing semantically meaningful latent traversals an open research challenge. Most prior work has aimed to solve this challenge by modeling latent structures linearly, and finding corresponding linear directions which result in `disentangled' generations. In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape's gradient. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations, thereby allowing them to flexibly vary over both space and time. To achieve disentanglement, multiple potentials are learned simultaneously, and are constrained by a classifier to be distinct and semantically self-consistent. Experimentally, we demonstrate that our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines. Further, we demonstrate that our method can be integrated as a regularization term during training, thereby acting as an inductive bias towards the learning of structured representations, ultimately improving model likelihood on similarly structured data.
Abstract
Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we remedy this discrepancy by proposing a method of incorporating nondeterministic stacks into neural networks. We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton, representing an exponential number of computations with a dynamic programming algorithm. We incorporate this module into two predominant architectures: recurrent neural networks (RNNs) and transformers. We show that this raises their formal recognition power to arbitrary context-free languages, and also aids training, even on deterministic context-free languages. Empirically, neural networks with nondeterministic stacks learn context-free languages much more effectively than prior stack-augmented models, including a language with theoretically maximal parsing difficulty. We also show that an RNN augmented with a nondeterminsitic stack is capable of surprisingly powerful behavior, such as learning cross-serial dependencies, a well-known non-context-free pattern. We demonstrate improvements on natural language modeling and provide analysis on a syntactic generalization benchmark. This work represents an important step toward building systems that learn to use syntax in more human-like fashion.
Centralized control for multi-agent RL in a complex Real-Time-Strategy game
Authors: Roger Creus Castanyer
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract
Multi-agent Reinforcement learning (MARL) studies the behaviour of multiple learning agents that coexist in a shared environment. MARL is more challenging than single-agent RL because it involves more complex learning dynamics: the observations and rewards of each agent are functions of all other agents. In the context of MARL, Real-Time Strategy (RTS) games represent very challenging environments where multiple players interact simultaneously and control many units of different natures all at once. In fact, RTS games are so challenging for the current RL methods, that just being able to tackle them with RL is interesting. This project provides the end-to-end experience of applying RL in the Lux AI v2 Kaggle competition, where competitors design agents to control variable-sized fleets of units and tackle a multi-variable optimization, resource gathering, and allocation problem in a 1v1 scenario against other competitors. We use a centralized approach for training the RL agents, and report multiple design decisions along the process. We provide the source code of the project: https://github.com/roger-creus/centralized-control-lux.
PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling
Abstract
Creating pose-driven human avatars is about modeling the mapping from the low-frequency driving pose to high-frequency dynamic human appearances, so an effective pose encoding method that can encode high-fidelity human details is essential to human avatar modeling.To this end, we present PoseVocab, a novel pose encoding method that encourages the network to discover the optimal pose embeddings for learning the dynamic human appearance. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. To achieve pose generalization and temporal consistency, we sample key rotations in $so(3)$ of each joint rather than the global pose vectors, and assign a pose embedding to each sampled key rotation. These joint-structured pose embeddings not only encode the dynamic appearances under different key poses, but also factorize the global pose embedding into joint-structured ones to better learn the appearance variation related to the motion of each joint. To improve the representation ability of the pose embedding while maintaining memory efficiency, we introduce feature lines, a compact yet effective 3D representation, to model more fine-grained details of human appearances. Furthermore, given a query pose and a spatial position, a hierarchical query strategy is introduced to interpolate pose embeddings and acquire the conditional pose feature for dynamic human synthesis. Overall, PoseVocab effectively encodes the dynamic details of human appearance and enables realistic and generalized animation under novel poses. Experiments show that our method outperforms other state-of-the-art baselines both qualitatively and quantitatively in terms of synthesis quality. Code is available at https://github.com/lizhe00/PoseVocab.
Bake off redux: a review and experimental evaluation of recent time series classification algorithms
Authors: Matthew Middlehurst, Patrick Schäfer, Anthony Bagnall
Abstract
In 2017, a research paper compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study, commonly referred to as a `bake off', identified that only nine algorithms performed significantly better than the Dynamic Time Warping (DTW) and Rotation Forest benchmarks that were used. The study categorised each algorithm by the type of feature they extract from time series data, forming a taxonomy of five main algorithm types. This categorisation of algorithms alongside the provision of code and accessible results for reproducibility has helped fuel an increase in popularity of the TSC field. Over six years have passed since this bake off, the UCR archive has expanded to 112 datasets and there have been a large number of new algorithms proposed. We revisit the bake off, seeing how each of the proposed categories have advanced since the original publication, and evaluate the performance of newer algorithms against the previous best-of-category using an expanded UCR archive. We extend the taxonomy to include three new categories to reflect recent developments. Alongside the originally proposed distance, interval, shapelet, dictionary and hybrid based algorithms, we compare newer convolution and feature based algorithms as well as deep learning approaches. We introduce 30 classification datasets either recently donated to the archive or reformatted to the TSC format, and use these to further evaluate the best performing algorithm from each category. Overall, we find that two recently proposed algorithms, Hydra+MultiROCKET and HIVE-COTEv2, perform significantly better than other approaches on both the current and new TSC problems.
Keyword: efficient
Proposal for a distributed, community-driven academic publishing system
Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications
Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Temperature Prediction
Efficient and Scalable Path-Planning Algorithms for Curvature Constrained Motion in the Hamilton-Jacobi Formulation
Recognizing and generating unswitchable graphs
Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming
Matrix-free GPU-accelerated saddle-point solvers for high-order problems in $H(\mathrm{div})$
HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing
Codes Correcting a Single Long Duplication Error
PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques
Sample-Efficient and Surrogate-Based Design Optimization of Underwater Vehicle Hulls
TIGTEC : Token Importance Guided TExt Counterfactuals
Sparse Private LASSO Logistic Regression
VpROM: A novel Variational AutoEncoder-boosted Reduced Order Model for the treatment of parametric dependencies in nonlinear systems
Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory
Evaluating Adversarial Robustness on Document Image Classification
Queue Routing Strategies to Improve Equitable Housing Coordination in New York City
Graph Convolutional Networks based on Manifold Learning for Semi-Supervised Image Classification
DualSlide: Global-to-Local Sketching Interface for Slide Content and Layout Design
Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning
Foley Sound Synthesis at the DCASE 2023 Challenge
Text-guided Eyeglasses Manipulation with Spatial Constraints
Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems
SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge
Performance Optimization using Multimodal Modeling and Heterogeneous GNN
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments
MixNeRF: Memory Efficient NeRF with Feature Mixed-up Hash Table
Analog Iterative Machine (AIM): using light to solve quadratic optimization problems with mixed variables
Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction
Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint
Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation
Spatiotemporal Graph Convolutional Recurrent Neural Network Model for Citywide Air Pollution Forecasting
LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves
A Practical Algorithm for Max-Norm Optimal Binary Labeling of Graphs
Evaluating the Energy Measurements of the IBM POWER9 On-Chip Controller
Towards Generating Hop-constrained s-t Simple Path Graphs
Patch-based 3D Natural Scene Generation from a Single Example
A Static Pruning Study on Sparse Neural Retrievers
Focusing on Information Context for ITS using a Spatial Age of Information Model
Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games
Binary stochasticity enabled highly efficient neuromorphic deep learning achieves better-than-software accuracy
SPDH-Sign: towards Efficient, Post-quantum Group-based Signatures
User-Centric Federated Learning: Trading off Wireless Resources for Personalization
SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators
Nondeterministic Stacks in Neural Networks
Faster High Accuracy Multi-Commodity Flow from Single-Commodity Techniques
Room dimensions and absorption inference from room transfer function via machine learning
On the Generalization of Learned Structured Representations
Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database
Keyword: faster
Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction
Demystifying Random Number in Ethereum Smart Contract: Taxonomy, Vulnerability Identification, and Attack Detection
Channel Estimation and Signal Detection for NLOS Ultraviolet Scattering Communication with Space Division Multiple Access
Keyword: mobile
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds
SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge
Social media in the Global South: A Network Dataset of the Malian Twittersphere
Linguistic Dead-Ends and Alphabet Soup: Finding Dark Patterns in Japanese Apps
Automated Solubility Analysis System and Method Using Computer Vision and Machine Learning
Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database
Keyword: pruning
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures
A Static Pruning Study on Sparse Neural Retrievers
Expand-and-Cluster: Exact Parameter Recovery of Neural Networks
Keyword: voxel
AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments
DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection
Keyword: lidar
Pointersect: Neural Rendering with Cloud-Ray Intersection
End-to-End Lidar-Camera Self-Calibration for Autonomous Vehicles
Object Semantics Give Us the Depth We Need: Multi-task Approach to Aerial Depth Completion
AdaLIO: Robust Adaptive LiDAR-Inertial Odometry in Degenerate Indoor Environments
ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds
Keyword: diffusion
Matrix-free GPU-accelerated saddle-point solvers for high-order problems in $H(\mathrm{div})$
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
RenderDiffusion: Text Generation as Image Generation
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
Exploring Compositional Visual Generation with Latent Classifier Guidance
Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems
CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
The Score-Difference Flow for Implicit Generative Modeling
Keyword: dynamic
Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications
Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Temperature Prediction
Efficient and Scalable Path-Planning Algorithms for Curvature Constrained Motion in the Hamilton-Jacobi Formulation
PID-inspired modifications in response threshold models in swarm intelligent systems
CEDR-API: Productive, Performant Programming of Domain-Specific Embedded Systems
Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear Systems via Sums-of-Squares Optimization
Sample-Efficient and Surrogate-Based Design Optimization of Underwater Vehicle Hulls
Neuroevolution of Recurrent Architectures on Control Tasks
VpROM: A novel Variational AutoEncoder-boosted Reduced Order Model for the treatment of parametric dependencies in nonlinear systems
Real-Time Ground Fault Detection for Inverter-Based Microgrid Systems
Large Intelligent Surface Measurements for Joint Communication and Sensing
Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout
Recurrent Transformer Encoders for Vision-based Estimation of Fatigue and Engagement in Cognitive Training Sessions
Artificial General Intelligence (AGI) for Education
Information Theory for Complex Systems Scientists
What is the Expected Transient Behavior of Opinion Evolution for Two Communities?
Causal Semantic Communication for Digital Twins: A Generalizable Imitation Learning Approach
Mobilizing Personalized Federated Learning via Random Walk Stochastic ADMM
Opinion Control under Adversarial Network Perturbation: A Stackelberg Game Approach
Real-time Safety Assessment of Dynamic Systems in Non-stationary Environments: A Review of Methods and Techniques
ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds
Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention
Dynamic Video Frame Interpolation with integrated Difficulty Pre-Assessment
Low-Power Data Streaming in Systolic Arrays with Bus-Invert Coding and Zero-Value Clock Gating
Learning Robust Deep Equilibrium Models
Inverting the Imaging Process by Learning an Implicit Camera Model
Blockchain Large Language Models
Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability
Dynamic Ineffectuality-based Clustered Architectures
Adaptive Collective Responses to Local Stimuli in Anonymous Dynamic Networks
Modeling Adaptive Self-healing Systems
Towards a generalizable simulation framework to study collisions between spacecraft and debris
Adaptive Services Function Chain Orchestration For Digital Health Twin Use Cases: Heuristic-boosted Q-Learning Approach
Constraining Chaos: Enforcing dynamical invariants in the training of recurrent neural networks
Data-Driven Robust Optimization for Energy-Aware and Safe Navigation of Electric Vehicles
The Score-Difference Flow for Implicit Generative Modeling
Direct Collocation Methods for Trajectory Optimization in Constrained Robotic Systems
System Identification with Copula Entropy
SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators
The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist
Latent Traversals in Generative Models as Potential Flows
Nondeterministic Stacks in Neural Networks
Centralized control for multi-agent RL in a complex Real-Time-Strategy game
PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling
Bake off redux: a review and experimental evaluation of recent time series classification algorithms